Configuration : Configuring Search Queries : Configuring Dictionaries
 
Configuring Dictionaries
 
About Dictionaries
Setting Up a Dictionary
Compacting and Building Dictionaries
Clearing Dictionaries
Dictionary is a separate structure from the index that stores all the words from an indexed document, plus their number of occurrences in the corpus. It serves for linguistic expansion mechanisms such as spell-checking or regular expression matching.
About Dictionaries
Setting Up a Dictionary
Compacting and Building Dictionaries
Clearing Dictionaries
About Dictionaries
During installation, features requiring a dictionary are set up with the default dictionary, dict0. You can change the configuration of dictionary resources in the default dictionary or create additional dictionaries to suit your needs.
Dictionary Resources
All these resources are already configured for the default dictionary, dict0. Use this list to change default settings or to build new dictionaries.
Resource
Description
Words
Stores words & their frequency to calculate relevance and term expansion.
If word occurrences are under the specified Min Frequency, they do not appear in the dictionary.
Ngrams
Used to improve spell check accuracy.
PREREQUISITE: Select the Extract spell check ngrams for the semantic types associated with this dictionary.
Phonetic Forms
Used to improve spell check accuracy, to calculate relevance and term expansion. It is required for phonetic term expansion.
PREREQUISITE: a phonetic semantic processor must be defined in the pipeline, or you must select the Extract phonetic forms option for the semantic types associated with this dictionary.
Related Terms
Required to provide related terms in this language.
PREREQUISITE: Define a related terms semantic processor in the pipeline.
Multiple Dictionaries
Exalead CloudView supports multiple dictionaries. Each dictionary is configured separately with its own name, maximum size, and so forth.
On the indexing side, you can configure a semantic type to use a specific dictionary. So when you associate a data model property with a semantic type, it ensures that the generated index field is associated to a specific dictionary. This dictionary can only contain words likely to appear in that field.
Symmetrically, each prefix-handler at search time can target a specific dictionary (for regexp search, etc.).
Moreover, the dictionary allows you to define filtering rules for controlling which words are stored in the dictionary. This allows you to store only words with a minimum number of characters, or words matching a regular expression.
Setting Up a Dictionary
Create a New Dictionary
1. In the Administration Console, go to Index > Linguistics > Dictionaries.
2. Click Add Dictionary.
TIP: For Creation mode, select copy.
To determine which elements you need in this dictionary, see About Dictionaries.
3. Click Apply.
Associate a Dictionary to Metas via Semantic Types
1. In the Administration Console, go to Index > Data Model > Semantic Type.
2. Expand a semantic type, and in the Dictionary field, select the dictionary.
Note: If you do not want to store words in a dictionary, select None.
3. Select the prerequisite options, depending on which elements are in your dictionary. See About Dictionaries.
4. Click Apply.
Associate a Dictionary to Metas via Mappings
1. In the Administration Console, go to Index > Data processing > pipeline name > Mappings.
2. Under the Mapping sources column, expand the meta you want to associate with a dictionary.
3. Under the Mapping targets column, select the dictionary name, and then under the Details column, select the elements where you want this meta to be stored in the dictionary. See About Dictionaries.
4. Repeat for all mappings you want to associate to dictionaries.
5. Click Apply.
Change the Default Dictionary
The first dictionary in your list of dictionaries is the default dictionary. Since a new Exalead CloudView installation only includes one dictionary, dict0, it automatically becomes the default dictionary.
1. To set another default dictionary, use the Default dictionary list under Dic­tionary.
Set Up a Dictionary Resource
This procedure shows how to set up the Words resource. You can configure other resources similarly.
1. In the Administration Console, go to Index > Linguistics > Dictionaries.
2. Select (or add) your dictionary.
3. Expand Words.
4. Under Actions, click the edit tool next to the language you want to configure.
5. From the Edit language config dialog box, configure:
Max No. terms: Set the maximum number of terms allowed for the selected language.
Min frequency: How often the word needs to occur for it to be stored for that language in the dictionary.
Regexp filter: Define a pattern of words to exclude from the dictionary for this language.
6. Click Accept.
Compacting and Building Dictionaries
The dictionary capabilities include compact and building policies.
Compact policies: Dictionary data is regularly compacted after N import operations and/ or N seconds, to keep a single file per resource.
Build policies: Dictionaries are regularly rebuilt after N compact operations and/ or N seconds to be up-to-date.
The following procedures explain how to configure compact and build operations.
Compact Individual Dictionaries
1. In the Administration Console, go to Index > Linguistics > Dictionaries > Dictionary > dictn > Configuration.
2. Select Enable compact and specify the compact policy.
Choose to compact when N import streams have been done since the last compact operation.
Choose to compact every N second.
3. Click Save and Apply.
Fine-Tune the Compact Size
1. Edit the <DATADIR>/config/Dictionary.xml file
2. Add a FrequencyCompactFilter to the CompactPolicies node, as shown in the following example.
<dict:CompactPolicies disjunctives="true">
<dict:ImportCountCompactPolicy countThreshold="1"/>
<dict:FrequencyCompactFilter lang="fr" minFrequency="10"/>
</dict:CompactPolicies>
In this example, the compact file is lightened of all French terms that do not have at least 10 occurrences.
Build Individual Dictionaries
1. In the Administration Console, go to Index > Linguistics > Dictionaries > Dictionary > dictn > Configuration.
2. Select Enable build and specify the build policy.
Choose to build when N compact operations have been done since the last build operation.
Choose to build every N second.
3. Click Save and Apply.
Force a Compact and a Build Operation
Sometimes, you do not want to wait for the end of a compact or a build operation, and start them at once.
1. In the Administration Console, go to Index > Linguistics > Dictionaries.
2. In the Dictionary status panel, click Compact & Build for the dictionary you want to compact and build immediately.
3. Click Save and Apply.
Clearing Dictionaries
You sometimes need to clear your dictionaries after you have edited:
the configuration of dictionary resources in Linguistics > Dictionaries > dictN > Resources, for example, related terms or ngrams parameters.
the tokenization config associated to dictionary features in Linguistics > Dictionaries > dictN > Features.
Recommendation: Clear your dictionaries when documents have been deleted from your corpus to ensure their reliability.
Clear Individual Dictionaries
1. In the Administration Console, go to Index > Linguistics > Dictionaries.
2. From the Dictionary status section, click Clear for the dictionary you want to clear out.
Clear All Dictionaries
1. In the Administration Console, go to the Home page.
2. From the Indexing section, click Clear.
3. Select Dictionary data for ALL build groups and click Clear.
Clear All Dictionaries (Alternative Procedure)
1. Go to Index > Linguistics > Dictionaries.
2. Click Clear all dictionaries.