Configuration : Configuring Search Queries : Adding Related Terms
 
Adding Related Terms
 
About Related Terms
Configure Related Terms and Similar Documents Detection
You can configure Related Terms to offer users-related query terms that might be relevant depending on their original queries.
About Related Terms
Configure Related Terms and Similar Documents Detection
About Related Terms
Related terms are a list of nouns or adjectives separated by link words, and shared by at least N documents of your corpus. The link words are identified by an internal, language-specific resource file that you cannot edit.
Related terms are flagged at index time as semantic annotations, based on the configuration of the Related Terms Extractor semantic processor.
Note: You can also add text directly to the dictionary using the dedicated annotation relatedTermCustom when defining annotations (Kind or Name field).
For related terms to display on the Refinements panel at search time, they must meet the following criteria:
Not be shared by more than X% of your hits (X=25 by default).
Be in at least Y hits (Y=3 by default).
Have a corpus frequency of at least Z (Z=0 by default).
Force or Prevent Related Terms with Allow Lists and Block Lists
Allow list: An indexing-time instruction that ensures whenever the specified expression is detected in the text. It is sent to the dictionary as a possible related term.
Note: This does NOT bypass the selection criteria for displaying related terms in the Refinements panel. The related terms that appear are determined according to their relevance to the search results. You cannot force a certain related term at search time.
Block list: A search-time filter that blocks the specified expression from displaying as a related term in the Refinements panel. There is little to no performance impact, and no need to reindex.
Configure the Detection of Similar Documents Based on Related Terms
Similar documents search means that for each hit, Exalead CloudView generates a new query that retrieves all indexed documents deemed close enough to it.
Depending on the frequency of each related term in your corpus, the number of related terms in your documents, and the conditions specified in the Similarity configuration parameters, the prefix handler generates:
A new dynamic virtual field that specifies the similarity formula.
A new dynamic sort on this virtual field, to display more similar documents first.
UQL Query
For example, your similar query probably looks like:
similar:(264 328 579 628 730 782 806 847 853 871 955 1064 1071 1073 1074 1134 1137 1177 1194 1270 1285 1362
1390 1391 1474 1537 1539 1560 1579 1585)
ELLQL Query
This generates the following ELLQL query, which defines the similarity virtual field and assigns a weight to each keyword lookup according to its corpus frequency.
#query{nbdocs=934, score.expr="@term.score * @proximity + @b",
similarity.expr="(#length(keyword) >= 5) * ((score >= 332925) * score / #sqrt(30* #length(keyword)))
/ 0.083231266666667",
proximity.maxDistance=1000,
term.score=RANK_TFIDF}(#or(#num{b=53646}(keyword,==,264)
#num{b=70688}(keyword,==,328) #num{b=75574}(keyword,==,579)
#num{b=74506}(keyword,==,628) #num{b=34317}(keyword,==,730)
#num{b=40264}(keyword,==,782) #num{b=107885}(keyword,==,806)
#num{b=143583}(keyword,==,847) #num{b=76695}(keyword,==,853)
#num{b=80417}(keyword,==,871) #num{b=60146}(keyword,==,955)
#num{b=88194}(keyword,==,1064) #num{b=61715}(keyword,==,1071)
#num{b=30021}(keyword,==,1073) #num{b=46950}(keyword,==,1074)
#num{b=61715}(keyword,==,1134) #num{b=143583}(keyword,==,1137)
#num{b=96514}(keyword,==,1177) #num{b=90061}(keyword,==,1194)
#num{b=51783}(keyword,==,1270) #num{b=130086}(keyword,==,1285)
#num{b=90061}(keyword,==,1362) #num{b=83255}(keyword,==,1390)
#num{b=161950}(keyword,==,1391) #num{b=161950}(keyword,==,1474)
#num{b=43783}(keyword,==,1537) #num{b=92059}(keyword,==,1539)
#num{b=76695}(keyword,==,1560) #num{b=99010}(keyword,==,1579)
#num{b=69832}(keyword,==,1585) ))
Configure Related Terms and Similar Documents Detection
If you selected the option Enable related terms during setup, the related terms feature is already set up. You can yet customize default values or add block lists and allow lists.
If you did not select the option Enable related terms during setup, see the procedure below to enable related terms first.
Find Out Which Languages Support Related Terms
1. In the Administration Console, go to Index > Linguistics > Dictionaries > dictionary_name > Related Terms.
or, check your <DATADIR>/config/dictionary.xml file.
Enable Related Terms
1. In the Administration Console, select Data Model > Semantic Types > text.
2. Select the Extract related terms check box. A default semantic processor (RelatedTerms.default) is added to your analysis pipeline and a facet (rt_keyword) is added to your search logic.
3. To allow the display of related terms in the Refinements panel of your search application, select Search > Search Logics > Your_Search_Logic > Facets.
4. Under Related terms (at the bottom), select Enable.
Related terms are enabled. See the procedure below to customize default values or add block lists and allow lists.
Configure Related Terms
1. In the Administration Console, select Search > Search Logics > Your_Search_Logic > Facets.
2. In the Related terms section, configure the following options:
Parameter
Description
Dictionary
Specify the dictionary to use.
Value field indexing RT
Index field in which related terms have been indexed (by default, named keyword).
Block list
Blocks the specified expression from displaying as a related term in the Refinements panel. See Set Up Related Terms Block Lists.
You can also set up related terms allow lists. See Set Up Related Terms Allow Lists.
Maximum number of RT
Maximum number of related terms to be computed for a query.
Minimum frequency for a RT
Minimum number of occurrences in the whole index for a term to be possibly selected for synthesis.
Result-set Low-pass filter
Filters out terms occurring more than this threshold in the result set (value comprised between 0 and 1).
Corpus Low-pass filter
Filters out terms occurring more than this threshold in the whole index (value comprised between 0 and 1).
3. Optionally, for big corpuses, you can enhance the quality and performance of Related Terms calculation by tuning two parameters in the Search Logic XML configuration.
a. Open the API Console and click Manage
b. Search for SetSearchLogicList and go the <ns:#RelatedTermsSynthesisConfig> node and configure its parameters.
Note: For information about these parameters, see "RelatedTermsSynthesisConfig" in the Exalead CloudView XML Configuration Reference Guide.
4. Click Apply.
Set Up Related Terms Block Lists
1. In the Administration Console, go to Index > Data processing > Pipeline name > Semantic Processors.
2. Under Block list, specify your resource file.
If you have already created a resource file, click Browse to select the resource file. If you have created a resource file using cvadmin, type the path to the resource file using the format resourcemanager://group_name/resource_name.
OR, create a new resource: click Create new, specify a name for the list, and click Accept. This adds a resource file using the same name as the Administration Console's Resource Manager, to ensure correct deployment of interdependent resource files in multihost environments.
To define the contents of the resource file, click Edit. This takes you to the Business Console. For more information, see "Add a related terms block list or allow list" in the Exalead CloudView Business Console User's Guide.
3. Click Apply.
Set Up Related Terms Allow Lists
1. In the Administration Console, go to Index > Data processing > Pipeline name > Semantic Processors.
2. Under Allow list, specify your resource file.
If you have already created a resource file, click Browse to select the resource file. If you have created a resource file using cvadmin, type the path to the resource file using the format resourcemanager://group_name/resource_name.
OR, create a new resource: click Create new, specify a name for the list, and click Accept. This adds a resource file using the same name as the Administration Console's Resource Manager, to ensure correct deployment of interdependent resource files in multihost environments.
To define the contents of the resource file, click Edit. This takes you to the Business Console. For more information, see "Add a related terms block list or allow list" in the Exalead CloudView Business Console User's Guide.
3. Click Apply.
Enable the Detection of Similar Documents
The similarity of two documents is based on related terms. You must set up related terms before you can use similar documents. See Enable Related Terms.
1. In the Administration Console, go to the Search Logics > Hit Content tab.
2. In the Similarity section, select Enable.
3. Change the configuration parameters if required.
Parameter
Description
Prefix handler name
Specifies the prefix handler that must be entered in the search box before the query.
Min. shared keywords
Does not return documents that do not share at least the specified number of shared keywords with the reference document.
Min. keywords per document
Does not return similar documents that do not have at least the specified number of keywords.
Min. similarity threshold
Specifies the minimum similarity score for two documents to be considered similar. Value must be between 0 and 1, 1 meaning exact match.
Language constraint
Forces to detect similar documents in the same language.
4. Click Apply.
5. Start a query in the Mashup UI.
You can view similar documents by clicking the Similar results link (on the lower right).