More About Semantic Analysis

To be able to provide relevant search results when the user's query is incomplete, misspelled, or imprecise, Exalead CloudView performs a semantic analysis of documents as well as the queries themselves. This generates word matching operators and fuzzy matching options.

• Did you mean? Spell check: "exalaed" prompts "Did you mean: exalead?"

• Approximation: "exalaed" matches "exalead"

• Phonetic spelling: "exaleed" matches "exalead"

• Word truncations: "exal*" matches "Exalead", "exalid" and "exalted"

• Regular expressions: "/exa.ead/" matches "exalead" and "exahead"

Depending on the semantic feature, the analysis takes place either at indexing-time, or at search-time.

• Index-time: analyzes documents before indexing, using semantic processors. Anytime you modify semantic processors, you must always reindex your documents before the change appears in your application.

• Search-time: analyzes the user’s search request, known as Query Expansion, which essentially adds additional search terms to the user’s original query. For example, if phonetic query expansion is enabled, the query "exaleed" would be expanded to "exaleed" OR "exalead".

This section explains how to perform index-time semantic analysis by configuring semantic processors.

For information on search-time semantic analysis, see Configuring Query Expansion.

Begin your semantic configuration using the semantic types delivereindexd in the default Data model. With semantic types, you can configure index-time options such as:

• language detection

• basic indexing form or kind (normalized, exact, or lowercase)

• extractions of phonetized forms and spell-check ngrams

These are examples of the basic building blocks of semantic analysis that allow you to set up more advanced semantics. For example, using the Rules Matcher processor or the Semantic Extractor processor.

For more information on semantic types, see Indexing Options for Alphanumeric Properties.

A semantic processor adds semantic information to text during analysis. These are annotations that you can map to fields and categories (index-time facet) in the index.

The annotations are named based on the type of semantic processor and its configuration.

Note: Because this analysis occurs at index-time, you must reindex your documents after enabling or modifying these features.

These are the main semantic processors available in Exalead CloudView:

• Related terms flag-related concepts in your corpus. Related terms typically display as navigation facet in your search application.

• Named entities flag people, places, organizations, or events in your corpus. Named entities typically display as navigation facet in your search application.

• Phonetizers creates a phonetic version for each word in your corpus and stores them in the dictionary. Phonetic processing significantly improves the effectiveness of spell check and enables phonetic search (soundslike: exaleed). This processing is language-dependant.

• Rules-based matching and annotations are provided through semantic processors such as the Rules Matcher, Fast Rules, the Ontology Matcher, and Semantic Extraction.

• Ngram Extractors calculate probability of word occurrences or phrases within the corpus. This significantly improves the effectiveness of spell-check at search-time.

• For a detailed reference of the processor parameters, see the semantic processors descriptions in the "Search" section of the CloudView XML Configuration Reference Guide.

• For a detailed reference of the format of a semantic processor’s resource file, see Appendix - Semantic Resources Reference.