Configuration : Appendix - Configure Semantic Processors : Compound Words Splitter
 
Compound Words Splitter
 
Example
When to Use
Dependencies
The Compound Words Splitter processor splits CamelCase, quiteCamelCase and underscored_case words into separate words.
Example
Input
Output
SearchServer
Search Server
simpleSearchServer
simple Search Server
simple_value
simple value
To allow searching for these words individually, you must use Tokenize annotations option. It creates tokens for each root word of the compound word. You need it to index values since annotations are not tokenized (same behavior as the spellchecker).
For example:
Input
Output
SearchServer
Search
Server
simpleSearchServer
simple
Search
Server
simple_value
simple
value
When to Use
The use cases where this processor is useful are manifold. Among others, we could use it for:
Agglutinated data coming from a database. For example, agglutinated names like JohnSteed, EmmaPeel, JohnGambitt, etc.
Source code to search for variables and class and function names. Searching is more convenient when these compound names are split into multiple words, for example, when you want the query search to retrieve a document containing SearchServer.
Note: If you need to index "real" compound words without uppercase and underscores (for example, wheelchair, editor-in-chief, etc.) use a standard tokenization. For more information, see Customizing the Tokenization Config.
Dependencies
Add a Normalizer processor in the analysis pipeline if you do not want to index exact forms only, but also support lowercase and normalized forms for uncompound words.