XML Configuration Reference : Index : LanguageDetector
 
LanguageDetector
com.exalead.indexing.analysis.v10.LanguageDetector
Language detection is performed using the text of all the DocumentChunks associated with the specified input ContextNames for which language was not already detected or specified. The whole text of all these DocumentChunks is taken into account by a statistical algorithm that detects the language. This language is then set as the language for all specified chunks. For example, the language attribute of a DocumentChunk is used by semantic processing. Language is represented by its iso639-1 code: fr, en.
Parent elements:
com.exalead.indexing.analysis.v10.AnalysisPipeline (as AnalysisPipeline)
com.exalead.indexing.analysis.v10.DocumentProcessorGroup (as DocumentProcessorGroup)
Attributes:
Name
Type
Default value
Description
name
string
Name of this processor. The name of a processor is used only for tracing and debugging purposes.
dataModelState
string
Is this document processor managed by a data model? @enum{null,auto,customized, error}.
If null, this document processor is not related to a data model.
If "auto", this document processor is auto-generated by a data model.
If "customized", this document processor was auto-generated by a data model and then customized.
If "error", there is a conflict between this document processor and the data model.
dataModelClass
string
If dataModelState is either "auto" or "customized", you will find here the name of the DataModelClass that generated this DocumentProcessor.
dataModelProperty
string
If dataModelState is either "auto" or "customized", you will find here the name of the DataModelProperty that generated this DocumentProcessor.
disabled
boolean
Disable the DocumentProcessor
languageContext
string
If this is not null and if there is a DocumentChunk with a ContextName matching 'languageContext':
no automatic detection will be performed,
the language specified will be used as the language of the DocumentChunks associated with the ContextNames specified as input.
languagesToDetect
string
If not null, restrict the language detector to a set of languages. If you only have a small set of languages to detect, you can restrict language detector to this set to improve precision. List is comma-separated, ex: "en,fr"
defaultLanguage
string
If not null, 'defaultLanguage' will be used as the default language when automatic detection fails.
exclude
boolean
If true, "inputContexts" is an exclude list instead of an include list. Language detection is then performed on all DocumentChunks except those whose ContextName appears in 'inputContexts'.
outputContext
string
ContextName of the DocumentChunk to create. It will contain the language detected in the processed DocumentChunks as defined in ISO 639-1.
minLangPercentage
int
33
Minimum ratio ([0-100]) of language to be detected (0 = always keeps a detected language)
languagesToKeep
int
Keeps the n most represented languages in the document. A value of 0 lets the minLangPercentage select the languages.
Nested elements:
Name
Type
Description
inputContexts
exa.bee.StringValue*
The processor will only be applied to DocumentChunks with a ContextName specified in this list.
fromDataModel
com.exalead.indexing.analysis.v10.DocumentProcessor
If dataModelState is "customized", you will find here the original document processor generated by the data model. Use this to easily revert to "auto" state from "customized". @IgnoreForValueConstructor
AcceptCondition
com.exalead.indexing.analysis.v10.AcceptCondition
Expresses the enablement condition of this DocumentProcessor.