XML Configuration Reference : Index : XpathExtractor
 
XpathExtractor
com.exalead.indexing.analysis.v10.XpathExtractor
Extraction is performed for the following data types:
text/html. HTML Files.
application/xml. XML Files.
Warning: To put before the NativeTextExtractor because the 'bytes' of each Document Binary Part are deleted by the NativeTextExtractor. Limitations: This extractor handles node set and string functions. Not number and boolean. You can use number or boolean functions inside your xpath //img[starts-with(@src, "http://")] because this xpath return a set of nodes (<img>) but xpath count(//img) doesn't work because it returns a number. @csh AC_XPATH_EXTRACTOR_ID
Parent elements:
com.exalead.indexing.analysis.v10.AnalysisPipeline (as AnalysisPipeline)
com.exalead.indexing.analysis.v10.DocumentProcessorGroup (as DocumentProcessorGroup)
Attributes:
Name
Type
Default value
Description
name
string
Name of this processor. The name of a processor is used only for tracing and debugging purposes.
dataModelState
string
Is this document processor managed by a data model? @enum{null,auto,customized, error}.
If null, this document processor is not related to a data model.
If "auto", this document processor is auto-generated by a data model.
If "customized", this document processor was auto-generated by a data model and then customized.
If "error", there is a conflict between this document processor and the data model.
dataModelClass
string
If dataModelState is either "auto" or "customized", you will find here the name of the DataModelClass that generated this DocumentProcessor.
dataModelProperty
string
If dataModelState is either "auto" or "customized", you will find here the name of the DataModelProperty that generated this DocumentProcessor.
disabled
boolean
Disable the DocumentProcessor
htmlParserToUse
enum(htmlCleaner, tagSoup)
htmlCleaner
HTML parser to use in priority.
Nested elements:
Name
Type
Description
fromDataModel
com.exalead.indexing.analysis.v10.DocumentProcessor
If dataModelState is "customized", you will find here the original document processor generated by the data model. Use this to easily revert to "auto" state from "customized". @IgnoreForValueConstructor
AcceptCondition
com.exalead.indexing.analysis.v10.AcceptCondition
Expresses the enablement condition of this DocumentProcessor.
XpathRule
com.exalead.indexing.analysis.v10.XpathRule*