NativeTextExtractor

Name	Type	Default value	Description
name	string		Name of this processor. The name of a processor is used only for tracing and debugging purposes.
dataModelState	string		Is this document processor managed by a data model? @enum{null,auto,customized, error}. • If null, this document processor is not related to a data model. • If "auto", this document processor is auto-generated by a data model. • If "customized", this document processor was auto-generated by a data model and then customized. • If "error", there is a conflict between this document processor and the data model.
dataModelClass	string		If dataModelState is either "auto" or "customized", you will find here the name of the DataModelClass that generated this DocumentProcessor.
dataModelProperty	string		If dataModelState is either "auto" or "customized", you will find here the name of the DataModelProperty that generated this DocumentProcessor.
disabled	boolean		Disable the DocumentProcessor
annotateHTML	boolean		Adds some stylish annotations to DocumentChunks (for HTML files only): • html:p for DocumentChunks generated from <p> • html:row for DocumentChunks generated from <tr> • html:column for DocumentChunks generated from <td> or <th> • html:table for DocumentChunks generated from <table> • html:h1 for DocumentChunks generated from <h1> • html:h2 for DocumentChunks generated from <h2> • html:h3 for DocumentChunks generated from <h3> • html:h4 for DocumentChunks generated from <h4> • html:h5 for DocumentChunks generated from <h5> • html:h6 for DocumentChunks generated from <h6> • html:link for DocumentChunks generated from <a>, <iframe> or <frame> ◦ html:link:rel if the link has a "rel" attribute ◦ html:link:name if the link has a "name" attribute • html:list for DocumentChunks generated from <ul>, <ol> or <dl> • html:item for DocumentChunks generated from <li> • html:bold for DocumentChunks generated from <b> or <strong> • html:italic for DocumentChunks generated from <i> or <em> • html:underline for DocumentChunks generated from <u> • html:strike for DocumentChunks generated from <s> or <strike> • html:pre for DocumentChunks generated from <pre> • html:invisible for DocumentChunks containing invisible text (display: none, white on white) • html:class for DocumentChunks taken in a CSS class • html:id for DocumentChunks taken in a CSS id • html:img:src for DocumentChunks created from a <img> It also creates specific HTML DocumentChunks with the following contexts: • html:lang when parsing a <html> containing the "lang" attribute • html:xml:lang when parsing a <html> containing the "xml:lang" attribute • html:title when parsing a <title> • html:title:other when parsing a second <title> • html:base:href when parsing a <base> • html:link when parsing a <link> containing the "src" attribute and annotated by: ◦ html:link:rel if the link has a "rel" attribute ◦ html:link:type if the link has a "type" attribute • html:http-equiv:NAME when parsing a http-equiv meta • html:meta:NAME when parsing a meta named "NAME"
skipInvisibleHTMLText	boolean		Skips the invisible text. For example, white fonts on white backgrounds (for HTML files only).
extractJs	boolean		Tries to parse JavaScript and then extract links.
extractHTMLTables	boolean		Adds annotations on table, tr, td, th
extractHTMLStyles	boolean		Adds annotations on style attributes.
extractHTMLForms	boolean		Add annotations on Forms, select.
maxHTMLAnnotationDepth	int	20	Prevents new annotations from being created after @c maxHTMLAnnotationDepth HTML level.
disableAutomaticHTMLDTDFix	boolean		Disables automatic DTD fix on HTML documents.

Name	Type	Description
fromDataModel	com.exalead.indexing.analysis.v10.DocumentProcessor	If dataModelState is "customized", you will find here the original document processor generated by the data model. Use this to easily revert to "auto" state from "customized". @IgnoreForValueConstructor
AcceptCondition	com.exalead.indexing.analysis.v10.AcceptCondition	Expresses the enablement condition of this DocumentProcessor.