Name | Type | Default value | Description |
---|---|---|---|
name | string | Name of this processor. The name of a processor is used only for tracing and debugging purposes. | |
dataModelState | string | Is this document processor managed by a data model? @enum{null,auto,customized, error}. • If null, this document processor is not related to a data model. • If "auto", this document processor is auto-generated by a data model. • If "customized", this document processor was auto-generated by a data model and then customized. • If "error", there is a conflict between this document processor and the data model. | |
dataModelClass | string | If dataModelState is either "auto" or "customized", you will find here the name of the DataModelClass that generated this DocumentProcessor. | |
dataModelProperty | string | If dataModelState is either "auto" or "customized", you will find here the name of the DataModelProperty that generated this DocumentProcessor. | |
disabled | boolean | Disable the DocumentProcessor | |
annotateHTML | boolean | Adds some stylish annotations to DocumentChunks (for HTML files only): • html:p for DocumentChunks generated from <p> • html:row for DocumentChunks generated from <tr> • html:column for DocumentChunks generated from <td> or <th> • html:table for DocumentChunks generated from <table> • html:h1 for DocumentChunks generated from <h1> • html:h2 for DocumentChunks generated from <h2> • html:h3 for DocumentChunks generated from <h3> • html:h4 for DocumentChunks generated from <h4> • html:h5 for DocumentChunks generated from <h5> • html:h6 for DocumentChunks generated from <h6> • html:link for DocumentChunks generated from <a>, <iframe> or <frame> ◦ html:link:rel if the link has a "rel" attribute ◦ html:link:name if the link has a "name" attribute • html:list for DocumentChunks generated from <ul>, <ol> or <dl> • html:item for DocumentChunks generated from <li> • html:bold for DocumentChunks generated from <b> or <strong> • html:italic for DocumentChunks generated from <i> or <em> • html:underline for DocumentChunks generated from <u> • html:strike for DocumentChunks generated from <s> or <strike> • html:pre for DocumentChunks generated from <pre> • html:invisible for DocumentChunks containing invisible text (display: none, white on white) • html:class for DocumentChunks taken in a CSS class • html:id for DocumentChunks taken in a CSS id • html:img:src for DocumentChunks created from a <img> It also creates specific HTML DocumentChunks with the following contexts: • html:lang when parsing a <html> containing the "lang" attribute • html:xml:lang when parsing a <html> containing the "xml:lang" attribute • html:title when parsing a <title> • html:title:other when parsing a second <title> • html:base:href when parsing a <base> • html:link when parsing a <link> containing the "src" attribute and annotated by: ◦ html:link:rel if the link has a "rel" attribute ◦ html:link:type if the link has a "type" attribute • html:http-equiv:NAME when parsing a http-equiv meta • html:meta:NAME when parsing a meta named "NAME" | |
skipInvisibleHTMLText | boolean | Skips the invisible text. For example, white fonts on white backgrounds (for HTML files only). | |
extractJs | boolean | Tries to parse JavaScript and then extract links. | |
extractHTMLTables | boolean | Adds annotations on table, tr, td, th | |
extractHTMLStyles | boolean | Adds annotations on style attributes. | |
extractHTMLForms | boolean | Add annotations on Forms, select. | |
maxHTMLAnnotationDepth | int | 20 | Prevents new annotations from being created after @c maxHTMLAnnotationDepth HTML level. |
disableAutomaticHTMLDTDFix | boolean | Disables automatic DTD fix on HTML documents. |
Name | Type | Description |
---|---|---|
fromDataModel | com.exalead.indexing.analysis.v10.DocumentProcessor | If dataModelState is "customized", you will find here the original document processor generated by the data model. Use this to easily revert to "auto" state from "customized". @IgnoreForValueConstructor |
AcceptCondition | com.exalead.indexing.analysis.v10.AcceptCondition | Expresses the enablement condition of this DocumentProcessor. |