XML Configuration Reference : Connector : CustomCrawlConfig
 
CustomCrawlConfig
com.exalead.mercury.mami.crawl.v21.CustomCrawlConfig
Custom processors specification.
Parent elements:
com.exalead.mercury.mami.crawl.v21.Crawler (as Crawler)
com.exalead.mercury.mami.crawl.v21.FeedFetcher (as FeedFetcher)
com.exalead.mercury.mami.crawl.v21.ICrawler (as ICrawler)
Attributes:
Name
Type
Default value
Description
preProcessorClassId
string
Custom PreProcessor. Called at the end of the preprocess pipe.
fetcherClassId
string
Custom Fetcher.
processorClassId
string
Custom Processor. Called at the end of the process pipe. Catches all mime types.
htmlProcessorClassId
string
Custom HTML Processor. Called at the of the html process pipe. Catches only html documents.
linksFilterClassId
string
Custom LinksFilter. Called at the end of the links filter list. Can decide whether to crawl an outgoing link.
postProcessorClassId
string
Custom PostProcessor. Called at the end of the postprocess pipe.
crawlerTemplate
string
Alternatively, specify the url of a xml file describing the whole crawler.