XML Configuration Reference : Connector : Feed
 
Feed
com.exalead.mercury.mami.crawl.v21.Feed
A feed. Contains KeyValue* that are mapped to metas on all documents crawled from this root. Beware: there is a 4KB limit on the whole url + metas storage.
Attributes:
Name
Type
Default value
Description
url
string
The root url.
site
boolean
True
Enable site-mode: only crawl urls that belong to this 'site'.
priority
int
Priority shift. Increase or decrease priority. 0 means normal, -1 is higher priority, +1 lower.
group
string
default
Key used to group rules and root urls.
kvs
string
A semi-colon separated list of key-values. example: "key1=value1;key2=value2"
refreshPeriodS
int
600
how often to refresh this feed, default 10min
indexFeedItems
boolean
True
whether to index all items found in the feed with metas, before crawling them
indexItemDocuments
boolean
True
whether to crawl the items and index the full item pages
findFeeds
boolean
whether to crawl feeds found in html headers <link href="" rel="alternate" />
forceFeedMimeType
boolean
True
force processing of url as xml feed (for servers returning buggy content types) can't work with findFeeds enabled.
findMediaLinks
boolean
True
find <img src="" /> and youtube/dailymotion links in item text and push them as metas
Nested elements:
Name
Type
Description
KeyValue
exa.bee.KeyValue*