Advanced Configuration

Rule category	Rule key	Example	Description and parameters
Boolean	And, Not, Or	<Or>...</Or>	Performs Boolean matching for the nested rules defined. Commonly used for Boolean matching on URL strings.
generic	Atom	Syntax: <Atom field="" kind="" norm="" value="" /> Example: <Atom field="path" kind="prefix" norm="none" value="/watch" />	This generic rule needs to define the type of match: The field defines the type of element to match. Possible values are url\|scheme\|host\|path\|query kind defines the part of the field to use for the match. It can have the values: exact\|prefix\|suffix • inside where you specify a regexp and its anchoring in val. • length where you specify the length of a field ([:10], [11:12], [30:]) in val. norm impacts the normalization level. The default is a case insensitive match that corresponds to non. Possible values: norm\|lower\|none. value is a regexp that must be matched with the links during crawl, based on the field and kind parameters.
shortcut	Domain	Exact match on domain. For example, <Domain val="foo.com" /> Matches http://foo.com/ and http://bar.foo.com but not http://barfoo.com	This is a shortcut for this combination of rule: <Or><Atom field="host" kind="suffix" value=".foo.com" /><Atom field="host" kind="exact" value="foo.com" /></Or>
	Path	<Path val="/cgi-bin" />	This rule is a shortcut for atom path-prefix. It is a left anchored match on the path.
	Ext	<Ext val=".gif" />	This rule is a shortcut for atom path-suffix. It is a right anchored match on the path.
	Host	<Host val="www.wikipedia.org" />	Performs an exact match on host. This rule is a shortcut for atom host-exact.
	Url	<Url val="http://en.wikipedia.org/wiki>	This rule is a shortcut for atom url-exact.
	Scheme	<Scheme val="http" />	This rule is a shortcut for atom scheme-exact. Possible values: http\|https
	Query	<Query val="q=foo" />	This rule is a shortcut for atom query-exact. It performs an exact match on the query.
	InQuery	<InQuery val="q=foo" />	This rule is a shortcut for atom query-inside. It performs a match on the query not anchored.
	Length	<Length field="path" val="[30:]" /> matches URLs with a path length >= 30	This rule is a shortcut for atom field-length. It specifies the length of the URL path.

Action	adds XML Tags...
Index and follow	<Index/> <Follow/> <Accept/>
Index and don’t follow	<Index/> <NoFollow/> <Accept/>
Follow but don’t index	<NoIndex/> <Follow/> <Accept/>
Index	<Index/> <Accept/>
Follow	<Follow/> <Accept/>
Don’t index	<NoIndex/>
Don’t follow	<NoFollow/>
Ignore	<NoIndex/> <NoFollow/> <Ignore/>
Source	<Source name=""/>
Add meta	<AddMeta value="" name=""/>
Priority	<Priority shift=""/> Possible values: -2 = Highest -1 = Higher 0 = Normal 1 = Lower 2 = Lowest

URL source	Number	Content	Default weight (priority)
fifo: user	0	Only user-submitted root URLs with priority 0, and roots with default priority.	10000
fifo: redir	1	Targets of redirections.	2000
fifo: index	2	Documents that are indexed but whose links are not be followed.	1000
fifo: index_follow	3	Documents that are indexed and whose links are followed.	100
fifo: follow	4	Documents whose links are followed, but which are not indexed.	10
smart refresh source	5	Documents to refresh.	1