Rule category | Rule key | Example | Description and parameters |
---|---|---|---|
Boolean | And, Not, Or | <Or>...</Or> | Performs Boolean matching for the nested rules defined. Commonly used for Boolean matching on URL strings. |
generic | Atom | Syntax: <Atom field="" kind="" norm="" value="" /> Example: <Atom field="path" kind="prefix" norm="none" value="/watch" /> | This generic rule needs to define the type of match: The field defines the type of element to match. Possible values are url|scheme|host|path|query kind defines the part of the field to use for the match. It can have the values: exact|prefix|suffix • inside where you specify a regexp and its anchoring in val. • length where you specify the length of a field ([:10], [11:12], [30:]) in val. norm impacts the normalization level. The default is a case insensitive match that corresponds to non. Possible values: norm|lower|none. value is a regexp that must be matched with the links during crawl, based on the field and kind parameters. |
shortcut | Domain | Exact match on domain. For example, <Domain val="foo.com" /> Matches http://foo.com/ and http://bar.foo.com but not http://barfoo.com | This is a shortcut for this combination of rule: <Or><Atom field="host" kind="suffix" value=".foo.com" /><Atom field="host" kind="exact" value="foo.com" /></Or> |
Path | <Path val="/cgi-bin" /> | This rule is a shortcut for atom path-prefix. It is a left anchored match on the path. | |
Ext | <Ext val=".gif" /> | This rule is a shortcut for atom path-suffix. It is a right anchored match on the path. | |
Host | <Host val="www.wikipedia.org" /> | Performs an exact match on host. This rule is a shortcut for atom host-exact. | |
Url | <Url val="http://en.wikipedia.org/wiki> | This rule is a shortcut for atom url-exact. | |
Scheme | <Scheme val="http" /> | This rule is a shortcut for atom scheme-exact. Possible values: http|https | |
Query | <Query val="q=foo" /> | This rule is a shortcut for atom query-exact. It performs an exact match on the query. | |
InQuery | <InQuery val="q=foo" /> | This rule is a shortcut for atom query-inside. It performs a match on the query not anchored. | |
Length | <Length field="path" val="[30:]" /> matches URLs with a path length >= 30 | This rule is a shortcut for atom field-length. It specifies the length of the URL path. |
Action | adds XML Tags... |
---|---|
Index and follow | <Index/> <Follow/> <Accept/> |
Index and don’t follow | <Index/> <NoFollow/> <Accept/> |
Follow but don’t index | <NoIndex/> <Follow/> <Accept/> |
Index | <Index/> <Accept/> |
Follow | <Follow/> <Accept/> |
Don’t index | <NoIndex/> |
Don’t follow | <NoFollow/> |
Ignore | <NoIndex/> <NoFollow/> <Ignore/> |
Source | <Source name=""/> |
Add meta | <AddMeta value="" name=""/> |
Priority | <Priority shift=""/> Possible values: -2 = Highest -1 = Higher 0 = Normal 1 = Lower 2 = Lowest |
URL source | Number | Content | Default weight (priority) |
---|---|---|---|
fifo: user | 0 | Only user-submitted root URLs with priority 0, and roots with default priority. | 10000 |
fifo: redir | 1 | Targets of redirections. | 2000 |
fifo: index | 2 | Documents that are indexed but whose links are not be followed. | 1000 |
fifo: index_follow | 3 | Documents that are indexed and whose links are followed. | 100 |
fifo: follow | 4 | Documents whose links are followed, but which are not indexed. | 10 |
smart refresh source | 5 | Documents to refresh. | 1 |