public class OntologyMatcher extends Processor
Modifier and Type | Field and Description |
---|---|
boolean |
addReferencesForIgnoredTags
add references for ignored tags
|
java.lang.String |
annotationPrefix
prefix to prepend to annotations' tag
|
boolean |
enableApproxMatching
When set to true, we enable approximative matching in ontology
Approximative matching use Damerau-Levenshtein edit distance
|
boolean |
ignoreSpaces
if your ontology was compiled with matchOnSeparators = false
- this allows 'lemonde' to retrieve 'le monde' entry or 'le monde' to retrieves 'lemonde'
if your ontology was compiled with matchOnSeparators = true
- this allows 'le monde' to retrieve 'le monde'
|
java.lang.String |
matchAgainst
annotation the ontology terms are matched against
|
int |
minWordSizeForDist1
Minimum number of chars in token to enable damerau-levenshtein distance of 1
|
int |
minWordSizeForDist2
Minimum number of chars in token to enable damerau-levenshtein distance of 2
|
boolean |
packageLevelMatchDedup
When deduping results, try to keep only one match per package, i.e.
|
java.lang.String |
resource
The processor resource
|
boolean |
restrictLanguage
when set to true, keep only expression added with language == Language.XX or with document language
for exemple if Ontology contains an expression added with language=En, if will be extracted only
for english document with restrictLanguage is set to true
|
boolean |
splitHandling
Match a term even if there are no blanks between alphanums tokens in the text
(Spellchecker may generates two tokens from one by splitting it up but no blank is inserted)
When enabled, the text [air][france] will match the entry [air][ ][france]
|
boolean |
suffixApproxMatching
When set to true, all approximative match have a tag suffixed by '.approx'
If you disable this suffix, the trustLevel of exact entry will all by set to 100 (this override the trustLevel in original xml file)
|
java.util.List<java.lang.String> |
tagsToIgnore
set the list of tag to be ignored
this feature allows to define a list of words/expressions to ignore in the recognotion of this ontology
For example if you add expression "of" and "the" with the tag "toIgnore" in the ontology A, then you add
the expression "website embassy" in ontology B with tagsToIgnore=["toIgnore"],
you will be able to match "website of the embassy", "website of embassy" and "website ambassy"
WARNING: for the moment this option is not compatible when ontology was compiled without
matchOnSeparators=false
|
boolean |
trustLevelBasedDedup
Keeps only the annotations with the highest trust level when several overlap.
|
Constructor and Description |
---|
OntologyMatcher(java.lang.String name,
java.lang.String resource,
java.lang.String fields) |
OntologyMatcher(java.lang.String name,
java.lang.String resource,
java.lang.String matchAgainstAnnotation,
java.lang.String fields,
boolean restrictToLang,
boolean addReferencesForIgnoredTags,
boolean enableApproxMatching,
boolean suffixApproxMatching,
int minWordSizeForDist1,
int minWordSizeForDist2,
boolean ignoreSpaces,
java.lang.String annotationPrefix)
Initialize a features extractor
|
Modifier and Type | Method and Description |
---|---|
protected void |
addTagToIgnore(java.lang.String tag)
Add a tag to ignore
|
protected void |
finalize() |
static java.lang.String |
getApproxTagSuffix() |
void |
init(java.lang.String name,
java.lang.String[] fields)
Initialize the processor
|
void |
init(java.lang.String name,
java.lang.String resource,
java.lang.String matchAgainstAnnotation,
java.lang.String[] fields,
boolean restrictToLang,
boolean addReferencesForIgnoredTags,
boolean enableApproxMatching,
boolean suffixApproxMatching,
int minWordSizeForDist1,
int minWordSizeForDist2,
boolean ignoreSpaces,
java.lang.String annotationPrefix,
boolean trustLevelDedup,
boolean splitHandling,
boolean packageLevelMatchDedup) |
protected void |
initNative(java.lang.String name,
java.lang.String resource,
java.lang.String matchAgainstAnnotation,
java.lang.String[] fields,
boolean restrictToLang,
boolean addReferencesForIgnoredTags,
boolean enableApproxMatching,
boolean suffixApproxMatching,
int minWordSizeForDist1,
int minWordSizeForDist2,
boolean ignoreSpaces,
java.lang.String annotationPrefix,
boolean trustLevelDedup,
boolean splitHandling,
boolean packageLevelMatchDedup) |
checkResource, destroy, getName, init
public java.lang.String resource
public boolean enableApproxMatching
public boolean suffixApproxMatching
public int minWordSizeForDist1
public int minWordSizeForDist2
public boolean ignoreSpaces
public boolean splitHandling
public boolean packageLevelMatchDedup
public boolean restrictLanguage
public boolean addReferencesForIgnoredTags
public java.lang.String matchAgainst
public java.lang.String annotationPrefix
public java.util.List<java.lang.String> tagsToIgnore
public boolean trustLevelBasedDedup
public OntologyMatcher(java.lang.String name, java.lang.String resource, java.lang.String matchAgainstAnnotation, java.lang.String fields, boolean restrictToLang, boolean addReferencesForIgnoredTags, boolean enableApproxMatching, boolean suffixApproxMatching, int minWordSizeForDist1, int minWordSizeForDist2, boolean ignoreSpaces, java.lang.String annotationPrefix)
name
- Its nameresource
- The associated resource namematchAgainstAnnotation
- annotation the ontology terms are matched againstfields
- The list of fields on which it's activerestrictToLang
- Keep only expression added with language == Language.XX or with document languageaddReferencesForIgnoredTags
- Add references for ignored tagsenableApproxMatching
- Approximative matching using Damerau-Levenshtein edit distancesuffixApproxMatching
- Approximative match have a tag suffixed by '.approx'minWordSizeForDist1
- Minimum number of chars in token to enable damerau-levenshtein distance of 1minWordSizeForDist2
- Minimum number of chars in token to enable damerau-levenshtein distance of 2ignoreSpaces
- Ignore spaces when matchingannotationPrefix
- Prefix to add to annotations' tagpublic OntologyMatcher(java.lang.String name, java.lang.String resource, java.lang.String fields)
public void init(java.lang.String name, java.lang.String resource, java.lang.String matchAgainstAnnotation, java.lang.String[] fields, boolean restrictToLang, boolean addReferencesForIgnoredTags, boolean enableApproxMatching, boolean suffixApproxMatching, int minWordSizeForDist1, int minWordSizeForDist2, boolean ignoreSpaces, java.lang.String annotationPrefix, boolean trustLevelDedup, boolean splitHandling, boolean packageLevelMatchDedup)
public void init(java.lang.String name, java.lang.String[] fields)
public static java.lang.String getApproxTagSuffix()
protected void initNative(java.lang.String name, java.lang.String resource, java.lang.String matchAgainstAnnotation, java.lang.String[] fields, boolean restrictToLang, boolean addReferencesForIgnoredTags, boolean enableApproxMatching, boolean suffixApproxMatching, int minWordSizeForDist1, int minWordSizeForDist2, boolean ignoreSpaces, java.lang.String annotationPrefix, boolean trustLevelDedup, boolean splitHandling, boolean packageLevelMatchDedup)
protected void addTagToIgnore(java.lang.String tag)
protected void finalize()
finalize
in class java.lang.Object
Copyright © 2013 Dassault Systèmes, All Rights Reserved.