public class StemmingHandler extends java.lang.Object implements LinguisticExpanderResource.Handler
Internal details
The MOT processor is in charge of of annotating the raw Text's alphabetic
tokens with the normalized words having the same stem
(
http://en.wikipedia.org/wiki/Word_stem).
It does so by finding the stem of the word, then by querying the dictionary
with the regular expression "stem.*"
Snowball stemmer is employed in this handler to produce stems for most languages
(http://snowball.tartarus.org/).
For PL, CS, ET, SK, SL, the internal CloudView stemmer is used
The post-processor simply expands the tokenized text with these words under the NORMALIZED form.
Modifier and Type | Field and Description |
---|---|
protected static org.apache.log4j.Logger |
logger |
Constructor and Description |
---|
StemmingHandler(java.lang.String cloudViewStemmerResourceDir) |
Modifier and Type | Method and Description |
---|---|
LinguisticExpanderResource.PostProcessorFactory |
buildPostProcessorFactory() |
java.util.List<SemanticProcessor> |
buildSemanticProcessor() |
void |
release() |
public StemmingHandler(java.lang.String cloudViewStemmerResourceDir)
public void release()
release
in interface LinguisticExpanderResource.Handler
public java.util.List<SemanticProcessor> buildSemanticProcessor()
buildSemanticProcessor
in interface LinguisticExpanderResource.Handler
public LinguisticExpanderResource.PostProcessorFactory buildPostProcessorFactory()
buildPostProcessorFactory
in interface LinguisticExpanderResource.Handler
Copyright © 2013 Dassault Systèmes, All Rights Reserved.