- All Implemented Interfaces:
- LinguisticExpanderResource.Handler
public class CJHandler
extends java.lang.Object
implements LinguisticExpanderResource.Handler
Specific handling of CJ query chunks
Reminder: after the MOT Pipe, we have:
- one token per ideogram
- CJ forms if recall mode enabled: one annotation per ideogram, annotation tag = cjk, all annotations with nbTokens = 1
- "tokenizer" forms: one annotation per detected word (with nbTokens > 1)
- all japanese alternative forms per detected word (with nbTokens > 1)
Example: query 本田技術研究所
NORMALIZE annotations [本田] [技術] [研究所]
cjk annotations [本] [田] [技] [術] [研] [究] [所]
tokens [本] [田] [技] [術] [研] [究] [所]
result (本田 OR "本 田") AND (技術 OR "技 術") AND (研究所 OR "研 究 所")
This handler does the following:
- Merge CJ forms of the same token into a sequence with a weight defined by the cjk form-indexing trust level