We usually put the normalizer right after the tokenizer in analysis pipelines. The normalizer computes lowercase and unaccentuated (normalized) forms for each alphanumeric token.
For each alphanumeric token, you must have in output: a LOWERCASE annotation and one or two NORMALIZE annotations.
All languages where there is a distinction between lowercase vs uppercase and accentuated vs unaccentuated.
Moreover, normalization exceptions are defined for:
• German
• Spanish
• French
• Italian
And normalization alternatives are defined for:
• German
For example, in German, grüne (green) has an alternative normalized form gruene. You get the following annotations: