Stemmer
Stemming is the process of reducing words to their root form. In English, for example, the root form ofrunning and runs is run. The stemmer filter can
be applied to any tokenizer.
stemmer
Available stemmers are
Arabic, Danish, Dutch, English, Finnish,
French, German, Greek, Hungarian, Italian, Norwegian,
Portuguese, Romanian, Russian, Spanish, Swedish, Tamil, and
Turkish.Remove Long
Theremove_long filter removes all tokens longer than a fixed number of bytes. If not specified,
remove_long defaults to 255.
Lowercase
Thelowercase filter lowercases all tokens. If not specified, lowercase defaults to true.
Stopwords Language
This filter is not supported for the ngram tokenizer.
stopwords_language removes common “stop words” for a specific language from the original text before tokenization.
stopwords_language
Available languages are
Danish, Dutch, English, Finnish, French,
German, Hungarian, Italian, Norwegian, Portuguese, Russian,
Spanish, and Swedish.Custom Stopwords
This filter is not supported for the ngram tokenizer.
stopwords removes custom words from the original text before tokenization.