Stemmer
Stemming is the process of reducing words to their root form. In English, for example, the root form ofrunning
and runs
is run
. The stemmer
filter can
be applied to any tokenizer.
stemmer
Available stemmers are
Arabic
, Danish
, Dutch
, English
, Finnish
,
French
, German
, Greek
, Hungarian
, Italian
, Norwegian
,
Portuguese
, Romanian
, Russian
, Spanish
, Swedish
, Tamil
, and
Turkish
.Remove Long
Theremove_long
filter removes all tokens longer than a fixed number of bytes. If not specified,
remove_long
defaults to 255
.
Lowercase
Thelowercase
filter lowercases all tokens. If not specified, lowercase
defaults to true
.
Stopwords Language
This filter is not supported for the ngram tokenizer.
stopwords_language
removes common “stop words” for a specific language from the original text before tokenization.
stopwords_language
Available languages are
Danish
, Dutch
, English
, Finnish
, French
,
German
, Hungarian
, Italian
, Norwegian
, Portuguese
, Russian
,
Spanish
, and Swedish
.Custom Stopwords
This filter is not supported for the ngram tokenizer.
stopwords
removes custom words from the original text before tokenization.