Indexing
Token Filters
Basic Usage
Token filters apply additional processing to tokens after they have been created. For instance, the remove_long
filter
removes all tokens longer than a fixed number of bytes.
lowercase
default: true
Lowercases all tokens.
remove_long
default: 255
Removes all tokens longer than this number of bytes.
stemmer
Applies language-specific stemming to each token. See stemming for supported languages.
Stemming
Stemming is the process of reducing words to their root form. In English, for example, the root form of running
and runs
is run
. The stemmer
filter can
be applied to any tokenizer.
Available stemmers are Arabic
, Danish
, Dutch
, English
, Finnish
, French
, German
, Greek
, Hungarian
, Italian
, Norwegian
, Portuguese
, Romanian
, Russian
, Spanish
, Swedish
, Tamil
, and Turkish
.
Was this page helpful?