Basic Usage

Token filters apply additional processing to tokens after they have been created. For instance, the remove_long filter removes all tokens longer than a fixed number of bytes.

paradedb.tokenizer('default', lowercase => false, remove_long => 255)
lowercase
default: true

Lowercases all tokens.

remove_long
default: 255

Removes all tokens longer than this number of bytes.

stemmer

Applies language-specific stemming to each token. See stemming for supported languages.

Stemming

Stemming is the process of reducing words to their root form. In English, for example, the root form of running and runs is run. The stemmer filter can be applied to any tokenizer.

paradedb.tokenizer('default', stemmer => 'English')

Available stemmers are Arabic, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil, and Turkish.