Stemming is the process of reducing words to their root form. In English, for example, the root form of “running” and “runs” is “run”. Stemming can be configured for any tokenizer besides the literal tokenizer. Stemmers in ParadeDB are based on stemming algorithms obtained from the official Snowball website. To set a stemmer, appendDocumentation Index
Fetch the complete documentation index at: https://docs.paradedb.com/llms.txt
Use this file to discover all available pages before exploring further.
stemmer=<language> to the tokenizer’s arguments.
arabic, czech, danish, dutch, english, finnish, french, german, greek, hungarian, italian, norwegian, polish, portuguese, romanian, russian, spanish, swedish, tamil, and turkish.
To demonstrate this token filter, let’s compare the output of the following two statements:
Expected Response