Stopwords are words that are so common or semantically insignificant in most contexts that they can be ignored during indexing. In English, for example, stopwords include “a”, “and”, “or”, etc. All tokenizers besides the literal tokenizer can be configured to automatically remove stopwords for one or more languages.Documentation Index
Fetch the complete documentation index at: https://docs.paradedb.com/llms.txt
Use this file to discover all available pages before exploring further.
Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Polish, Portuguese, Russian, Spanish, and Swedish. Language names are case-insensitive.
Multiple Languages
For documents containing multiple languages, you can specify multiple stopword languages as a comma-separated list:Expected Response
Example
To demonstrate this token filter, let’s compare the output of the following two statements:Expected Response