Skip to main content
Stopwords are words that are so common or semantically insignificant in most contexts that they can be ignored during indexing. In English, for example, stopwords include “a”, “and”, “or”, etc. All tokenizers besides the literal tokenizer can be configured to automatically remove stopwords for a given language.
CREATE INDEX search_idx ON mock_items
USING bm25 (id, (description::pdb.simple('stopwords_language=english')))
WITH (key_field='id');
Valid languages are danish, dutch, english, finnish, french, german, hungarian, italian, norwegian, portuguese, russian, spanish, swedish. To demonstrate this token filter, let’s compare the output of the following two statements:
SELECT
  'The cat in the hat'::pdb.simple::text[],
  'The cat in the hat'::pdb.simple('stopwords_language=english')::text[];
Expected Response
         text         |   text
----------------------+-----------
 {the,cat,in,the,hat} | {cat,hat}
(1 row)
I