The ASCII folding filter strips away diacritical marks (accents, umlauts, tildes, etc.) while leaving the base character intact.
It is supported for all tokenizers besides the literal tokenizer.To enable, append ascii_folding=true to the tokenizer’s arguments.
Copy
Ask AI
CREATE INDEX search_idx ON mock_itemsUSING bm25 (id, (description::pdb.simple('ascii_folding=true')))WITH (key_field='id');
To demonstrate this token filter, let’s compare the output of the following two statements: