ParadeDB Docs

The ASCII folding filter strips away diacritical marks (accents, umlauts, tildes, etc.) while leaving the base character intact. It is supported for all tokenizers besides the literal tokenizer. To enable, append ascii_folding=true to the tokenizer’s arguments.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, (description::pdb.simple('ascii_folding=true')))
WITH (key_field='id');

To demonstrate this token filter, let’s compare the output of the following two statements:

SELECT
  'Café naïve coöperate'::pdb.simple::text[],
  'Café naïve coöperate'::pdb.simple('ascii_folding=true')::text[];

Expected Response

          text          |          text
------------------------+------------------------
 {café,naïve,coöperate} | {cafe,naive,cooperate}
(1 row)

Documentation (v2)

ASCII Folding