Trim - ParadeDB

The trim filter removes leading and trailing whitespace from a token (but not whitespace in the middle). If a token consists entirely of whitespace, the token is eliminated entirely. This filter is useful for tokenizers that don’t already split on whitespace, like the literal normalized tokenizer or certain language-specific tokenizers.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, (description::pdb.literal_normalized('trim=true')))
WITH (key_field='id');

To demonstrate this token filter, let’s compare the output of the following two statements:

SELECT
  '    token with whitespace   '::pdb.literal_normalized::text[],
  '    token with whitespace   '::pdb.literal_normalized('trim=true')::text[];

Expected Response

               text               |           text
----------------------------------+---------------------------
 {"    token with whitespace   "} | {"token with whitespace"}
(1 row)

Token Length How Text Search Works

⌘I

Documentation