The token length filter automatically removes tokens that are above or below a certain length in bytes.
To remove all tokens longer than a certain length, append a remove_long configuration to the tokenizer:
Copy
Ask AI
CREATE INDEX search_idx ON mock_itemsUSING bm25 (id, (description::pdb.simple('remove_long=100')))WITH (key_field='id');
To remove all tokens shorter than a length, use remove_short:
Copy
Ask AI
CREATE INDEX search_idx ON mock_itemsUSING bm25 (id, (description::pdb.simple('remove_short=3')))WITH (key_field='id');
All tokenizers besides the literal tokenizer accept these configurations.To demonstrate this token filter, let’s compare the output of the following two statements:
Copy
Ask AI
SELECT 'A supersupersuperlong token'::pdb.simple::text[], 'A supersupersuperlong token'::pdb.simple('remove_short=2', 'remove_long=10')::text[];
Expected Response
Copy
Ask AI
text | text-------------------------------+--------- {a,supersupersuperlong,token} | {token}(1 row)