ParadeDB Docs

The whitespace tokenizer is exactly like the simple tokenizer, but splits only on whitespace and preserves punctuation. It also lowercases by default.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, (description::pdb.whitespace))
WITH (key_field='id');

To get a feel for this tokenizer, run the following command and replace the text with your own:

SELECT 'Tokenize me!'::pdb.whitespace::text[];

Expected Response

      text
----------------
 {tokenize,me!}
(1 row)

Literal Normalized Ngram

⌘I

Documentation (v2)

Whitespace