Skip to main content
The simple tokenizer is the default tokenizer in ParadeDB. This tokenizer splits on punctuation, whitespace, and lowercases all characters. If no tokenizer is specified for a text field, the simple tokenizer is used.
-- The following statements are equivalent
CREATE INDEX search_idx ON mock_items
USING bm25 (id, description)
WITH (key_field='id');

CREATE INDEX search_idx ON mock_items
USING bm25 (id, (description::pdb.simple))
WITH (key_field='id');
To get a feel for this tokenizer, run the following command and replace the text with your own:
SELECT 'Tokenize me!'::pdb.simple::text[];
Expected Response
     text
---------------
 {tokenize,me}
(1 row)
I