The simple tokenizer is the default tokenizer in ParadeDB. This tokenizer splits on punctuation, whitespace, and lowercases all characters.If no tokenizer is specified for a text field, the simple tokenizer is used.
Copy
Ask AI
-- The following statements are equivalentCREATE INDEX search_idx ON mock_itemsUSING bm25 (id, description)WITH (key_field='id');CREATE INDEX search_idx ON mock_itemsUSING bm25 (id, (description::pdb.simple))WITH (key_field='id');
To get a feel for this tokenizer, run the following command and replace the text with your own: