Literal - ParadeDB

The literal tokenizer is not ideal for text search queries like match or phrase. If you need to do text search over a field that is literal tokenized, consider using multiple tokenizers.

Because the literal tokenizer preserves the source text exactly, token filters cannot be configured for this tokenizer.

The literal tokenizer applies no tokenization to the text, preserving it as-is. It is the default for uuid fields (since exact UUID matching is a common use case), and is useful for doing exact string matching over text fields. It is also required if the text field is used as a sort field in a Top N query, or as part of an aggregate.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, (description::pdb.literal))
WITH (key_field='id');

To get a feel for this tokenizer, run the following command and replace the text with your own:

SELECT 'Tokenize me!'::pdb.literal::text[];

Expected Response

       text
------------------
 {"Tokenize me!"}
(1 row)

Documentation