ParadeDB Docs

In many cases, a text field needs to be tokenized multiple ways. For instance, using the simple tokenizer for search, and the literal tokenizer for Top N ordering. To tokenize a field in more than one way, append an alias=<alias_name> argument to the additional tokenizer configurations. The alias name can be any string you like. For instance, the following statement tokenizes description using both the simple and literal tokenizers.

CREATE INDEX search_idx ON mock_items
USING bm25 (
  id,
  (description::pdb.literal),
  (description::pdb.simple('alias=description_simple'))
) WITH (key_field='id');

Under the hood, two distinct fields are created in the index: a field called description, which uses the literal tokenizer, and an aliased field called description_simple, which uses the simple tokenizer. To query against the aliased field, cast it to pdb.alias('alias_name'):

-- Query against `description_simple`
SELECT description, rating, category
FROM mock_items
WHERE description::pdb.alias('description_simple') ||| 'Sleek running shoes';

-- Query against `description`
SELECT description, rating, category
FROM mock_items
WHERE description ||| 'Sleek running shoes';

If a text field uses multiple tokenizers and one of them is literal, we recommend aliasing the other tokenizers and leaving the literal tokenizer un-aliased. This is so queries that GROUP BY, ORDER BY, or aggregate the text field can reference the field directly:

CREATE INDEX search_idx ON mock_items
USING bm25 (
  id,
  (description::pdb.literal),
  (description::pdb.simple('alias=description_simple'))
) WITH (key_field='id');

SELECT description, rating, category
FROM mock_items
WHERE description @@@ 'shoes'
ORDER BY description
LIMIT 5;

Documentation (v2)

Multiple Tokenizers Per Field