In many cases, a text field needs to be tokenized multiple ways. For instance, using the unicode tokenizer for search, and the literal tokenizer for Top K ordering. To tokenize a field in more than one way, append anDocumentation Index
Fetch the complete documentation index at: https://docs.paradedb.com/llms.txt
Use this file to discover all available pages before exploring further.
alias=<alias_name> argument to the additional tokenizer configurations.
The alias name can be any string you like. For instance, the following statement tokenizes description using both the simple and literal tokenizers.
description, which uses the literal tokenizer,
and an aliased field called description_simple, which uses the simple tokenizer.
To query against the aliased field, cast it to pdb.alias('alias_name'):
If a text field uses multiple tokenizers and one of them is literal, we recommend aliasing
the other tokenizers and leaving the literal tokenizer un-aliased. This is so queries that
GROUP BY, ORDER BY, or aggregate the
text field can reference the field directly:SQL