Regex queries search for terms that follow a pattern. For example, the wildcard pattern key.* finds all terms that start with key.
SELECT description, rating, category
FROM mock_items
WHERE description @@@ pdb.regex('key.*');
ParadeDB supports all regex constructs of the Rust regex crate, with the following exceptions:
- Lazy quantifiers such as
+?
- Word boundaries such as
\b
Otherwise, the full syntax of the regex crate is supported, including all Unicode support and relevant flags.
A list of regex flags and grouping options can be found here, which includes:
- named and numbered capture groups
- case insensitivty flag (
i)
- multi-line mode (
m)
Regex queries operate at the token level. To execute regex over the original
text, use the keyword tokenizer.
During a regex query, ParadeDB doesn’t scan through every single word. Instead, it uses a highly optimized structure called a finite state transducer (FST) that makes it possible to jump straight to the matching terms.
Even if the index contains millions of words, the regex query only looks at the ones that have a chance of matching, skipping everything else.
This is why the certain regex constructs are not supported — they are difficult to implement efficiently.