ParadeDB supports all regex constructs of the Rust regex crate, with the following exceptions:
Lazy quantifiers such as +?
Word boundaries such as \b
Otherwise, the full syntax of the regex crate is supported, including all Unicode support and relevant flags.A list of regex flags and grouping options can be found here, which includes:
named and numbered capture groups
case insensitivty flag (i)
multi-line mode (m)
Regex queries operate at the token level. To execute regex over the original
text, use the keyword tokenizer.
During a regex query, ParadeDB doesn’t scan through every single word. Instead, it uses a highly optimized structure called a finite state transducer (FST) that makes it possible to jump straight to the matching terms.
Even if the index contains millions of words, the regex query only looks at the ones that have a chance of matching, skipping everything else.This is why the certain regex constructs are not supported — they are difficult to implement efficiently.