regex_pattern
tokenizer tokenizes text using a regular expression. The regular expression can be specified with the pattern parameter.
For instance, the following tokenizer creates tokens only for words starting with the letter h
:
- Lazy quantifiers such as
+?
- Word boundaries such as
\b
Expected Response