Expected Response
Available Tokenizers
ICU
Splits text according to the Unicode standard
The ICU (International Components for Unicode) tokenizer breaks down text according to the Unicode standard. It can be used to tokenize most languages and recognizes the nuances in word boundaries across different languages.
To get a feel for this tokenizer, run the following command and replace the text with your own: