Expected Response
Available Tokenizers
Jieba
The most advanced Chinese tokenizer that leverages both a dictionary and statistical models
The Jieba tokenizer is a tokenizer for Chinese text that leverages both a dictionary and statistical models. It is generally considered to be better at identifying ambiguous Chinese word boundaries
compared to the Chinese Lindera and Chinese compatible tokenizers, but
the tradeoff is that it is slower.
To get a feel for this tokenizer, run the following command and replace the text with your own: