The Chinese compatible tokenizer is like the simple tokenizer — it lowercases non-CJK characters and splits on
whitespace and punctuation. Additionally, it treats each CJK character as its own token.
Copy
Ask AI
CREATE INDEX search_idx ON mock_itemsUSING bm25 (id, (description::pdb.chinese_compatible))WITH (key_field='id');
To get a feel for this tokenizer, run the following command and replace the text with your own: