The source code tokenizer is intended for tokenizing code. In addition to splitting on whitespace,
punctuation, and symbols, it also splits on common casing conventions like camel case and snake case. For instance, text like
my_variable or myVariable would get split into my and variable.
Copy
Ask AI
CREATE INDEX search_idx ON mock_itemsUSING bm25 (id, (description::pdb.source_code))WITH (key_field='id');
To get a feel for this tokenizer, run the following command and replace the text with your own: