Skip to main content
The following code block creates a BM25 index over several columns in the mock_items table.
CREATE INDEX search_idx ON mock_items
USING bm25 (id, description, category)
WITH (key_field='id');
Only one BM25 index can exist per table. We recommend indexing all columns in a table that may be present in a search query, including columns used for sorting, grouping, filtering, and aggregations.
CREATE INDEX search_idx ON mock_items
USING bm25 (id, description, category, rating, in_stock, created_at, metadata, weight_range)
WITH (key_field='id');
Most Postgres types, including text, JSON, numeric, timestamp, range, boolean, and arrays, can be indexed.

Choosing a Key Field

In the CREATE INDEX statement above, note the mandatory key_field option. Every BM25 index needs a key_field, which is the name of a column that will function as a row’s unique identifier within the index. The key_field must:
  1. Have a UNIQUE constraint. Usually this means the table’s PRIMARY KEY.
  2. Be the first column in the column list.
  3. Be untokenized, if it is a text field.

Tokenizers

Understanding how tokenization works is key to getting the search results you want, as full text search in ParadeDB (and most search tools) is centered around token matching. For a refresher, please see the text search overview.
By default, text columns are tokenized using the simple tokenizer, which lowercases and splits on both punctuation and whitespace. This tokenization strategy is configurable. For example, the following code block uses the ngrams tokenizer for description.
CREATE INDEX search_idx ON mock_items
USING bm25 (id, (description::pdb.ngram(2,3)), category)
WITH (key_field='id');
There are many available tokenizers to choose from.

Token Filters

After tokens are created, token filters can be configured to apply further processing like lowercasing, stemming, or unaccenting. For example, the following code block adds English stemming to description:
CREATE INDEX search_idx ON mock_items
USING bm25 (id, (description::pdb.simple('stemmer=english')), category)
WITH (key_field='id');
I