Basic Usage

The following code block creates a BM25 index called search_idx over multiple columns of the mock_items table. All columns that are relevant to the search query, including columns used for sorting and filtering, should be indexed for optimal performance.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, description, category, rating, in_stock, created_at, metadata, weight_range)
WITH (key_field='id');

Syntax

CREATE INDEX <index_name> ON <schema_name>.<table_name>
USING bm25 (<columns>)
WITH (key_field='<key_field>');
index_name

The name of the index being created. If unspecified, Postgres will automatically choose a name.

schema_name

The name of the schema that the table belongs to. If unspecified, CURRENT SCHEMA is used.

table_name
required

The name of the table being indexed.

columns
required

A comma-separated list of columns to index, starting with the key field. Text, numeric, datetime, boolean, range, enum, and JSON types can be indexed.

key_field
required

The name of a column in the table that represents a unique identifier for each record. Usually, this is the same column that is the primary key of the table.

Indexing Settings

Several settings exist to control the amount of resources to allocate during index creation. While they have sensible defaults, overall indexing performance can be drastically improved if configured for the host system.

Choosing a Key Field

While the key field can be any unique text, numeric, or datetime value, an integer key field will be the most performant. The Postgres SERIAL type is an easy way to create a unique integer column.

The key field is not intended to be tokenized. For instance, the following configuration is not allowed:

-- This will throw an error
CREATE INDEX search_idx ON mock_items
USING bm25 (description)
WITH (key_field = 'description');

Finally, the key field must be the first column in the target list.

-- Recommended: key_field is the first column in the list
CREATE INDEX search_idx ON mock_items
USING bm25 (id, description)
WITH (key_field = 'id');

-- NOT recommended: key_field is not the first column
CREATE INDEX search_idx ON mock_items
USING bm25 (description, id)
WITH (key_field = 'id');

Partitioned Index

In Postgres, a partitioned index is an index created over a partitioned table. A BM25 index can be created over a partitioned table in the same way as a normal table.

Partial Index

The following code block demonstrates how to pass predicates to CREATE INDEX to construct a partial index. Partial indexes are useful for reducing index size on disk and improving update speeds over non-indexed rows.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, description)
WITH (key_field='id')
WHERE category = 'Electronics' AND rating > 2;

Concurrent Indexing

To create a new index without blocking writes to your table, use the CONCURRENTLY keyword:

CREATE INDEX CONCURRENTLY search_idx_v2 ON mock_items
USING bm25 (id, description, category, rating, in_stock)
WITH (key_field='id');

This is particularly useful when you need to:

  • Reindex with different settings
  • Update an existing index without downtime
  • Change the indexed columns

pg_search can only use a single BM25 index per table - the most recently created one will automatically be used for queries. After creating a new index concurrently and verifying it works as expected, you can safely drop the old index:

DROP INDEX search_idx;