Create an Index - ParadeDB

Basic Usage

For large indexes, tuning Postgres’ default memory and parallel worker settings prior to running CREATE INDEX is strongly recommended. Please see index tuning for details.

The following code block creates a BM25 index called search_idx over multiple columns of the mock_items table. All columns that are relevant to the search query, including columns used for sorting and filtering, should be indexed for optimal performance.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, description, category, rating, in_stock, created_at, metadata, weight_range)
WITH (key_field='id');

Note that the index’s column list contains fields of type text, JSON, numeric, timestamp, and boolean. See the Field Configuration section for how to configure these fields.

Syntax

CREATE INDEX <index_name> ON <schema_name>.<table_name>
USING bm25 (<columns_or_expressions>)
WITH (key_field='<key_field>');

index_name

The name of the index being created. If unspecified, Postgres will automatically choose a name.

schema_name

The name of the schema that the table belongs to. If unspecified, CURRENT SCHEMA is used.

table_name

required

The name of the table being indexed.

columns_or_expressions

required

A comma-separated list of columns or expressions to index, starting with the key_field. Text, numeric, datetime, boolean, range, enum, and JSON types can be indexed.

key_field

required

The name of a column in the table that represents a unique identifier for each record. Usually, this is the same column that is the primary key of the table.

Choosing a Key Field

The key_field must have a UNIQUE constraint. A non-unique key field is likely to lead to undefined behavior or incorrect search results. While the key field can be any unique text, numeric, or datetime value, an integer key field will be the most performant. The Postgres SERIAL type is an easy way to create a unique integer column. Finally, the key field must be the first column in the target list.

-- Recommended: key_field is the first column in the list
CREATE INDEX search_idx ON mock_items
USING bm25 (id, description)
WITH (key_field = 'id');

-- NOT recommended: key_field is not the first column
CREATE INDEX search_idx ON mock_items
USING bm25 (description, id)
WITH (key_field = 'id');

Partitioned Index

In Postgres, a partitioned index is an index created over a partitioned table. A BM25 index can be created over a partitioned table in the same way as a normal table.

If the partition key of your partitioned table is only indexed as a fast field by the BM25 index (and not additionally indexed via a B-Tree index), note that the @@@ operator must be used in order for filtering to be optimized by the index.

Partial Index

The following code block demonstrates how to pass predicates to CREATE INDEX to construct a partial index. Partial indexes are useful for reducing index size on disk and improving update speeds over non-indexed rows.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, description)
WITH (key_field='id')
WHERE category = 'Electronics' AND rating > 2;

Indexes on Expressions

Postgres supports creating indexes on expressions, which can be useful for optimizing queries that involve computed values. A BM25 index can also be created over an expression, as shown below:

CREATE INDEX search_idx ON mock_items
-- index a single field converting a text field to lowercase
USING bm25 (id, description, lower(description))
WITH (key_field='id');

In order to modify how the BM25 engine will index the expression, you can specify the newly computed field as _pg_search_{i} where {i} is the 0-based ordinal position of the expression in the index definition. In the example above, the lowercase version of the description field will be referred to as _pg_search_2.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, description, lower(description))
WITH (
  key_field='id',
  text_fields='{
    "_pg_search_2": {
      "fast":true,
      "tokenizer": {"type": "keyword"},
      "record": "basic"
    }
  }'
);

This rule holds for querying the field as well:

SELECT * FROM mock_items
WHERE mock_items @@@ paradedb.term('_pg_search_2', 'television');

Concurrent Indexing

To create a new index without blocking writes to your table, use the CONCURRENTLY keyword:

CREATE INDEX CONCURRENTLY search_idx_v2 ON mock_items
USING bm25 (id, description, category, rating, in_stock)
WITH (key_field='id');

This is particularly useful when you need to:

Reindex with different settings
Update an existing index without downtime
Change the indexed columns

pg_search can only use a single BM25 index per table - the most recently created one will automatically be used for queries. After creating a new index concurrently and verifying it works as expected, you can safely drop the old index:

DROP INDEX search_idx;

Delete Index

The following command deletes a BM25 index.

DROP INDEX search_idx;

Documentation

​Basic Usage

​Syntax

​Choosing a Key Field

​Partitioned Index

​Partial Index

​Indexes on Expressions

​Concurrent Indexing

​Delete Index

Basic Usage

Syntax

Choosing a Key Field

Partitioned Index

Partial Index

Indexes on Expressions

Concurrent Indexing

Delete Index