Overview - ParadeDB

Basic Usage

You can change how individual fields are tokenized by passing JSON strings to the WITH clause of CREATE INDEX. For instance, the following statement configures an ngram tokenizer for the description field.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, description)
WITH (
    key_field = 'id',
    text_fields = '{
        "description": {
          "tokenizer": {"type": "ngram", "min_gram": 2, "max_gram": 3, "prefix_only": false}
        }
    }'
);

The key(s) of the JSON string correspond to field names, and the values are the configuration options. If a configuration option or field name is not specified, the default values are used (see all configuration options).

Configure Multiple Fields

To configure multiple fields, simply pass more keys to the JSON string. For instance, the following statement specifies tokenizers for both the description and category fields.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, description, category)
WITH (
    key_field = 'id',
    text_fields = '{
        "description": {
          "tokenizer": {"type": "ngram", "min_gram": 2, "max_gram": 3, "prefix_only": false}
        },
        "category": {
            "tokenizer": {"type": "ngram", "min_gram": 2, "max_gram": 3, "prefix_only": false}
        }
    }'
);

All Configuration Options

Text Fields

Options for columns of type VARCHAR, TEXT, UUID, and their corresponding array types should be passed to text_fields.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, description)
WITH (
    key_field = 'id',
    text_fields = '{
        "description": {
          "fast": true,
          "tokenizer": {"type": "ngram", "min_gram": 2, "max_gram": 3, "prefix_only": false}
        }
    }'
);

The nested configuration JSON for text_fields accepts the following keys.

fast

default:false

See fast fields for when this option should be set to true.

tokenizer

See tokenizers for how to configure the tokenizer.

record

default:"position"

See record for a list of available record types.

normalizer

See normalizers for how to configure the normalizer.

Advanced Options

JSON Fields

Options for columns of type JSON and JSONB should be passed to json_fields.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, metadata)
WITH (
  key_field = 'id',
  json_fields = '{
    "metadata": {
      "fast": true
    }
  }'
);

The nested configuration JSON for json_fields accepts the following keys.

fast

default:false

See fast fields for when this option should be set to true.

tokenizer

See tokenizers for how to configure the tokenizer.

record

default:"position"

See record for a list of available record types.

normalizer

See normalizers for how to configure the normalizer.

expand_dots

default:true

If true, JSON keys containing a . will be expanded. For instance, if expand_dots is true, {"metadata.color": "red"} will be indexed as if it was {"metadata": {"color": "red"}}.

Advanced Options

In addition to text and JSON, ParadeDB exposes options for numeric, datetime, boolean, range, and enum fields. For most use cases, it is not necessary to change these options.

Numeric Fields

Options for columns of type SMALLINT, INTEGER, BIGINT, OID, REAL, DOUBLE PRECISION, NUMERIC, and their corresponding array types should be passed to numeric_fields.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, rating)
WITH (
    key_field = 'id',
    numeric_fields = '{
        "rating": {"fast": true}
    }'
);

Advanced Options

Boolean Fields

Options for columns of type BOOLEAN and BOOLEAN[] should be passed to boolean_fields.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, in_stock)
WITH (
  key_field = 'id',
  boolean_fields = '{
      "in_stock": {"fast": true}
  }'
);

CREATE_INDEX accepts several configuration options for boolean_fields:

Advanced Options

Datetime Fields

Options for columns of type DATE, TIMESTAMP, TIMESTAMPTZ, TIME, TIMETZ, and their corresponding array types should be passed to datetime_fields.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, created_at)
WITH (
  key_field = 'id',
  datetime_fields = '{
      "created_at": {"fast": true}
  }'
);

CREATE INDEX accepts several configuration options for datetime_fields:

Advanced Options

Enumerated Types

Options for custom Postgres enums should be passed to numeric_fields. Enums should be queried with term queries.

If the ordering of the enum is changed with ADD VALUE ... [ BEFORE | AFTER ], the BM25 index should be dropped and recreated to account for the new enum ordinal values.

Documentation

​Basic Usage

​Configure Multiple Fields

​All Configuration Options

​Text Fields

​JSON Fields

​Advanced Options

​Numeric Fields

​Boolean Fields

​Datetime Fields

​Enumerated Types

Basic Usage

Configure Multiple Fields

All Configuration Options

Text Fields

JSON Fields

Advanced Options

Numeric Fields

Boolean Fields

Datetime Fields

Enumerated Types