Overview
Basic Usage
You can change how individual fields are tokenized and stored by passing JSON strings to the WITH
clause of CREATE INDEX
.
For instance, the following statement configures an ngram tokenizer for the description
field.
The key(s) of the JSON string correspond to field names, and the values are the configuration options. If a configuration option or field name is not specified, the default values are used (see all configuration options).
Configure Multiple Fields
To configure multiple fields, simply pass more keys to the JSON string. For instance, the following statement specifies
tokenizers for both the description
and category
fields.
All Configuration Options
Text Fields
Options for columns of type VARCHAR
, TEXT
, UUID
, and their corresponding array types
should be passed to text_fields
.
The nested configuration JSON for text_fields
accepts the following keys.
See fast fields for when this option
should be set to true
.
See tokenizers for how to configure the tokenizer.
See record for a list of available record types.
See normalizers for how to configure the normalizer.
JSON Fields
Options for columns of type JSON
and JSONB
should be passed to json_fields
.
The nested configuration JSON for json_fields
accepts the following keys.
See fast fields for when this option should be set to true
.
See tokenizers for how to configure the tokenizer.
See record for a list of available record types.
See normalizers for how to configure the normalizer.
If true
, JSON keys containing a .
will be expanded. For instance, if expand_dots
is true
,
{"metadata.color": "red"}
will be indexed as if it was {"metadata": {"color": "red"}}
.
Advanced Options
In addition to text and JSON, ParadeDB exposes options for numeric, datetime, boolean, range, and enum fields. For most use cases, it is not necessary to change these options.
Numeric Fields
Options for columns of type SMALLINT
, INTEGER
, BIGINT
, OID
, REAL
, DOUBLE PRECISION
, NUMERIC
, and their corresponding array types
should be passed to numeric_fields
.
Boolean Fields
Options for columns of type BOOLEAN
and BOOLEAN[]
should be passed to boolean_fields
.
CREATE_INDEX
accepts several configuration options for boolean_fields
:
Datetime Fields
Options for columns of type DATE
, TIMESTAMP
, TIMESTAMPTZ
, TIME
, TIMETZ
, and their corresponding array types should be passed to datetime_fields
.
CREATE INDEX
accepts several configuration options for datetime_fields
:
Range Fields
Options for columns of type int4range
, int8range
, numrange
, tsrange
, and tstzrange
should be passed to range_fields
.
The range term query is used to filter over these fields.
CREATE INDEX
accepts several configuration options for range_fields
:
Enumerated Types
Options for custom Postgres enums should be passed to numeric_fields
.
Enums should be queried with term queries.
If the ordering of the enum is changed with ADD VALUE ... [ BEFORE | AFTER ]
, the BM25 index should be dropped
and recreated to account for the new enum ordinal values.
Deprecated Syntax Migration
In ParadeDB v0.13.0
, the old paradedb.create_bm25
function was deprecated in favor of the CREATE INDEX
syntax. To make migration to the new
CREATE INDEX
syntax easier, a new paradedb.format_create_bm25
function has been introduced. This function accepts the same arguments as the deprecated
paradedb.create_bm25
function and outputs an equivalent CREATE INDEX
statement which can be copy, pasted, and executed.
paradedb.format_create_bm25
does not create the index. It simply outputs
a CREATE INDEX
statement for you to run.
Was this page helpful?