ParadeDB Docs

When indexing JSON, ParadeDB automatically indexes all sub-fields of the JSON object. The type of each sub-field is also inferred automatically. For example, consider the following statement where metadata is JSONB:

CREATE INDEX search_idx ON mock_items
USING bm25 (id, metadata)
WITH (key_field='id');

A single metadata JSON may look like:

{ "color": "Silver", "location": "United States" }

ParadeDB will automatically index both metadata.color and metadata.location as text. By default, all text sub-fields of a JSON object use the same tokenizer. The tokenizer can be configured the same way as text fields:

CREATE INDEX search_idx ON mock_items
USING bm25 (id, (metadata::pdb.ngram(2,3)))
WITH (key_field='id');

Instead of indexing the entire JSON, sub-fields of the JSON can be indexed individually. This allows for configuring separate tokenizers within a larger JSON:

CREATE INDEX search_idx ON mock_items
USING bm25 (id, ((metadata->>'color')::pdb.ngram(2,3)))
WITH (key_field='id');

Documentation (v2)

Indexing JSON