Index Settings

These settings can be set in postgresql.conf and affect system resources used by the paradedb.create_bm25() function and normal INSERT/UPDATE/COPY statements.

Indexing Threads

This setting requires superuser privileges.

paradedb.create_index_parallelism sets the number of threads used during CREATE INDEX. The default is 8. A value of zero will automatically detect “available parallelism” of the host computer.

For best performance it is recommended to use one less than the total number of host computer cores.

SET paradedb.create_index_parallelism = 8;

Indexing Memory

This setting requires superuser privileges.

paradedb.create_index_memory_budget sets the amount of memory to dedicate per indexing thread before the index segment needs to be written to disk.

If unset, Postgres’ maintenance_work_mem value is used, but is first divided by the total parallelism (paradedb.create_index_parallelism).

In terms of raw indexing performance, larger is generally better.

The value is measured in megabytes. A value of 1024 is the same as 1GB.

SET paradedb.create_index_memory_budget = 1024;

Indexing Progress

When set to true, paradedb.log_create_index_progress creates Postgres LOG: entries every 100,000 rows with information on the indexing rate (in rows per second). Defaults to false.

This can be useful to monitor the progress of a long-running CREATE INDEX statement. This setting can be changed by any user.

SET paradedb.log_create_index_progress = true;

Statement Parallelism

This setting requires superuser privileges.

paradedb.statement_parallelism controls the number of indexing threads used during INSERT/UPDATE/COPY statements. The default is 8. A value of zero will automatically detect “available parallelism” of the host computer.

If your typical update patterns are single-row atomic INSERTs or UPDATEs, then a value of 1 is ideal. If your update patterns typically update many thousands of rows at once, a larger value might be preferred for greater indexing concurrency.

SET paradedb.statement_parallelism = 1;

Statement Memory Budget

This setting requires superuser privileges.

paradedb.statement_memory_budget sets the amount of memory to dedicate per indexing thread before the index segment needs to be written to disk.

Like indexing memory, it defaults to maintenance_work_mem divided by the total parallelism, and is measured in megabytes.

If your typical update patterns are single-row atomic INSERTs or UPDATEs, then a value of 15MB is ideal. If your update patterns typically update many thousands of rows, a larger value might be preferred for greater indexing concurrency.

Generally speaking, for statement level indexing, the parallelism value dictates the number of new index segments that will be created, so there’s a trade-off between finishing the INSERT/UPDATE statement quickly and the longer-term search overhead of having many additional index segments until VACUUM runs.

SET paradedb.statement_memory_budget = 15;

Segment Settings

A segment is an individual unit of an index. Each BM25 index is comprised of multiple segments. Every atomic insert or update creates a new segment.

Target Segment Count

Having too many segments can negatively impact search performance. During a VACUUM, smaller segments can be merged into larger segments.

target_segment_count controls the number of segments that should exist and defaults to the number of available CPUs on the host machine. This allows multiple threads to read from segments in parallel during a search.

While ParadeDB attempts to create target_segment_count segments, the actual number of segments may not always equal target_segment_count. For instance, it could exceed this number of VACUUM is not run, or fall below this number if the table contains a small number of rows.

ALTER INDEX search_idx SET (target_segment_count = 8);

Merge on Insert

merge_on_insert indicates if ParadeDB should look for merge opportunities during INSERT, UPDATE, or DELETE operations in addition to VACUUMs. The default is true. Setting this to false can improve INSERT/UPDATE/COPY throughput at the expense of creating more segments that will later be merged by VACUUM.

ALTER INDEX search_idx SET (merge_on_insert = true);