Indexing Memory

The default Postgres maintenance_work_mem value of 64MB is quite conservative and can significantly slow down index builds. For large indexes, we strongly recommend increasing maintenance_work_mem.

postgresql.conf
maintenance_work_mem = 16GB

maintenance_work_mem should not exceed the server’s available memory.

In addition to improving build times, maintenance_work_mem also affects the number of segments created in the index. This is because, while ParadeDB tries to maintain as many segments as CPUs, a segment that cannot fit into memory will be split into a new segment. As a result, an insufficient maintenance_work_mem can lead to significantly more segments than available CPUs, which degrades search performance. To check if the chosen maintenance_work_mem value is high enough, you can compare the index’s segment count with the server’s CPU count.

Generally speaking, maintenance_work_mem is the only setting that must be tuned to optimize index build times. The rest have sensible defaults.

Indexing Progress

ParadeDB can write indexing progress to the Postgres logs, which helps monitor the progress of a long-running CREATE INDEX statement.

First, Postgres logs must be enabled in postgresql.conf and Postgres must be restarted afterward.

postgresql.conf
logging_collector = on

Next, set paradedb.log_create_index_progress to true. This creates Postgres LOG: entries every 100,000 rows with information on the indexing rate (in rows per second).

SET paradedb.log_create_index_progress = true;

The logs will appear in the directory specified in log_directory.

SHOW log_directory;

Indexing Memory per Thread

This setting requires superuser privileges.

paradedb.create_index_memory_budget defaults to maintenance_work_mem divided by total parallelism. It sets the amount of memory to dedicate per indexing thread before the index segment needs to be written to disk.

The value is measured in megabytes. A value of 1024 is the same as 1GB. In terms of raw indexing performance, larger is generally better.

SET paradedb.create_index_memory_budget = 1024;

Indexing Threads

This setting requires superuser privileges.

paradedb.create_index_parallelism sets the number of threads used during CREATE INDEX. The default is 0, which automatically detects the “available parallelism” of the host computer.

SET paradedb.create_index_parallelism = 8;