Raise Parallel Workers
There are three settings that control how many parallel workers ultimately get assigned to a query. First,max_worker_processes
is a global limit for the number of workers.
Next, max_parallel_workers
is a subset of max_worker_processes
, and sets the limit for workers used in
parallel queries. Finally, max_parallel_workers_per_gather
limits how many workers a single query can receive.
postgresql.conf
4
. The max_parallel_workers
pool
is set to 64
, which means that 16
queries can execute simultaneously with 4
workers each. Finally, max_worker_processes
is
set to 72
to give headroom for other workers like autovacuum and replication.
In practice, we recommend experimenting with different settings, as the best configuration depends on the underlying hardware,
query patterns, and volume of data.
If all
max_parallel_workers
are in use, Postgres will still execute
additional queries, but those queries will run without parallelism. This means
that queries do not fail — they just may run slower due to lack of
parallelism.Raise Shared Buffers
shared_buffers
controls how much memory is available to the Postgres buffer cache. We recommend allocating no more than 40% of total memory
to shared_buffers
.
postgresql.conf
pg_prewarm
extension can be used to load the BM25 index into the buffer cache after Postgres restarts. A higher shared_buffers
value allows more of the index to be
stored in the buffer cache.
Configure Autovacuum
If an index experiences frequent writes, the search performance of some queries like sorting or aggregates can degrade ifVACUUM
has not been recently run. This is because writes can cause parts of Postgres’ visibility map
to go out of date, and VACUUM
updates the visibility map.
To determine if search performance is degraded by lack of VACUUM
, run EXPLAIN ANALYZE
over a query. A Parallel Custom Scan
in the query plan with a large number of Heap Fetches
typically means that VACUUM
should be run.
Postgres can be configured to automatically vacuum a table when a certain number of rows have been updated. Autovacuum settings
can be set globally in postgresql.conf
or for a specific table.
autovacuum_vacuum_scale_factor
triggers an autovacuum if a certain percentage of rows in a table have been updated.autovacuum_vacuum_threshold
triggers an autovacuum if an absolute number of rows have been updated.autovacuum_naptime
ensures that vacuum does not run too frequently.
autovacuum_vacuum_scale_factor
to 0
and autovacuum_vacuum_threshold
to 100000
will trigger an autovacuum
for every 100000
row updates. As a general rule of thumb, we recommend autovacuuming at least once every 100000
single-row updates.
Adjust Target Segment Count
By default,CREATE INDEX
/REINDEX
will create as many segments as there are CPUs on the host machine. This can be changed using the
target_segment_count
index option.
REINDEX
, the same value will be used.
It can be changed with ALTER INDEX, like so:
REINDEX
is required to rebalance the index to that segment count.
For optimal performance, the segment count should equal the number of parallel workers that a query can receive, which is controlled by
max_parallel_workers_per_gather
. If max_parallel_workers_per_gather
is greater than the number of CPUs on the host machine, then increasing the target segment count to match max_parallel_workers_per_gather
can improve query
performance.
target_segment_count
is merely a suggestion.While pg_search
will endeavor to ensure the created index will have exactly this many segments, it is possible for it
to have less or more. Mostly this depends on the distribution of work across parallel builder processes, memory
constraints, and table size.