INSERT/UPDATE/COPY statements to the BM25 index.
Ensure Merging Happens in the Background
During everyINSERT/UPDATE/COPY/VACUUM, the BM25 index runs a compaction process that looks for opportunities to merge segments
together. The goal is to consolidate smaller segments into larger ones, reducing the total number of segments and improving query performance.
Segments become candidates for merging if their combined size meets or exceeds one of several configurable layer thresholds. These thresholds define target
segment sizes — such as 10KB, 100KB, 1MB, etc. For each layer, the compactor checks if there are enough smaller segments whose total size adds up to the threshold.
The default layer sizes are 100KB, 1MB, 100MB, 1GB, and 10GB but can be configured.
layer_sizes option allows merging to happen in the foreground.
This is not typically recommended because it slows down writes, but can be used to apply back pressure to writes if segments are being created faster
than they can be merged down.
layer_sizes to 0 disables foreground merging, and setting background_layer_sizes to 0 disables background merging.
Increase Work Memory for Bulk Updates
work_mem controls how much memory to allocate to a single INSERT/UPDATE/COPY statement. Each statement that writes to a BM25 index is required to have at least 15MB memory. If
work_mem is below 15MB, it will be ignored and 15MB will be used.
If your typical update patterns are large, bulk updates (not single-row updates) a larger value may be better.
maintenance_work_mem.
Increase Mutable Segment Size
Theparadedb.global_mutable_segment_rows setting enables use of mutable segments, which buffer new rows in order to amortize the cost of indexing them.
By default, it is set to 1000, which means that 1000 writes are buffered before being flushed.
postgresql.conf
paradedb.global_mutable_segment_rows generally improves write throughput at the expense of read performance,
since the mutable data structure is slower to search. Additionally, the mutable data structure is read into
memory, so higher values cause reads to consume more RAM.
Setting paradedb.global_mutable_segment_rows to 0 disables mutable segments across the entire database.
The mutable segment size can also be configured per index.
paradedb.global_mutable_segment_rows will be used.
To ignore the global setting, set paradedb.global_mutable_segment_rows to -1.
postgresql.conf