Index Size
There are a few things you can do to optimize the BM25 index size.
Index Creation
If possible, creating the BM25 index should be deferred until after a table has been populated. To illustrate:
This allows the BM25 index to create a more tightly packed, efficient representation on disk.
Vacuuming
When rows are updated/deleted or segments are merged together, parts of the index can be rendered “dead.”
VACUUM
s, which can either be run manually or automatically by Postgres autovacuum,
are responsible for marking the space occupied by dead segments as “free for reuse” by future INSERT
/UPDATE
/COPY
statements.
Normal UPDATE
or DELETE
statements do not create free space on their own. This means that, if a table were never vacuumed, the BM25 index size would
grow unbounded as rows are updated or additional rows are inserted. To control the index size, it is important to either:
- Tune autovacuum in
postgresql.conf
such that vacuums occur at a frequency that is acceptable for your write patterns. Please refer to the Postgres documentation for guidance. - Manually run
VACUUM
between largeINSERT
/UPDATE
/DELETE
/COPY
statements.
If the index has already grown too large as a result of failure to vacuum, the only way to shrink the index size is to drop the index or REINDEX
. Vacuums on their own do not decrease the
index size — they only mark space for reuse.
Was this page helpful?