There are a few things you can do to optimize the BM25 index size.

Index Creation

If possible, creating the BM25 index should be deferred until after a table has been populated. To illustrate:

-- This is preferred
CREATE TABLE test (id SERIAL, data text);
INSERT INTO test (data) VALUES ('hello world'), ('many more values');
CREATE INDEX ON test USING bm25 (id, data) WITH (key_field = 'id');

-- ...to this
CREATE TABLE test (id SERIAL, data text);
CREATE INDEX ON test USING bm25 (id, data) WITH (key_field = 'id');
INSERT INTO test (data) VALUES ('hello world'), ('many more values');

This allows the BM25 index to create a more tightly packed, efficient representation on disk.

Vacuuming

When rows are updated/deleted or segments are merged together, parts of the index can be rendered “dead.” VACUUMs, which can either be run manually or automatically by Postgres autovacuum, are responsible for marking the space occupied by dead segments as “free for reuse” by future INSERT/UPDATE/COPY statements.

Normal UPDATE or DELETE statements do not create free space on their own. This means that, if a table were never vacuumed, the BM25 index size would grow unbounded as rows are updated or additional rows are inserted. To control the index size, it is important to either:

  1. Tune autovacuum in postgresql.conf such that vacuums occur at a frequency that is acceptable for your write patterns. Please refer to the Postgres documentation for guidance.
  2. Manually run VACUUM between large INSERT/UPDATE/DELETE/COPY statements.

If the index has already grown too large as a result of failure to vacuum, the only way to shrink the index size is to drop the index or REINDEX. Vacuums on their own do not decrease the index size — they only mark space for reuse.