pg_bm25

Brand New API Interface

0.4.1 introduces a major API overhaul of pg_bm25. All queries have been rewritten to use a consistent dynamic function interface. For instance, the old way to create a BM25 index was:

CREATE INDEX <index_name> ON <table_name>
USING bm25 ((<table_name>.*))
WITH (text_fields = '<options>');

The new way is:

CALL paradedb.create_bm25(
    index_name => '<index_name>',
    table_name => '<table_name>',
    text_fields => '<options>'
    key_field => '<key_field>'
)

Similarly, the old way to search a BM25 index was:

SELECT * FROM mock_items
WHERE mock_items @@@ '<query>';

The new way is:

SELECT * FROM <index_name>.search('<query>');

Not only is this new API interface more pleasant to look at, it helps unlock several new features and bug fixes (continue reading below).

Searching Over JOINed Tables

The new create_bm25 function requires a key_field parameter, which is a unique integer column. This is necessary for unlocking search over JOINed tables. Previously, pg_bm25 relied on the Postgres-internal ctid column, which doesn’t exist in a JOINed table.

Multiple Concurrent Indexes

The new interface allows multiple indexes to exist over the same table. Previously, only one BM25 index was allowed per table. This prevented use cases where users wanted to create multiple different tokenizers over the same column.

Sequential Scan Fix

The sequential scan code path was completely broken, which led to queries unexpectedly failing. This issue has been addressed.

pg_sparse

pg_sparse has been introduced to the ParadeDB Docker Image and to ParadeDB Cloud.

Full Changelog

The full changelog is available here.