Brand New Query API ✨

#1739 by @eeeebbbbrrrr brought a custom scan implementation to pg_search. Barring the technical details, this PR centers the entire ParadeDB search API around the @@@ operator. The old .search() syntax has been removed in #1781 by @neilyio and the documentation has been overhauled.

By upgrading to 0.11.0, any queries that rely on the old search syntax will need to be rewritten.

-- Old query syntax
SELECT description, rating, category
FROM search_idx.search(
  '(description:keyboard OR category:electronics) AND rating:>2',
  limit_rows => 5
);

-- New query syntax
SELECT description, rating, category
FROM mock_items
WHERE (description @@@ 'shoes' OR category @@@ 'footwear') AND rating @@@ '>2'
ORDER BY description
LIMIT 5;

This new query syntax brings a number of serious benefits:

JOIN Friendly

The old .search() syntax made it difficult to search over joined tables. The @@@ operator, on the other hand, acts like any other Postgres operator and can be dropped into both sides of a JOIN query:

SELECT o.order_id
FROM orders o
JOIN mock_items m ON o.product_id = m.id
WHERE o.customer_name @@@ 'Johnson' AND m.description @@@ 'shoes';

Intuitive SQL Syntax

The @@@ operator relies on SQL LIMIT, OFFSET, and ORDER BY clauses for limits, offsets, and sorting. These parameters no longer need to be passed in as arguments to the .search() function.

Direct Scores and Snippets

The old .score_bm25() and .snippet() functions needed to be JOINed with the original tables to associate rows with their scores/snippets. This is no longer the case with the custom scan, which is capable of projecting scores and snippets directly onto the result table:

-- BM25 scoring
SELECT id, paradedb.score(id)
FROM mock_items
WHERE description @@@ 'shoes';

-- Snippets
SELECT id, paradedb.snippet(description)
FROM mock_items
WHERE description @@@ 'shoes';

Performance Improvements 🚀

Incredibly Efficient Top N Queries

#1765 brought significant performance improvements to “top N” style search queries:

-- Top 5 results by score
SELECT id, paradedb.score(id)
FROM mock_items
WHERE description @@@ 'shoes'
LIMIT 5;

On test datasets with 1 billion+ rows, we’ve seen query times drop from over 10 seconds to under 20 milliseconds.

Concurrent Writes

#1707 brought concurrent writes to multiple indexes. Prior to this change, a write to one index would block writes to any other index. This PR also removes the background writer from pg_search, which significantly reduces the surface area for bugs.

Query Builder 💡

JSON Support

Query builder term, phrase, and range functions can now query JSON fields in the same way as text fields.

-- Queries {"metadata": {"color": "white"}}
SELECT description, rating, category
FROM mock_items
WHERE id @@@ paradedb.term('metadata.color', 'white');

Range Term

Columns of type int4range, int8range, numrange, tsrange, and tstzrange can be indexed as range_fields. The range term query is used to filter over these fields.

CALL paradedb.create_bm25(
  index_name => 'search_idx',
  table_name => 'mock_items',
  key_field => 'id',
  range_fields => paradedb.field('weight_range')
);

SELECT id, weight_range FROM mock_items
WHERE id @@@ paradedb.range_term('weight_range', 1);

Parse

paradedb.parse now accepts lenient and conjunction_mode parameters. A new function, paradedb.parse_with_field, has also been introduced.

Stability Improvements 💪

Flat JSONB Arrays

Fixed a bug where paradedb.term failed to search over flat JSONB arrays like '["red", "blue", "green"]'::JSONB.

Prepared Statements

Fixed a bug where prepared statements using the @@@ operator would incorrectly choose a sequential scan.

Docker Image 🐳

The ParadeDB Docker image is now available for all versions of Postgres from 13 to 17. Previously, only Postgres 16 was available.

New Contributors 👋

Full Changelog

The full changelog is available here.