ParadeDB’s full text and similarity search APIs can be combined in the same query to execute hybrid search.

This guide uses the mock_items table, which was created in the quickstart. It assumes that the entire quickstart tutorial has been completed, including the vector search section.

Reciprocal Rank Fusion

Reciprocal rank fusion is a popular hybrid search algorithm that:

  1. Calculates a BM25 and similarity score for the top n documents.
  2. Ranks documents by their BM25 and similarity scores separately. The highest-ranked document for each score receives an r of 1.
  3. Calculates a reciprocal rank for each score as 1/(k + r), where k is a constant. k is usually set to 60.
  4. Calculates each document’s reciprocal rank fusion score as the sum of the BM25 and similarity reciprocal rank scores.

The following code block implements reciprocal rank fusion over the mock_items table. BM25 scores are calculated against the query description:keyboard and similarity scores are calculated against the vector [1,2,3].

WITH bm25_candidates AS (
    SELECT id
    FROM mock_items
    WHERE description @@@ 'keyboard'
    ORDER BY paradedb.score(id) DESC
    LIMIT 20
),
bm25_ranked AS (
    SELECT id, RANK() OVER (ORDER BY paradedb.score(id) DESC) AS rank
    FROM bm25_candidates
),
semantic_search AS (
    SELECT id, RANK() OVER (ORDER BY embedding <=> '[1,2,3]') AS rank
    FROM mock_items
    ORDER BY embedding <=> '[1,2,3]'
    LIMIT 20
)
SELECT
    COALESCE(semantic_search.id, bm25_ranked.id) AS id,
    COALESCE(1.0 / (60 + semantic_search.rank), 0.0) +
    COALESCE(1.0 / (60 + bm25_ranked.rank), 0.0) AS score,
    mock_items.description,
    mock_items.embedding
FROM semantic_search
FULL OUTER JOIN bm25_ranked ON semantic_search.id = bm25_ranked.id
JOIN mock_items ON mock_items.id = COALESCE(semantic_search.id, bm25_ranked.id)
ORDER BY score DESC, description
LIMIT 5;

Here, we see that the top five results either contain keyboard in the description field or have an embedding of [1,2,3].