Prerequisite Before performing full text search over a table, please ensure that you have created both BM25 and HNSW indexes.

Overview

Hybrid search, which combines BM25-based full text scores with vector-based similarity scores, is especially useful in scenarios where you want to match by both exact keywords and semantic meaning.

Basic Usage

To calculate a row’s hybrid score, ParadeDB introduces a rank_hybrid function. Under the hood, this function does the following:

  1. Calculates the BM25 and similarity scores for the respective queries
  2. Applies minmax normalization to both scores, which sets the lowest score to 0 and the highest score to 1
  3. Calculates the weighted mean of the normalized scores
SELECT * FROM <index_name>.rank_hybrid(
  bm25_query => '<bm25_query>',
  similarity_query => '<similarity_query>',
  bm25_weight => <bm25_weight>,
  similarity_weight => <similarity_weight>,
  bm25_limit_n => <bm25_limit_n>,
  similarity_limit_n => <similarity_limit_n>
);
index_name
required

The name of the BM25 index associated with this table. For instance, if you ran CREATE INDEX my_index ON my_table USING bm25 ((my_table.*)), the index name would be 'my_index'.

bm25_query
required

The full text search query string. For instance, 'description:keyboard'.

embedding_query
required

The similarity query string. For instance, '''[1,2,3]'' <-> embedding'. Note that double single quotes are used to escape the single quote inside the string.

bm25_weight
default: 0.5

The weight applied to the BM25 score. It is recommended that this weight and the similarity weight add up to 1.

similarity_weight
default: 0.5

The weight applied to the similarity score. It is recommended that this weight and the BM25 weight add up to 1.

bm25_limit_n
default: 100

The maximum number of rows that are considered for ranking using BM25.

similarity_limit_n
default: 100

The maximum number of rows that are considered for ranking using similarity search.