Basic Hybrid Search
Prerequisite Before performing full text search over a table, please ensure that you have created both BM25 and HNSW indexes.
Overview
Hybrid search, which combines BM25-based full text scores with vector-based similarity scores, is especially useful in scenarios where you want to match by both exact keywords and semantic meaning.
Basic Usage
To calculate a row’s hybrid score, ParadeDB introduces a rank_hybrid
function. Under the hood,
this function does the following:
- Calculates the BM25 and similarity scores for the respective queries
- Applies minmax normalization to both scores, which sets the lowest score to
0
and the highest score to1
- Calculates the weighted mean of the normalized scores
SELECT * FROM <index_name>.rank_hybrid(
bm25_query => '<bm25_query>',
similarity_query => '<similarity_query>',
bm25_weight => <bm25_weight>,
similarity_weight => <similarity_weight>,
bm25_limit_n => <bm25_limit_n>,
similarity_limit_n => <similarity_limit_n>
);
The name of the BM25 index associated with this table. For instance, if you
ran CREATE INDEX my_index ON my_table USING bm25 ((my_table.*))
, the index
name would be 'my_index'
.
The full text search query string. For instance, 'description:keyboard'
.
The similarity query string. For instance, '''[1,2,3]'' <-> embedding'
.
Note that double single quotes are used to escape the single quote inside the string.
The weight applied to the BM25 score. It is recommended that this weight and the similarity weight
add up to 1
.
The weight applied to the similarity score. It is recommended that this weight and the BM25 weight
add up to 1
.
The maximum number of rows that are considered for ranking using BM25.
The maximum number of rows that are considered for ranking using similarity search.
Was this page helpful?