Quickstart
This guide will walk you through the following steps to get started with ParadeDB:
- Full text search
- Similarity (i.e. vector) search
Full Text Search
ParadeDB comes with a helpful procedure that creates a table populated with mock data to help
you get started. Once connected with psql
, run the following commands to create and inspect
this table.
CALL paradedb.create_bm25_test_table(
schema_name => 'public',
table_name => 'mock_items'
);
SELECT description, rating, category
FROM mock_items
LIMIT 3;
Next, let’s create a BM25 index called search_idx
on this table. A BM25 index is a covering index, which means that multiple columns can be included in the same index.
The following code block demonstrates the various Postgres types that can be combined inside a single index.
CALL paradedb.create_bm25(
index_name => 'search_idx',
table_name => 'mock_items',
key_field => 'id',
text_fields => paradedb.field('description') || paradedb.field('category'),
numeric_fields => paradedb.field('rating'),
boolean_fields => paradedb.field('in_stock'),
datetime_fields => paradedb.field('created_at'),
json_fields => paradedb.field('metadata')
);
Note the mandatory key_field
option. Every BM25 index needs a key_field
,
which should be the name of a column that will function as a row’s unique
identifier within the index. Usually, the key_field
can just be the name of
your table’s primary key column.
The indexing documentation provides an in-depth explanation of what each of these options means.
We’re now ready to execute a full-text search. We’ll look for rows with a rating
greater than 2
where description
matches keyboard
or category
matches electronics
.
SELECT description, rating, category
FROM search_idx.search(
'(description:keyboard OR category:electronics) AND rating:>2',
limit_rows => 5
);
Note the usage of limit_rows
instead of the SQL LIMIT
clause. For optimal performance, we recommend always using
limit_rows
and offset_rows
instead of LIMIT
and OFFSET
.
Similarly, the rating
column was indexed and the rating:>2
filter was used instead of the SQL WHERE
clause
for efficient filtering.
Next, let’s see how ParadeDB handles a phrase query like bluetooth speaker
. Let’s also surface results even
if there is a word between bluetooth
and speaker
.
SELECT description, rating, category
FROM search_idx.search('description:"bluetooth speaker"~1');
Note that phrases must be wrapped in double quotes. Also note our use of the ~1
slop operator,
which tells ParadeDB to return matches even if they are separated by one word.
Finally, let’s use the snippet
function to examine the BM25 scores and generate
highlighted snippets for our results.
SELECT * FROM search_idx.snippet(
'description:bluetooth',
highlight_field => 'description'
);
Let’s join this result with our mock_items
table to see the BM25 scores and highlighted snippets
next to the original data:
WITH snippet AS (
SELECT * FROM search_idx.snippet(
'description:bluetooth',
highlight_field => 'description'
)
)
SELECT description, snippet, score_bm25
FROM snippet
LEFT JOIN mock_items ON snippet.id = mock_items.id;
For more on full text search, refer to the full text search API section.
Similarity Search
For vector similarity search, let’s first generate a vector embeddings column. For the sake of this tutorial, we’ll randomly generate these embeddings.
ALTER TABLE mock_items ADD COLUMN embedding vector(3);
UPDATE mock_items m
SET embedding = ('[' ||
((m.id + 1) % 10 + 1)::integer || ',' ||
((m.id + 2) % 10 + 1)::integer || ',' ||
((m.id + 3) % 10 + 1)::integer || ']')::vector;
SELECT description, rating, category, embedding
FROM mock_items
LIMIT 3;
Next, let’s create an HNSW index on the embedding
column of our table.
While not required, an HNSW index can drastically improve query performance over very large datasets.
CREATE INDEX on mock_items
USING hnsw (embedding vector_l2_ops);
Next, let’s query our table with a vector and order the results by L2 distance:
SELECT description, category, rating, embedding
FROM mock_items
ORDER BY embedding <-> '[1,2,3]'
LIMIT 3;
For Further Assistance
The paradedb.help
function opens a GitHub Discussion that the ParadeDB team will respond to.
SELECT paradedb.help(
subject => $$Something isn't working$$,
body => $$Issue description$$
);
Was this page helpful?