Quickstart
This guide will walk you through the following steps to get started with ParadeDB:
- Full text search over a single table
- Full text search over a JOIN
- Similarity (i.e. vector) search
Single Table Search
ParadeDB comes with a helpful procedure that creates a table populated with mock data to help
you get started. Once connected with psql
, run the following commands to create and inspect
this table.
Next, let’s create a BM25 index called search_idx
on this table. A BM25 index is a covering index, which means that multiple columns can be included in the same index.
The following code block demonstrates the various Postgres types that can be combined inside a single index.
Note the mandatory key_field
option. Every BM25 index needs a key_field
,
which should be the name of a column that will function as a row’s unique
identifier within the index. Additionally, the key_field
must be the first field
in the list of columns. See choosing a key field for more details.
We’re now ready to execute a full-text search. We’ll look for the first five rows with a rating
greater than 2
where description
matches shoes
or category
matches footwear
.
The ParadeDB-specific @@@
operator instructs ParadeDB to execute a full-text search using the BM25 index. Its job is to return matching rows as quickly as possible in no
particular order. To sort by relevance, the paradedb.score
function generates BM25 scores for each row.
The result table shows that rows matching on both description
and category
scored more highly than rows matching only one of the fields.
Finally, let’s see how ParadeDB handles a phrase query like white shoes
. Let’s also surface results even
if there is a word between white
and shoes
using the ~
slop operator.
In addition to simple string queries, ParadeDB supports advanced query builder functions similar to the Elastic DSL.
Joined Search
ParadeDB supports full text search over JOINs, which is crucial for database schemas that store data in a normalized fashion. To demonstrate, let’s create a table
called orders
that references mock_items
.
Next, let’s create a BM25 index over the orders
table.
The following query searches for rows where customer_name
matches Johnson
and description
matches shoes
.
Similarity Search
For vector similarity search, let’s first generate a vector embeddings column. For the sake of this tutorial, we’ll randomly generate these embeddings.
Next, let’s create an HNSW index on the embedding
column of our table.
While not required, an HNSW index can drastically improve query performance over very large datasets.
Next, let’s query our table with a vector and order the results by cosine distance:
That’s it! Next, let’s load your data to start running real queries.
Was this page helpful?