ParadeDB comes with a helpful procedure that creates a table populated with mock data to help
you get started. Once connected with psql, run the following commands to create and inspect
this table.
Next, let’s create a BM25 index called search_idx on this table. A BM25 index is a covering index, which means that multiple columns can be included in the same index.
The following code block demonstrates the various Postgres types that can be combined inside a single index.
Note the mandatory key_field option. Every BM25 index needs a key_field,
which should be the name of a column that will function as a row’s unique
identifier within the index. Usually, the key_field can just be the name of
your table’s primary key column.
The indexing documentation provides an in-depth
explanation of what each of these options means.
We’re now ready to execute a full-text search. We’ll look for the first five rows with a rating greater than 2 where description matches shoes
or category matches footwear.
The ParadeDB-specific @@@ operator instructs ParadeDB to execute a full-text search using the BM25 index. Its job is to return matching rows as quickly as possible in no
particular order. To sort by relevance, the paradedb.score function generates BM25 scores for each row.
The result table shows that rows matching on both description and category scored more highly than rows matching only one of the fields.
Finally, let’s see how ParadeDB handles a phrase query like white shoes. Let’s also surface results even
if there is a word between white and shoes using the ~ slop operator.
ParadeDB supports full text search over JOINs, which is crucial for database schemas that store data in a normalized fashion. To demonstrate, let’s create a table
called orders that references mock_items.
The following query searches for rows where customer_name matches Johnson and description matches shoes.
SELECT o.order_id, o.customer_name, m.descriptionFROM orders oJOIN mock_items m ON o.product_id = m.idWHERE o.customer_name @@@ 'Johnson'AND m.description @@@ 'shoes'ORDERBY order_idLIMIT5;
order_id | customer_name | description----------+---------------+--------------------- 3 | Alice Johnson | Sleek running shoes 6 | Alice Johnson | White jogging shoes 36 | Alice Johnson | White jogging shoes(3 rows)
For vector similarity search, let’s first generate a vector embeddings column. For the sake of this tutorial, we’ll
randomly generate these embeddings.
Next, let’s create an HNSW index on the embedding column of our table.
While not required, an HNSW index can drastically improve query performance over very large datasets.