ParadeDB comes with a helpful procedure that creates a table populated with mock data to help
you get started. Once connected with psql, run the following commands to create and inspect
this table.
Next, let’s create a BM25 index called search_idx on this table. A BM25 index is a covering index, which means that multiple columns can be included in the same index.
The following code block demonstrates the various Postgres types that can be combined inside a single index.
CREATE INDEX search_idx ON mock_itemsUSING bm25 (id, description, category, rating, in_stock, created_at, metadata, weight_range)WITH (key_field='id');
Note the mandatory key_field option. Every BM25 index needs a key_field,
which should be the name of a column that will function as a row’s unique
identifier within the index. Additionally, the key_field must be the first field
in the list of columns. See choosing a key field for more details.
We’re now ready to execute a full-text search. We’ll look for the first five rows with a rating greater than 2 where description matches shoes
or category matches footwear.
SELECT description, rating, categoryFROM mock_itemsWHERE description @@@ 'shoes' OR category @@@ 'footwear' AND rating @@@ '>2'ORDER BY descriptionLIMIT 5;
The ParadeDB-specific @@@ operator instructs ParadeDB to execute a full-text search using the BM25 index. Its job is to return matching rows as quickly as possible in no
particular order. To sort by relevance, the paradedb.score function generates BM25 scores for each row.
SELECT description, rating, category, paradedb.score(id)FROM mock_itemsWHERE description @@@ 'shoes' OR category @@@ 'footwear' AND rating @@@ '>2'ORDER BY score DESC, descriptionLIMIT 5;
The result table shows that rows matching on both description and category scored more highly than rows matching only one of the fields.
Finally, let’s see how ParadeDB handles a phrase query like white shoes. Let’s also surface results even
if there is a word between white and shoes using the ~ slop operator.
ParadeDB supports full text search over JOINs, which is crucial for database schemas that store data in a normalized fashion. To demonstrate, let’s create a table
called orders that references mock_items.
order_id | product_id | order_quantity | order_total | customer_name----------+------------+----------------+-------------+--------------- 1 | 1 | 3 | 99.99 | John Doe 2 | 2 | 1 | 49.99 | Jane Smith 3 | 3 | 5 | 249.95 | Alice Johnson(3 rows)
Next, let’s create a BM25 index over the orders table.
CREATE INDEX orders_idx ON ordersUSING bm25 (order_id, customer_name)WITH (key_field='order_id');
The following query searches for rows where customer_name matches Johnson and description matches shoes.
SELECT o.order_id, o.customer_name, m.descriptionFROM orders oJOIN mock_items m ON o.product_id = m.idWHERE o.customer_name @@@ 'Johnson' AND m.description @@@ 'shoes'ORDER BY order_idLIMIT 5;
order_id | customer_name | description----------+---------------+--------------------- 3 | Alice Johnson | Sleek running shoes 6 | Alice Johnson | White jogging shoes 36 | Alice Johnson | White jogging shoes(3 rows)
For vector similarity search, let’s first generate a vector embeddings column. For the sake of this tutorial, we’ll
randomly generate these embeddings.
Next, let’s create an HNSW index on the embedding column of our table.
While not required, an HNSW index can drastically improve query performance over very large datasets.
CREATE INDEX on mock_itemsUSING hnsw (embedding vector_cosine_ops);
Next, let’s query our table with a vector and order the results by cosine distance: