Quickstart - ParadeDB

This guide will walk you through the following steps to get started with ParadeDB:

Full text search over a single table
Full text search over a JOIN
Similarity (i.e. vector) search

Single Table Search

ParadeDB comes with a helpful procedure that creates a table populated with mock data to help you get started. Once connected with psql, run the following commands to create and inspect this table.

CALL paradedb.create_bm25_test_table(
  schema_name => 'public',
  table_name => 'mock_items'
);

SELECT description, rating, category
FROM mock_items
LIMIT 3;

Expected Response

       description        | rating |  category
--------------------------+--------+-------------
 Ergonomic metal keyboard |      4 | Electronics
 Plastic Keyboard         |      4 | Electronics
 Sleek running shoes      |      5 | Footwear
(3 rows)

Next, let’s create a BM25 index called search_idx on this table. A BM25 index is a covering index, which means that multiple columns can be included in the same index. The following code block demonstrates the various Postgres types that can be combined inside a single index.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, description, category, rating, in_stock, created_at, metadata, weight_range)
WITH (key_field='id');

Note the mandatory key_field option. Every BM25 index needs a key_field, which should be the name of a column that will function as a row’s unique identifier within the index. Additionally, the key_field must be the first field in the list of columns. See choosing a key field for more details.

We’re now ready to execute a full-text search. We’ll look for the first five rows with a rating greater than 2 where description matches shoes or category matches footwear.

SELECT description, rating, category
FROM mock_items
WHERE description @@@ 'shoes' OR category @@@ 'footwear' AND rating @@@ '>2'
ORDER BY description
LIMIT 5;

Expected Response

     description      | rating | category
----------------------+--------+----------
 Comfortable slippers |      3 | Footwear
 Generic shoes        |      4 | Footwear
 Sleek running shoes  |      5 | Footwear
 Sturdy hiking boots  |      4 | Footwear
 White jogging shoes  |      3 | Footwear
(5 rows)

The ParadeDB-specific @@@ operator instructs ParadeDB to execute a full-text search using the BM25 index. Its job is to return matching rows as quickly as possible in no particular order. To sort by relevance, the paradedb.score function generates BM25 scores for each row.

SELECT description, rating, category, paradedb.score(id)
FROM mock_items
WHERE description @@@ 'shoes' OR category @@@ 'footwear' AND rating @@@ '>2'
ORDER BY score DESC, description
LIMIT 5;

Expected Response

     description      | rating | category |   score
----------------------+--------+----------+-----------
 Generic shoes        |      4 | Footwear | 5.8135376
 Sleek running shoes  |      5 | Footwear | 5.4211845
 White jogging shoes  |      3 | Footwear | 5.4211845
 Comfortable slippers |      3 | Footwear | 2.9362776
 Sturdy hiking boots  |      4 | Footwear | 2.9362776
(5 rows)

The result table shows that rows matching on both description and category scored more highly than rows matching only one of the fields. Finally, let’s see how ParadeDB handles a phrase query like white shoes. Let’s also surface results even if there is a word between white and shoes using the ~ slop operator.

SELECT description, rating, category
FROM mock_items
WHERE description @@@ '"white shoes"~1'
LIMIT 5;

Expected Response

In addition to simple string queries, ParadeDB supports advanced query builder functions similar to the Elastic DSL.

Joined Search

ParadeDB supports full text search over JOINs, which is crucial for database schemas that store data in a normalized fashion. To demonstrate, let’s create a table called orders that references mock_items.

CALL paradedb.create_bm25_test_table(
  schema_name => 'public',
  table_name => 'orders',
  table_type => 'Orders'
);

ALTER TABLE orders
ADD CONSTRAINT foreign_key_product_id
FOREIGN KEY (product_id)
REFERENCES mock_items(id);

SELECT * FROM orders LIMIT 3;

Expected Response

 order_id | product_id | order_quantity | order_total | customer_name
----------+------------+----------------+-------------+---------------
        1 |          1 |              3 |       99.99 | John Doe
        2 |          2 |              1 |       49.99 | Jane Smith
        3 |          3 |              5 |      249.95 | Alice Johnson
(3 rows)

Next, let’s create a BM25 index over the orders table.

CREATE INDEX orders_idx ON orders
USING bm25 (order_id, customer_name)
WITH (key_field='order_id');

The following query searches for rows where customer_name matches Johnson and description matches shoes.

SELECT o.order_id, o.customer_name, m.description
FROM orders o
JOIN mock_items m ON o.product_id = m.id
WHERE o.customer_name @@@ 'Johnson' AND m.description @@@ 'shoes'
ORDER BY order_id
LIMIT 5;

Expected Response

 order_id | customer_name |     description
----------+---------------+---------------------
        3 | Alice Johnson | Sleek running shoes
        6 | Alice Johnson | White jogging shoes
       36 | Alice Johnson | White jogging shoes
(3 rows)

Similarity Search

For vector similarity search, let’s first generate a vector embeddings column. For the sake of this tutorial, we’ll randomly generate these embeddings.

ALTER TABLE mock_items ADD COLUMN embedding vector(3);

UPDATE mock_items m
SET embedding = ('[' ||
    ((m.id + 1) % 10 + 1)::integer || ',' ||
    ((m.id + 2) % 10 + 1)::integer || ',' ||
    ((m.id + 3) % 10 + 1)::integer || ']')::vector;

SELECT description, rating, category, embedding
FROM mock_items
LIMIT 3;

Expected Response

       description        | rating |  category   | embedding
--------------------------+--------+-------------+-----------
 Ergonomic metal keyboard |      4 | Electronics | [3,4,5]
 Plastic Keyboard         |      4 | Electronics | [4,5,6]
 Sleek running shoes      |      5 | Footwear    | [5,6,7]
(3 rows)

Next, let’s create an HNSW index on the embedding column of our table. While not required, an HNSW index can drastically improve query performance over very large datasets.

CREATE INDEX on mock_items
USING hnsw (embedding vector_cosine_ops);

Next, let’s query our table with a vector and order the results by cosine distance:

SELECT description, category, rating, embedding
FROM mock_items
ORDER BY embedding <=> '[1,2,3]', description
LIMIT 3;

Expected Response

       description        |  category  | rating | embedding
--------------------------+------------+--------+-----------
 Artistic ceramic vase    | Home Decor |      4 | [1,2,3]
 Designer wall paintings  | Home Decor |      5 | [1,2,3]
 Handcrafted wooden frame | Home Decor |      5 | [1,2,3]
(3 rows)

That’s it! Next, let’s load your data to start running real queries.

Documentation

​Single Table Search

​Joined Search

​Similarity Search

Single Table Search

Joined Search

Similarity Search