This guide will walk you through the following steps to get started with ParadeDB:

  1. Full text search over a single table
  2. Full text search over a JOIN
  3. Similarity (i.e. vector) search

ParadeDB comes with a helpful procedure that creates a table populated with mock data to help you get started. Once connected with psql, run the following commands to create and inspect this table.

CALL paradedb.create_bm25_test_table(
  schema_name => 'public',
  table_name => 'mock_items'
);

SELECT description, rating, category
FROM mock_items
LIMIT 3;

Next, let’s create a BM25 index called search_idx on this table. A BM25 index is a covering index, which means that multiple columns can be included in the same index. The following code block demonstrates the various Postgres types that can be combined inside a single index.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, description, category, rating, in_stock, created_at, metadata, weight_range)
WITH (key_field='id');

Note the mandatory key_field option. Every BM25 index needs a key_field, which should be the name of a column that will function as a row’s unique identifier within the index. Additionally, the key_field must be the first field in the list of columns. See choosing a key field for more details.

We’re now ready to execute a full-text search. We’ll look for the first five rows with a rating greater than 2 where description matches shoes or category matches footwear.

SELECT description, rating, category
FROM mock_items
WHERE description @@@ 'shoes' OR category @@@ 'footwear' AND rating @@@ '>2'
ORDER BY description
LIMIT 5;

The ParadeDB-specific @@@ operator instructs ParadeDB to execute a full-text search using the BM25 index. Its job is to return matching rows as quickly as possible in no particular order. To sort by relevance, the paradedb.score function generates BM25 scores for each row.

SELECT description, rating, category, paradedb.score(id)
FROM mock_items
WHERE description @@@ 'shoes' OR category @@@ 'footwear' AND rating @@@ '>2'
ORDER BY score DESC, description
LIMIT 5;

The result table shows that rows matching on both description and category scored more highly than rows matching only one of the fields.

Finally, let’s see how ParadeDB handles a phrase query like white shoes. Let’s also surface results even if there is a word between white and shoes using the ~ slop operator.

SELECT description, rating, category
FROM mock_items
WHERE description @@@ '"white shoes"~1'
LIMIT 5;

In addition to simple string queries, ParadeDB supports advanced query builder functions similar to the Elastic DSL.

ParadeDB supports full text search over JOINs, which is crucial for database schemas that store data in a normalized fashion. To demonstrate, let’s create a table called orders that references mock_items.

CALL paradedb.create_bm25_test_table(
  schema_name => 'public',
  table_name => 'orders',
  table_type => 'Orders'
);

ALTER TABLE orders
ADD CONSTRAINT foreign_key_product_id
FOREIGN KEY (product_id)
REFERENCES mock_items(id);

SELECT * FROM orders LIMIT 3;

Next, let’s create a BM25 index over the orders table.

CREATE INDEX orders_idx ON orders
USING bm25 (order_id, customer_name)
WITH (key_field='order_id');

The following query searches for rows where customer_name matches Johnson and description matches shoes.

SELECT o.order_id, o.customer_name, m.description
FROM orders o
JOIN mock_items m ON o.product_id = m.id
WHERE o.customer_name @@@ 'Johnson' AND m.description @@@ 'shoes'
ORDER BY order_id
LIMIT 5;

For vector similarity search, let’s first generate a vector embeddings column. For the sake of this tutorial, we’ll randomly generate these embeddings.

ALTER TABLE mock_items ADD COLUMN embedding vector(3);

UPDATE mock_items m
SET embedding = ('[' ||
    ((m.id + 1) % 10 + 1)::integer || ',' ||
    ((m.id + 2) % 10 + 1)::integer || ',' ||
    ((m.id + 3) % 10 + 1)::integer || ']')::vector;

SELECT description, rating, category, embedding
FROM mock_items
LIMIT 3;

Next, let’s create an HNSW index on the embedding column of our table. While not required, an HNSW index can drastically improve query performance over very large datasets.

CREATE INDEX on mock_items
USING hnsw (embedding vector_cosine_ops);

Next, let’s query our table with a vector and order the results by cosine distance:

SELECT description, category, rating, embedding
FROM mock_items
ORDER BY embedding <=> '[1,2,3]', description
LIMIT 3;

That’s it! Next, let’s load your data to start running real queries.