Joins - ParadeDB

ParadeDB supports all PostgreSQL JOIN types and extends them with BM25-powered full-text search. This guide explains how JOINs behave with search, how to identify sub-optimal query plans, and offers strategies to keep queries fast.

Supported JOIN Types

ParadeDB supports all PostgreSQL JOINs:

INNER JOIN
LEFT / RIGHT / FULL OUTER JOIN
CROSS JOIN
LATERAL
Semi and Anti JOINs

For the most part you can mix search and relational queries without changing your SQL.

Scoring in JOINs

When using paradedb.score() or paradedb.snippet() inside JOINs:

Scores and snippets are computed before the JOIN at the base table level.
JOIN conditions never change the score, they only determine which rows are combined.

This design keeps scores predictable and consistent across queries.

Performance Characteristics

Fast Cases

Queries are efficient when search filters can be applied directly to the underlying tables. In these cases, PostgreSQL can push down the ||| operators so that each table does its own filtered index scan before the JOIN runs. That means:

Each table only contributes rows that already match the search condition.
The JOIN operates on much smaller intermediate sets.

In this query, both a.bio and b.content are filtered independently. The planner runs efficient index scans on each table and then joins the results.

SELECT a.name, b.title, paradedb.score(a.id)
FROM authors a
JOIN books b ON a.id = b.author_id
WHERE
    a.bio ||| 'science fiction'
    AND b.content ||| 'space travel';

The plan will have this shape:

Gather
  -> Parallel Hash Join
       Hash Cond: (b.id = a.id)
       -> Parallel ParadeDB Scan on authors a
       -> Parallel Hash
            -> Parallel ParadeDB Scan on books b

Slower Cases

Queries become slower when search conditions span multiple tables in a way that prevents PostgreSQL from pushing them down. The most common example is an OR across different tables:

SELECT a.name, b.title
FROM authors a
JOIN books b ON a.id = b.author_id
WHERE
    a.bio ||| 'science'
    OR b.content ||| 'artificial';

Because the condition references both a and b, PostgreSQL cannot apply it until after the join. As a result, both tables must be scanned in full, joined, and only then filtered. The plan will have this shape:

Gather
  -> Parallel Hash Join
       Hash Cond: (a.id = b.author_id)
       Join Filter: (a.bio ||| (...) OR b.content ||| (...))
       -> Parallel Seq Scan on authors a
       -> Parallel Hash
            -> Parallel Seq Scan on books b

Note that the ||| query is in the Join Filter, not in the scan.

Diagnosing Performance

Use EXPLAIN to check the query plan:

EXPLAIN (ANALYZE, BUFFERS)
SELECT a.name, b.title, paradedb.score(a.id)
FROM authors a
JOIN books b ON a.id = b.author_id
WHERE a.bio ||| 'science'
   OR b.content ||| 'artificial';

Watch for:

Custom Scan nodes with large row counts
ParadeDB operators inside JOIN conditions
Tantivy Query: all (full index scan)

Writing Faster JOIN Queries

Replace Cross-Table OR with UNION

If you don’t need scores/snippets and have a simple JOIN, express the OR as a UNION of two separately filtered joins. This lets PostgreSQL push each search predicate down to a Custom Index Scan and avoid a join-time filter.

SELECT a.name, b.title
FROM authors a
JOIN books b ON a.id = b.author_id
WHERE a.bio ||| 'science'
UNION
SELECT a.name, b.title
FROM authors a
JOIN books b ON a.id = b.author_id
WHERE b.content ||| 'artificial';

Use CTEs for Complex Queries

Use common table expressions (CTEs) to pre-filter each table with its own search condition, then join the smaller result sets together. If possible, add a LIMIT to each CTE to keep the result sets small.

WITH matching_authors AS (
  SELECT id, name, paradedb.score(id) AS author_score
  FROM authors
  WHERE bio ||| 'science'
  LIMIT 100
),
matching_books AS (
  SELECT id, title, author_id, paradedb.score(id) AS book_score
  FROM books
  WHERE content ||| 'artificial'
  LIMIT 100
)
SELECT
  COALESCE(ma.name, a.name) AS name,
  COALESCE(mb.title, b.title) AS title,
  ma.author_score,
  mb.book_score
FROM matching_authors ma
FULL JOIN matching_books mb ON ma.id = mb.author_id
LEFT JOIN authors a ON mb.author_id = a.id AND ma.id IS NULL
LEFT JOIN books b ON ma.id = b.author_id AND mb.id IS NULL;

BM25 scores should not be added, if you want to combine scores then consider using reciprocal rank fusion (RRF).

Roadmap

We really want to remove the need to think about the way to do JOINs in ParadeDB. At the moment we are actively working on:

A CustomScan Join API for native join handling
Smarter cost estimation for the PostgreSQL planner

Documentation

​Supported JOIN Types

​Scoring in JOINs

​Performance Characteristics

​Fast Cases

​Slower Cases

​Diagnosing Performance

​Writing Faster JOIN Queries

​Replace Cross-Table OR with UNION

​Use CTEs for Complex Queries

​Roadmap