Skip to main content
ParadeDB supports all standard PostgreSQL JOIN types, including:
  • INNER JOIN
  • LEFT JOIN
  • RIGHT JOIN
  • FULL JOIN
  • SEMI JOIN
  • ANTI JOIN
In most cases, join queries run using PostgreSQL’s native execution exactly as they would in a vanilla Postgres database. However, ParadeDB also includes a beta optimization called join pushdown that can significantly accelerate INNER, SEMI, and ANTI joins when they involve ParadeDB search queries.

Join Pushdown (Beta)

Join pushdown is in beta and is available on versions 0.22.0 and up.
Join pushdown is an optimization that allows ParadeDB to execute parts of a JOIN directly inside the ParadeDB executor instead of in Postgres’ row-based executor. This can dramatically reduce latency for certain queries because ParadeDB tries to answer as much of the query as possible using the index before touching the underlying table. To enable join pushdown, first enable the feature:
SET paradedb.enable_join_custom_scan TO on;

Requirements for Join Pushdown

Join pushdown is automatically used when a query meets several conditions. If any of these are not satisfied, PostgreSQL will simply execute the join normally.
RequirementDescription
Supported join typeThe query must use an INNER, SEMI, or ANTI join. Pushdown for other join types is coming soon.
BM25 indexesAll tables participating in the join must have a ParadeDB BM25 index.
Search predicateThe query must contain a ParadeDB operator such as &&&, ===, etc.
Equi-join keyThe join must contain at least one equality condition such as a.id = b.id.
Indexed fieldsAll join keys, filters, and ORDER BY columns must be present in the BM25 index. Text and JSON fields must be columnar.
LIMIT clauseThe query must include a LIMIT.
No aggregatesQueries containing aggregates (GROUP BY, COUNT, etc.) are not currently supported.
If any checks fail, ParadeDB will emit a NOTICE explaining why and fall back to Postgres’ native join execution. To demonstrate, let’s create a second table called orders that can be joined with mock_items:
CALL paradedb.create_bm25_test_table(
  schema_name => 'public',
  table_name => 'orders',
  table_type => 'Orders'
);

ALTER TABLE orders
ADD CONSTRAINT foreign_key_product_id
FOREIGN KEY (product_id)
REFERENCES mock_items(id);

CREATE INDEX orders_idx ON orders
USING bm25 (order_id, product_id, order_quantity, order_total, customer_name)
WITH (key_field = 'order_id');
SELECT * FROM orders ORDER BY order_id LIMIT 3;
Expected Response
 order_id | product_id | order_quantity | order_total | customer_name
----------+------------+----------------+-------------+---------------
        1 |          1 |              3 |       99.99 | John Doe
        2 |          2 |              1 |       49.99 | Jane Smith
        3 |          3 |              5 |      249.95 | Alice Johnson
(3 rows)

Inner Join

An inner join returns rows where a matching row exists in both tables according to the join condition.
SELECT o.order_id, o.customer_name, o.order_total, m.description
FROM orders o
INNER JOIN mock_items m
  ON o.product_id = m.id
WHERE m.description ||| 'keyboard'
AND o.customer_name ||| 'John'
ORDER BY o.order_total DESC
LIMIT 5;
Expected Response
 order_id | customer_name | order_total |       description
----------+---------------+-------------+--------------------------
        4 | John Doe      |      501.87 | Plastic Keyboard
        1 | John Doe      |       99.99 | Ergonomic metal keyboard
(2 rows)
To verify join pushdown, run EXPLAIN on the query and look for a ParadeDB Join Scan in the output.
Expected Response
                                                                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10.00..11.00 rows=5 width=55)
   ->  Custom Scan (ParadeDB Join Scan)  (cost=10.00..11.00 rows=5 width=55)
         Relation Tree: m INNER o
         Join Cond: o.product_id = m.id
         Limit: 5
         Order By: o.order_total desc
         DataFusion Physical Plan:
           : ProjectionExec: expr=[NULL as col_1, NULL as col_2, order_total@2 as col_3, NULL as col_4, ctid_0@0 as ctid_0, ctid_1@1 as ctid_1]
           :   SortExec: TopK(fetch=5), expr=[order_total@2 DESC], preserve_partitioning=[false]
           :     HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(id@1, product_id@1)], projection=[ctid_0@0, ctid_1@2, order_total@4]
           :       ProjectionExec: expr=[ctid@0 as ctid_0, id@1 as id]
           :         CooperativeExec
           :           PgSearchScan: segments=1, query={"with_index":{"query":{"match":{"field":"description","value":"keyboard","tokenizer":null,"distance":null,"transposition_cost_one":null,"prefix":null,"conjunction_mode":false}}}}
           :       ProjectionExec: expr=[ctid@0 as ctid_1, product_id@1 as product_id, order_total@2 as order_total]
           :         CooperativeExec
           :           PgSearchScan: segments=1, dynamic_filters=2, query={"with_index":{"query":{"match":{"field":"customer_name","value":"John","tokenizer":null,"distance":null,"transposition_cost_one":null,"prefix":null,"conjunction_mode":false}}}}
(16 rows)

Semi Join

A semi join returns rows from the left table when a matching row exists in the right table. In SQL, this usually appears as an IN or EXISTS query:
SELECT o.order_id, o.order_total FROM orders o
WHERE o.product_id IN (
  SELECT m.id
  FROM mock_items m
  WHERE m.description ||| 'keyboard'
)
ORDER BY o.order_total DESC
LIMIT 5;
Expected Response
 order_id | order_total
----------+-------------
       27 |      676.15
       57 |      676.15
       11 |      633.94
       41 |      633.94
        4 |      501.87
(5 rows)
To verify join pushdown, run EXPLAIN on the query and look for a ParadeDB Join Scan in the output.
Expected Response
                                                                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10.00..11.00 rows=5 width=11)
   ->  Custom Scan (ParadeDB Join Scan)  (cost=10.00..11.00 rows=5 width=11)
         Relation Tree: m INNER o
         Join Cond: o.product_id = m.id
         Limit: 5
         Order By: o.order_total desc
         DataFusion Physical Plan:
           : ProjectionExec: expr=[NULL as col_1, order_total@2 as col_2, ctid_0@0 as ctid_0, ctid_1@1 as ctid_1]
           :   SortExec: TopK(fetch=5), expr=[order_total@2 DESC], preserve_partitioning=[false]
           :     HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(id@1, product_id@1)], projection=[ctid_0@0, ctid_1@2, order_total@4]
           :       ProjectionExec: expr=[ctid@0 as ctid_0, id@1 as id]
           :         CooperativeExec
           :           PgSearchScan: segments=1, query={"with_index":{"query":{"match":{"field":"description","value":"keyboard","tokenizer":null,"distance":null,"transposition_cost_one":null,"prefix":null,"conjunction_mode":false}}}}
           :       ProjectionExec: expr=[ctid@0 as ctid_1, product_id@1 as product_id, order_total@2 as order_total]
           :         CooperativeExec
           :           PgSearchScan: segments=1, dynamic_filters=2, query="all"
(16 rows)

Anti Join

An anti join returns rows from the left table when no matching row exists in the right table. This typically appears as NOT EXISTS or NOT IN.
SELECT o.order_id, o.order_total FROM orders o
WHERE NOT EXISTS (
  SELECT 1
  FROM mock_items m
  WHERE m.id = o.product_id
    AND m.description ||| 'keyboard'
)
ORDER BY o.order_total DESC
LIMIT 5;
Expected Response
 order_id | order_total
----------+-------------
       10 |      638.73
       40 |      638.73
       21 |      632.08
       51 |      632.08
       22 |      605.18
(5 rows)
To verify join pushdown, run EXPLAIN on the query and look for a ParadeDB Join Scan in the output.
Expected Response
                                                                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10.00..11.00 rows=5 width=11)
   ->  Custom Scan (ParadeDB Join Scan)  (cost=10.00..11.00 rows=5 width=11)
         Relation Tree: o ANTI m
         Join Cond: m.id = o.product_id
         Limit: 5
         Order By: o.order_total desc
         DataFusion Physical Plan:
           : ProjectionExec: expr=[NULL as col_1, order_total@1 as col_2, ctid_0@0 as ctid_0]
           :   SortExec: TopK(fetch=5), expr=[order_total@1 DESC], preserve_partitioning=[false]
           :     HashJoinExec: mode=CollectLeft, join_type=RightAnti, on=[(id@0, product_id@1)], projection=[ctid_0@0, order_total@2]
           :       CooperativeExec
           :         PgSearchScan: segments=1, query={"with_index":{"query":{"match":{"field":"description","value":"keyboard","tokenizer":null,"distance":null,"transposition_cost_one":null,"prefix":null,"conjunction_mode":false}}}}
           :       ProjectionExec: expr=[ctid@0 as ctid_0, product_id@1 as product_id, order_total@2 as order_total]
           :         CooperativeExec
           :           PgSearchScan: segments=1, dynamic_filters=1, query="all"
(15 rows)

Future Work

We are actively improving join pushdown, specifically when it comes to pushing down more shapes of joins. If your join query is not currently supported by join pushdown (or isn’t as fast as you’d like!), we invite you to open a Github issue.