Top N - ParadeDB

ParadeDB is highly optimized for quickly returning the Top N results out of the index. In SQL, this means queries that contain an ORDER BY...LIMIT:

SELECT description, rating, category
FROM mock_items
WHERE description ||| 'running shoes'
ORDER BY rating
LIMIT 5;

In order for a Top N query to be executed by ParadeDB vs. vanilla Postgres, all of the following conditions must be met:

All ORDER BY fields must be indexed. If they are text fields, they must use the literal tokenizer.
At least one ParadeDB text search operator must be present at the same level as the ORDER BY...LIMIT.
The query must have a LIMIT.
With the exception of lower, ordering by expressions is not supported — only the raw fields themselves.

To verify that ParadeDB is executing the Top N, look for a Custom Scan with a TopNScanExecState in the EXPLAIN output:

EXPLAIN SELECT description, rating, category
FROM mock_items
WHERE description ||| 'running shoes'
ORDER BY rating
LIMIT 5;

Expected Response

                                                                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10.00..10.02 rows=3 width=552)
   ->  Custom Scan (ParadeDB Scan) on mock_items  (cost=10.00..10.02 rows=3 width=552)
         Table: mock_items
         Index: search_idx
         Segment Count: 1
         Exec Method: TopNScanExecState
         Scores: false
            TopN Order By: rating asc
            TopN Limit: 5
         Tantivy Query: {"with_index":{"query":{"match":{"field":"description","value":"running shoes","tokenizer":null,"distance":null,"transposition_cost_one":null,"prefix":null,"conjunction_mode":false}}}}
(10 rows)

If any of the above conditions are not met, the query cannot be fully optimized and you will not see a TopNScanExecState in the EXPLAIN output.

Tiebreaker Sorting

To guarantee stable sorting in the event of a tie, additional columns can be provided to ORDER BY:

SELECT description, rating, category
FROM mock_items
WHERE description ||| 'running shoes'
ORDER BY rating, id
LIMIT 5;

ParadeDB is currently able to handle 3 ORDER BY columns. If there are more than 3 columns, the ORDER BY will not be efficiently executed by ParadeDB.

Sorting by Text

If a text field is present in the ORDER BY clause, it must be indexed with the literal or literal normalized tokenizer. Sorting by lowercase text using lower(<text_field>) is also supported. To enable this, the expression lower(<text_field>) must be indexed with either the literal or literal normalized tokenizer. See indexing expressions for more information.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, (lower(description)::pdb.literal))
WITH (key_field='id');

This allows sorting by lowercase to be optimized.

SELECT description, rating, category
FROM mock_items
WHERE description ||| 'sleek running shoes'
ORDER BY lower(description)
LIMIT 5;

Sorting by JSON

Ordering by JSON subfield is on the roadmap but not yet supported. For example, this query will not receive an optimized Top N scan:

SELECT id, description, metadata
FROM mock_items
WHERE description ||| 'sleek running shoes'
ORDER BY metadata->'weight'
LIMIT 5;

Documentation

​Tiebreaker Sorting

​Sorting by Text

​Sorting by JSON

Tiebreaker Sorting

Sorting by Text

Sorting by JSON