Top N - ParadeDB

ParadeDB is highly optimized for quickly returning the Top N results out of the index. In SQL, this means queries that contain an ORDER BY...LIMIT:

SELECT description, rating, category
FROM mock_items
WHERE description ||| 'running shoes'
ORDER BY rating
LIMIT 5;

In order for a Top N query to be executed by ParadeDB vs. vanilla Postgres, all of the following conditions must be met:

All ORDER BY fields must be indexed. If they are text fields, they must use the literal tokenizer.
At least one ParadeDB text search operator must be present at the same level as the ORDER BY...LIMIT.
The query must have a LIMIT.
With the exception of lower, ordering by expressions is not supported — only the raw fields themselves.

To verify that ParadeDB is executing the Top N, look for a Custom Scan with a TopNScanExecState in the EXPLAIN output:

EXPLAIN SELECT description, rating, category
FROM mock_items
WHERE description ||| 'running shoes'
ORDER BY rating
LIMIT 5;

Expected Response

                                                                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10.00..10.02 rows=3 width=552)
   ->  Custom Scan (ParadeDB Scan) on mock_items  (cost=10.00..10.02 rows=3 width=552)
         Table: mock_items
         Index: search_idx
         Segment Count: 1
         Exec Method: TopNScanExecState
         Scores: false
            TopN Order By: rating asc
            TopN Limit: 5
         Tantivy Query: {"with_index":{"query":{"match":{"field":"description","value":"running shoes","tokenizer":null,"distance":null,"transposition_cost_one":null,"prefix":null,"conjunction_mode":false}}}}
(10 rows)

If any of the above conditions are not met, the query cannot be fully optimized and you will not see a TopNScanExecState in the EXPLAIN output.

Tiebreaker Sorting

To guarantee stable sorting in the event of a tie, additional columns can be provided to ORDER BY:

SELECT description, rating, category
FROM mock_items
WHERE description ||| 'running shoes'
ORDER BY rating, id
LIMIT 5;

ParadeDB is currently able to handle 3 ORDER BY columns. If there are more than 3 columns, the ORDER BY will not be efficiently executed by ParadeDB.

Sorting by Text

If a text field is present in the ORDER BY clause, it must be indexed with the literal tokenizer. The reason is that the literal tokenizer preserves the original text, which is necessary for accurate sorting. Sorting by lowercase text using lower(<text_field>) is also supported. To enable this, first ensure that lower(<text_field>) is indexed with the literal tokenizer. See indexing expressions for more information.

CREATE INDEX search_idx ON mock_items
USING bm25 (id, (lower(description)::pdb.literal))
WITH (key_field='id');

This allows sorting by lowercase to be optimized.

SELECT description, rating, category
FROM mock_items
WHERE description ||| 'sleek running shoes'
ORDER BY lower(description)
LIMIT 5;

Documentation (v2)

​Tiebreaker Sorting

​Sorting by Text

Tiebreaker Sorting

Sorting by Text