ParadeDB is highly optimized for quickly returning the Top N results out of the index. In SQL, this means queries that contain an ORDER BY...LIMIT
:
SELECT description , rating, category
FROM mock_items
WHERE description || | 'running shoes'
ORDER BY rating
LIMIT 5 ;
In order for a Top N query to be executed by ParadeDB vs. vanilla Postgres, all of the following conditions must be met:
All ORDER BY
fields must be indexed. If they are text fields, they must use the literal tokenizer .
At least one ParadeDB text search operator must be present at the same level as the ORDER BY...LIMIT
.
The query must have a LIMIT
.
With the exception of lower
, ordering by expressions is not supported — only the raw fields themselves.
To verify that ParadeDB is executing the Top N, look for a Custom Scan
with a TopNScanExecState
in the EXPLAIN
output:
EXPLAIN SELECT description , rating, category
FROM mock_items
WHERE description || | 'running shoes'
ORDER BY rating
LIMIT 5 ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=10.00..10.02 rows=3 width=552)
-> Custom Scan (ParadeDB Scan) on mock_items (cost=10.00..10.02 rows=3 width=552)
Table: mock_items
Index: search_idx
Segment Count: 1
Exec Method: TopNScanExecState
Scores: false
TopN Order By: rating asc
TopN Limit: 5
Tantivy Query: {"with_index":{"query":{"match":{"field":"description", "value":"running shoes", "tokenizer":null, "distance":null, "transposition_cost_one":null, "prefix":null, "conjunction_mode":false}}}}
(10 rows)
If any of the above conditions are not met, the query cannot be fully optimized and you will not see a TopNScanExecState
in the EXPLAIN
output.
Tiebreaker Sorting
To guarantee stable sorting in the event of a tie, additional columns can be provided to ORDER BY
:
SELECT description , rating, category
FROM mock_items
WHERE description || | 'running shoes'
ORDER BY rating, id
LIMIT 5 ;
ParadeDB is currently able to handle 3 ORDER BY
columns. If there are more
than 3 columns, the ORDER BY
will not be efficiently executed by ParadeDB.
Sorting by Text
If a text field is present in the ORDER BY
clause, it must be indexed with the literal tokenizer.
The reason is that the literal tokenizer preserves the original text, which is necessary for accurate sorting.
Sorting by lowercase text using lower(<text_field>)
is also supported. To enable this, first ensure that lower(<text_field>)
is indexed with the literal tokenizer.
See indexing expressions for more information.
CREATE INDEX search_idx ON mock_items
USING bm25 (id, ( lower ( description ):: pdb . literal ))
WITH (key_field = 'id' );
This allows sorting by lowercase to be optimized.
SELECT description , rating, category
FROM mock_items
WHERE description || | 'sleek running shoes'
ORDER BY lower ( description )
LIMIT 5 ;