A terms aggregation counts the number of occurrences for every unique value in a field. For example, the following query groups the mock_items table by rating, and calculates the number of items for each unique rating.
SELECT rating, COUNT(*) FROM mock_items
WHERE id @@@ paradedb.all()
GROUP BY rating
LIMIT 10;
Ordering by the bucketing field is supported:
SELECT rating, COUNT(*) FROM mock_items
WHERE id @@@ paradedb.all()
GROUP BY rating
ORDER BY rating
LIMIT 10;
Ordering by COUNT is not yet supported — queries that do so will not be optimized.
All fields in the GROUP BY and ORDER BY clauses, in this case rating, must be indexed as fast.
For performance reasons, we strongly recommend adding a LIMIT to the GROUP BY. Terms aggregations without a LIMIT consume more memory and are slower to execute. If a query does not have a limit and more than 65000 unique values are found in a field, an error will be returned. To verify that a terms aggregation was pushed down, look for a ParadeDB Aggregate Scan with terms in the EXPLAIN output:
EXPLAIN SELECT rating, COUNT(*) FROM mock_items
WHERE id @@@ paradedb.all()
GROUP BY rating
LIMIT 10;
Expected Response
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.00 rows=1 width=12)
   ->  Custom Scan (ParadeDB Aggregate Scan) on mock_items  (cost=0.00..0.00 rows=0 width=12)
         Index: search_idx
         Tantivy Query: {"with_index":{"query":"all"}}
         Aggregate Definition: {"group_0":{"terms":{"field":"rating","size":10,"segment_size":10}}}
(5 rows)