A common pattern in search is to query for both an aggregate and a set of search results. For example, “find the top 10
results, and also count the total number of results.”
Instead of issuing two separate queries — one for the search results, and another for the aggregate — pdb.agg allows for
these results to be returned in a single “faceted” query. This can significantly improve read throughput, since issuing a single
query uses less CPU and disk I/O.
For example, this query returns the top 3 search results alongside the total number of results found.
SELECT
id, description, rating,
pdb.agg('{"value_count": {"field": "id"}}') OVER ()
FROM mock_items
WHERE category === 'electronics'
ORDER BY rating DESC
LIMIT 3;
id | description | rating | agg
----+-----------------------------+--------+----------------
12 | Innovative wireless earbuds | 5 | {"value": 5.0}
1 | Ergonomic metal keyboard | 4 | {"value": 5.0}
2 | Plastic Keyboard | 4 | {"value": 5.0}
(3 rows)
Faceted queries require that pdb.agg be used as a window function:
pdb.agg() OVER ().
On every query, ParadeDB runs checks to ensure that deleted or updated-away rows are not factored into the result set.
If your table is not frequently updated or you can tolerate an approximate result, the performance of faceted queries can be improved by disabling these visibility checks.
To do so, set the second argument of pdb.agg to false.
SELECT
description, rating, category,
pdb.agg('{"value_count": {"field": "id"}}', false) OVER ()
FROM mock_items
WHERE description ||| 'running shoes'
ORDER BY rating
LIMIT 5;
Disabling this check can improve query times by 2-4x in some cases (at the expense of correctness).