Facets - ParadeDB

A common pattern in search is to query for both an aggregate and a set of search results. For example, “find the top 10 results, and also count the total number of results.” Instead of issuing two separate queries — one for the search results, and another for the aggregate — pdb.agg allows for these results to be returned in a single “faceted” query. This can significantly improve read throughput, since issuing a single query uses less CPU and disk I/O. For example, this query returns the top 3 search results alongside the total number of results found.

SELECT
  id, description, rating,
  pdb.agg('{"value_count": {"field": "id"}}') OVER ()
FROM mock_items
WHERE category === 'electronics'
ORDER BY rating DESC
LIMIT 3;

Expected Response

 id |         description         | rating |      agg
----+-----------------------------+--------+----------------
 12 | Innovative wireless earbuds |      5 | {"value": 5.0}
  1 | Ergonomic metal keyboard    |      4 | {"value": 5.0}
  2 | Plastic Keyboard            |      4 | {"value": 5.0}
(3 rows)

Faceted queries require that pdb.agg be used as a window function: pdb.agg() OVER ().

Faceted Performance Optimization

On every query, ParadeDB runs checks to ensure that deleted or updated-away rows are not factored into the result set. If your table is not frequently updated or you can tolerate an approximate result, the performance of faceted queries can be improved by disabling these visibility checks. To do so, set the second argument of pdb.agg to false.

SELECT
     description, rating, category,
     pdb.agg('{"value_count": {"field": "id"}}', false) OVER ()
FROM mock_items
WHERE description ||| 'running shoes'
ORDER BY rating
LIMIT 5;

Disabling this check can improve query times by 2-4x in some cases (at the expense of correctness).

Documentation

​Faceted Performance Optimization

Faceted Performance Optimization