Basic Usage

ParadeDB supports aggregates through plain SQL functions like COUNT, SUM, and AVG. For instance, the following query tallies the total number of search results for a full text query.

SELECT COUNT(*) FROM mock_items
WHERE description @@@ 'shoes';

Fast Aggregates

Fast aggregates are an enterprise-only feature. Contact us for access.

The performance of aggregates over very large result sets (millions of rows or more) is a well-known problem in Postgres.

In ParadeDB Enterprise, aggregates over full text search results are drastically accelerated by fast fields. Fast fields are stored in a column-oriented fashion, which makes them much quicker to read and process.

The following steps are necessary to prepare the ParadeDB instance for fast fields.

Configure Parallel Workers

ParadeDB Enterprise uses Postgres parallel workers. By default, Postgres allows two workers per parallel query. Increasing the number of parallel workers allows parallel queries to use all of the available hardware on the host machine and can deliver significant speedups. For instance, if the host machine has 16 CPUs, max_worker_processes, max_parallel_workers, and max_parallel_workers_per_gather should be set to 16.

Remember to restart Postgres in order for postgresql.conf changes to take effect.

Run VACUUM

VACUUM updates the table’s visibility map, which speeds up Postgres’ visibility checks. Specifying INDEX_CLEANUP = false allows VACUUM to run much faster over large tables by skipping the cleanup of dead rows in the index.

Make sure to pass the table name, not the index name, to VACUUM.

VACUUM (INDEX_CLEANUP = false) mock_items;

If the table experiences frequent updates, we recommend configuring autovacuum.

Run Cold Query

The following query allows Postgres to cache the underlying index files. This is a one-time cold query that will not be fast, but allows subsequent queries to be fast.

SELECT COUNT(*) FROM mock_items WHERE id @@@ paradedb.all();

Replace mock_items with your table name, and id with your key field.

Index Fast Fields

You can check which fields are indexed as fast by running paradedb.schema.

SELECT name, fast FROM paradedb.schema('search_idx') ORDER BY fast;

In order for a query to be accelerated, all of the returned columns must be fast fields. For instance, the following query is accelerated because rating is a fast field, and COUNT(*) does not specify any fields.

SELECT rating, COUNT(*) FROM mock_items
WHERE description @@@ 'shoes'
GROUP BY rating LIMIT 5;

You can verify if a query will be accelerated by running EXPLAIN. Accelerated queries will show either a Custom Scan or Parallel Index Only Scan in the query plan.

EXPLAIN SELECT rating, COUNT(*) FROM mock_items
WHERE description @@@ 'shoes'
GROUP BY rating LIMIT 5;

On the other hand, a query that selects a non-fast field like description is not accelerated.

SELECT description, COUNT(*) FROM mock_items
WHERE description @@@ 'shoes'
GROUP BY description LIMIT 5;

If we run EXPLAIN on this query, we’ll see a regular Index Scan, which indicates that no fast field optimization occurred.

EXPLAIN SELECT description, COUNT(*) FROM mock_items
WHERE description @@@ 'shoes'
GROUP BY description LIMIT 5;

That’s it! ParadeDB is now ready for fast aggregates.