Overview
Basic Usage
ParadeDB supports aggregates through plain SQL functions like COUNT
, SUM
, and AVG
. For instance, the following query tallies the total number of search results for a full text query.
Fast Aggregates
The performance of aggregates over very large result sets (millions of rows or more) is a well-known problem in Postgres.
In ParadeDB Enterprise, aggregates over full text search results are drastically accelerated by fast fields. Fast fields are stored in a column-oriented fashion, which makes them much quicker to read and process.
The following steps are necessary to prepare the ParadeDB instance for fast fields.
Configure Parallel Workers
ParadeDB Enterprise uses Postgres parallel workers. By default, Postgres allows two workers per parallel query.
Increasing the number of parallel workers allows parallel queries to use all of the available hardware on the host machine and can deliver significant
speedups. For instance, if the host machine has 16 CPUs, max_worker_processes
, max_parallel_workers
, and max_parallel_workers_per_gather
should be set to 16.
Remember to restart Postgres in order for postgresql.conf
changes to take effect.
Run VACUUM
VACUUM
updates the table’s visibility map,
which speeds up Postgres’ visibility checks. Specifying INDEX_CLEANUP false
allows VACUUM
to run much
faster over large tables by skipping the cleanup of dead rows in the index.
Make sure to pass the table name, not the index name, to VACUUM
.
If the table experiences frequent updates, we recommend configuring autovacuum.
Run pg_prewarm
The pg_prewarm
extension can be used to preload data from the index into the Postgres buffer cache, which
improves the response times of “cold” queries (i.e. the first search query after Postgres has restarted).
Index Fast Fields
You can check which fields are indexed as fast by running paradedb.schema
.
In order for a query to be accelerated, all of the returned columns must be fast fields. For instance, the following query is accelerated because rating
is a fast
field, and COUNT(*)
does not specify any fields.
You can verify if a query will be accelerated by running EXPLAIN
. Accelerated queries will show either a Custom Scan
or Parallel Index Only Scan
in the query plan.
On the other hand, a query that selects a non-fast field like description
is not accelerated.
If we run EXPLAIN
on this query, we’ll see a regular Index Scan
, which indicates that no fast field optimization occurred.
That’s it! ParadeDB is now ready for fast aggregates.
Was this page helpful?