Facets/aggregations are a ParadeDB enterprise feature. Contact us for access.

Prerequisite Aggregations work only on fast fields. Ensure that the fields you want to aggregate on are configured as fast fields.

ParadeDB provides powerful aggregation capabilities over text search results using Tantivy’s aggregation features. With the aggregate function, you can compute various statistical metrics and group data into buckets based on search queries. This is especially useful for deriving insights from your data. You can specify the aggregation operation and the field on which the operation is performed in a JSON format compatible with Elasticsearch.

Basic Usage

<index_name>.aggregate accepts an aggregate query.

SELECT search_idx.aggregate('<aggregation_query>');

This function can be used to aggregate over an entire table:

-- Average rating for all documents
SELECT search_idx.aggregate('{
  "avg_rating": {
    "avg": {"field": "rating"}
  }
}');

Or compute facets on top of a search query:

-- Average rating for documents containing "keyboard"
SELECT bm25_search.aggregate(
  '{
    "avg_rating": {
      "avg": {"field": "rating"}
    }
  }',
  query => paradedb.parse('description:keyboard')
);

Aggregation Names

In the examples above, the top-level key (avg_rating) in the aggregation query is a user-defined name. This name is arbitrary and can be any valid string. It serves as an identifier for the aggregation result, allowing you to reference and distinguish between multiple aggregations in the response.

This naming convention is similar to Elasticsearch’s format, where user-defined names are used to identify and organize aggregation results. This flexibility is essential when you have complex aggregation queries involving multiple metrics or nested aggregations.

Aggregate Refresh

Unlike search, aggregate bypasses the Postgres index access method and queries the index directly. As a result, previously deleted rows may influence the aggregate.

Running VACUUM on the underlying table will remove all dead rows from the index and ensures that only rows visible to the current transaction are factored into the aggregate.

VACUUM <table_name>;