ParadeDB Analytics is powered by our new extension, pg_lakehouse. pg_analytics, our original extension for analytics inside Postgres, is being refactored and will be re-released in the coming months. Documentation for pg_analytics has been preserved for existing users.

Overview

ParadeDB Analytics transforms Postgres into a fast analytical query engine over external object stores like Amazon S3, file formats like Parquet, and table formats like Delta Lake. Queries are pushed down to Apache DataFusion, which delivers excellent analytical performance.

Getting Started

The following example queries an example dataset of 3 million NYC taxi trips from January 2024, hosted in a public S3 bucket provided by ParadeDB.

CREATE FOREIGN DATA WRAPPER s3_wrapper
HANDLER s3_fdw_handler
VALIDATOR s3_fdw_validator;

-- Provide S3 credentials
CREATE SERVER s3_server FOREIGN DATA WRAPPER s3_wrapper
OPTIONS (
    region 'us-east-1',
    allow_anonymous 'true'
);

-- Create foreign table
CREATE FOREIGN TABLE trips (
    "VendorID"              INT,
    "tpep_pickup_datetime"  TIMESTAMP,
    "tpep_dropoff_datetime" TIMESTAMP,
    "passenger_count"       BIGINT,
    "trip_distance"         DOUBLE PRECISION,
    "RatecodeID"            DOUBLE PRECISION,
    "store_and_fwd_flag"    TEXT,
    "PULocationID"          REAL,
    "DOLocationID"          REAL,
    "payment_type"          DOUBLE PRECISION,
    "fare_amount"           DOUBLE PRECISION,
    "extra"                 DOUBLE PRECISION,
    "mta_tax"               DOUBLE PRECISION,
    "tip_amount"            DOUBLE PRECISION,
    "tolls_amount"          DOUBLE PRECISION,
    "improvement_surcharge" DOUBLE PRECISION,
    "total_amount"          DOUBLE PRECISION
)
SERVER s3_server
OPTIONS (
    path 's3://paradedb-benchmarks/yellow_tripdata_2024-01.parquet',
    extension 'parquet'
);

-- Success! Now you can query the file like a regular Postgres table
SELECT COUNT(*) FROM trips;
  count
---------
 2964624
(1 row)

If path points to a directory of partitioned files, it must end in a /.

Column names must be wrapped in double quotes to preserve uppercase letters. This is because DataFusion is case-sensitive and Postgres’ foreign table column names must match the foreign table’s column names exactly.

That’s it! To query your own object store, please refer to the object stores and foreign table schema documentation.

For Further Assistance

The paradedb.help function opens a Github Discussion that the ParadeDB team will respond to.

SELECT paradedb.help(
  subject => $$Something isn't working$$,
  body => $$Issue description$$
);