Overview

Regular Postgres tables, known as heap tables, organize data by row. While this makes sense for operational data, it is inefficient for analytical queries, which often scan a large amount of data from a subset of the columns in a table.

ParadeDB introduces special tables called parquet tables. These tables behave like regular Postgres tables but use a column-oriented layout via Apache Arrow and Parquet and leverage Apache DataFusion, a query engine optimized for column-oriented data.

Using Parquet Tables

-- USING parquet must be provided
CREATE TABLE movies (name text, rating int) USING parquet;

INSERT INTO movies VALUES ('Star Wars', 9), ('Indiana Jones', 8);
SELECT AVG(rating) FROM movies;

DROP TABLE movies;

That’s it! parquet tables accept standard Postgres queries, so there’s nothing new to learn.

Copying into a Parquet Table

This example demonstrates how to copy data from an existing heap table into a parquet table.

-- Create heap table
CREATE TABLE heap (a int, b int);
INSERT INTO heap VALUES (1, 2);

-- Create parquet table with the same schema
CREATE TABLE events (a int, b int) USING parquet;

-- Copy data into parquet table
INSERT INTO events SELECT * FROM heap;

Use Cases

parquet tables have two primary advantages over regular tables:

  1. Significantly faster aggregate queries
  2. Lower disk space, since data is stored as highly-compressed Parquet files

They should be used for storing and querying large volumes of event or analytics data.

Non Use Cases

Regular Postgres tables should be used for operational data that requires frequent updates or deletes.

Known Limitations

parquet tables are currently in beta. The following is a list of known limitations. Many of these will become resolved as parquet tables become production-ready.

  • UPDATE statements
  • Nested DELETE statements
  • Partitioning tables by column
  • Some Postgres types like JSON, time, and timestamp with time zone
  • User-defined functions, aggregations, or types
  • Referencing parquet and regular Postgres heap tables in the same query
  • Write-ahead-log (WAL) support and ROLLBACK
  • Foreign keys
  • Index scans
  • Collations
  • Using an external data lake as a table storage provider
  • Full text search over parquet tables with pg_bm25