Overview
Today, a vast amount of data is stored in
- File formats like Parquet or CSV
- Data lakes like S3 or GCS
- Table formats like Delta Lake or Iceberg
ParadeDB’s integrations that make it easy to ingest this data without data processing engines or ETL tools, which can be complex and error-prone.
Basic Usage
In this example, we will query and copy a Parquet file stored in S3 to Postgres. The Parquet file contains 3 million NYC taxi trips from January 2024, hosted in a public S3 bucket provided by ParadeDB.
To begin, let’s create a Postgres foreign data wrapper, which is how ParadeDB connects to S3.
Next, let’s query the foreign table trips
. You’ll notice that the column names and types of this table are automatically
inferred from the Parquet file.
Queries over this table are powered by DuckDB, an in-process analytical query engine. This means that you can run fast analytical queries over data lakes from ParadeDB.
Finally, let’s copy this table into a Postgres heap table. For demonstration, we will copy over the first 100 rows.
That’s it! Please refer to the other sections for instructions on how to ingest from other file and table formats and object stores.
Was this page helpful?