Parquet
Overview
This code block demonstrates how to query Parquet file(s).
Foreign data wrapper name. Can be any string.
Foreign server name. Can be any string.
Foreign table name. Can be any string.
The path of a single Parquet file or multiple Parquet files.
For instance, s3://bucket/folder/file.parquet
if the file is in Amazon S3, https://domain.tld/file.parquet
if the file is on a HTTP server, or /path/to/file.parquet
if the file is on the local file system.
Parquet Options
There are a number of options that can be passed into the CREATE FOREIGN TABLE
statement.
These are the same options accepted
by DuckDB’s read_parquet
function.
The path of a single Parquet file or multiple Parquet files.
For instance, s3://bucket/folder/file.parquet
if the file is in Amazon S3 or /path/to/file.parquet
if the file is on the local file system.
Parquet files generated by legacy writers do not correctly set the UTF8
flag for strings,
causing string columns to be loaded as BLOB
instead. Set this to true to load binary columns as
strings.
Whether or not an extra filename
column should be included in the result.
Whether or not to include the file_row_number
column.
Whether or not to interpret the path as a Hive partitioned path.
If hive_partitioning
is enabled, hive_types
can be used to specify the logical types of the hive
partitions in a struct.
hive_types will be autodetected for the following types: DATE
, TIMESTAMP
and BIGINT
.
To switch off the autodetection, this option can be set to 0
.
Whether the columns of multiple schemas should be unified by name, rather than by position.
Multiple Parquet Files
To treat multiple Parquet files as a single table, their paths should be passed in as a comma-separated string.
To treat a directory of Parquet files as a single table, the glob pattern should be used.
The glob pattern can also be used to read all Parquet files from multiple directories.
Parquet Schema
The parquet_describe
function returns the column names and types contained within a Parquet file. This function is useful
for determining the schema of the Postgres foreign table.
The parquet_schema
function returns the internal schema contained within the metadata of a Parquet file.
Cloud Object Stores
The object stores documentation explains how to provide secrets and other credentials for Parquet files stored in object stores like S3.
Was this page helpful?