Overview

This code block demonstrates how to query Parquet file(s).

CREATE FOREIGN DATA WRAPPER <wrapper_name>
HANDLER parquet_fdw_handler
VALIDATOR parquet_fdw_validator;

CREATE SERVER <server_name>
FOREIGN DATA WRAPPER <wrapper_name>;

CREATE FOREIGN TABLE <table_name> ()
SERVER <server_name>
OPTIONS (files '<files>');
wrapper_name
required

Foreign data wrapper name. Can be any string.

server_name
required

Foreign server name. Can be any string.

table_name
required

Foreign table name. Can be any string.

files
required

The path of a single Parquet file or multiple Parquet files. For instance, s3://bucket/folder/file.parquet if the file is in Amazon S3, https://domain.tld/file.parquet if the file is on a HTTP server, or /path/to/file.parquet if the file is on the local file system.

Parquet Options

There are a number of options that can be passed into the CREATE FOREIGN TABLE statement. These are the same options accepted by DuckDB’s read_parquet function.

CREATE FOREIGN TABLE parquet_table ()
SERVER parquet_server
OPTIONS (
    files 's3://bucket/folder/file.parquet',
    binary_as_string 'true',
    hive_partitioning 'true'
);
files
required

The path of a single Parquet file or multiple Parquet files. For instance, s3://bucket/folder/file.parquet if the file is in Amazon S3 or /path/to/file.parquet if the file is on the local file system.

binary_as_string
default:
"false"

Parquet files generated by legacy writers do not correctly set the UTF8 flag for strings, causing string columns to be loaded as BLOB instead. Set this to true to load binary columns as strings.

filename
default:
"false"

Whether or not an extra filename column should be included in the result.

file_row_number
default:
"false"

Whether or not to include the file_row_number column.

hive_partitioning
default:
"false"

Whether or not to interpret the path as a Hive partitioned path.

hive_types

If hive_partitioning is enabled, hive_types can be used to specify the logical types of the hive partitions in a struct.

hive_types_autocast

hive_types will be autodetected for the following types: DATE, TIMESTAMP and BIGINT. To switch off the autodetection, this option can be set to 0.

union_by_name
default:
"false"

Whether the columns of multiple schemas should be unified by name, rather than by position.

Multiple Parquet Files

To treat multiple Parquet files as a single table, their paths should be passed in as a comma-separated string.

CREATE FOREIGN TABLE parquet_table ()
SERVER parquet_server
OPTIONS (
    files '/path/to/file1.parquet, /path/to/file2.parquet'
);

To treat a directory of Parquet files as a single table, the glob pattern should be used.

CREATE FOREIGN TABLE parquet_table ()
SERVER parquet_server
OPTIONS (
    files '/folder/*.parquet',
);

The glob pattern can also be used to read all Parquet files from multiple directories.

CREATE FOREIGN TABLE parquet_table ()
SERVER parquet_server
OPTIONS (
    files '/folder1/*.parquet, /folder2/*.parquet'
);

Parquet Schema

The parquet_describe function returns the column names and types contained within a Parquet file. This function is useful for determining the schema of the Postgres foreign table.

SELECT * FROM parquet_describe('/path/to/file.parquet')

The parquet_schema function returns the internal schema contained within the metadata of a Parquet file.

SELECT * FROM parquet_schema('/path/to/file.parquet');

Cloud Object Stores

The object stores documentation explains how to provide secrets and other credentials for Parquet files stored in object stores like S3.