Overview

This code block demonstrates how to query CSV file(s).

CREATE FOREIGN DATA WRAPPER <wrapper_name>
HANDLER csv_fdw_handler
VALIDATOR csv_fdw_validator;

CREATE SERVER <server_name>
FOREIGN DATA WRAPPER <wrapper_name>;

CREATE FOREIGN TABLE <table_name> ()
SERVER <server_name>
OPTIONS (files '<files>');
wrapper_name
required

Foreign data wrapper name. Can be any string.

server_name
required

Foreign server name. Can be any string.

table_name
required

Foreign table name. Can be any string.

files
required

The path of a single CSV file or multiple CSV files. For instance, s3://bucket/folder/file.csv if the file is in Amazon S3 or /path/to/file.csv if the file is on the local file system.

CSV Options

There are a number of options that can be passed into the CREATE FOREIGN TABLE statement. These are the same options accepted by DuckDB’s read_csv function.

all_varchar
default: "false"

Option to skip type detection for CSV parsing and assume all columns to be of typeVARCHAR.

allow_quoted_nulls
default: "true"

Option to allow the conversion of quoted values to NULL values.

auto_detect
default: "true"

Enables auto detection of CSV parameters. See Auto Detection.

auto_type_candidates

This option allows you to specify the types that the sniffer will use when detecting CSV column types. The VARCHAR type is always included in the detected types (as a fallback option). See Auto Type Candidates.

columns

A struct that specifies the column names and column types contained within the CSV file (e.g., {'col1': 'INTEGER', 'col2': 'VARCHAR'}). Using this option implies that auto detection is not used.

compression
default: "auto"

The compression type for the file. By default this will be detected automatically from the file extension (e.g., t.csv.gz will use gzip, t.csv will use none). Options are none, gzip, zstd.

dateformat

Specifies the date format to use when parsing dates. See Date Format.

decimal_separator
default: "."

The decimal separator of numbers.

delim
default: ","

Specifies the delimiter character that separates columns within each row (line) of the file. Alias for sep.

escape
default: "\""

Specifies the string that should appear before a data character sequence that matches the quote value.

filename
default: "false"

Whether or not an extra filename column should be included in the result.

force_not_null

Do not match the specified columns’ values against the NULL string. In the default case where the NULL string is empty, this means that empty values will be read as zero-length strings rather than NULLs.

header
default: "false"

Specifies that the file contains a header line with the names of each column in the file.

hive_partitioning
default: "false"

Whether or not to interpret the path as a Hive partitioned path.

hive_types

If hive_partitioning is enabled, hive_types can be used to specify the logical types of the hive partitions in a struct.

hive_types_autocast

hive_types will be autodetected for the following types: DATE, TIMESTAMP and BIGINT. To switch off the autodetection, this option can be set to 0.

ignore_errors
default: "false"

Option to ignore any parsing errors encountered and instead ignore rows with errors.

max_line_size
default: "2097152"

The maximum line size in bytes.

names

The column names as a list if the file does not contain a header.

new_line

Set the new line character(s) in the file. Options are ‘\r’,‘\n’, or ‘\r\n’.

normalize_names
default: "false"

Boolean value that specifies whether or not column names should be normalized, removing any non-alphanumeric characters from them.

null_padding
default: "false"

If this option is enabled, when a row lacks columns, it will pad the remaining columns on the right with null values.

nullstr

Specifies the string that represents a NULL value or a list of strings that represent a NULL value.

parallel
default: "true"

Whether or not the parallel CSV reader is used.

quote
default: "\""

Specifies the quoting string to be used when a data value is quoted.

sample_size
default: "20480"

The number of sample rows for auto detection of parameters.

sep
default: ","

Specifies the delimiter character that separates columns within each row (line) of the file. Alias for delim.

skip
default: "0"

The number of lines at the top of the file to skip.

timestampformat
default: "(empty)"

Specifies the date format to use when parsing timestamps. See Date Format.

types

The column types as a list by position.

union_by_name
default: "false"

Whether the columns of multiple schemas should be unified by name, rather than by position.

Multiple CSV Files

To treat multiple CSV files as a single table, their paths should be passed in as a comma-separated string.

CREATE FOREIGN TABLE csv_table ()
SERVER csv_server
OPTIONS (
    files '/path/to/file1.csv, /path/to/file2.csv'
);

To treat a directory of CSV files as a single table, the glob pattern should be used.

CREATE FOREIGN TABLE csv_table ()
SERVER csv_server
OPTIONS (
    files '/folder/*.csv',
);

The glob pattern can also be used to read all CSV files from multiple directories.

CREATE FOREIGN TABLE csv_table ()
SERVER csv_server
OPTIONS (
    files '/folder1/*.csv, /folder2/*.csv'
);

Cloud Object Stores

The object stores documentation explains how to provide secrets and other credentials for CSV files stored in object stores like S3.