CSV
Overview
This code block demonstrates how to query CSV file(s).
CREATE FOREIGN DATA WRAPPER <wrapper_name>
HANDLER csv_fdw_handler
VALIDATOR csv_fdw_validator;
CREATE SERVER <server_name>
FOREIGN DATA WRAPPER <wrapper_name>;
CREATE FOREIGN TABLE <table_name> ()
SERVER <server_name>
OPTIONS (files '<files>');
Foreign data wrapper name. Can be any string.
Foreign server name. Can be any string.
Foreign table name. Can be any string.
The path of a single CSV file or multiple CSV files.
For instance, s3://bucket/folder/file.csv
if the file is in Amazon S3, https://domain.tld/file.csv
if the file is on a HTTP server, or /path/to/file.csv
if the file is on the local file system.
CSV Options
There are a number of options that can be passed into the CREATE FOREIGN TABLE
statement.
These are the same options accepted
by DuckDB’s read_csv
function.
Option to skip type detection for CSV parsing and assume all columns to be of typeVARCHAR
.
Option to allow the conversion of quoted values to NULL
values.
Enables auto detection of CSV parameters. See Auto Detection.
This option allows you to specify the types that the sniffer will use when detecting CSV column types.
The VARCHAR
type is always included in the detected types (as a fallback option).
See Auto Type Candidates.
A struct that specifies the column names and column types contained within the CSV file
(e.g., {'col1': 'INTEGER', 'col2': 'VARCHAR'}
). Using this option implies that auto detection is
not used.
The compression type for the file. By default this will be detected
automatically from the file extension (e.g., t.csv.gz
will use gzip
,
t.csv
will use none
). Options are none
, gzip
, zstd
.
Specifies the date format to use when parsing dates. See Date Format.
The decimal separator of numbers.
Specifies the delimiter character that separates columns within each row
(line) of the file. Alias for sep
.
Specifies the string that should appear before a data character sequence that matches the quote value.
Whether or not an extra filename column should be included in the result.
Do not match the specified columns’ values against the NULL
string. In the default case where the NULL
string is empty, this means that empty values will be read as zero-length strings rather than NULLs.
Specifies that the file contains a header line with the names of each column in the file.
Whether or not to interpret the path as a Hive partitioned path.
If hive_partitioning
is enabled, hive_types
can be used to specify the logical types of the hive
partitions in a struct.
hive_types will be autodetected for the following types: DATE
, TIMESTAMP
and BIGINT
.
To switch off the autodetection, this option can be set to 0
.
Option to ignore any parsing errors encountered and instead ignore rows with errors.
The maximum line size in bytes.
The column names as a list if the file does not contain a header.
Set the new line character(s) in the file. Options are ‘\r’,‘\n’, or ‘\r\n’.
Boolean value that specifies whether or not column names should be normalized, removing any non-alphanumeric characters from them.
If this option is enabled, when a row lacks columns, it will pad the remaining columns on the right with null values.
Specifies the string that represents a NULL
value or a list of strings that represent a NULL
value.
Whether or not the parallel CSV reader is used.
Specifies the quoting string to be used when a data value is quoted.
The number of sample rows for auto detection of parameters.
Specifies the delimiter character that separates columns within each row
(line) of the file. Alias for delim
.
The number of lines at the top of the file to skip.
Specifies the date format to use when parsing timestamps. See Date Format.
The column types as a list by position.
Whether the columns of multiple schemas should be unified by name, rather than by position.
Multiple CSV Files
To treat multiple CSV files as a single table, their paths should be passed in as a comma-separated string.
CREATE FOREIGN TABLE csv_table ()
SERVER csv_server
OPTIONS (
files '/path/to/file1.csv, /path/to/file2.csv'
);
To treat a directory of CSV files as a single table, the glob pattern should be used.
CREATE FOREIGN TABLE csv_table ()
SERVER csv_server
OPTIONS (
files '/folder/*.csv',
);
The glob pattern can also be used to read all CSV files from multiple directories.
CREATE FOREIGN TABLE csv_table ()
SERVER csv_server
OPTIONS (
files '/folder1/*.csv, /folder2/*.csv'
);
Cloud Object Stores
The object stores documentation explains how to provide secrets and other credentials for CSV files stored in object stores like S3.