Below is a collection of tips to help when attempting to import complex CSV files. In the examples, we use the flights.csv file.
If a file contains only string columns the header auto-detection might fail. Provide the header option to override this behavior.
SELECT * FROM read_csv('flights.csv', header = true); If the file does not contain a header, names will be auto-generated by default. You can provide your own names with the names option.
SELECT * FROM read_csv('flights.csv', names = ['DateOfFlight', 'CarrierName']); The types flag can be used to override types of only certain columns by providing a struct of name → type mappings.
SELECT * FROM read_csv('flights.csv', types = {'FlightDate': 'DATE'}); COPY When Loading Data into a Table The COPY statement copies data directly into a table. The CSV reader uses the schema of the table instead of auto-detecting types from the file. This speeds up the auto-detection, and prevents mistakes from being made during auto-detection.
COPY tbl FROM 'test.csv';
union_by_name When Loading Files with Different Schemas The union_by_name option can be used to unify the schema of files that have different or missing columns. For files that do not have certain columns, NULL values are filled in.
SELECT * FROM read_csv('flights*.csv', union_by_name = true);
© Copyright 2018–2024 Stichting DuckDB Foundation
Licensed under the MIT License.
https://duckdb.org/docs/data/csv/tips.html