On this page is a brief description of the internals of the DuckDB engine.
The parser converts a query string into the following tokens:
The parser is not aware of the catalog or any other aspect of the database. It will not throw errors if tables do not exist, and will not resolve any types of columns yet. It only transforms a query string into a set of tokens as specified.
The ParsedExpression represents an expression within a SQL statement. This can be e.g., a reference to a column, an addition operator or a constant value. The type of the ParsedExpression indicates what it represents, e.g., a comparison is represented as a ComparisonExpression.
ParsedExpressions do not have types, except for nodes with explicit types such as CAST statements. The types for expressions are resolved in the Binder, not in the Parser.
The TableRef represents any table source. This can be a reference to a base table, but it can also be a join, a table-producing function or a subquery.
The QueryNode represents either (1) a SELECT statement, or (2) a set operation (i.e. UNION, INTERSECT or DIFFERENCE).
The SQLStatement represents a complete SQL statement. The type of the SQL Statement represents what kind of statement it is (e.g., StatementType::SELECT represents a SELECT statement). A single SQL string can be transformed into multiple SQL statements in case the original query string contains multiple queries.
The binder converts all nodes into their bound equivalents. In the binder phase:
The following conversions happen:
BoundStatement
BoundQueryNode
BoundTableRef
Expression
The logical planner creates LogicalOperator nodes from the bound statements. In this phase, the actual logical query tree is created.
After the logical planner has created the logical query tree, the optimizers are run over that query tree to create an optimized query plan. The following query optimizers are run:
DPccp algorithm from the paper Dynamic Programming Strikes Back is used.The column binding resolver converts logical BoundColumnRefExpresion nodes that refer to a column of a specific table into BoundReferenceExpression nodes that refer to a specific index into the DataChunks that are passed around in the execution engine.
The physical plan generator converts the resulting logical operator tree into a PhysicalOperator tree.
In the execution phase, the physical operators are executed to produce the query result. DuckDB uses a push-based vectorized model, where DataChunks are pushed through the operator tree. For more information, see the talk Push-Based Execution in DuckDB.
© Copyright 2018–2024 Stichting DuckDB Foundation
Licensed under the MIT License.
https://duckdb.org/docs/internals/overview.html