Issue #, if available:
Closes #201
Description of changes
Adds the required changes for enabling query evaluation using a DAG model.
What is our definition of DAG?
We are only interested in DAGs that can be used as execution plans, which leads to the following definition. A DAG is a directed, cycle-free graph G = (V, E) with a denoted root node v0 ∈ V such that all v ∈ V {v0} are reachable from v0. Note that this is the definition of trees without the condition |E| = |V | − 1. Hence, all trees are DAGs. Reference: https://link.springer.com/article/10.1007/s00450-009-0061-0
In addition
- The steel thread includes only
Scan
and Project
operators, and passes the test for planning and evaluation of the following query:
SELECT a AS b FROM data
- The code is not clean, maybe not idiomatic at some instances, and lacks documentation. I'll iterate through the code with a consequent PR when adding new operators.
Algorithm
We're loosely using a push model with DAG as called out in the following: https://link.springer.com/article/10.1007/s00450-009-0061-0 (Section 10.3.5. Pushing)
- Create a DAG logical plan
LogicalPlan
with each node as a logical operator (for now manual until we integrate with AST -> PLAN).
- Create an evaluation plan
EvalPlan
with the same structure as logical plan.
- Toposort the
EvalPlan
to ensure all dependencies of a node get evaluated before the node itself.
- For each operator in
EvalPlan
, eval the node.
- Store the result in the node.
- Push the output to each consumer—for now, for final result, we use a
Sink
node that is considered as the node that accumulates output; we need to see if there are better alternatives.
Yet to be implemented
- [ ] port all the other existing nodes and ensure the current tests pass against them
- [ ] add error handling and validation (incl. DAG cyclic validation).
- [ ] add documentation.
Finally, we're not performing any optimization at this stage (and we won't for extended amount of time).
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.