Datafuse is a Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture written in Rust, inspired by ClickHouse and powered by arrow-rs, built to make it easy to power the Data Cloud.
Principles
-
Fearless
- No data races, No unsafe, Minimize unhandled errors
-
High Performance
- Everything is Parallelism
-
High Scalability
- Everything is Distributed
-
High Reliability
- Datafuse primary design goal is reliability
Architecture
Performance
- Memory SIMD-Vector processing performance only
- Dataset: 100,000,000,000 (100 Billion)
- Hardware: AMD Ryzen 7 PRO 4750U, 8 CPU Cores, 16 Threads
- Rust: rustc 1.55.0-nightly (868c702d0 2021-06-30)
- Build with Link-time Optimization and Using CPU Specific Instructions
- ClickHouse server version 21.4.6 revision 54447
Query | FuseQuery (v0.4.40-nightly) | ClickHouse (v21.4.6) |
---|---|---|
SELECT avg(number) FROM numbers_mt(100000000000) | 4.35 s. (22.97 billion rows/s., 183.91 GB/s.) |
×1.4 slow, (6.04 s.) (16.57 billion rows/s., 132.52 GB/s.) |
SELECT sum(number) FROM numbers_mt(100000000000) | 4.20 s. (23.79 billion rows/s., 190.50 GB/s.) |
×1.4 slow, (5.90 s.) (16.95 billion rows/s., 135.62 GB/s.) |
SELECT min(number) FROM numbers_mt(100000000000) | 4.92 s. (20.31 billion rows/s., 162.64 GB/s.) |
×2.7 slow, (13.05 s.) (7.66 billion rows/s., 61.26 GB/s.) |
SELECT max(number) FROM numbers_mt(100000000000) | 4.77 s. (20.95 billion rows/s., 167.78 GB/s.) |
×3.0 slow, (14.07 s.) (7.11 billion rows/s., 56.86 GB/s.) |
SELECT count(number) FROM numbers_mt(100000000000) | 2.91 s. (34.33 billion rows/s., 274.90 GB/s.) |
×1.3 slow, (3.71 s.) (26.93 billion rows/s., 215.43 GB/s.) |
SELECT sum(number+number+number) FROM numbers_mt(100000000000) | 19.83 s. (5.04 billion rows/s., 40.37 GB/s.) |
×12.1 slow, (233.71 s.) (427.87 million rows/s., 3.42 GB/s.) |
SELECT sum(number) / count(number) FROM numbers_mt(100000000000) | 3.90 s. (25.62 billion rows/s., 205.13 GB/s.) |
×2.5 slow, (9.70 s.) (10.31 billion rows/s., 82.52 GB/s.) |
SELECT sum(number) / count(number), max(number), min(number) FROM numbers_mt(100000000000) | 8.28 s. (12.07 billion rows/s., 96.66 GB/s.) |
×4.0 slow, (32.87 s.) (3.04 billion rows/s., 24.34 GB/s.) |
SELECT number FROM numbers_mt(10000000000) ORDER BY number DESC LIMIT 100 | 4.80 s. (2.08 billion rows/s., 16.67 GB/s.) |
×2.9 slow, (13.95 s.) (716.62 million rows/s., 5.73 GB/s.) |
SELECT max(number), sum(number) FROM numbers_mt(1000000000) GROUP BY sipHash64(number % 3), sipHash64(number % 4) | 14.84 s. (67.38 million rows/s., 539.51 MB/s.) |
×1.5 fast, (10.24 s.) (97.65 million rows/s., 781.23 MB/s.) |
Note:
- ClickHouse system.numbers_mt is 16-way parallelism processing, gist
- FuseQuery system.numbers_mt is 16-way parallelism processing, gist
Status
General
- SQL Parser
- Query Planner
- Query Optimizer
- Predicate Push Down
- Limit Push Down
- Projection Push Down
- Type coercion
- Parallel Query Execution
- Distributed Query Execution
- Shuffle Hash GroupBy
- Merge-Sort OrderBy
- Joins (WIP)
SQL Support
- Projection
- Filter (WHERE)
- Limit
- Aggregate Functions
- Scalar Functions
- UDF Functions
- SubQueries
- Sorting
- Joins (WIP)
- Window (TODO)
Getting Started
Roadmap
Datafuse is currently in Alpha and is not ready to be used in production, Roadmap 2021
Contributing
License
Datafuse is licensed under Apache 2.0.