A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

Overview

Datafuse

Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture

Built to make the Data Cloud easy!



Stargazers over time

Principles

  • Fearless

    • No data races, No unsafe, Minimize unhandled errors
  • High Performance

    • Everything is Parallelism
  • High Scalability

    • Everything is Distributed
  • High Reliability

    • Datafuse primary design goal is reliability

Architecture

Datafuse Architecture

Performance

  • Memory SIMD-Vector processing performance only
  • Dataset: 100,000,000,000 (100 Billion)
  • Hardware: AMD Ryzen 7 PRO 4750U, 8 CPU Cores, 16 Threads
  • Rust: rustc 1.55.0-nightly (868c702d0 2021-06-30)
  • Build with Link-time Optimization and Using CPU Specific Instructions
  • ClickHouse server version 21.4.6 revision 54447
Query FuseQuery (v0.4.48-nightly) ClickHouse (v21.4.6)
SELECT avg(number) FROM numbers_mt(100000000000) 4.35 s.
(22.97 billion rows/s., 183.91 GB/s.)
×1.4 slow, (6.04 s.)
(16.57 billion rows/s., 132.52 GB/s.)
SELECT sum(number) FROM numbers_mt(100000000000) 4.20 s.
(23.79 billion rows/s., 190.50 GB/s.)
×1.4 slow, (5.90 s.)
(16.95 billion rows/s., 135.62 GB/s.)
SELECT min(number) FROM numbers_mt(100000000000) 4.92 s.
(20.31 billion rows/s., 162.64 GB/s.)
×2.7 slow, (13.05 s.)
(7.66 billion rows/s., 61.26 GB/s.)
SELECT max(number) FROM numbers_mt(100000000000) 4.77 s.
(20.95 billion rows/s., 167.78 GB/s.)
×3.0 slow, (14.07 s.)
(7.11 billion rows/s., 56.86 GB/s.)
SELECT count(number) FROM numbers_mt(100000000000) 2.91 s.
(34.33 billion rows/s., 274.90 GB/s.)
×1.3 slow, (3.71 s.)
(26.93 billion rows/s., 215.43 GB/s.)
SELECT sum(number+number+number) FROM numbers_mt(100000000000) 19.83 s.
(5.04 billion rows/s., 40.37 GB/s.)
×12.1 slow, (233.71 s.)
(427.87 million rows/s., 3.42 GB/s.)
SELECT sum(number) / count(number) FROM numbers_mt(100000000000) 3.90 s.
(25.62 billion rows/s., 205.13 GB/s.)
×2.5 slow, (9.70 s.)
(10.31 billion rows/s., 82.52 GB/s.)
SELECT sum(number) / count(number), max(number), min(number) FROM numbers_mt(100000000000) 8.28 s.
(12.07 billion rows/s., 96.66 GB/s.)
×4.0 slow, (32.87 s.)
(3.04 billion rows/s., 24.34 GB/s.)
SELECT number FROM numbers_mt(10000000000) ORDER BY number DESC LIMIT 100 4.80 s.
(2.08 billion rows/s., 16.67 GB/s.)
×2.9 slow, (13.95 s.)
(716.62 million rows/s., 5.73 GB/s.)
SELECT max(number), sum(number) FROM numbers_mt(1000000000) GROUP BY number % 3, number % 4, number % 5 6.31 s.
(158.49 million rows/s., 1.27 GB/s.)
×1.02 fast, (6.18 s.)
(161.84 million rows/s., 1.29 GB/s.)

Note:

  • ClickHouse system.numbers_mt is 16-way parallelism processing, gist
  • FuseQuery system.numbers_mt is 16-way parallelism processing, gist

Getting Started

Roadmap

Datafuse is currently in Alpha and is not ready to be used in production, Roadmap 2021

Contributing

License

Datafuse is licensed under Apache 2.0.

Issues
  • Rename trait type names from I$Name to $Name

    Rename trait type names from I$Name to $Name

    I hereby agree to the terms of the CLA available at: https://datafuse.rs/policies/cla/

    Summary

    Rename trait type names from I$Name to $Name

    Changelog

    • Renames :

      • ITable to Table
      • IDatabase to Database
    • Removes IDataSource, use struct DataSource directly

    • And relevant code

    Related Issues

    Fixes #727

    Test Plan

    No extra ut/stateless_test

    opened by dantengsky 127
  • [ci] fix gcov install failed

    [ci] fix gcov install failed

    Signed-off-by: Chojan Shang [email protected]

    I hereby agree to the terms of the CLA available at: https://datafuse.rs/policies/cla/

    Summary

    remove ~/.cargo/bin/ from cache

    Changelog

    • Build/Testing/CI
    • Not for changelog (changelog entry is not required)

    Related Issues

    Fixes #1012

    Test Plan

    No

    pr-build pr-not-for-changelog 
    opened by PsiACE 63
  • Implements Feature 630

    Implements Feature 630

    Summary

    It's a baby step of integrating Store with Query, which implements

    • update metadata after appending data parts to the table
    • remote table read_plan
    • remote table read

    Basic statements like insert into ... and select .. from ... could be executed now. (and lots of interesting things are left to do)

    Changelog

    • Store: implementions for ITable read_plan and read a5c42b2e5d14d042f3c3d928a35c625ca32f4410

    • Query: implements RemoteTalbe's read_plan & read b55eacf912bc7985765b870ecd658d505eb75a56

    • Adds remote flag to ReadDataSourcePlan deaea8ea29b4a6d4afb1390cdd3e0d3540b2597c

    • Tweaks stateless test cases ed69c92fc37c01650f17478f4d6e446f828f74ad

    The following issues might be worthy of your concern:

    • Remove trait bound Sync from SendableDataBlockStream ed69c92fc37c01650f17478f4d6e446f828f74ad

      Turns out, at least for now, we do not need this trait bound, and without Sync constraint, SendableDataBlockStream is more stream-combinator friendly.

    • Keep ITable::read_plan as a non-async method

      By using runtime of ctx (and channel). IMHO, change ITable::read_plan to async fn may be too harsh at this stage.

    • Add an extra flag to ReadDataSourcePlan and SourceTransform

      So that we could be aware of operating a remote table(and fetch remote table accordingly). It is a temp workaround, let's postpone it until the Catalog API is ready. SourceTransform::execute and FuseQueryContext are tweaked accordingly. pls see deaea8ea29b4a6d4afb1390cdd3e0d3540b2597c

    Related Issues

    resolves #630

    Test Plan

    • UT & Stateless Testes

    Progress

    • [x] Update meta
    • [x] Flight Service
    • [x] Store Client
    • [x] Remote table - read_plan
    • [x] Remote table - read (read partition)
    • [x] Unit tests & integration tests
    • [x] Multi-Node integration tests
    • [x] Code GC
    • [x] Squash commits
    pr-improvement 
    opened by dantengsky 63
  • ISSUE-1639:Remove session_api.rs

    ISSUE-1639:Remove session_api.rs

    I hereby agree to the terms of the CLA available at: https://datafuse.rs/policies/cla/

    Summary

    Remove common/store-api/session_api.rs

    Changelog

    • Improvement

    Related Issues

    Fixes #1639

    Test Plan

    Unit Tests

    Stateless Tests

    pr-improvement 
    opened by jyz0309 39
  • Consider renaming project. DataFuse is too similar to DataFusion.

    Consider renaming project. DataFuse is too similar to DataFusion.

    This project appears to have similar goals to Apache Arrow DataFusion, contains code from DataFusion, and has a very similar name.

    The names "DataFuse" and "DataFusion" only differ by a few characters and this could cause confusion about the relationship between these projects.

    On behalf of the Apache Arrow DataFusion community, who have put a lot of work into building the DataFusion software and brand over the past three years, I respectfully ask that you consider renaming this project.

    opened by andygrove 35
  • Refactor Aggregator for improve performance and prepare refactor transform_aggregator_final

    Refactor Aggregator for improve performance and prepare refactor transform_aggregator_final

    I hereby agree to the terms of the CLA available at: https://datafuse.rs/policies/cla/

    Summary

    • [x] Refactoring the aggregator can better adapt to refactor transform_aggregator_final
    • [x] Investigate and fix the causes of performance degradation
      • [x] Inline not work
      • [x] Declare local references
      • [x] While match optimize match while
      • [x] clippy::ptr_arg
    Query:
      SELECT  sum(number) FROM numbers_mt(10000000000)  group by number % 3, number % 4, number % 5;
    
    Before:
      60 rows in set. Elapsed: 12.008 sec. Processed 10.00 billion rows, 80.00 GB (832.77 million rows/s., 6.66 GB/s.)
    
    After:
      60 rows in set. Elapsed: 11.258 sec. Processed 10.00 billion rows, 80.00 GB (888.24 million rows/s., 7.11 GB/s.)
    
    ClickHouse (master):
      60 rows in set. Elapsed: 12.152 sec. Processed 10.00 billion rows, 80.00 GB (822.93 million rows/s., 6.58 GB/s.)
    

    Changelog

    • Performance Improvement
    pr-performance 
    opened by zhang2014 34
  • [compute] use auto vectorized compute for some cases

    [compute] use auto vectorized compute for some cases

    I hereby agree to the terms of the CLA available at: https://datafuse.rs/policies/cla/

    Summary

    • Use auto vectorized compute for sum and rem, this can avoid type casting from other types.
    • get instead get_mut
    • Custom Hash functions

    Changelog

    • Performance Improvement

    Related Issues

    Fixes #issue

    Test Plan

    Unit Tests

    Stateless Tests

    pr-performance 
    opened by sundy-li 32
  • [function] compatible with mysql when aggregate function work on empty data

    [function] compatible with mysql when aggregate function work on empty data

    I hereby agree to the terms of the CLA available at: https://datafuse.rs/policies/cla/

    Summary

    Summary about this PR

    Changelog

    • Bug Fix
    • Improvement

    Related Issues

    Fixes #771

    Test Plan

    Unit Tests

    Stateless Tests

    pr-bugfix pr-improvement 
    opened by zhaox1n 29
  • refactoring(store): store no longer depends on  common-store-api directly

    refactoring(store): store no longer depends on common-store-api directly

    I hereby agree to the terms of the CLA available at: https://datafuse.rs/policies/cla/

    Summary

    pure refactoring, removes dependency common-store-api from fuser-store

    Changelog

    • Not for changelog (changelog entry is not required)

    Related Issues

    N/A

    Test Plan

    No extra tests

    pr-bugfix pr-other pr-performance pr-build pr-feature pr-doc-fix pr-not-for-changelog 
    opened by dantengsky 28
  • [test] Flaky tests

    [test] Flaky tests

    Tests issues

    testing 
    opened by BohuTANG 26
  • WIP: do not read: split metasrv and store

    WIP: do not read: split metasrv and store

    I hereby agree to the terms of the CLA available at: https://databend.rs/policies/cla/

    Summary

    WIP: split metasrv and store

    Changelog

    Related Issues

    databend-store 
    opened by drmingdrmer 1
  • Need help: stateless test fails on local mac

    Need help: stateless test fails on local mac

    Env:

    With the latest master: b2f51d2446eee8cd086dbd1bca7ce89b7517ed25 make stateless-test

    mysql --version
    mysql  Ver 8.0.25 for macos10.15 on x86_64 (Homebrew)
    

    Failures:

    It seems like every test fails. But unit tests all passed. I am not sure if it is caused by some of my local lib/bin, e.g. mysql client, or by some incompatible dependency.

    Starting databend-test
    
    Running 45 stateless tests.
    
    00_0000_dummy_select_1:                                                 [ FAIL ] - return code 1
    , result:
    
    ERROR 2013 (HY000): Lost connection to MySQL server during query
    ...
    

    No logs output to /_logs/.

    The query backtrace is as following( nohup.out.zip ):

    [2021-09-18T07:05:48Z ERROR databend_query::servers::mysql::mysql_session] Unexpected error occurred during query execution: Code: 1002, displayText = peer terminated connection.
    
           0: backtrace::backtrace::libunwind::trace
                     at /Users/drdrxp/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.61/src/backtrace/libunwind.rs:90:5
              backtrace::backtrace::trace_unsynchronized
                     at /Users/drdrxp/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.61/src/backtrace/mod.rs:66:5
           1: backtrace::backtrace::trace
                     at /Users/drdrxp/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.61/src/backtrace/mod.rs:53:14
           2: backtrace::capture::Backtrace::create
                     at /Users/drdrxp/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.61/src/capture.rs:176:9
           3: backtrace::capture::Backtrace::new
                     at /Users/drdrxp/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.61/src/capture.rs:140:22
           4: common_exception::exception::ErrorCode::from_std_error
                     at common/exception/src/exception.rs:408:65
           5: <common_exception::exception::ErrorCode as core::convert::From<std::io::error::Error>>::from
                     at common/exception/src/exception.rs:374:9
           6: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
                     at /rustc/b69fe57261086e70aea9d5b58819a1794bf7c121/library/core/src/result.rs:1915:27
           7: msql_srv::MysqlIntermediary<B,R,W>::init
                     at /Users/drdrxp/.cargo/git/checkouts/msql-srv-62b5153e0120bb3a/60e369b/src/lib.rs:420:54
           8: msql_srv::MysqlIntermediary<B,R,W>::run_on
                     at /Users/drdrxp/.cargo/git/checkouts/msql-srv-62b5153e0120bb3a/60e369b/src/lib.rs:307:9
           9: msql_srv::MysqlIntermediary<B,std::net::tcp::TcpStream,std::net::tcp::TcpStream>::run_on_tcp
                     at /Users/drdrxp/.cargo/git/checkouts/msql-srv-62b5153e0120bb3a/60e369b/src/lib.rs:274:9
          10: databend_query::servers::mysql::mysql_session::MySQLConnection::session_executor
                     at query/src/servers/mysql/mysql_session.rs:42:29
          11: databend_query::servers::mysql::mysql_session::MySQLConnection::run_on_stream::{{closure}}
                     at query/src/servers/mysql/mysql_session.rs:34:13
    ...
    
    bug 
    opened by drmingdrmer 1
  • Cleanup redundant mod path: `metasrv/meta_service/**`

    Cleanup redundant mod path: `metasrv/meta_service/**`

    After extracting metasrv as a standalone crate, the inner mod meta_service becomes unnecessary.

    improvement metasrv 
    opened by drmingdrmer 0
  • refacotrying

    refacotrying

    I hereby agree to the terms of the CLA available at: https://databend.rs/policies/cla/

    Summary

    fix:

    • issue #1869

    Changelog

    • Improvement

    Related Issues

    Fixes #1869

    Test Plan

    Unit Tests

    Stateless Tests

    pr-improvement 
    opened by dantengsky 2
  • [improvement] store client factory

    [improvement] store client factory

    Part of #1855

    Add a factory for create LocalKvApi(if config.meta.meta_address is empty, this is only for test) or RemoteKvAPI

    feature prio: high 
    opened by drmingdrmer 0
  • [improvement] change the query config group default values to StructOpt default

    [improvement] change the query config group default values to StructOpt default

    Summary

    In query/config, every config group has his default values, we can use structop default to do that, for example:

    impl MetaConfig {
        pub fn default() -> Self {
            MetaConfig {
                meta_address: "".to_string(),
                meta_username: "root".to_string(),
                meta_password: "".to_string(),
                rpc_tls_meta_server_root_ca_cert: "".to_string(),
                rpc_tls_meta_service_domain_name: "localhost".to_string(),
            }
        }
    }
    

    ->

    impl MetaConfig {
        pub fn default() -> Self {
           <Self as StructOpt>::from_iter(&Vec::<&'static str>::new())
        }
    }
    
    improvement good first issue 
    opened by BohuTANG 0
  • New testing style  in Rust crate

    New testing style in Rust crate

    Summary

    Description for this feature.

    Databend used Golang testing style currently, but there is a more rustacean testing style of cargo, refer to https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html

    feature discuss 
    opened by sundy-li 4
  • [improvement] add tenant id to` UserMgrApi`

    [improvement] add tenant id to` UserMgrApi`

    Summary

    Like namespace, UserMgrApi also need the tenant as the prefix, if we want get a user, we must identify its tenant.

    prio: high improvement 
    opened by BohuTANG 1
  • [Cross Compile] cannot cross compile on arm/v6 and arm/v7 architecture

    [Cross Compile] cannot cross compile on arm/v6 and arm/v7 architecture

    Summary when compile on arm/v6 and arm/v7 architecture the following error log occured, compiler succeed on arm64 for cross compile

    error[E0277]: the trait bound `u64: ToUsize` is not satisfied
       --> /cargo/git/checkouts/msql-srv-62b5153e0120bb3a/bb7ba7b/src/commands.rs:39:44
        |
    39  |                 nom::bytes::complete::take(size)(i)?
        |                                            ^^^^ the trait `ToUsize` is not implemented for `u64`
        |
    note: required by a bound in `nom::bytes::complete::take`
       --> /cargo/registry/src/github.com-1ecc6299db9ec823/nom-7.0.0/src/bytes/complete.rs:406:6
        |
    406 |   C: ToUsize,
        |      ^^^^^^^ required by this bound in `nom::bytes::complete::take`
    
    

    How to replicate

    cross --version
    cross 0.1.16
    info: syncing channel updates for 'nightly-2021-09-11-x86_64-unknown-linux-gnu'
    
      nightly-2021-09-11-x86_64-unknown-linux-gnu unchanged - rustc 1.57.0-nightly (b69fe5726 2021-09-10)
    
    cargo 1.56.0-nightly (18751dd3f 2021-09-01)
    
    make cross-compile-debug 
    
    bug 
    opened by ZhiHanZ 0
  • [improvement] consider move out management/namespace/local_kv_store.rs

    [improvement] consider move out management/namespace/local_kv_store.rs

    Summary

    For now, local_kv_store is used by namespace and in management/namespace, user mod also need it:

    impl<T> UserMgr<T>
    where T: KVApi
    {
        #[allow(dead_code)]
        pub fn new(kv_api: T) -> Self {
            UserMgr { kv_api }
        }
    }
    
    1. Consider move it out from the namespace directory to store-api/src/impls/local_kv_store.rs? cc @drmingdrmer
    2. Refactor the utils.rs to:
      • Add async fn auth_user(&mut self, password: impl AsRef<[u8]>) -> bool to UserMgrApi trait (@BohuTANG )
      • Remove the NewUser (@BohuTANG )
      • Remove the utils.rs
    improvement 
    opened by BohuTANG 10
Releases(v0.4.111-nightly)
Owner
Datafuse Labs
The open-source Lakehouse runtime that powers the Modern Data Cloud
Datafuse Labs
🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.

SQLx ?? The Rust SQL Toolkit Install | Usage | Docs Built with ❤️ by The LaunchBadge team SQLx is an async, pure Rust† SQL crate featuring compile-tim

launchbadge 4.6k Sep 12, 2021
RedisLess is a fast, lightweight, embedded and scalable in-memory Key/Value store library compatible with the Redis API.

RedisLess is a fast, lightweight, embedded and scalable in-memory Key/Value store library compatible with the Redis API.

Qovery 132 Aug 31, 2021
Materialize simplifies application development with streaming data. Incrementally-updated materialized views - in PostgreSQL and in real time. Materialize is powered by Timely Dataflow.

Materialize is a streaming database for real-time applications. Get started Check out our getting started guide. About Materialize lets you ask questi

Materialize, Inc. 3.1k Sep 17, 2021
Native PostgreSQL driver for the Rust programming language

Rust-Postgres PostgreSQL support for Rust. postgres Documentation A native, synchronous PostgreSQL client. tokio-postgres Documentation A native, asyn

Steven Fackler 2.3k Sep 17, 2021
Skybase is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and SSL

Skybase The next-generation NoSQL database What is Skybase? Skybase (or SkybaseDB/SDB) is an effort to provide the best of key/value stores, document

Skybase 411 Sep 6, 2021
Skytable is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and TLS

Skytable is an effort to provide the best of key/value stores, document stores and columnar databases, that is, simplicity, flexibility and queryability at scale. The name 'Skytable' exemplifies our vision to create a database that has limitless possibilities. Skytable was previously known as TerrabaseDB (and then Skybase) and is also nicknamed "STable", "Sky" and "SDB" by the community.

Skytable 417 Sep 18, 2021
Immutable Ordered Key-Value Database Engine

PumpkinDB Build status (Linux) Build status (Windows) Project status Usable, between alpha and beta Production-readiness Depends on your risk toleranc

null 1.3k Sep 14, 2021
https://crates.io/crates/transistor

Transistor A Rust Crux Client crate/lib. For now, this crate intends to support 2 ways to interact with Crux: Via Docker with a crux-standalone versio

Julia Naomi 27 Aug 31, 2021
Distributed transactional key-value database, originally created to complement TiDB

Website | Documentation | Community Chat TiKV is an open-source, distributed, and transactional key-value database. Unlike other traditional NoSQL sys

TiKV Project 9.9k Sep 16, 2021
RefineDB - A strongly-typed document database that runs on any transactional key-value store.

RefineDB - A strongly-typed document database that runs on any transactional key-value store.

Heyang Zhou 328 Sep 15, 2021
SQLite clone from scratch in Rust

Rust-SQLite (SQLRite) Rust-SQLite, aka SQLRite , is a simple embedded database modeled off SQLite, but developed with Rust. The goal is get a better u

João Henrique Machado Silva 722 Sep 17, 2021
A minecraft-like multi version client implemented in Rust.

Leafish Multi-version Minecraft-compatible client written in Rust, forked from Stevenarella. Chat Chat takes place on Matrix and Discord. The channels

null 336 Sep 14, 2021
A Rust client for the ElasticSearch REST API

rs-es Introduction An ElasticSearch client for Rust via the REST API. Targetting ElasticSearch 2.0 and higher. Other clients For later versions of Ela

Ben Ashford 214 Aug 25, 2021
Engula empowers engineers to build reliable and cost-effective databases.

Engula is a storage engine that empowers engineers to build reliable and cost-effective databases with less effort and more confidence. Engula is in t

Engula 208 Sep 12, 2021
A user crud written in Rust, designed to connect to a MySQL database with full integration test coverage.

SQLX User CRUD Purpose This application demonstrates the how to implement a common design for CRUDs in, potentially, a system of microservices. The de

null 38 Sep 10, 2021
WooriDB

WooriDB USER GUIDE WooriDB is a general purpose (EXPERIMENTAL) time serial database, which means it contains all entities registries indexed by DateTi

Julia Naomi 85 Aug 11, 2021
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Authors: Sanjay Ghem

Google 26.4k Sep 11, 2021
Ergonomic bindings to SQLite for Rust

Rusqlite Rusqlite is an ergonomic wrapper for using SQLite from Rust. It attempts to expose an interface similar to rust-postgres. use rusqlite::{para

Rusqlite 1.2k Sep 14, 2021
TDS 7.2+ (mssql / Microsoft SQL Server) async driver for rust

Tiberius A native Microsoft SQL Server (TDS) client for Rust. Supported SQL Server versions Version Support level Notes 2019 Tested on CI 2017 Tested

Prisma 115 Sep 8, 2021