A new arguably faster implementation of Apache Spark from scratch in Rust

Related tags

Data processing vega
Overview

vega

Previously known as native_spark.

Join the chat at https://gitter.im/fast_spark/community Build Status License

Documentation

A new, arguably faster, implementation of Apache Spark from scratch in Rust. WIP

Framework tested only on Linux, requires nightly Rust. Read how to get started in the documentation.

Contributing

If you are interested in contributing please jump on Gitter and check out and head to the issue tracker where you can find open issues (anything labelled good first issue and help wanted is up for grabs).

Comments
  • Fixes and improvements for distributed mode

    Fixes and improvements for distributed mode

    • [x] The configuration is now handled via env vars and config files entirely, no more passing through arguments. This frees the CLI args for user provided args in case they want to use clap or any parsers and does not collide with cargo itself.
    • [x] The config if propagated to the workers properly through a config file and loaded on the fly. Bonus: the same can be done in the master so persistence of configuration should be easy to implement (probably can be merged witht he hosts.conf file).
    • [x] Fixed/improved distributed mode where asynchronous reading/writing from the socket traffic from/to driver/executor was not being handled properly (now is done thought the correct read_exact method and pointing the right msg size at both ends so communication between both ends is robust).
    • [x] Fixing the disabled unit test in the executor (should be rather easy now with the other fix in place).
    • [x] Adding the missing featured to graceful shutdown workers on failure or job ending (and finding out why is not shutting down properly).

    There are a couple changes still I want to make:

    • [ ] There still is a bug with shuffle manager async version where shuffle jobs are not working properly
    opened by iduartgomez 12
  • Deadlock while partitioning

    Deadlock while partitioning

    As talked in Gitter, while developing union I found out a problem where the application enters a deadlock while resolving the partitioning or computation of a dag. The workign branch is: https://github.com/iduartgomez/native_spark/tree/dev

    The error is reproducible executing:

    #[test]
    fn test_error() {
        let sc = CONTEXT.clone();
        let join = || {
            let col1 = vec![
                (1, ("A".to_string(), "B".to_string())),
                (2, ("C".to_string(), "D".to_string())),
                (3, ("E".to_string(), "F".to_string())),
                (4, ("G".to_string(), "H".to_string())),
            ];
            let col1 = sc.parallelize(col1, 4);
            let col2 = vec![
                (1, "A1".to_string()),
                (1, "A2".to_string()),
                (2, "B1".to_string()),
                (2, "B2".to_string()),
                (3, "C1".to_string()),
                (3, "C2".to_string()),
            ];
            let col2 = sc.parallelize(col2, 4);
            col2.join(col1.clone(), 4)
        };
        let join1 = join();
        let join2 = join();
        let res = join1.union(join2).unwrap().collect().unwrap();
        assert_eq!(res.len(), 12);
    }
    

    Inside some executor there is a thread panic over here:

    let mut stream_r = std::io::BufReader::new(&mut stream);
    let message_reader = serialize_packed::read_message(&mut stream_r, r).unwrap()
    
    bug 
    opened by iduartgomez 7
  • Handling resources destruction when program exits

    Handling resources destruction when program exits

    Ctrl-C handling, proper destruction of resources in case of panic and remove explicit drop executor logic. Instead of cloning Context like currently, create a single context and wrap it inside ref count and move resource destruction logic like deleting all temp files and closing all spinned processes inside Drop trait.

    enhancement 
    opened by rajasekarv 7
  • Improve application configuration execution/deployment

    Improve application configuration execution/deployment

    Right now the way we are doing configuration is a bit lacklustre in the following way: we are using clap to parse many of the configuration parameters, passing them by command line argument, this creates a problem where in an user created application it will collide with their own command line arguments.

    Similarly, this already collides with cargo own optional parameters, for example something like this will fail: cargo test -- --test-threads=1.

    We must provide a more elegant and ergonomic way to pass configuration parameters which may not collide with user (or generated, e.g. cargo) code. A first approach is to add/revamp the configuration file we are already using (hosts.conf) to include more configuration parameters, which we would eventually have to do anyway. Additionally, centralize all the environment variables configuration managment (under env.rs) on initialization and document that, so the user can use those to set up any required parameters.

    Also for local execution and testing, many of the defaults could be provided (e.g. NS_LOCAL_IP) so they don't require to be provided either by env variable or argument parameter (e.g. Spark itself assigns a free local ip if necessary when executing in local mode).

    enhancement help wanted good first issue 
    opened by iduartgomez 6
  • could not compile `packed_simd`

    could not compile `packed_simd`

    error[E0432]: unresolved import crate::arch::x86_64::_mm_movemask_pi8 --> /root/.cargo/registry/src/github.com-1ecc6299db9ec823/packed_simd-0.3.3/src/codegen/reductions/mask/x86/sse.rs:47:21

    https://asciinema.org/a/0EOIzjxOI9UcyQLohN9Lwgvb5

    opened by uk0 5
  • Koalas-like implementation

    Koalas-like implementation

    Hi I read in your paper (https://medium.com/@rajasekar3eg/fastspark-a-new-fast-native-implementation-of-spark-from-scratch-368373a29a5c) that you wanted to be inspired by panda for implementing dataframes (and api). You could consider basing your implementation on Koalas (https://github.com/databricks/koalas) regards

    opened by xmehaut 5
  • '_mm_movemask_epi8' error

    '_mm_movemask_epi8' error

    error[E0432]: unresolved import crate::arch::x86_64::_mm_movemask_pi8 --> /root/.cargo/registry/src/mirrors.tuna.tsinghua.edu.cn-df7c3c540f42cdbd/packed_simd-0.3.3/src/codegen/reductions/mask/x86/sse.rs:47:21 | 47 | use crate::arch::x86_64::_mm_movemask_pi8; | ^^^^^^^^^^^^^^^^^^^^^---------------- | | | | | help: a similar name exists in the module: _mm_movemask_epi8 | no _mm_movemask_pi8 in arch::x86_64 | ::: /root/.cargo/registry/src/mirrors.tuna.tsinghua.edu.cn-df7c3c540f42cdbd/packed_simd-0.3.3/src/codegen/reductions/mask.rs:41:1 | 41 | impl_mask_reductions!(m8x8); | ---------------------------- in this macro invocation | = note: this error originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)

    opened by liergou99 4
  • Add union_rdd

    Add union_rdd

    Preparing the PR to add union_rdd.

    Still need to make a fix for the incorrect dependency graph and a couple more polishing changes when that's fixed (not really happy how I am exposing the two variants publically) but the fixes are required first before proceeding with polishing.

    opened by iduartgomez 4
  • Refactor local and distributed scheduler

    Refactor local and distributed scheduler

    Refactored local and distributed scheduler to extract most common functionality and made it everything more maintanable; removed duplicity and code surface.

    With this change it should be easier to progress and complete work on the DAGScheduler and Scheduler traits. I didn't change any of those as I am not sure what do you want to expose in the public API yet, but in any case should be fairly easy to change things around.

    The refactor was made using a new non-leaking trait (NativeScheduler) which could be temporal until everything is moved to any of the other two traits.

    Tests pass both in local and distributed mode.

    opened by iduartgomez 3
  • Sort by

    Sort by

    Implement sort by transform by a very simple range_partitioner.

    This algorithm is almost the same with Apache Spark: partition all the data into ordered partitions and sort them separately.

    There're still some work to be done:

    1. Find a better algorithm for building range_bounds.
    2. implement descending.
    3. use binary search in method get_partitions().
    4. perhaps F: SerFunc(&Self::Item) -> K + Clone is better than F: SerFunc(Self::Item) -> K.
    5. test corner case.
    opened by return02 2
  • build failed!!

    build failed!!

    errorddeMacBook-Pro:native_spark d$ cargo build Compiling native_spark v0.1.0 (/Users/d/Work/opensource/native_spark) Compiling bincode v1.2.0 Compiling serde_closure v0.2.7 Compiling rustc_version v0.2.3 error: failed to run custom build command for native_spark v0.1.0 (/Users/d/Work/opensource/native_spark)

    Caused by: process didn't exit successfully: /Users/d/Work/opensource/native_spark/target/debug/build/native_spark-3382f7e3c05897a6/build-script-build (exit code: 101) --- stderr thread 'main' panicked at 'capnpc compiling issue: Error { kind: Failed, description: "Error while trying to execute capnp compile: Failed: No such file or directory (os error 2). Please verify that version 0.5.2 or higher of the capnp executable is installed on your system. See https://capnproto.org/install.html" }', src/libcore/result.rs:1165:5 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace.

    warning: build failed, waiting for other jobs to finish... error: build failed

    opened by errord 2
  • Core dump as the number of rdds grows

    Core dump as the number of rdds grows

    Recently, I found that these pieces of code snippets cause core dump.

    let mut tc = sc.parallelize(vec![1], 1);
    let tc0 = tc.clone();
    let mut next_count = tc.count().unwrap();
    while idx < 100 {
        tc = tc.union(tc0.clone().into());
        next_count = tc.count().unwrap();
    }
    
    let mut tc = sc.parallelize(vec![1], 1);
    let mut next_count = tc.count().unwrap();
    while idx < 100 {
        tc = tc.distinct();
        next_count = tc.count().unwrap();
    }
    
    let mut tc = sc.parallelize(vec![1], 1);
    let mut next_count = tc.count().unwrap();
    while idx < 100 {
        tc = tc.group_by_key(1)
            .map(Fn!(|i: (u32, Vec<u32>)| (i.0, i.1[0])));
        next_count = tc.count().unwrap();
    }
    

    I used bt in gdb, and get the following information about the error

    #11919 0x0000557a28dfdf69 in <erased_serde::de::erase::Visitor<T> as erased_serde::de::Visitor>::erased_visit_enum ()
    #11920 0x0000557a28e19c82 in <erased_serde::de::erase::Deserializer<T> as erased_serde::de::Deserializer>::erased_deserialize_enum ()
    #11921 0x0000557a28e19e9f in <erased_serde::de::erase::Deserializer<T> as erased_serde::de::Deserializer>::erased_deserialize_enum ()
    #11922 0x0000557a28e15dc0 in <erased_serde::de::erase::Visitor<T> as erased_serde::de::Visitor>::erased_visit_newtype_struct ()
    #11923 0x0000557a28e16539 in <erased_serde::de::erase::Visitor<T> as erased_serde::de::Visitor>::erased_visit_newtype_struct ()
    #11924 0x0000557a28e1c43b in <erased_serde::de::erase::Deserializer<T> as erased_serde::de::Deserializer>::erased_deserialize_newtype_struct ()
    #11925 0x0000557a28e1c2cc in <erased_serde::de::erase::Deserializer<T> as erased_serde::de::Deserializer>::erased_deserialize_newtype_struct ()
    #11926 0x0000557a28e4e8f1 in <T as serde_traitobject::deserialize::Sealed>::deserialize_erased ()
    #11927 0x0000557a28dc00a7 in <erased_serde::de::erase::DeserializeSeed<T> as erased_serde::de::DeserializeSeed>::erased_deserialize_seed ()
    #11928 0x0000557a28e1670b in <erased_serde::de::erase::SeqAccess<T> as erased_serde::de::SeqAccess>::erased_next_element ()
    #11929 0x0000557a28de32b5 in <erased_serde::de::erase::Visitor<T> as erased_serde::de::Visitor>::erased_visit_seq ()
    #11930 0x0000557a28e1ab53 in <erased_serde::de::erase::Deserializer<T> as erased_serde::de::Deserializer>::erased_deserialize_tuple ()
    #11931 0x0000557a28e3a7d0 in serde_traitobject::deserialize ()
    #11932 0x0000557a28dc45b6 in <erased_serde::de::erase::DeserializeSeed<T> as erased_serde::de::DeserializeSeed>::erased_deserialize_seed ()
    #11933 0x0000557a28e1670b in <erased_serde::de::erase::SeqAccess<T> as erased_serde::de::SeqAccess>::erased_next_element ()
    #11934 0x0000557a28de85d6 in <erased_serde::de::erase::Visitor<T> as erased_serde::de::Visitor>::erased_visit_seq ()
    #11935 0x0000557a28e1b1f7 in <erased_serde::de::erase::Deserializer<T> as erased_serde::de::Deserializer>::erased_deserialize_struct ()
    #11936 0x0000557a28ea6877 in <T as serde_traitobject::deserialize::Sealed>::deserialize_erased ()
    #11937 0x0000557a28eab37f in <&mut bincode::de::Deserializer<R,O> as serde::de::Deserializer>::deserialize_tuple ()
    #11938 0x0000557a28e44407 in vega::scheduler::task::_::<impl serde::de::Deserialize for vega::scheduler::task::TaskOption>::deserialize ()
    #11939 0x0000557a28e2cd1a in bincode::internal::deserialize ()
    #11940 0x0000557a28e4094b in vega::scheduler::local_scheduler::LocalScheduler::run_task ()
    #11941 0x0000557a28e2c517 in tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut ()
    #11942 0x0000557a28e268e5 in tokio::runtime::task::core::Core<T,S>::poll ()
    #11943 0x0000557a28e7d0b6 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
    #11944 0x0000557a28e7b09d in tokio::runtime::task::harness::Harness<T,S>::poll ()
    #11945 0x0000557a29165267 in tokio::runtime::blocking::pool::Inner::run ()
    #11946 0x0000557a2916e1e6 in tokio::runtime::context::enter ()
    #11947 0x0000557a2917494e in std::sys_common::backtrace::__rust_begin_short_backtrace ()
    #11948 0x0000557a2916f031 in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
    #11949 0x0000557a291b4e9a in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once ()
        at /rustc/ffa2e7ae8fbf9badc035740db949b9dae271c29f/library/alloc/src/boxed.rs:1042
    #11950 <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/ffa2e7ae8fbf9badc035740db949b9dae271c29f/library/alloc/src/boxed.rs:1042
    #11951 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:89
    #11952 0x00007f783216b6db in start_thread (arg=0x7f7813386700) at pthread_create.c:463
    #11953 0x00007f78318f2a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
    

    However, this snippet runs successfully.

    let mut tc = sc.parallelize(vec![1], 1);
    let mut next_count = tc.count().unwrap();
    while idx < 100 {
        tc = tc.map(Fn!(|i: (u32, u32)| (i.0, i.1)));
        next_count = tc.count().unwrap();
    }
    

    It seems that the problem is related to serde_traitobject. But I don't know why.

    opened by AmbitionXiang 0
  • questions about wordcount example

    questions about wordcount example

    I write a WordCount example with your framework as follows. It only processes a 17-lines text but takes 240s to finish on my computer. Why does it run so slow?

    use chrono::prelude::*;
    use vega::io::*;
    use vega::*;
    use std::fs::File;
    
    fn main() -> Result<()> {
        let context = Context::new()?;
    
        let num_splits = 4;
        let deserializer = Fn!(|file: Vec<u8>| {
            String::from_utf8(file)
            .unwrap()
            .lines()
            .map(|s| s.to_string())
            .collect::<Vec<_>>()
        });
        let lines = context
                    .read_source(LocalFsReaderConfig::new("./README.md"), deserializer)
                    .flat_map(Fn!(|lines: Vec<String>| {
                        Box::new(lines.into_iter()) as Box<dyn Iterator<Item = _>>
                    }));
        
        let words = lines.flat_map(Fn!(|line: String| {
            Box::new(line.split(' ').map(|s| (s.to_string(), 1)).collect::<Vec<_>>().into_iter()) as Box<dyn Iterator<Item = _>>
        }));
    
        let result = words.reduce_by_key(Fn!(|(a, b)| a + b), num_splits);
    
        let output = result.collect().unwrap();
    
        println!("result: {:?}", output);
    
        Ok(())
    }
    
    opened by Bran-Sun 1
  • Support cache() & persist() of RDD

    Support cache() & persist() of RDD

    I see there are cache.rs and cache_tracker.rs. But some functions, e.g., get_or_compute, are not used in the framework. Since persistence is also a core functionality provided by Spark, it's necessary to implement it.

    opened by AmbitionXiang 0
  • error in the distributed mode of Vega : executor not initialized

    error in the distributed mode of Vega : executor not initialized

    thread 'tokio-runtime-worker' panicked at 'executor @15000 not initialized', /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/macros.rs:16:9 stack backtrace: 0: 0x5577bacef035 - backtrace::backtrace::libunwind::trace::h508d2c55eb856ef2 at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86 1: 0x5577bacef035 - backtrace::backtrace::trace_unsynchronized::h31c9d8a0097b92f8 at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66 2: 0x5577bacef035 - std::sys_common::backtrace::_print_fmt::h520f528c4c103a3f at src/libstd/sys_common/backtrace.rs:78 3: 0x5577bacef035 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hf4dec5da8360bccb at src/libstd/sys_common/backtrace.rs:59 4: 0x5577bad1b31c - core::fmt::write::hdf236390fbd68d3d at src/libcore/fmt/mod.rs:1076 5: 0x5577bace77f3 - std::io::Write::write_fmt::hd33447f61d92a88a at src/libstd/io/mod.rs:1537 6: 0x5577bacf1d40 - std::sys_common::backtrace::_print::h856c809943c588ab at src/libstd/sys_common/backtrace.rs:62 7: 0x5577bacf1d40 - std::sys_common::backtrace::print::hc042b4237cc3648a at src/libstd/sys_common/backtrace.rs:49 8: 0x5577bacf1d40 - std::panicking::default_hook::{{closure}}::h35997132c163f935 at src/libstd/panicking.rs:198 9: 0x5577bacf1a8c - std::panicking::default_hook::h8555bf398b0f1318 at src/libstd/panicking.rs:218 10: 0x5577bacf2377 - std::panicking::rust_panic_with_hook::h9b80e82887819c44 at src/libstd/panicking.rs:486 11: 0x5577bacf1f7b - rust_begin_unwind at src/libstd/panicking.rs:388 12: 0x5577bacf1eeb - std::panicking::begin_panic_fmt::ha3a705fe707278db at src/libstd/panicking.rs:342 13: 0x5577ba25b625 - <vega::scheduler::distributed_scheduler::DistributedScheduler as vega::scheduler::base_scheduler::NativeScheduler>::submit_task::{{closure}}::hf126ad5c35648b1b at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/macros.rs:16 14: 0x5577ba227119 - <core::future::from_generator::GenFuture as core::future::future::Future>::poll::h6ec335f1fcc81da4 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libcore/future/mod.rs:73 15: 0x5577ba333639 - tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::h78aed4ccd1a6d03f at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/task/core.rs:173 16: 0x5577ba2fcdf7 - tokio::loom::std::unsafe_cell::UnsafeCell::with_mut::h3593b9cfa5044276 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/loom/std/unsafe_cell.rs:14 17: 0x5577ba333483 - tokio::runtime::task::core::Core<T,S>::poll::hb10803f461ff2df1 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/task/core.rs:158 18: 0x5577ba258dab - tokio::runtime::task::harness::Harness<T,S>::poll::{{closure}}::hbcfbeb30ed9245a2 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/task/harness.rs:107 19: 0x5577ba2aac80 - core::ops::function::FnOnce::call_once::h73693a3fa1033ef3 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libcore/ops/function.rs:232 20: 0x5577ba239bfb - <std::panic::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once::h3f558966588bd2e8 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/panic.rs:318 21: 0x5577ba228eaa - std::panicking::try::do_call::had1ce840843fdd5e at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/panicking.rs:297 22: 0x5577ba22943d - __rust_try 23: 0x5577ba228845 - std::panicking::try::h3af79c0afa54065e at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/panicking.rs:274 24: 0x5577ba239cca - std::panic::catch_unwind::h61eaa281eeede289 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/panic.rs:394 25: 0x5577ba258263 - tokio::runtime::task::harness::Harness<T,S>::poll::h82302ef8aaffdc48 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/task/harness.rs:89 26: 0x5577ba2f3f10 - tokio::runtime::task::raw::poll::hae75b29e62d6bf24 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/task/raw.rs:104 27: 0x5577babdbddf - tokio::runtime::task::raw::RawTask::poll::h6749f790f11914f5 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/task/raw.rs:66 28: 0x5577bac1f2d1 - tokio::runtime::task::Notified::run::hb335b7e3feb41ffd at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/task/mod.rs:169 29: 0x5577babc4312 - tokio::runtime::thread_pool::worker::Context::run_task::{{closure}}::hb1c45cf00218c4a8 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/thread_pool/worker.rs:349 30: 0x5577babe4306 - tokio::coop::with_budget::{{closure}}::h344cf887f743b125 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/coop.rs:127 31: 0x5577babbc905 - std::thread::local::LocalKey::try_with::hb4e0647c5427c023 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/thread/local.rs:263 32: 0x5577babba64e - std::thread::local::LocalKey::with::h3cbe7749f39e09be at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/thread/local.rs:239 33: 0x5577babc4149 - tokio::coop::with_budget::h1fe3b166fff60a8c at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/coop.rs:120 34: 0x5577babc4149 - tokio::coop::budget::h5652104cdec93792 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/coop.rs:96 35: 0x5577babc4149 - tokio::runtime::thread_pool::worker::Context::run_task::h7fb4a7e7ae9f9872 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/thread_pool/worker.rs:348 36: 0x5577babc3caa - tokio::runtime::thread_pool::worker::Context::run::hf0265362430c720c at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/thread_pool/worker.rs:327 37: 0x5577babc3953 - tokio::runtime::thread_pool::worker::run::{{closure}}::hfd6cde609be5c2bd at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/thread_pool/worker.rs:305 38: 0x5577babe31ea - tokio::macros::scoped_tls::ScopedKey::set::h9f8531be72da2945 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/macros/scoped_tls.rs:63 39: 0x5577babc3836 - tokio::runtime::thread_pool::worker::run::hc527e313d268fd78 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/thread_pool/worker.rs:302 40: 0x5577babc36eb - tokio::runtime::thread_pool::worker::Launch::launch::{{closure}}::h86157bd7d8180940 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/thread_pool/worker.rs:281 41: 0x5577babe58f0 - <tokio::runtime::blocking::task::BlockingTask as core::future::future::Future>::poll::h67fbd38c2e1a9351 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/blocking/task.rs:41 42: 0x5577bac308de - tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::hdd8902314b689e4a at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/task/core.rs:173 43: 0x5577babb727b - tokio::loom::std::unsafe_cell::UnsafeCell::with_mut::h2ceb6eff3f2f1bc0 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/loom/std/unsafe_cell.rs:14 44: 0x5577bac3071e - tokio::runtime::task::core::Core<T,S>::poll::hade3b71839b36aff at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/task/core.rs:158 45: 0x5577bac36f14 - tokio::runtime::task::harness::Harness<T,S>::poll::{{closure}}::h1588ee4c72a0a8da at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/task/harness.rs:107 46: 0x5577bac11a50 - core::ops::function::FnOnce::call_once::h3c73c9db1103969e at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libcore/ops/function.rs:232 47: 0x5577bac3c46c - <std::panic::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once::hde0706673733aec5 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/panic.rs:318 48: 0x5577babbd7dc - std::panicking::try::do_call::h355a5f001591d6ef at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/panicking.rs:297 49: 0x5577babc7f3d - __rust_try 50: 0x5577babbd692 - std::panicking::try::h796f493406880011 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/panicking.rs:274 51: 0x5577bac3ccab - std::panic::catch_unwind::h9c809335ae627240 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/panic.rs:394 52: 0x5577bac36a60 - tokio::runtime::task::harness::Harness<T,S>::poll::hf0e8b958f9e7e54a at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/task/harness.rs:89 53: 0x5577babdbf32 - tokio::runtime::task::raw::poll::h5447fbc06bff0fae at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/task/raw.rs:104 54: 0x5577babdbddf - tokio::runtime::task::raw::RawTask::poll::h6749f790f11914f5 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/task/raw.rs:66 55: 0x5577bac1f261 - tokio::runtime::task::Notified::run::h80475a8c73406fa0 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/task/mod.rs:169 56: 0x5577bac397e4 - tokio::runtime::blocking::pool::Inner::run::h38d2163f810734d6 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/blocking/pool.rs:250 57: 0x5577bac3954e - tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}::{{closure}}::h7f3f28f10c9fa340 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/blocking/pool.rs:230 58: 0x5577bac3293b - tokio::runtime::context::enter::h078aa1060d2bea1c at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/context.rs:72 59: 0x5577bac0f712 - tokio::runtime::handle::Handle::enter::hd63e017e1370edee at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/handle.rs:76 60: 0x5577bac395e2 - tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}::h170dea7809362524 at /root/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/tokio-0.2.23/src/runtime/blocking/pool.rs:229 61: 0x5577babe6990 - std::sys_common::backtrace::__rust_begin_short_backtrace::h1b6282dd728f6658 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/sys_common/backtrace.rs:130 62: 0x5577bac32191 - std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}::h5b672195a48076b0 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/thread/mod.rs:475 63: 0x5577bac3c443 - <std::panic::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once::h806f67c01629286d at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/panic.rs:318 64: 0x5577babbd8b4 - std::panicking::try::do_call::h919a00c733aaa7f8 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/panicking.rs:297 65: 0x5577babc7f3d - __rust_try 66: 0x5577babbd4d4 - std::panicking::try::h2e8874678db6a632 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/panicking.rs:274 67: 0x5577bac3ccf3 - std::panic::catch_unwind::hfc6aba7db14e3b76 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/panic.rs:394 68: 0x5577bac31f9a - std::thread::Builder::spawn_unchecked::{{closure}}::hd546233487009d23 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libstd/thread/mod.rs:474 69: 0x5577bac1195f - core::ops::function::FnOnce::call_once{{vtable.shim}}::hc6aec8618c0d2ff9 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/libcore/ops/function.rs:232 70: 0x5577bacfa47a - <alloc::boxed::Box as core::ops::function::FnOnce>::call_once::h82579ad6d01265b5 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/liballoc/boxed.rs:1076 71: 0x5577bacfa47a - <alloc::boxed::Box as core::ops::function::FnOnce>::call_once::ha64b44ac705bd524 at /rustc/74e80468347471779be6060d8d7d6d04e98e467f/src/liballoc/boxed.rs:1076 72: 0x5577bacfa47a - std::sys::unix::thread::Thread::new::thread_start::hbaf2c67a26ced5a5 at src/libstd/sys/unix/thread.rs:87 73: 0x7fb25ccd4ea5 - start_thread 74: 0x7fb25c7e78dd - __clone 75: 0x0 -

    opened by liergou99 0
  • Maybe use DataFusion and Apache Arrow as building blocks ?

    Maybe use DataFusion and Apache Arrow as building blocks ?

    There is a competing project called https://github.com/ballista-compute/ballista It is using DataFusion, I don't quite get it why Ballista examples include weird syntax for querying. I understand that distributed SQL execution is more complex then just combining results from individual executors, but I think having single-node SQL engine would be of a great help. What do you think ?

    opened by constantOut 1
Owner
raja sekar
Dabbling in deep learning and distributed systems in free time
raja sekar
Official Rust implementation of Apache Arrow

Native Rust implementation of Apache Arrow Welcome to the implementation of Arrow, the popular in-memory columnar format, in Rust. This part of the Ar

The Apache Software Foundation 1.3k Jan 9, 2023
Apache TinkerPop from Rust via Rucaja (JNI)

Apache TinkerPop from Rust An example showing how to call Apache TinkerPop from Rust via Rucaja (JNI). This repository contains two directories: java

null 8 Sep 27, 2022
Fill Apache Arrow record batches from an ODBC data source in Rust.

arrow-odbc Fill Apache Arrow arrays from ODBC data sources. This crate is build on top of the arrow and odbc-api crate and enables you to read the dat

Markus Klein 21 Dec 27, 2022
Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model.

Polars Python Documentation | Rust Documentation | User Guide | Discord | StackOverflow Blazingly fast DataFrames in Rust, Python & Node.js Polars is

null 11.8k Jan 8, 2023
Apache Arrow DataFusion and Ballista query engines

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

The Apache Software Foundation 2.9k Jan 2, 2023
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

Apache Arrow Powering In-Memory Analytics Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enabl

The Apache Software Foundation 10.9k Jan 6, 2023
🦖 Evolve your fixed length data files into Apache Arrow tables, fully parallelized!

?? Evolve your fixed length data files into Apache Arrow tables, fully parallelized! ?? Overview ... ?? Installation The easiest way to install evolut

Firelink Data 3 Dec 22, 2023
TensorBase is a new big data warehousing with modern efforts.

TensorBase is a new big data warehousing with modern efforts.

null 1.3k Jan 4, 2023
New generation decentralized data warehouse and streaming data pipeline

World's first decentralized real-time data warehouse, on your laptop Docs | Demo | Tutorials | Examples | FAQ | Chat Get Started Watch this introducto

kamu 184 Dec 22, 2022
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

Parquet2 This is a re-write of the official parquet crate with performance, parallelism and safety in mind. The five main differentiators in compariso

Jorge Leitao 237 Jan 1, 2023
bspipe A Rust implementation of Bidirectional Secure Pipe

bspipe A Rust implementation of Bidirectional Secure Pipe

xufanglu 2 Nov 14, 2022
Yet Another Technical Analysis library [for Rust]

YATA Yet Another Technical Analysis library YaTa implements most common technical analysis methods and indicators. It also provides you an interface t

Dmitry 197 Dec 29, 2022
Dataframe structure and operations in Rust

Utah Utah is a Rust crate backed by ndarray for type-conscious, tabular data manipulation with an expressive, functional interface. Note: This crate w

Suchin 139 Sep 26, 2022
Rust DataFrame library

Polars Blazingly fast DataFrames in Rust & Python Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arro

Ritchie Vink 11.9k Jan 8, 2023
Rayon: A data parallelism library for Rust

Rayon Rayon is a data-parallelism library for Rust. It is extremely lightweight and makes it easy to convert a sequential computation into a parallel

null 7.8k Jan 8, 2023
A Rust crate that reads and writes tfrecord files

tfrecord-rust The crate provides the functionality to serialize and deserialize TFRecord data format from TensorFlow. Features Provide both high level

null 22 Nov 3, 2022
sparse linear algebra library for rust

sprs, sparse matrices for Rust sprs implements some sparse matrix data structures and linear algebra algorithms in pure Rust. The API is a work in pro

Vincent Barrielle 311 Dec 18, 2022
PyO3-based Rust binding of NumPy C-API

rust-numpy Rust bindings for the NumPy C-API API documentation Latest release (possibly broken) Current Master Requirements Rust >= 1.41.1 Basically,

PyO3 759 Jan 3, 2023
DataFrame / Series data processing in Rust

black-jack While PRs are welcome, the approach taken only allows for concrete types (String, f64, i64, ...) I'm not sure this is the way to go. I want

Miles Granger 30 Dec 10, 2022