Fluvio is a high-performance distributed streaming platform that's written in Rust

Overview

Fluvio

Data streaming for real-time applications

Fluvio is a high-performance distributed streaming platform that's written in Rust, built to make it easy to develop real-time applications.

Quick Links

Release Status

Fluvio is currently in Alpha and is not ready to be used in production. Our CLI and APIs are also in rapid development and may experience breaking changes at any time. We do our best to adhere to semantic versioning but until our R1 release we cannot guarantee it.

Contributing

If you'd like to contribute to the project, please read our Contributing guide.

License

This project is licensed under the Apache license.

Comments
  • [Merged by Bors] - Feature/fluvio cli partition list show size

    [Merged by Bors] - Feature/fluvio cli partition list show size

    Add a human-readable partition size field to the fluvio partition list command.

    $ fluvio partition list
     TOPIC      PARTITION  LEADER  REPLICAS  RESOLUTION  SIZE    HW  LEO  LRS  FOLLOWER OFFSETS 
     cat-facts  0          5001    []        Online      6 KB    12  12   0    [] 
     cat-facts  1          5001    []        Online      2.1 KB  10  10   0    [] 
     cat-facts  2          5001    []        Online      4.7 KB  10  10   0    [] 
     cat-facts  3          5001    []        Online      4.6 KB  9   9    0    []
    

    Prints ERROR when the leader is not able to fetch log size.

    TOPIC      PARTITION  LEADER  REPLICAS  RESOLUTION  SIZE   HW  LEO  LRS  FOLLOWER OFFSETS 
    cat-facts  0          5001    []        Online      ERROR  12  12   0    [] 
    cat-facts  1          5001    []        Online      ERROR  10  10   0    [] 
    cat-facts  2          5001    []        Online      ERROR  10  10   0    [] 
    cat-facts  3          5001    []        Online      ERROR  9   9    0    []
    

    Force bash shell in the k8-util/docker/build.sh to make the script work in a non-bash-compatible environment.

    Solves #2148

    opened by Afourcat 35
  • Add Producer API for sending keyless records

    Add Producer API for sending keyless records

    After #965 lands, we will have deprecated the TopicProducer::send_record API, which allowed users to send a record without specifying a key for it. This API was deprecated because send_record accepted a partition number as one of the arguments, and we decided that for now, users should not be able to manually specify partitions.

    This leaves a gap in our API for a particular use-case, which is sending a single record with no key. Without send_record, the only way to send a record with no key involves using the send_all function, which forces users to wrap their single record in some kind of iterator before passing it to the producer, e.g. like this:

    producer.send_all(Some( (None, "My record data with no key") )).await?;
    

    I believe that this is a workaround that we should not force users to deal with, as it is non-obvious and non-ergonomic. I think we need to introduce a new API method for the "single record with no key" use-case. I originally proposed a send_keyless API in #965 (discussion at https://github.com/infinyon/fluvio/pull/965#discussion_r615967714), but we decided to exclude it from that PR in order to discuss it more, which is what this issue is for.

    For reference, the send_keyless API that I would like to introduce looks like this:

    impl TopicProducer {
        pub async fn send_keyless<V>(&self, value: V) -> Result<(), FluvioError>
        where
            V: Into<Vec<u8>> { ... }
    }
    

    So the questions I would like to resolve in this discussion are:

    1. Do we believe that a new API to fill this use-case is necessary? (I believe it is),
    2. Is there a different API that better serves this use-case (I have not thought of one)?
    3. Are we happy with the name send_keyless or would we prefer something else?

    I think it makes sense to put a timeline on this. If nobody has any strong reasons why we should NOT introduce this API, I would like to add this issue to our roadmap as an action item by this Wednesday (4/21/21) to give enough time to ship it with the 0.7.4 release this Thursday.


    Discussion Results

    After debating the API, we've decided to implement the following:

    pub struct RecordKey(RecordKeyInner);
    
    impl RecordKey {
        pub const NULL: Self = Self(RecordKeyInner::Null);
    }
    
    enum RecordKeyInner {
        Null,
        Key(RecordData),
    }
    
    impl<K: Into<Vec<u8>>> From<K> for RecordKey {
        fn from(k: K) -> Self {
            Self(RecordKeyInner::Key(RecordData::from(k)))
        }
    }
    
    /// A type to hold the contents of a record's value.
    pub struct RecordData(bytes::Bytes);
    
    impl<V: Into<Vec<u8>>> From<V> for RecordData {
        fn from(v: V) -> Self {
            let value: Vec<u8> = v.into();
            Self(value.into())
        }
    }
    
    fn send<K, V>(key: K, value: V)
        where
            K: Into<RecordKey>,
            V: Into<RecordData>,
    { ... }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_conversion() {
            send(RecordKey::NULL, vec![2]);
            send(vec![3],vec![4]);
            send("Hello", "world");
            send("Hello".to_string(), "World".to_string());
        }
    }
    
    Client API producer 
    opened by nicholastmosher 31
  • [Merged by Bors] - fix: clap-command updates

    [Merged by Bors] - fix: clap-command updates

    Overview

    1. Adds tests for valid and invalid fluvio cli arguments.
    2. Changes to make flag names more clear and consistent.
    3. Some bug fixes to make command usage consistent with help.
    4. Simplifies consume status message and adds tests.

    Changes to fluvio consume options for specifying offsets.

      -B, --beginning
              Consume records from the beginning of the log
    
      -H, --head <integer>
              Consume records starting <integer> from the beginning of the log
    
      -T, --tail <integer>
              Consume records starting <integer> from the end of the log
    
          --start <integer>
              The absolute offset of the first record to begin consuming from
    
          --end <integer>
              Consume records until end offset (inclusive)
    

    Command Summary

    Read messages from a topic/partition
    
    Usage: fluvio consume [OPTIONS] <topic>
    
    Arguments:
      <topic>
              Topic name
    
    Options:
      -p, --partition <integer>
              Partition id
              
              [default: 0]
    
      -A, --all-partitions
              Consume records from all partitions
    
      -d, --disable-continuous
              Disable continuous processing of messages
    
          --disable-progressbar
              Disable the progress bar and wait spinner
    
      -k, --key-value
              Print records in "[key] value" format, with "[null]" for no key
    
      -F, --format <FORMAT>
              Provide a template string to print records with a custom format. See --help for details.
              
              Template strings may include the variables {{key}}, {{value}}, {{offset}}, {{partition}} and {{time}} which will have each record's contents substituted in their place. Note that timestamp is displayed using RFC3339, is always UTC and ignores system timezone.
              
              For example, the following template string:
              
              Offset {{offset}} has key {{key}} and value {{value}}
              
              Would produce a printout where records might look like this:
              
              Offset 0 has key A and value Apple
    
          --table-format <TABLE_FORMAT>
              Consume records using the formatting rules defined by TableFormat name
    
      -B, --beginning
              Consume records from the beginning of the log
    
      -H, --head <integer>
              Consume records starting <integer> from the beginning of the log
    
      -T, --tail <integer>
              Consume records starting <integer> from the end of the log
    
          --start <integer>
              The absolute offset of the first record to begin consuming from
    
          --end <integer>
              Consume records until end offset (inclusive)
    
      -b, --maxbytes <integer>
              Maximum number of bytes to be retrieved
    
          --suppress-unknown
              Suppress items items that have an unknown output type
    
      -O, --output <type>
              Output
              
              [possible values: dynamic, text, binary, json, raw, table, full-table]
    
          --smartmodule <SMARTMODULE>
              Name of the smart module
    
          --smartmodule-path <SMARTMODULE_PATH>
              Path to the stmart module
    
          --aggregate-initial <AGGREGATE_INITIAL>
              (Optional) Path to a file to use as an initial accumulator value with --aggregate
    
      -e, --params <PARAMS>
              (Optional) Extra input parameters passed to the smartmodule module. They should be passed using key=value format Eg. fluvio consume topic-name --filter filter.wasm -e foo=bar -e key=value -e one=1
    
          --isolation <ISOLATION>
              Isolation level that consumer must respect. Supported values: read_committed (ReadCommitted) - consume only committed records, read_uncommitted (ReadUncommitted) - consume all records accepted by leader
    
      -h, --help
              Print help information (use `-h` for a summary)
    
    

    Loose ends

    1. Integration tests for the command behavior
    2. Consistency for offset datatype.
    3. --end is inclusive so that fluvio consume --end 1 displays 2 records.
    CLI 
    opened by davidbeesley 28
  • [Merged by Bors] - Add bors configuration

    [Merged by Bors] - Add bors configuration

    This PR is intended to be merged after https://github.com/infinyon/fluvio/pull/937

    @sehz if we want to use this, you'll need to add the Bors bot to this repository. You can do that here: https://bors.tech/ at the top you click "Log into dashboard". It will ask to add Bors to your Github account, but you can selectively choose which repositories it has access to, so you can choose infinyon/fluvio. Once that's done, I believe we should be able to use the bors commands in this PR.

    Here's the command documentation for how to use Bors: https://bors.tech/documentation/. The general gist though is just that if you are reviewing a PR, instead of clicking the "Merge/Rebase/Squash" button, you just comment with bors r+ and it will take care of adding it to a queue to be merged, then it will merge it once all CI passes.

    opened by nicholastmosher 26
  • [Merged by Bors] - feat: Improved SmartStream API using proc macro

    [Merged by Bors] - feat: Improved SmartStream API using proc macro

    Closes #1033.

    • Adds a #[smartstream] attribute that generates glue code to pass data to and from WASM
    • Detects attribute arguments #[smartstream(filter)] and #[smartstream(map)]
    • Adds SimpleRecord that may be passed to user functions, while hiding internal DefaultRecord details
    opened by nicholastmosher 25
  • [Merged by Bors] - feat: support to generate action tokens in other environments

    [Merged by Bors] - feat: support to generate action tokens in other environments

    Introduces the get_action_auth_with_token method for HubAccess to allow tokens in other applications of the ecosystem to generate action tokens similar to how the Hub CLI does.

    opened by EstebanBorai 24
  • [Merged by Bors] - feat: generate SmartModules using SMDK

    [Merged by Bors] - feat: generate SmartModules using SMDK

    Provides support to generate new SmartModules using cargo-generate as a libary.

    Resolve: https://github.com/infinyon/fluvio/issues/2621


    Examples

    Generates SmartModule Projects using smdk generate

    ➜  Desktop smdk generate log-filter
    Generating new SmartModule project: log-filter
    🔧   Destination: /Users/esteban/Desktop/log-filter ...
    🔧   Generating template ...
    ✔ 🤷   Which type of SmartModule would you like? · filter
    ✔ 🤷   Want to use SmartModule parameters? · false
    🤷   SmartModule Version [default: git = "https://github.com/infinyon/fluvio.git"]: git = "https://github.com/infinyon/fluvio.git"
    ✔ 🤷   Want to use SmartModule init? · true
    Ignoring: /var/folders/0g/l702s31x2mj4zdlw2_43rpyr0000gn/T/.tmpwQ4Yql/cargo-generate.toml
    [1/5]   Done: Cargo.toml
    [2/5]   Done: README.md
    [3/5]   Done: Smart.toml
    [4/5]   Done: src/lib.rs
    [5/5]   Done: src
    🔧   Moving generated files into: `/Users/esteban/Desktop/log-filter`...
    💡   Initializing a fresh Git repository
    ✨   Done! New project created /Users/esteban/Desktop/log-filter
    

    Generated Project

    total 24
    drwxr-xr-x  7 esteban  staff   224B Oct  2 13:54 .
    drwx------+ 6 esteban  staff   192B Oct  2 13:54 ..
    drwxr-xr-x  9 esteban  staff   288B Oct  2 13:54 .git
    -rw-r--r--  1 esteban  staff   358B Oct  2 13:54 Cargo.toml
    -rw-r--r--  1 esteban  staff   1.8K Oct  2 13:54 README.md
    -rw-r--r--  1 esteban  staff   152B Oct  2 13:54 Smart.toml
    drwxr-xr-x  3 esteban  staff    96B Oct  2 13:54 src
    

    Generated src/lib.rs with the provided options

    use fluvio_smartmodule::dataplane::smartmodule::{SmartModuleExtraParams, SmartModuleInitError};
    
    use once_cell::sync::OnceCell;
    use fluvio_smartmodule::eyre;
    
    
    
    use fluvio_smartmodule::{smartmodule, Result, Record};
    
    #[smartmodule(filter)]
    pub fn filter(record: &Record) -> Result<bool> {
        let string = std::str::from_utf8(record.value.as_ref())?;
        Ok(string.contains('a'))
    }
    
    
    
    
    static CRITERIA: OnceCell<String> = OnceCell::new();
    
    #[smartmodule(init)]
    fn init(params: SmartModuleExtraParams) -> Result<()> {
        // You can refer to the example SmartModules in Fluvio's GitHub Repository
        // https://github.com/infinyon/fluvio/tree/master/smartmodule
        if let Some(key) = params.get("key") {
            CRITERIA.set(key.clone()).map_err(|err| eyre!("failed setting key: {:#?}", err))
        } else {
            Err(SmartModuleInitError::MissingParam("key".to_string()).into())
        }
    }
    

    Building SMDK Project

    ➜  Desktop cd ./log-filter
    ➜  log-filter git:(main) ✗ smdk build
        Updating git repository `https://github.com/infinyon/fluvio.git`
        Updating crates.io index
       Compiling proc-macro2 v1.0.46
    

    Testing SM Project using smdk test

    ➜  log-filter git:(main) ✗ smdk test --params "key=a" --text "testing"
    project name: "log-filter"
    loading module at: target/wasm32-unknown-unknown/release-lto/log_filter.wasm
    0 records outputed
    ➜  log-filter git:(main) ✗ smdk test --params "key=a" --text "tasty"
    project name: "log-filter"
    loading module at: target/wasm32-unknown-unknown/release-lto/log_filter.wasm
    1 records outputed
    tasty
    
    opened by EstebanBorai 21
  • [Merged by Bors] - Introduce channels into CLI

    [Merged by Bors] - Introduce channels into CLI

    This adds the concept of separate channels in the CLI, with the ability to change between them.

    3 default channels:

    • stable
    • latest
    • dev

    By default, we use the stable channel, but you can switch to another channel with:

    fluvio version switch <channel>

    To add a version channel, you need to create it first

    For example, to create a channel for 0.9.15 run fluvio version create 0.9.15

    Then you can switch over to it with

    fluvio version switch 0.9.15


    Simplifies starting a cluster to fluvio cluster start for all channels (only if using the fluvio-channel binary)

    Changes fluvio-cli to respect 3 environment variables to control cluster and plugin interactions

    • FLUVIO_RELEASE_CHANNEL
    • FLUVIO_EXTENSIONS_DIR
    • FLUVIO_IMAGE_TAG_STRATEGY

    Manual tests:

    • Copy fluvio-channel binary ~/.fluvio/bin/fluvio
    • Switch to stable channel fluvio version switch stable, start a cluster with fluvio cluster start
    • Switch to latest, fluvio version switch latest, start a cluster with fluvio cluster start
    • Verify the image that is used for pods is correct

    Additionally, modifies install.sh to support the installation of fluvio-channel to ~/.fluvio/bin/fluvio, and a target fluvio binary as ~/.fluvio/bin/fluvio-<channel>

    Resolves #1663

    opened by tjtelan 21
  • [Merged by Bors] - feat: Add `smdk` install support in CLI and release

    [Merged by Bors] - feat: Add `smdk` install support in CLI and release

    Closes #2644

    Adds a name check to fluvio install. If it starts with fluvio-, it is an extension and gets installed in the channel extension dir. Otherwise, it gets installed in the fluvio bin dir.

    The result of fluvio install smdk --develop installs the latest published smdk to install in my fluvio path.

    --

    Added smdk to CI for build and publish for CI

    Moving a lot of publish and release workflows into Makefile so I can test it easier locally

    Got rid of the cargo make configs since they are out of date and unmaintained

    opened by tjtelan 20
  • [Merged by Bors] - Migrate to `comfy-table`

    [Merged by Bors] - Migrate to `comfy-table`

    I've migrated all components from prettytable to comfy-table.

    Although the API's of the 2 libraries are quite different, and a lot of features differ, I've tried my best to maintain expected behavior.

    Would appreciate if you could review this PR and merge if tables are rendered as expected.

    enhancement 
    opened by suptejas 19
  • [Merged by Bors] - [Merged by Bors] - Add check for unused dependencies

    [Merged by Bors] - [Merged by Bors] - Add check for unused dependencies

    Resolves #415

    Runs in separate workflow. Runs on staging branch (when Bors performs PR merge to master)

    • Install nightly in CI (to be used by udeps)
    • Add udeps to Makefile
    • Remove unused crates and minor bumped affected crates
    opened by tjtelan 18
  • Tracking:  Connector Development Kit Phase 1

    Tracking: Connector Development Kit Phase 1

    CDK is a new component for Fluvio. It facilitates the development, testing, and deployment of connectors.

    $ cdk initialize
    
    
    $ cdk test
    Testing connector
    
    $ cdk deploy
    

    TODO:

    • [x] Initial Skeleton
    • [x] Build cdk build @sehz
    • [ ] Schema validation @sehz
    • [ ] Initial cdk initialize @tjtelan
    • [ ] Test cdk test @sehz
    • [ ] Docker Deploy cdk deploy @sehz
    feature/connector DX/Developer Experience 
    opened by sehz 0
  • cli: Add `fluvio cluster status`

    cli: Add `fluvio cluster status`

    Behavior

    Adds fluvio cluster status to the CLI

    Tries to connect to the SC, reports "none" if unsuccessful, describes where (local/k8s/cloud) cluster is deployed if successful. If the SC is up, it then will report if the SPUs are down, or if they are all empty.

    Comment

    I'm not sure if this was still wanted but it looked like a fun first issue. It's also possible I misinterpreted the original request. Specifically, I'm not sure how we could determine if the status should be "stopped (no sc, spu running but data exits)", as I'm not sure how to determine if there is data in the spus if the connection to the sc fails.

    I would love some feedback and would be happy to make modifications!

    https://github.com/infinyon/fluvio/issues/1673

    CLI cluster 
    opened by crajcan 2
  • Bring back stat features in the CLI

    Bring back stat features in the CLI

    Prior to 0.10.0, stat features display consume/producer statistics. It depends on the stat feature, which was removed with other metrics features. We should bring back this feature in the CLI. Some options for metric analysis are:

    • Bring back stat in the CLI or other metric crates.
    • Use OpenTelemetry pipeline
    • Other metric calculation
    CLI Performance 
    opened by sehz 0
Releases(dev)
A highly efficient daemon for streaming data from Kafka into Delta Lake

kafka-delta-ingest The kafka-delta-ingest project aims to build a highly efficient daemon for streaming data through Apache Kafka into Delta Lake. Thi

Delta Lake 164 Nov 19, 2022
Magical Automatic Deterministic Simulator for distributed systems in Rust.

MadSim Magical Automatic Deterministic Simulator for distributed systems. Deterministic simulation MadSim is a Rust async runtime similar to tokio, bu

MadSys Research Group 242 Dec 1, 2022
Raft distributed consensus for WebAssembly in Rust

WRaft: Raft in WebAssembly What is this? A toy implementation of the Raft Consensus Algorithm for WebAssembly, written in Rust. Basically, it synchron

Emanuel Evans 60 Oct 22, 2022
Paxakos is a pure Rust implementation of a distributed consensus algorithm

Paxakos is a pure Rust implementation of a distributed consensus algorithm based on Leslie Lamport's Paxos. It enables distributed systems to consistently modify shared state across their network, even in the presence of failures.

Pavan Ananth Sharma 2 Jul 5, 2022
A model checker for implementing distributed systems.

A model checker for implementing distributed systems.

Stateright Actor Framework 1.3k Dec 1, 2022
The lightest distributed consensus library. Run your own replicated state machine! ❤️

Little Raft The lightest distributed consensus library. Run your own replicated state machine! ❤️ Installing Simply import the crate. In your Cargo.to

Ilya Andreev 354 Nov 27, 2022
Sorock is an experimental "so rocking" scale-out distributed object storage

Sorock is an experimental "so rocking" scale-out distributed object storage

Akira Hayakawa 6 Jun 13, 2022
A universal, distributed package manager

Cask A universal, distributed package manager. Installation | Usage | How to publish package? | Design | Contributing | Cask.toml If you are tired of:

null 35 Nov 20, 2022
A fully asynchronous, futures-based Kafka client library for Rust based on librdkafka

rust-rdkafka A fully asynchronous, futures-enabled Apache Kafka client library for Rust based on librdkafka. The library rust-rdkafka provides a safe

Federico Giraud 1.1k Dec 2, 2022
Rust client for Apache Kafka

Kafka Rust Client Project Status This project is starting to be maintained by John Ward, the current status is that I am bringing the project up to da

Yousuf Fauzan 871 Nov 22, 2022
Easy-to-use beanstalkd client for Rust (IronMQ compatible)

rust-beanstalkd Easy-to-use beanstalkd client for Rust (IronMQ compatible) Install Add this dependency to your Cargo.toml beanstalkd = "*" Documentati

Johannes Schickling 44 Oct 4, 2022
libhdfs binding and wrapper APIs for Rust

hdfs-rs libhdfs binding library and rust APIs which safely wraps libhdfs binding APIs Current Status Alpha Status (Rust wrapping APIs can be changed)

Hyunsik Choi 31 Apr 4, 2022
The Raft algorithm implement by Rust.

Raft The Raft algorithm implement by Rust. This project refers to Eli Bendersky's website, the link as follows: https://eli.thegreenplace.net/2020/imp

Qiang Zhao 1 Oct 23, 2021
Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

MichelNowak 0 Mar 29, 2022
Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

Damavand is a code that simulates quantum circuits. In order to learn more about damavand, refer to the documentation. Development status Core feature

prevision.io 6 Mar 29, 2022
The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

null 12 Nov 18, 2022
High performance and distributed KV store w/ REST API. 🦀

About Lucid KV High performance and distributed KV store w/ REST API. ?? Introduction Lucid is an high performance, secure and distributed key-value s

Lucid ᵏᵛ 300 Nov 18, 2022
High performance distributed framework for training deep learning recommendation models based on PyTorch.

PERSIA (Parallel rEcommendation tRaining System with hybrId Acceleration) is developed by AI [email protected] Technology, collaborating with ETH. It

null 337 Nov 24, 2022
A high-performance, distributed, schema-less, cloud native time-series database

CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

null 1.7k Nov 27, 2022
A high-performance, high-reliability observability data pipeline.

Quickstart • Docs • Guides • Integrations • Chat • Download What is Vector? Vector is a high-performance, end-to-end (agent & aggregator) observabilit

Timber 11.9k Dec 1, 2022