Fwumious Wabbit, fast on-line machine learning toolkit written in Rust

Overview

Fwumious Wabbit is

  • a very fast machine learning tool
  • built with Rust
  • inspired by and partially compatible with Vowpal Wabbit (much love! read more about compatibility here)
  • currently supports logistic regression and field-aware factorization machines

Rust Gitter

Fwumious Wabbit is actively used in Outbrain for offline research, as well as for some production flows. It enables "high bandwidth research" when doing feature engineering, feature selection, hyper-parameter tuning, and the like.

Data scientists can train hundreds of models over hundreds of millions of examples in a matter of hours on a single machine.

For our tested scenarios it is almost two orders of magnitude faster than the fastest Tensorflow implementation of Logistic Regression and FFMs that we could come up with. It is an order of magnitude faster than Vowpal Wabbit for some specific use-cases.

Check out our benchmark, here's a teaser:

benchmark results

Why is it faster? (see here for more details)

  • Only implements Logistic Regression and Field-aware Factorization Machines
  • Uses hashing trick, lookup table for AdaGrad and a tight encoding format for the "input cache"
  • Features' namespaces have to be declared up-front
  • Prefetching of weights from memory (avoiding pipeline stalls)
  • Written in Rust with heavy use of code specialization (via macros and traits)
Comments
  • Java interoperation

    Java interoperation

    Looking forward to a moment when there will be some way to test interoperation with Java - to use at least predict part in Java using JNI/JNA or something similar, the same as we currently use VW maven component (https://github.com/VowpalWabbit/vowpal_wabbit/tree/master/java).

    opened by josepowera 4
  • Add an alignment check before calling slice::from_raw_parts

    Add an alignment check before calling slice::from_raw_parts

    A Vec<u8>'s buffer is only guaranteed to be 1-aligned, but it's used here in code assuming it's 4-aligned. That might work, depending on what the allocator does, but it should be checked to avoid UB.

    (Alternatively it could over-allocate the buffer and do something like align_to, but I didn't want to make broad changes.)

    opened by scottmcm 3
  • Additional tests of classification performance and weight properties

    Additional tests of classification performance and weight properties

    • parameterized training data generator to simplify running many ad hoc experiments if required
    • added a bash script that trains fw on generated data, stores the outputs, computes relevant classification metrics (on training data, intentionally) and alerts the user if something is off --- including properties of prediction files, as well as classification capabilities vs. random must be in good shape/of quality. Current experiments indicate that random vs. fw balanced accuracy on training data ((sensitivity + specificity) / 2) difference is more than 0.2, hence this is the current margin considered for the test to pass.
    • added an action that runs this for each commit, so we know where things went south if that's the case

    The plan is to include this as a git action so that for each new binary we have a few learning-level sanity checks conducted. Currently, we included the main properties we are most interested in with each new version; more can be added should the need arise.

    opened by SkBlaz 2
  • Use `if let` to deconstruct a single pattern, instead of `match`

    Use `if let` to deconstruct a single pattern, instead of `match`

    Hi there, I've recently started my rust journey and was hoping you'd be open for some ornamental changes recommended by clippy.

    This PR replaces match ... Some(...) blocks with the more idiomatic if let.

    opened by sed-i 2
  • Combine

    Combine

    • fix a small bug with not accepting "w" as a valid character for a namespace
    • allow for transformations of transformed namespaces
    • implement Combine transformer
    opened by andraztori 2
  • Binning

    Binning

    Add binning basic support with a few default binners implemented.

    This allows for any kind of transformations of float values before making them categorical features to be used in LR/FFM.

    opened by andraztori 2
  • Reduce examples in benchmark, benchmark only fw by default

    Reduce examples in benchmark, benchmark only fw by default

    1. Instead of running the benchmark for 10 million train & test examples each - running only 1 million (should be indicative enough and will run more quickly)
    2. Benchmarking scripts run the benchmark only on fw, not on both fw and vw
    opened by bbenshalom 1
  • Use `if let` to deconstruct a single pattern, instead of `match`

    Use `if let` to deconstruct a single pattern, instead of `match`

    (Reposting #61 under a different branch name.)

    Hi there, I've recently started my rust journey and was hoping you'd be open for some ornamental changes recommended by clippy.

    This PR replaces match ... Some(...) blocks with the more idiomatic if let.

    opened by sed-i 1
  • discarding of temporary pointer as it may become invalid in case the vector is reallocated

    discarding of temporary pointer as it may become invalid in case the vector is reallocated

    while investigating some FW crashes due to segmentation fault, caused by malloc trying to allocate a block and complaining that the block CRC isn't the same as when it was freed, I ran valgrind (on our reproducible setup offline) and got a hint that something bad is going on in some lines in parser.rs the only thing I could imagine happening is the "buf" pointer somehow becoming invalid, it seems this can happen if the output_buffer vec grows and is reallocated. got rid of it - and the valgrind complaints went away, as well as the crashes. still not sure exactly about the scenario though - because the vector is preallocated generously on startup, so we might want to continue looking into the input as there may be something fishy going on there. WDYT?

    opened by yonatankarni 1
  • Verbose namespaces

    Verbose namespaces

    A) Introduce two new parameters --linear (to be used instead of --interaction and --keep) and --ffm_field_verbose (to be used instead of --ffm_field)

    This now allows for passing feature names / namespaces as full namespace names as found in vw_namespace_map.csv.

    It's a first step to unlock more flexible namespace definitions in input files.

    B) Implement multi-byte namespaces in vw_namespaces_map.csv and when parsing vw files

    opened by andraztori 1
  • Version

    Version

    LMK what you think - discussed this with @flaunderg - currently we publish artifacts internally by explicitly triggering a build from Jenkins. the produced artifact is put in artifactory at fw-/fw-<branch_name>- - and that's when we set a git tag "fw-<branch_name>-version" in the repo.

    we suggest that - only when creating a new tag for "main" branch, two additional things will happen:

    1. the version.rs file will be overwritten, with the auto-incremented version (current is 0.1, so next is 0.2 etc.)
    2. the benchmark will be run and BENCHMARK.md and benchmark_results.png will be committed

    so - when someone builds (or if we choose to publish artifacts from "main") - we'll be able to tell the binary version, and not just have to rely on commit # (which we can also add to the version info as with vw, btw). this way there will be less potential for conflict when we merge branches where we already published artifacts to try them out - the version will always be taken from main. if main was promoted and you pull - you'll get the updated version.

    opened by yonatankarni 1
  • Field interactions

    Field interactions

    implement ability to specify specific field interaction parameters the feature gets turned on with --ffm_interaction_matrix then field weights can be expressed by --ffm_interaction field_id_1:field_id_2:weight

    weight 0 means the interaction is fully masked out. default is weight 1.0 everywhere field ids are sequential ids of fields as passed by field declaration parameters

    WARNING: since fields are not named, this means that any change to order of field declarations requires careful adjustment of interactions too...

    ping @adischw

    opened by andraztori 0
Owner
Outbrain
Outbrain
convolutions-rs is a crate that provides a fast, well-tested convolutions library for machine learning

convolutions-rs convolutions-rs is a crate that provides a fast, well-tested convolutions library for machine learning written entirely in Rust with m

null 10 Jun 28, 2022
A Machine Learning Framework for High Performance written in Rust

polarlight polarlight is a machine learning framework for high performance written in Rust. Key Features TBA Quick Start TBA How To Contribute Contrib

Chris Ohk 25 Aug 23, 2022
A Rust library with homemade machine learning models to classify the MNIST dataset. Built in an attempt to get familiar with advanced Rust concepts.

mnist-classifier Ideas UPDATED: Finish CLI Flags Parallelize conputationally intensive functions Class-based naive bayes README Image parsing Confusio

Neil Kaushikkar 0 Sep 2, 2021
A Rust machine learning framework.

Linfa linfa (Italian) / sap (English): The vital circulating fluid of a plant. linfa aims to provide a comprehensive toolkit to build Machine Learning

Rust-ML 2.2k Jan 2, 2023
Machine Learning library for Rust

rusty-machine This library is no longer actively maintained. The crate is currently on version 0.5.4. Read the API Documentation to learn more. And he

James Lucas 1.2k Dec 31, 2022
Machine learning crate for Rust

rustlearn A machine learning package for Rust. For full usage details, see the API documentation. Introduction This crate contains reasonably effectiv

Maciej Kula 547 Dec 28, 2022
Machine learning in Rust.

Rustml Rustml is a library for doing machine learning in Rust. The documentation of the project with a descprition of the modules can be found here. F

null 60 Dec 15, 2022
Rust based Cross-GPU Machine Learning

HAL : Hyper Adaptive Learning Rust based Cross-GPU Machine Learning. Why Rust? This project is for those that miss strongly typed compiled languages.

Jason Ramapuram 83 Dec 20, 2022
Machine Learning Library for Rust

autograph Machine Learning Library for Rust undergoing maintenance Features Portable accelerated compute Run SPIR-V shaders on GPU's that support Vulk

null 223 Jan 1, 2023
Example of Rust API for Machine Learning

rust-machine-learning-api-example Example of Rust API for Machine Learning API example that uses resnet224 to infer images received in base64 and retu

vaaaaanquish 16 Oct 3, 2022
High-level non-blocking Deno bindings to the rust-bert machine learning crate.

bertml High-level non-blocking Deno bindings to the rust-bert machine learning crate. Guide Introduction The ModelManager class manages the FFI bindin

Carter Snook 14 Dec 15, 2022
Machine learning Neural Network in Rust

vinyana vinyana - stands for mind in pali language. Goal To implement a simple Neural Network Library in order to understand the maths behind it. This

Alexandru Olaru 3 Dec 26, 2022
Source Code for 'Practical Machine Learning with Rust' by Joydeep Bhattacharjee

Apress Source Code This repository accompanies Practical Machine Learning with Rust by Joydeep Bhattacharjee (Apress, 2020). Download the files as a z

Apress 57 Dec 7, 2022
An example of using TensorFlow rust bindings to serve trained machine learning models via Actix Web

Serving TensorFlow with Actix-Web This repository gives an example of training a machine learning model using TensorFlow2.0 Keras in python, exporting

Kyle Kosic 39 Dec 12, 2022
🏆 A ranked list of awesome machine learning Rust libraries.

best-of-ml-rust ?? A ranked list of awesome machine learning Rust libraries. This curated list contains 180 awesome open-source projects with a total

₸ornike 110 Dec 28, 2022
Machine learning crate in Rust

DeepRust - Machine learning in Rust Vision To create a deeplearning crate in rust aiming to create a great experience for ML researchers & developers

Vigneshwer Dhinakaran 8 Sep 6, 2022
Mars is a rust machine learning library. [Goal is to make Simple as possible]

Mars Mars (ma-rs) is an blazingly fast rust machine learning library. Simple and Powerful! ?? ?? Contribution: Feel free to build this project. This i

KoBruh 3 Dec 25, 2022
A machine learning library in Rust from scratch.

Machine Learning in Rust Learn the Rust programming language through implementing classic machine learning algorithms. This project is self-completed

Chi Zuo 39 Jan 17, 2023
Xaynet represents an agnostic Federated Machine Learning framework to build privacy-preserving AI applications.

xaynet Xaynet: Train on the Edge with Federated Learning Want a framework that supports federated learning on the edge, in desktop browsers, integrate

XayNet 196 Dec 22, 2022