Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference

Overview

tract-logo

rustc >= 1.42.0 MIT/Apache 2 Native Linux test status Embedded targets status Doc

Sonos' Neural Network inference engine.

This project used to be called tfdeploy, or Tensorflow-deploy-rust.

What ?

tract is a Neural Network inference toolkit. It can read Tensorflow 1, ONNX or NNEF, optimize them and run data through them.

Quick start

Tract in the landscape

ONNX

As of today (October 2020), tract passes successfully about 85% of ONNX backends tests. All "real life" integration tests in Onnx test suite are passing: bvlc_alexnet, densenet121, inception_v1, inception_v2, resnet50, shufflenet, squeezenet, vgg19, zfnet512.

The following operators are implemented and tested.

Abs, Acos, Acosh, Add, And, ArgMax, ArgMin, Asin, Asinh, Atan, Atanh, AveragePool, BatchNormalization, Cast, CategoryMapper, Ceil, Clip, Compress, Concat, Constant, ConstantLike, ConstantOfShape, Conv, ConvInteger, Cos, Cosh, DequantizeLinear, Div, Dropout, Elu, Equal, Erf, Exp, Expand, EyeLike, Flatten, Floor, GRU, Gather, Gemm, GlobalAveragePool, GlobalLpPool, GlobalMaxPool, Greater, GreaterOrEqual, HardSigmoid, Hardmax, Identity, InstanceNormalization, IsInf, IsNaN, LRN, LSTM, LeakyRelu, Less, LessOrEqual, Log, LogSoftmax, MatMul, MatMulInteger, Max, MaxPool, Mean, Min, Mod, Mul, Neg, NonZero, Not, Or, PRelu, Pad, ParametricSoftplus, Pow, QLinearConv, QLinearMatMul, QuantizeLinear, RNN, Reciprocal, ReduceL1, ReduceL2, ReduceLogSum, ReduceLogSumExp, ReduceMax, ReduceMean, ReduceMin, ReduceProd, ReduceSum, ReduceSumSquare, Relu, Reshape, Resize, Round, Rsqrt, ScaledTanh, Scan, Selu, Shape, Shrink, Sigmoid, Sign, Sin, Sinh, Size, Slice, Softmax, Softplus, Softsign, Split, Sqrt, Squeeze, Sub, Sum, Tan, Tanh, ThresholdedRelu, Tile, Transpose, Unsqueeze, Where, Xor

We test these operators against Onnx 1.4.1 (operator set 9), Onnx 1.5.0 (operator set 10), Onnx 1.6.0 (operator set 11), and Onnx 1.7.0 (operator set 12). Many networks in operator set 8 are also working.

TensorFlow

Even if tract is very far from supporting any arbitrary model, it can run Google Inception v3 and Snips wake word models. Missing operators are relatively easy to add. The lack of easy to reuse test suite, and the wide diversity of operators in Tensorflow make it difficult to target a full support.

The following operators are implemented and tested:

Abs, Add, AddN, AddV2, Assign, AvgPool, BatchToSpaceND, BiasAdd, BlockLSTM, Cast, Ceil, ConcatV2, Const, Conv2D, DepthwiseConv2dNative, Div, Enter, Equal, Exit, ExpandDims, FakeQuantWithMinMaxVars, Fill, FloorMod, FusedBatchNorm, GatherNd, GatherV2, Greater, GreaterEqual, Identity, Less, LessEqual, Log, LogicalAnd, LogicalOr, LoopCond, MatMul, Max, MaxPool, Maximum, Mean, Merge, Min, Minimum, Mul, Neg, NoOp, Pack, Pad, Placeholder, Pow, Prod, RandomUniform, RandomUniformInt, Range, RealDiv, Relu, Relu6, Reshape, Rsqrt, Shape, Sigmoid, Slice, Softmax, SpaceToBatchND, Squeeze, StridedSlice, Sub, Sum, Switch, Tanh, Tile, Transpose, VariableV2

TensorFlow-Lite

TensorFlow-Lite is a TensorFlow subproject that also focuses on inference on smaller devices. It uses a precompiler to transform a TensorFlow network to its own format. It only supports a subset of operators from TensorFlow though, and is only optimised for devices with Arm Neon support.

Tract supports a wider subset of TensorFlow operators, and has been optimised for CPU of the previous generation (ARM VFP), also targetting devices in the Raspberry Pi Zero family that TensorFlow Lite does not address.

NNEF

Long story short, TensorFlow and Onnx formats are good for designing and training networks. They need to move fast to follow the research field, tend to integrate new features and operators greedily. They also exhibit a high level of expressivity to facilitate network design.

On the other hand, only a subset of operators and network features actually reach production, so systems running production network do not have to deal with so many operators. Furthermore, some information required for training can be stripped from the network before going to production for prediction.

NNEF tries to bridge the gap between training frameworks and inference by proposing a format dedicated to production and prediction.

Tract supports NNEF:

  • tract_nnef can load and execute NNEF networks
  • tract supports most of the NNEF specification, the most notable exception being the ROI operators and deconvolution
  • tract introduces tract-OPL, a series of NNEF extensions to support other operators (or extend some operators semantics) in order to represent the full range of tract-core neural network support: any network understood by tract should be serializable to tract-OPL. This is a work in progress.
  • tract command line can translate networks from TensorFlow or ONNX to NNEF/OPL.

Example of supported networks

These models among others, are used to track tract performance evolution as part of the Continuous Integration jobs. See .travis/README.md and .travis/bundle-entrypoint.sh for more information.

Keyword spotting on Arm Cortex-M Microcontrollers

https://github.com/ARM-software/ML-KWS-for-MCU

ARM demonstrated the capabilited of the Cortex-M family by providing tutorials and pre-trained models for keyword spotting. While the exercise is ultimately meant for micro-controllers, tract can run the intermediate TensorFlow models.

For instance, on a Rasperry Pi Zero, the "CNN M" model runs in about 70 micro-seconds, and 11 micro-seconds on a Raspberry Pi 3.

Snips wake word models

https://arxiv.org/abs/1811.07684

Snips uses tract to run the wake word detectors. While earlier models were class-based and did not require any special treatment, tract pulsing capabilities made it possible to run WaveNet models efficiently enough for a Raspberry Pi Zero.

Inception v3

Device Family TensorFlow-lite tract
Raspberry Pi Zero Armv6 VFP 113s 39s
Raspberry Pi 2 Armv7 NEON 25s 7s
Raspberry Pi 3 aarch32 NEON 5s 5s

Notes:

  • while the Raspberry Pi 3 is an Armv8 device, this bench is running on Raspbian, an armv6 operating system, crippling the performance of both benches
  • there exists other benches on the internet that show better performance results for TensorFlow (not -Lite) on the Pi 3. They use all four cores of the device. Both TensorFlow-Lite and tract here have been made to run on a single-core.

Roadmap

One important guiding cross-concern: this library must cross-compile as easily as practical to small-ish devices (think 20$ boards).

License

Note: files in the tensorflow/protos directory are copied from the TensorFlow project and are not covered by the following licence statement.

Note: files in the onnx/protos directory are copied from the ONNX project and are not covered by the following licence statement.

Apache 2.0/MIT

All original work licensed under either of

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Comments
  • Integer-sizing a decluttered streaming TypedModel without Pulse (for non causal models)

    Integer-sizing a decluttered streaming TypedModel without Pulse (for non causal models)

    Hey, I came across another problem trying the bidirectional LSTM model in a browser. It is the same LSTM that is now in CI (download link). Now normally I'd use code similar to this:

    use tract_onnx::prelude::*;
    
    fn main() -> TractResult<()> {
        let model = tract_onnx::onnx()
            .model_for_path("model.onnx")?
            .into_optimized()?
            .into_runnable()?;
    
        let input: Tensor = tract_ndarray::Array2::<u8>::zeros((1, 100)).into();
        model.run(tvec!(input))?;
    
        Ok(())
    }
    

    but I get an error:

    ➜  ~/Documents/Experiments/sblstmtest git:(master) ✗ cargo run
       Compiling sblstmtest v0.1.0 (/Users/bminixhofer/Documents/Experiments/sblstmtest)
        Finished dev [unoptimized + debuginfo] target(s) in 4.51s
         Running `target/debug/sblstmtest`
    Error: TractError(Msg("Translating node #1 \"input\" Source ToTypedTranslator"), State { next_error: Some(TractError(Msg("Output type not determined"), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })), backtrace: InternalBacktrace { backtrace: None } })
    

    Running it without into_optimized, or with an input fact works. So I understand that the model can not be optimized because the shape of the input (batch size and seq len) is not known at the time of building. Is that correct? In practice I don't want to fix the input shape at build time because it has to work with different batch sizes.

    Now so far it wouldn't be a problem, I'd just add an option optimize to the JS API to turn optimization on or off depending on whether dynamic shapes are needed during inference.

    The problem comes when I try to store the model that I got by calling into_runnable without calling into_optimized before.

    I get a model of type SimplePlan<InferenceFact, Box<dyn InferenceOp>, ModelImpl<InferenceFact, Box<dyn InferenceOp>>>. When I want to store such a model in a struct like:

    use tract_onnx::prelude::*;
    
    struct Model {
        inner: SimplePlan<InferenceFact, Box<dyn tract_hir::infer::ops::InferenceOp>, InferenceModel>,
    }
    

    I get an error which says that the module ops is private:

    ➜  ~/Documents/Experiments/sblstmtest git:(master) ✗ cargo run
       Compiling sblstmtest v0.1.0 (/Users/bminixhofer/Documents/Experiments/sblstmtest)
    error[E0603]: module `ops` is private
      --> src/main.rs:4:64
       |
    4  |     inner: SimplePlan<InferenceFact, Box<dyn tract_hir::infer::ops::InferenceOp>, InferenceModel>,
       |                                                                ^^^ private module
       |
    note: the module `ops` is defined here
      --> /Users/bminixhofer/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-hir-0.7.0/src/infer/mod.rs:12:1
       |
    12 | mod ops;
       | ^^^^^^^^
    
    error: aborting due to previous error
    
    For more information about this error, try `rustc --explain E0603`.
    error: could not compile `sblstmtest`.
    
    To learn more, run the command again with --verbose.
    

    So I can't store the result. Am I missing something? And if not, is there some way to work around this?

    Thanks for all your help :)

    opened by bminixhofer 36
  • Input fact propagation wonky for NNEF

    Input fact propagation wonky for NNEF

    Maybe I was a bit too quick to close #718 as it seems to still have some issues when running depending on exact flags I pass.

    I'll post these in one go as I think they're related; but we'll see.

    As a base; I'm using the image.nnef.tar we can now generate. I'm using the following base command line:

    tract image.nnef.tar --nnef-tract-core --nnef-tract-onnx -i input:1,3,224,224,f32 --allow-random-input 
    

    This always works with dump (except when passing --profile), but fails run with the following error:

    [2022-06-14T13:55:45.961766439Z WARN  tract::tensor] Using random input for input called "input": 1,3,224,224,F32
    [2022-06-14T13:55:45.969444249Z ERROR tract] Evaluating #1 "ConstantOfShape_25" MultiBroadcastTo
    
        Caused by:
            Undetermined symbol in expression: N
    

    Adding --set N=1 to the run fixes this. I'd have expect something like --override-fact input:1,3,224,224,f32 to also work correctly as a more aggressive -i input:1,3,224,224,f32.

    If I attempt to optimize the graph it fails with the following wonky error where it fails to unify two compatible shapes?

    [2022-06-14T13:42:11.116655254Z ERROR tract] Error at stage optimize
    
        Caused by:
            0: codegen node #4 "Conv_0" ConvUnary
            1: Trying to substitute a N,768,7,7,F32 by 1,768,7,7,F32.
               ModelPatch { context: ["wire_as_lazy_im2col"], dont_apply_twice: None, model: Graph { nodes: [Node { id: 0, name: "incoming-3/0", inputs: [], op: TypedSource { fact: 1,3,224,224,F32 }, outputs: [1,3,224,224,F32 >1/0] }, Node { id: 1, name: "Conv_0.matmatmul", inputs: [0/0>], op: LirMatMulUnary { c_fact: 1,768,49,F32, c_m_axis: 1, c_n_axis: 2, micro_ops: [(2359296,F32 0.0050811768, 0.002538681, -0.0051002502, -0.0015630722, 0.0034770966, 0.0017652512, -0.0231781, -0.0051574707, -0.013504028, 0.002796173, 0.00044894218, -0.0076141357..., [Store])], shape=[1], strides=[1], layout=CFcf (0xf), dynamic ndim=1, c_final_shape: 1,768,49, geometry: Concrete(ConcreteMatMulGeometry { m: 768, k: 3072, n: 49, b_storage: VirtualPacking { packer: Packer { r: 5, alignment: 4, end_padding_record: 0 }, func: LazyIm2colSpec { n_bytes_offsets: [0, ...], k_bytes_offsets: [0, 4, ...] }, k: 3072 } }), mmm: MMM (fma_mmm_f32_16x5 16x5), reshape_post: [] }, outputs: [1,768,49,F32 >2/0] }, Node { id: 2, name: "Conv_0", inputs: [1/0>], op: Reshape(2, [Val(49)], [Val(7), Val(7)]), outputs: [1,768,7,7,F32 ] }], inputs: [0/0>], outputs: [], outlet_labels: {}, properties: {} }, inputs: {}, incoming: {0/0>: 3/0>}, shunt_outlet_by: {}, obliterate: [] }`
    

    Looks like it's somehow fails to propagate the input facts?

    opened by tgolsson 32
  • Add support for GPU inference

    Add support for GPU inference

    Address #688 .

    Tasks:

    • [x] GPUTensor
      • [x] Import
        • [x] Proper type for imported tensor
      • [x] Export
      • [x] Intermediate data in GPU memory
      • [x] Pass tensor strides as uniforms
      • [x] Have way of processing rank 4 tensors
    • [ ] Ops
      • [x] Validate tensor props before applying ops, at least in debug builds
      • [ ] Convolution
      • [ ] Activations
        • [x] tanh
        • [x] sigmoid
        • [ ] relu
      • [ ] Fully-connected
      • [ ] Pooling
      • [ ] Softmax
    • [ ] Runner for models
      • [ ] Managing GPU memory
        • [ ] Free buffers no longer in use to allow for models larger than GPU memory
    • [ ] Examples working
      • [ ] tensorflow-mobilenet-v2
    • [ ] Test various platforms
      • [ ] Linux
        • [x] Vulkan + RADV
        • [ ] Other GPUs
        • [ ] Various embedded systems like RPi
      • [ ] Windows
      • [ ] macOS and iOS
      • [ ] Android
      • [ ] WASM
        • [ ] WebGPU
        • [ ] WebGL
    opened by sh7dm 27
  • Some tensorflow extensions for keras layers support

    Some tensorflow extensions for keras layers support

    I'm trying to load a model into rust and I'm getting an error when I run the model.

    thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: 
    TractError(Msg("Translating #30 \"global_average_pooling1d/Mean\" Unimplemented(Mean)"), 
    State { next_error: Some(TractError(Msg("Operator can not be made a TypedOp."), 
    State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })), 
    backtrace: InternalBacktrace { backtrace: None } })
    

    It seems that the mean operation of global_average_pooling1d is not supported, does anyone know anymore about this?

    opened by CharlieBickerton 25
  • Unimplemented Unimplemented(RandomNormalLike) ToTypedTransla

    Unimplemented Unimplemented(RandomNormalLike) ToTypedTransla

    I'm trying to make Soft-Actor Critic model from stable_baselines3 work with onnx and get following error.

    thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Translating node #39 "RandomNormalLike_25" Unimplemented(RandomNormalLike) ToTypedTranslator
    
    Caused by:
        0: translating op UnimplementedOp { outputs: 1, name: "RandomNormalLike", message: "NodeProto { input: [\"onnx::RandomNormalLike_102\"], output: [\"onnx::Mul_103\"], name: \"RandomNormalLike_25\", op_type: \"RandomNormalLike\", domain: \"\", attribute: [], doc_string: \"\" }" }
    

    Do you plan to support RandomNormalLike op?

    opened by stillonearth 23
  • BERT support

    BERT support

    Not too sure what specific operators the BERT architecture will require but:

    • https://github.com/onnx/models/tree/master/text/machine_comprehension/bert-squad use OneHot which is not implemented but easy to add (as an onnx primitive)
    • as reported here, https://github.com/snipsco/tract/issues/313#issuecomment-661937254 we may encounter ConstantOfShape with dynamic inputs
    opened by kali 22
  • Tree ensemble ONNX ML ops [WIP]

    Tree ensemble ONNX ML ops [WIP]

    @kali Opening this draft PR so as to collect some potential early feedback.

    The core generic tree ensemble engine seems to be working (what I've managed to test so far, I was actually super surprised that the tests vs lightgbm passed from the first try after I got it to compile, lol), now need to pin it to protobuf config, set up the rules, run a few basic benchmarks, wrap it in classifier/regressor types, etc. All features (including score post transforms except probit, different various comparison ops, inputnan handling etc) that can be provided in ONNX protobuf config are already supported here.

    It's definitely not implemented in the most efficient way now, but I think it shouldn't be too bad (although before there's benchmarks, I have no idea how much worse it would be than the existing lightgbm/xgboost c++ engines).

    opened by aldanor 22
  • WebAssembly support

    WebAssembly support

    Hi!

    At the moment, there is no easy way to reliably run ONNX models in the browser. ONNX.js exists but is apparently unmaintained and lacks support for important operations like Conv1d and LSTM.

    The alternative is Tensorflow.js which does not directly support ONNX so a model would have to be converted from ONNX to TF, then to a TFJS model, which does also not work at the moment (see https://github.com/onnx/onnx-tensorflow/issues/490).

    So there is a bit of a gap in the ecosystem there.

    That gap could be filled by compiling tract to WASM, and exposing a higher-level API (i. e. load and predict functions) to Javascript. WebGL support would of course be missing but that is out of scope.

    I did some prototyping today, and got the latest release (tract-onnx = "0.6.3") to work in the browser without any changes. So I think a JS wrapper would not be too hard to make.

    I'll start working on this in the next couple of days. Depending on how it goes and if there is interest on your side, this could be merged back into tract at a later point.

    It would be great if you could officially support compiling to WASM, and add WASM to the CI (e. g. the current master branch does not compile to WASM because of the memory maps from https://github.com/snipsco/tract/commit/99c622ad8279e676fde4485ab9b4db0e537418e4).

    Thanks, and please let me know what you think!

    opened by bminixhofer 21
  • Support TreeEnsembleClassifier op

    Support TreeEnsembleClassifier op

    (I'm aware that it overlaps somewhat with #56, but it's a bit more specific, hence opening it as a separate issue)

    Given that it's now officially possible to convert LightGBM (and xgboost) tree ensemble classifiers into ONNX, how realistic would it be to expect tract to support TreeEnsembleClassifier op in the foreseeable future? This would potentially be a huge feature, instantly unlocking whole universe of tree ensemble classifiers (and potentially regressors as well).

    // I'd be glad to help if there was some guidance on what to do and where, if needed; not quite sure how much of work it is to implement this since I'm not very familiar with the internals of tract.

    Thanks!

    opened by aldanor 21
  • Internal multithreading

    Internal multithreading

    Hi!

    From #326:

    tract does not make any effort to run a computation using multiple cores, but is safe to use in multiple threads. So you may get better results by calling run::() on several inputs (or several copies) from different thread (using a parallel iterator may do the trick).

    Are there any plans to support internal multithreading? Tract is already very fast. With internal multithreading it could possibly be faster than onnxruntime and ONNX.js*.

    *That is, if we can exploit multithreading in the browser, but there is already a working wasm-bindgen example with rayon so I'm confident we would get there.

    I'm not very familiar with parallelized implementations of neural nets but I think there are three major points where parallelization is possible:

    1. Slicing the input in chunks that get computed on different cores e. g. with batch sizes > 1.
    2. Computing different operators on different cores, each operator could start it's computation once all its inputs are computed.
    3. Internal parallelization of an operation, e. g. different convolution filters on different cores.

    Feel free to close this issue if this does not align with your Roadmap for tract.

    opened by bminixhofer 20
  • MobileNet ops not supported

    MobileNet ops not supported

    Hi

    I wanted to run the pretrained frozen .pb models from mobilenetv1 and mobilenetv2 with

    let tfd = ::tract_tensorflow::tensorflow().model_for_path(mobilenetv1_frozen).unwrap();
    let plan = ::tract::SimplePlan::new(&tfd).unwrap();
    let input = load_image(img);
    let outputs = plan.run(tvec![input]).unwrap();
    

    But for MobilenetV1 I get

    thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: TractError(Msg("Evaluating #13 \"MobilenetV1/MobilenetV1/Conv2d_0/Relu6\" Unimplemented(Relu6): unimplemented operation: Relu6"), State { next_error: None, backtrace: InternalBacktrace })', src/libcore/result.rs:997:5
    

    and for MobilenetV2

    thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: TractError(Msg("Node named MobilenetV2/Conv/BatchNorm/FusedBatchNorm not found"), State { next_error: None, backtrace: InternalBacktrace })', src/libcore/result.rs:997:5
    

    Any plan to support Relu6 or FusedBatchNorm? Would you be willing to point me where can I add those?

    opened by ehsanmok 20
  • DepthWise conv Inner loop f16 support

    DepthWise conv Inner loop f16 support

    https://github.com/Rikorose/DeepFilterNet/pull/211#issuecomment-1353637586

    Digging in a bit into why I was seeing so many f32/f16 conversions despite the A55 supporting fp16 storage and arithmetic, it seems like this is just a limitation of Rust’s f16 support.

    To fully take advantage of FP16, I think avoiding these conversions is necessary… though, I’m not sure what the best solution is…

    Maybe just rewriting the inner loop in assembly for f16 when the CPU says it supports f16?

    Overriding the operators in the half crate might work too.

    opened by VariantXYZ 22
  • Instructions on training new cost_models

    Instructions on training new cost_models

    It would be great to take advantage of the cost_model setup for arbitrary ARM CPUs (like the a57) and generate better cost models for then, but I’m not really sure on the procedure.

    Digging in a little, it looks like the cost_model binary gets run on the platform and then that data gets processed by the train script, which generates the file. Seems straightforward, but there’s a lot of parameters that I’m not really sure about…

    opened by VariantXYZ 18
  • Supporting ARM64 CPUs on systems without /proc/cpuinfo

    Supporting ARM64 CPUs on systems without /proc/cpuinfo

    Related to #847

    It is possible to override the detector with environment variables... It's not documented, I thought it was only making sens in qemu test contexts where detection is confused. See https://github.com/sonos/tract/blob/main/linalg/src/arm64.rs#L61/

    (I just happened to hit this issue since /proc/cpuinfo didn't exist)

    Which part of what getrandom does do you refer to ? The ability to register a generator at runtime ?

    Right, the ability to provide a fallback implementation for unsupported targets.

    https://docs.rs/getrandom/latest/getrandom/macro.register_custom_getrandom.html

    I'm not a huge fan of it though, as it involves modifying the crate to add the fallback implementation for a platform.

    The environment variable setup would work, but it would be more ideal to have this available during compile-time.

    I was thinking about using features, but ARM is a bit strange such that in-order/out-of-order execution is actually implied by the CPU name, which is a bit of a pain. Things like fp16, neon, etc... can be handled via this though (with the caveat that target_feature can only be applied to unsafe functions). Maybe the CPU ID could be provided via a regular feature?

    In a situation where a user is building on a platform that isn't windows/linux, I think it's not unreasonable to expect them to provide a .cargo/config.toml or something similar to define their setup better.

    opened by VariantXYZ 8
  • Unnecessary copy of inputs

    Unnecessary copy of inputs

    Hello,

    In the run method of Simpleplan, the inputs parameter is a TVec. Since the tensors eventually get converted into an Arc, it means each input tensor get copied once.

    Of course this doesn't matter most of the time because the input is small enough, but we have a use case in which the inputs would be huge and reused between differents runs.

    Therefore I was wondering if it is possible to allow the user to pass a TVec<Arc> to the run method, hence avoiding the later conversion. In the case you don't want to change the API, there is also the option of adding a new run_with_arcs method that would have the same effect.

    If you agree with one of the two possibilities, I can do the corresponding PR shortly as there is almost nothing to do.

    wdyt?

    opened by mbrunel 1
Releases(0.19.0-alpha.19)
Owner
Sonos, Inc.
The Wireless Hi-Fi system
Sonos, Inc.
A Demo server serving Bert through ONNX with GPU written in Rust with <3

Demo BERT ONNX server written in rust This demo showcase the use of onnxruntime-rs on BERT with a GPU on CUDA 11 served by actix-web and tokenized wit

Xavier Tao 28 Jan 1, 2023
Your one stop CLI for ONNX model analysis.

Your one stop CLI for ONNX model analysis. Featuring graph visualization, FLOP counts, memory metrics and more! ⚡️ Quick start First, download and ins

Christopher Fleetwood 20 Dec 30, 2022
🔭 interactively explore `onnx` networks in your CLI.

nnli Interactively explore onnx networks in your CLI. Get nnli ?? From Cargo cargo install nnli From Github git clone https://github.com/drbh/nnli.git

drbh 18 Nov 27, 2023
Tensors and differentiable operations (like TensorFlow) in Rust

autograd Differentiable operations and tensors backed by ndarray. Motivation Machine learning is one of the field where Rust lagging behind other lang

Ryo ASAKURA 403 Dec 25, 2022
Rust language bindings for TensorFlow

TensorFlow Rust provides idiomatic Rust language bindings for TensorFlow. Notice: This project is still under active development and not guaranteed to

null 4.1k Jan 1, 2023
Rust bindings for TensorFlow Lite

Rust bindings for TensorFlow Lite This crates provides TensorFlow Lite APIs. Please read the API documentation on docs.rs Using the interpreter from a

Boncheol Gu 84 Dec 11, 2022
An example of using TensorFlow rust bindings to serve trained machine learning models via Actix Web

Serving TensorFlow with Actix-Web This repository gives an example of training a machine learning model using TensorFlow2.0 Keras in python, exporting

Kyle Kosic 39 Dec 12, 2022
Orkhon: ML Inference Framework and Server Runtime

Orkhon: ML Inference Framework and Server Runtime Latest Release License Build Status Downloads Gitter What is it? Orkhon is Rust framework for Machin

Theo M. Bulut 129 Dec 21, 2022
Using OpenAI Codex's "davinci-edit" Model for Gradual Type Inference

OpenTau: Using OpenAI Codex for Gradual Type Inference Current implementation is focused on TypeScript Python implementation comes next Requirements r

Gamma Tau 11 Dec 18, 2022
pyke Diffusers is a modular Rust library for optimized Stable Diffusion inference 🔮

pyke Diffusers is a modular Rust library for pretrained diffusion model inference to generate images, videos, or audio, using ONNX Runtime as a backen

pyke 12 Jan 5, 2023
Rust+OpenCL+AVX2 implementation of LLaMA inference code

RLLaMA RLLaMA is a pure Rust implementation of LLaMA large language model inference.. Supported features Uses either f16 and f32 weights. LLaMA-7B, LL

Mikko Juola 344 Apr 16, 2023
`dfx new --type=rust` + burn-rs MNIST web inference example

ic-mnist The frontend provides a canvas where users can draw a digit. The drawn digit is then sent to the backend canister running burn-rs for inferen

Marcin Nowak-Liebiediew 4 Jun 25, 2023
Rust library for Self Organising Maps (SOM).

RusticSOM Rust library for Self Organising Maps (SOM). Using this Crate Add rusticsom as a dependency in Cargo.toml [dependencies] rusticsom = "1.1.0"

Avinash Shenoy 26 Oct 17, 2022
SelfOrgMap 5 Nov 4, 2020
A tiny embedding database in pure Rust.

tinyvector - a tiny embedding database in pure Rust ✨ Features Tiny: It's in the name. It's literally just an axum server. Extremely easy to customize

Miguel Piedrafita 210 Jul 12, 2023
Narwhal and Tusk A DAG-based Mempool and Efficient BFT Consensus.

This repo contains a prototype of Narwhal and Tusk. It supplements the paper Narwhal and Tusk: A DAG-based Mempool and Efficient BFT Consensus.

Facebook Research 134 Dec 8, 2022
MesaTEE GBDT-RS : a fast and secure GBDT library, supporting TEEs such as Intel SGX and ARM TrustZone

MesaTEE GBDT-RS : a fast and secure GBDT library, supporting TEEs such as Intel SGX and ARM TrustZone MesaTEE GBDT-RS is a gradient boost decision tre

MesaLock Linux 179 Nov 18, 2022
Ecosystem of libraries and tools for writing and executing extremely fast GPU code fully in Rust.

Ecosystem of libraries and tools for writing and executing extremely fast GPU code fully in Rust.

Riccardo D'Ambrosio 2.1k Jan 5, 2023
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

The Rust CUDA Project An ecosystem of libraries and tools for writing and executing extremely fast GPU code fully in Rust Guide | Getting Started | Fe

Rust GPU 2.1k Dec 30, 2022