A Voice Activity Detector rust library using the Silero VAD model.

Overview

Voice Activity Detector

Provides a model and extensions for detecting speech in audio.

Standalone Voice Activity Detector

This crate provides a standalone Voice Activity Detector (VAD) which can be used to predict speech in a chunk of audio. This implementation uses the Silero VAD.

The VAD predicts speech in a chunk of Linear Pulse Code Modulation (LPCM) encoded audio samples. These may be 8 or 16 bit integers or 32 bit floats.

The model is trained using chunk sizes of 256, 512, and 768 samples for an 8000 hz sample rate. It is trained using chunk sizes of 512, 768, 1024 samples for a 16,000 hz sample rate. These values are recommended for optimal performance, but are not required. The only requirement imposed by the underlying model is the sample rate must be no larger than 31.25 times the chunk size.

The samples passed to predict will be truncacted or padded if they are not of the correct length.

fn main() -> Result<(), voice_activity_detector::Error> {
    use voice_activity_detector::{VoiceActivityDetector};

    let chunk = vec![0i16; 512];
    let mut vad = VoiceActivityDetector::builder()
        .sample_rate(8000)
        .chunk_size(512usize)
        .build()?;
    let probability = vad.predict(chunk);
    println!("probability: {}", probability);

    Ok(())
}

Extensions

Some extensions have been added for dealing with streams of audio. These extensions have variants to work with both Iterators and Async Iterators (Streams) of audio samples. The Stream utilities are enabled as part of the async feature.

Predict Iterator/Stream

The PredictIterator and PredictStream work on an iterator/stream of samples, and return an iterator/stream containing a tuple of a chunk of audio and its probability of speech. Be sure to use the IteratorExt and StreamExt traits to bring the predict function on iterators into scope.

fn main() -> Result<(), voice_activity_detector::Error> {
    use voice_activity_detector::{IteratorExt, VoiceActivityDetector};

    let samples = [0i16; 512000];
    let vad = VoiceActivityDetector::builder()
        .sample_rate(8000)
        .chunk_size(512usize)
        .build()?;

    let probabilities = samples.into_iter().predict(vad);
    for (chunk, probability) in probabilities {
        if probability > 0.5 {
            println!("speech detected!");
        }
    }
    Ok(())
}

Label Iterator/Stream

The LabelIterator and LabelStream also work on an iterator/stream of samples. Rather than returning just the probability of speech for each chunk, these return labels of speech or non-speech. This helper allows adding additional padding to speech chunks to prevent sudden cutoffs of speech.

  • threshold: Value between 0.0 and 1.0. Probabilties greater than or equal to this value will be considered speech.
  • padding_chunks: Adds additional chunks to the start and end of speech chunks.
fn main() -> Result<(), voice_activity_detector::Error> {
    use voice_activity_detector::{LabeledAudio, IteratorExt, VoiceActivityDetector};

    let samples = [0i16; 51200];
    let vad = VoiceActivityDetector::builder()
        .sample_rate(8000)
        .chunk_size(512usize)
        .build()?;

    // This will label any audio chunks with a probability greater than 75% as speech,
    // and label the 3 additional chunks before and after these chunks as speech.
    let labels = samples.into_iter().label(vad, 0.75, 3);
    for label in labels {
        match label {
            LabeledAudio::Speech(_) => println!("speech detected!"),
            LabeledAudio::NonSpeech(_) => println!("non-speech detected!"),
        }
    }
    Ok(())
}

More Examples

Please see the tests directory for more examples.

Limitations

The voice activity detector and helper functions work only on mono-channel audio streams. If your use case involves multiple channels, you will need to split the channels and potentially interleave them again depending on your needs.

We have also currently not verified functionality with all platforms, here is what we tested:

Windows macOS Linux
🟢 🟢 🟢

🟢 = Available

🔵 = Currently in the works

🟡 = Currently not tested

🔴 = Not working currently (possible in the future)

Comments
  • Making the chunk size not constant for real-time audio purposes

    Making the chunk size not constant for real-time audio purposes

    Hello, sorry for giving you more work again and creating a new issue since I already made a PR today, but there is a feature missing in your crate that I would like to add by PR, it would be a pretty huge breaking change probably because it would require some huge changes in the code base.

    I would like to suggest making the chunk_size a value in the VoiceActivityDetector struct instead of a generic value since it is otherwise impossible to write code like this:

    let sample_rate: u32 = default_config.sample_rate().0;
    let buffer_size = sample_rate / 16;
    let _work_channels = 1; // Stereo doesn't work at the moment (will fix in the future or never)
    let mic_channels = default_config.channels();
    let config: StreamConfig = StreamConfig {
        channels: mic_channels,
        sample_rate: cpal::SampleRate(sample_rate),
        buffer_size: cpal::BufferSize::Fixed(buffer_size),
    };
    
    let mut vad = VoiceActivityDetector::<buffer_size>::try_with_sample_rate(
        default_config.sample_rate().0,
    )
    .expect("how dare you");
    

    The above code will error because buffer_size isn't constant, and there is no way to make it constant since it has to be computed every time this code runs. Since this code always gets a different sample rate, I would like to be able to compute the chunk_size based on the sample rate. If I just set a constant, it will error with different sample rates as well, just like when going with something as low as 256 as a constant for the chunk_size the code errors when passing in a sample_rate above 44100 (that's what I tested with). It would be cool to be able to compute this on the fly using something like I have shown in the example here. The only way to currently work around this is to have different recorders for different sample rates, meaning you write the same code like 5 times just to get this value to change. If there is another way that fixes this issue, please let me know, I'm kind of new to Rust, so I'm sorry if there's something basic I'm overlooking here.

    Thanks for your work and I hope we can get this resolved, Julian.

    opened by Unbreathable 6
  • Crate doesn't build (want to pull request)

    Crate doesn't build (want to pull request)

    Hello, thanks for this amazing project!

    I've just gotten started on developing a voice chat update for my app and thought this crate was a perfect fit. When importing it from Cargo, I got some errors though.

    error[E0599]: no method named `with_model_from_memory` found for struct `SessionBuilder` in the current scope
      --> voice_activity_detector\src\vad.rs:36:14
       |
    28 |           let session = Session::builder()
       |  _______________________-
    29 | |             .unwrap()
    30 | |             .with_optimization_level(GraphOptimizationLevel::Level3)
    31 | |             .unwrap()
    ...  |
    35 | |             .unwrap()
    36 | |             .with_model_from_memory(MODEL)
       | |             -^^^^^^^^^^^^^^^^^^^^^^ help: there is a method with a similar name: `commit_from_memory`
       | |_____________|
       | 
    
    error[E0599]: no method named `extract_tensor` found for reference `&Value` in the current scope
      --> voice_activity_detector\src\vad.rs:87:45
       |
    87 |         let hn = outputs.get("hn").unwrap().extract_tensor::<f32>().unwrap();
       |                                             ^^^^^^^^^^^^^^ help: there is a method with a similar name: `try_extract_tensor`
    
    error[E0599]: no method named `extract_tensor` found for reference `&Value` in the current scope
      --> voice_activity_detector\src\vad.rs:88:45
       |
    88 |         let cn = outputs.get("cn").unwrap().extract_tensor::<f32>().unwrap();
       |                                             ^^^^^^^^^^^^^^ help: there is a method with a similar name: `try_extract_tensor`
    
    error[E0599]: no method named `extract_tensor` found for reference `&Value` in the current scope
      --> voice_activity_detector\src\vad.rs:97:14
       |
    94 |           let output = outputs
       |  ______________________-
    95 | |             .get("output")
    96 | |             .unwrap()
    97 | |             .extract_tensor::<f32>()
       | |             -^^^^^^^^^^^^^^ help: there is a method with a similar name: `try_extract_tensor`
       | |_____________|
       | 
    
    For more information about this error, try `rustc --explain E0599`.
    error: could not compile `voice_activity_detector` (lib) due to 4 previous errors
    

    I've already fixed them for me, but I want to contribute them back to here if you want me to do so, I mean all you really have to do is listen to Cargo here and everything will be fixed.

    Let me know if I should make a pull requests, Julian.

    opened by Unbreathable 3
  • Add platform support table to README.md

    Add platform support table to README.md

    Hello, it's me again from yesterday. One of the first things that jumped at me when looking at this repository is that there is no supported platforms table. I currently tested the latest build of voice_activity_detector on macOS and Windows (doing Linux today) and thought I would already give you a draft of what I would like to add to the README.md. Again, this is just a draft so feel free to come in and change some things as I'm not familiar with how you want your README.md to look.

    Hope this gets accepted, Julian.

    opened by Unbreathable 2
  • feat: remove const generic chunk size; add vad builder

    feat: remove const generic chunk size; add vad builder

    Address Issue #9.

    • Removes all of the const generics for the chunk size
    • Replaces all of the arrays with Vecs
    • Adds a builder to the VAD in order to avoid any confusion with the order of arguments for sample rate and chunk size. Also provides flexibility to add additional configuration options in the future without a breaking change.

    The only real change to consumers is on construction of the VAD: What was this:

    let mut vad = VoiceActivityDetector::<512>::try_with_sample_rate(8000)?;
    

    will now be written as:

    let mut vad = VoiceActivityDetector::builder()
        .sample_rate(8000)
        .chunk_size(512usize)
        .build()?;
    
    opened by nkeenan38 0
  • draft: Add a different struct with a chunk_size value

    draft: Add a different struct with a chunk_size value

    This is what I described in #9 and what I made this morning. I've also updated some of the test cases since they were erroring for me when editing the project and not having the async feature enabled.

    opened by Unbreathable 0
  • chore: release v0.0.3

    chore: release v0.0.3

    🤖 New release

    • voice_activity_detector: 0.0.2 -> 0.0.3 (⚠️ API breaking changes)

    ⚠️ voice_activity_detector breaking changes

    --- failure derive_trait_impl_removed: built-in derived trait no longer implemented ---
    
    Description:
    A public type has stopped deriving one or more traits. This can break downstream code that depends on those types implementing those traits.
            ref: https://doc.rust-lang.org/reference/attributes/derive.html#derive
           impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.30.0/src/lints/derive_trait_impl_removed.ron
    
    Failed in:
      type LabeledAudio no longer derives Copy, in /tmp/.tmpW8mNA9/voice_activity_detector/src/label.rs:5
    
    --- failure inherent_method_missing: pub method removed or renamed ---
    
    Description:
    A publicly-visible method or associated fn is no longer available under its prior name. It may have been renamed or removed entirely.
            ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
           impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.30.0/src/lints/inherent_method_missing.ron
    
    Failed in:
      VoiceActivityDetector::try_with_sample_rate, previously in file /tmp/.tmpm78W1H/voice_activity_detector/src/vad.rs:19
      VoiceActivityDetector::with_session, previously in file /tmp/.tmpm78W1H/voice_activity_detector/src/vad.rs:45
      VoiceActivityDetector::predict_array, previously in file /tmp/.tmpm78W1H/voice_activity_detector/src/vad.rs:116
    
    Changelog

    0.0.3 - 2024-04-03

    Added

    • remove const generic chunk size; add vad builder (#12)

    Other

    • (deps) bump tokio from 1.36.0 to 1.37.0 (#7)


    This PR was generated with release-plz.

    opened by github-actions[bot] 0
  • chore(deps): bump tokio from 1.36.0 to 1.37.0

    chore(deps): bump tokio from 1.36.0 to 1.37.0

    Bumps tokio from 1.36.0 to 1.37.0.

    Release notes

    Sourced from tokio's releases.

    Tokio v1.37.0

    1.37.0 (March 28th, 2024)

    Added

    • fs: add set_max_buf_size to tokio::fs::File (#6411)
    • io: add try_new and try_with_interest to AsyncFd (#6345)
    • sync: add forget_permits method to semaphore (#6331)
    • sync: add is_closed, is_empty, and len to mpsc receivers (#6348)
    • sync: add a rwlock() method to owned RwLock guards (#6418)
    • sync: expose strong and weak counts of mpsc sender handles (#6405)
    • sync: implement Clone for watch::Sender (#6388)
    • task: add TaskLocalFuture::take_value (#6340)
    • task: implement FromIterator for JoinSet (#6300)

    Changed

    • io: make io::split use a mutex instead of a spinlock (#6403)

    Fixed

    • docs: fix docsrs build without net feature (#6360)
    • macros: allow select with only else branch (#6339)
    • runtime: fix leaking registration entries when os registration fails (#6329)

    Documented

    • io: document cancel safety of AsyncBufReadExt::fill_buf (#6431)
    • io: document cancel safety of AsyncReadExt's primitive read functions (#6337)
    • runtime: add doc link from Runtime to #[tokio::main] (#6366)
    • runtime: make the enter example deterministic (#6351)
    • sync: add Semaphore example for limiting the number of outgoing requests (#6419)
    • sync: fix missing period in broadcast docs (#6377)
    • sync: mark mpsc::Sender::downgrade with #[must_use] (#6326)
    • sync: reorder const_new before new_with (#6392)
    • sync: update watch channel docs (#6395)
    • task: fix documentation links (#6336)

    Changed (unstable)

    • runtime: include task Id in taskdumps (#6328)
    • runtime: panic if unhandled_panic is enabled when not supported (#6410)

    #6300: tokio-rs/tokio#6300 #6326: tokio-rs/tokio#6326 #6328: tokio-rs/tokio#6328 #6329: tokio-rs/tokio#6329 #6331: tokio-rs/tokio#6331 #6336: tokio-rs/tokio#6336 #6337: tokio-rs/tokio#6337

    ... (truncated)

    Commits
    • 9c337ca chore: prepare Tokio v1.37.0 (#6435)
    • e542501 io: document cancel safety of AsyncBufReadExt::fill_buf (#6431)
    • 4601c84 stream: add next_many and poll_next_many to StreamMap (#6409)
    • deff252 util: document cancel safety of SinkExt::send and StreamExt::next (#6417)
    • 4565b81 sync: add a rwlock() method to owned RwLock guards (#6418)
    • 3ce4720 sync: add is_closed, is_empty, and len to mpsc receivers (#6348)
    • 8342e4b util: assert compatibility between LengthDelimitedCodec options (#6414)
    • 4c453e9 readme: add description about benchmarks (#6425)
    • 1846483 sync: expose strong and weak counts of mpsc sender handles (#6405)
    • baad270 sync: add Semaphore example for limiting the number of outgoing requests (#6419)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies rust 
    opened by dependabot[bot] 0
  • chore: release v0.0.2

    chore: release v0.0.2

    🤖 New release

    • voice_activity_detector: 0.0.1 -> 0.0.2 (✓ API compatible changes)
    Changelog

    0.0.2 - 2024-04-01

    Fixed

    • update ort (#5)


    This PR was generated with release-plz.

    opened by github-actions[bot] 0
  • chore: release v0.0.1

    chore: release v0.0.1

    🤖 New release

    • voice_activity_detector: 0.0.0 -> 0.0.1 (✓ API compatible changes)
    Changelog

    0.0.1 - 2024-03-24

    Other

    • update github settings (#3)
    • release (#1)


    This PR was generated with release-plz.

    opened by github-actions[bot] 0
  • chore: release v0.0.4

    chore: release v0.0.4

    🤖 New release

    • voice_activity_detector: 0.0.3 -> 0.0.4 (✓ API compatible changes)
    Changelog

    0.0.4 - 2024-04-03

    Other

    • Add platform support table to README.md (#8)


    This PR was generated with release-plz.

    opened by github-actions[bot] 0
Releases(v0.0.3)
A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the loss function.

random_search A neural network model that can approximate any non-linear function by using the random search algorithm for the optimization of the los

ph04 2 Apr 1, 2022
Using OpenAI Codex's "davinci-edit" Model for Gradual Type Inference

OpenTau: Using OpenAI Codex for Gradual Type Inference Current implementation is focused on TypeScript Python implementation comes next Requirements r

Gamma Tau 11 Dec 18, 2022
Library for the Standoff Text Annotation Model, in Rust

STAM Library STAM is a data model for stand-off text annotation and described in detail here. This is a sofware library to work with the model, writte

annotation 3 Jan 11, 2023
m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies

Bayes' Witnesses 2.3k Dec 31, 2022
Docker for PyTorch rust bindings `tch`. Example of pretrain model.

tch-rs-pretrain-example-docker Docker for PyTorch rust bindings tch-rs. Example of pretrain model. Docker files support the following install libtorch

vaaaaanquish 5 Oct 7, 2022
Python+Rust implementation of the Probabilistic Principal Component Analysis model

Probabilistic Principal Component Analysis (PPCA) model This project implements a PPCA model implemented in Rust for Python using pyO3 and maturin. In

FindHotel 11 Dec 16, 2022
Experimenting with Rust's fundamental data model

ferrilab Redefining the Rust fundamental data model bitvec funty radium Introduction The ferrilab project is a collection of crates that provide more

Rusty Bit-Sequences 13 Dec 13, 2022
A rust implementation of the csl-next model.

Vision This is a project to write the CSL-Next typescript model and supporting libraries and tools in Rust, and convert to JSON Schema from there. At

Bruce D'Arcus 4 Jun 13, 2023
Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.

Cleora Cleora is a genus of moths in the family Geometridae. Their scientific name derives from the Ancient Greek geo γῆ or γαῖα "the earth", and metr

Synerise 405 Dec 20, 2022
Masked Language Model on Wasm

Masked Language Model on Wasm This project is for OPTiM TECH BLOG. Please see below: WebAssemblyを用いてBERTモデルをフロントエンドで動かす Demo Usage Build image docker

OPTiM Corporation 20 Sep 23, 2022
This is a rewrite of the RAMP (Rapid Assistance in Modelling the Pandemic) model

RAMP from scratch This is a rewrite of the RAMP (Rapid Assistance in Modelling the Pandemic) model, based on the EcoTwins-withCommuting branch, in Rus

Dustin Carlino 3 Oct 26, 2022
Your one stop CLI for ONNX model analysis.

Your one stop CLI for ONNX model analysis. Featuring graph visualization, FLOP counts, memory metrics and more! ⚡️ Quick start First, download and ins

Christopher Fleetwood 20 Dec 30, 2022
A demo repo that shows how to use the latest component model feature in wasmtime to implement a key-value capability defined in a WIT file.

Key-Value Component Demo This repo serves as an example of how to use the latest wasm runtime wasmtime and its component-model feature to build and ex

Jiaxiao Zhou 3 Dec 20, 2022
Believe in AI democratization. llama for nodejs backed by llama-rs, work locally on your laptop CPU. support llama/alpaca model.

llama-node Large Language Model LLaMA on node.js This project is in an early stage, the API for nodejs may change in the future, use it with caution.

Genkagaku.GPT 145 Apr 10, 2023
WebAssembly component model implementation for any backend.

wasm_component_layer wasm_component_layer is a runtime agnostic implementation of the WebAssembly component model. It supports loading and linking WAS

Douglas Dwyer 11 Aug 28, 2023
Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

ormsgpack ormsgpack is a fast msgpack library for Python. It is a fork/reboot of orjson It serializes faster than msgpack-python and deserializes a bi

Aviram Hassan 139 Dec 30, 2022
This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by using the ndarray library.

Kalman filter and RTS smoother in Rust (ndarray) This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by usin

SPDEs 3 Dec 1, 2022
Toy library for neural networks in Rust using Vulkan compute shaders

descent Toy library for neural networks in Rust using Vulkan compute shaders. Features Multi-dimensional arrays backed by Vulkan device memory Use Rus

Simon Brown 71 Dec 16, 2022
A real-time implementation of "Ray Tracing in One Weekend" using nannou and rust-gpu.

Real-time Ray Tracing with nannou & rust-gpu An attempt at a real-time implementation of "Ray Tracing in One Weekend" by Peter Shirley. This was a per

null 89 Dec 23, 2022