GStreamer plugin for speech to text using the Vosk Toolkit.

Overview

This project has moved upstream to the GStreamer Rust Plugins: https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/729

Vosk Speech Recognition GStreamer Plugin

Transcription of speech using Vosk Toolkit. Can be used to generate subtitles for movies, live streams, lectures and interviews.

Vosk is an offline open source speech recognition toolkit. It enables speech recognition for 20+ languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish, Japanese, Esperanto, Hindi. More to come.

https://github.com/alphacep/vosk-api

This GStreamer plugin was inspired by the work of @MathieuDuponchelle in the AwsTranscriber element.

Build

Compiling this project will provide a shared library that can be used by your local GStreamer installation.

cargo build --release

The compiled shared library ./target/release/libgstvosk.dylib must be made loadable to GStreamer. One possible solution is to use the argument --gst-plugin-path= pointing to the location where the library file is every time you run gst-launch-1.0 command line tool.

Example Usage

This plugin connects via websockets protocol to the Vosk Server. The easiest way to run the Vosk server is using Docker. You can run the server locally using this command:

docker run --rm --name vosk-server -d -p 2700:2700 alphacep/kaldi-en:latest

Running the recognition server as a separated process comes with the additional benefit that you don't need to install any special software. Plus the voice recognition work load is off your GStreamer pipeline process.

This example will just print out the raw text buffers that are published out by the Vosk transcriber:

gst-launch-1.0 \
  vosk_transcriber name=tc ! fakesink sync=true dump=true \
  uridecodebin uri=https://studio.blender.org/download-source/d1/d1f3b354a8f741c6afabf305489fa510/d1f3b354a8f741c6afabf305489fa510-1080p.mp4 ! audioconvert ! tc.
You might also like...
A library for constructing Groth-Sahai proofs using pre-built wrappers

Groth-Sahai Wrappers A Rust library containing wrappers that facilitate the construction of non-interactive witness-indistinguishable and zero-knowled

ncspot is a ncurses Spotify client written in Rust using librespot.
ncspot is a ncurses Spotify client written in Rust using librespot.

ncspot is a ncurses Spotify client written in Rust using librespot. It is heavily inspired by ncurses MPD clients, such as ncmpc. My motivation was to provide a simple and resource friendly alternative to the official client as well as to support platforms that currently don't have a Spotify client, such as the *BSDs.

A crate using DeepSpeech bindings to convert mic audio from speech to text

DS-TRANSCRIBER Need an Offline Speech To Text converter? Records your mic, and returns a String containing what was said. Features Begins transcriptio

Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

GStreamer HTTP Live Streaming Plugin

A highly configurable GStreamer HLS sink plugin. Based on the hlssink2 element. The flexhlssink is written in Rust and has various options to configure the HLS output playlist generation.

Speech-to-text lib for Melba Toast

Melba-stt A rust discord bot that joins a voice channel and transcribes spoken audio from each user. Running Install the rust toolchain With CUDA and

WriteForAll is a text file style checker, that compares text documents with editorial tips to make text better.

WriteForAll: tips to make text better WriteForAll is a text file style checker, that compares text documents with editorial tips to make text better.

This crate bridges between gstreamer and tracing ecosystems.

This crate provides a bridge between gstreamer and the tracing ecosystem. The goal is to allow Rust applications utilizing GStreamer to better integra

Connect GStreamer pipelines to Jitsi Meet conferences

gst-meet: Integrate Jitsi Meet conferences with GStreamer pipelines Note: gst-meet is in an alpha state and is under active development. The command-l

Rust speech synth

Grail-rs (Work in progress) Grail, A simple formant speech synthesizer, built for portability This is the rust version The goal of this synthesizer is

All-batteries included GStreamer WebRTC producer

webrtcsink All-batteries included GStreamer WebRTC producer, that tries its best to do The Right Thing™. Use case The webrtcbin element in GStreamer i

murasaki: Nostr to Speech (in Japanese)

murasaki: Nostr to Speech ⚠ このソフトウェアはα版です ⚠ VOICEVOX を利用したタイムライン読み上げツールです。 指定したリレーのグローバルタイムライン、または指定した公開鍵でフォローしているユーザのタイムラインを読み上げます。 つかいかた Rust をインストー

Higher-level toolkit for MSDF text rendering

MSDF Toolkit Higher-level toolkit for MSDF text rendering About MSDF - an abbreviation of Multi-channel Signed Distance Field. In short, an efficient

An implementation of Piet's text interface using cosmic-text

piet-cosmic-text Implements piet's Text interface using the cosmic-text crate. License piet-cosmic-text is free software: you can redistribute it and/

Plugin to request a relaunch when uploading a Skyline plugin through cargo skyline

restart-plugin A skyline plugin for allowing cargo-skyline (or other tools) to restart your game without you having to touch your controller. Install

Bevy plugin for a simple single-line text input widget.
Bevy plugin for a simple single-line text input widget.

bevy_simple_text_input An unambitious single-line text input widget for bevy_ui. Usage See examples/basic.rs. Alternatives If you need more features,

Text Expression Runner – Readable and easy to use text expressions
Text Expression Runner – Readable and easy to use text expressions

ter - Text Expression Runner ter is a cli to run text expressions and perform basic text operations such as filtering, ignoring and replacing on the c

Freeze.rs is a payload toolkit for bypassing EDRs using suspended processes, direct syscalls written in RUST
Freeze.rs is a payload toolkit for bypassing EDRs using suspended processes, direct syscalls written in RUST

Freeze.rs More Information If you want to learn more about the techniques utilized in this framework, please take a look at SourceZero Blog and the or

🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.

SQLx 🧰 The Rust SQL Toolkit Install | Usage | Docs Built with ❤️ by The LaunchBadge team SQLx is an async, pure Rust† SQL crate featuring compile-tim

Comments
  • Low Latency Causes Disconnect

    Low Latency Causes Disconnect

    Hello! Giving this plugin a whirl and was curious about trying to acheive lower latency. When using default latency of 30s it seems to work fine. However if I go to anything lower the plugin runs for a bit, but then vosk eventually terminates the connection.

    In browsing through the code it doesn't appear that the latency actually affects the configuration of the vosk server at all?

    I've got debug logs if that might explain what's going on: https://gist.github.com/raytiley/c9f741093a78367a010606eba337a8ec

    This is from the vosk server docker container.

    INFO:root:Connection from ('172.17.0.1', 61632)
    
    INFO:root:Config {'sample_rate': 48000, 'words': True}
    
    LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.00932693 seconds in looped compilation.
    
    ERROR:websockets.server:Error in connection handler
    
    Traceback (most recent call last):
    
      File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 827, in transfer_data
    
        message = await self.read_message()
    
      File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 895, in read_message
    
        frame = await self.read_data_frame(max_size=self.max_size)
    
      File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 971, in read_data_frame
    
        frame = await self.read_frame(max_size)
    
      File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 1047, in read_frame
    
        frame = await Frame.read(
    
      File "/usr/lib/python3/dist-packages/websockets/framing.py", line 105, in read
    
        data = await reader(2)
    
      File "/usr/lib/python3.9/asyncio/streams.py", line 723, in readexactly
    
        await self._wait_for_data('readexactly')
    
      File "/usr/lib/python3.9/asyncio/streams.py", line 517, in _wait_for_data
    
        await self._waiter
    
    asyncio.exceptions.CancelledError
    
    
    The above exception was the direct cause of the following exception:
    
    
    Traceback (most recent call last):
    
      File "/usr/lib/python3/dist-packages/websockets/server.py", line 191, in handler
    
        await self.ws_handler(self, path)
    
      File "/opt/vosk-server/websocket/./asr_server.py", line 38, in recognize
    
        message = await websocket.recv()
    
      File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 509, in recv
    
        await self.ensure_open()
    
      File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 812, in ensure_open
    
        raise self.connection_closed_exc()
    
    websockets.exceptions.ConnectionClosedError: code = 1006 (connection closed abnormally [internal]), no reason
    

    The command i'm using is:

    gst-launch-1.0 vosk_transcriber latency=5000 name=tc ! fakesink sync=true dump=true uridecodebin uri=file:///e:/content/dn.mp4 name=decode ! queue ! audioconvert ! tc.

    opened by raytiley 2
Owner
Rafael Carício
I like writing code. #StandWithUkraine
Rafael Carício
Selim – a real-time musical score follower toolkit

Selim – a real-time musical score follower toolkit You can provide Selim with a MIDI file (or text input with millisecond timestamps and integers for

Antti Kaihola 1 Dec 30, 2021
Simple examples to demonstrate full-stack Rust audio plugin dev with baseplug and iced_audio

iced baseplug examples Simple examples to demonstrate full-stack Rust audio plugin dev with baseplug and iced_audio WIP (The GUI knobs do nothing curr

Billy Messenger 10 Sep 12, 2022
A low-level windowing system geared towards making audio plugin UIs.

baseview A low-level windowing system geared towards making audio plugin UIs. baseview abstracts the platform-specific windowing APIs (winapi, cocoa,

null 155 Dec 30, 2022
MVC audio plugin framework for rust

__ __ | |--.---.-.-----.-----.-----.| |.--.--.-----. | _ | _ |__ --| -__| _ || || | | _ | |

william light 93 Dec 23, 2022
MIDI-controlled stereo-preserving granular-synthesizer LV2 plugin

Stereog "Stereog" rhymes with "hairy dog." Stereog is a MIDI-controlled stereo-preserving granular synthesizer LV2 plugin. It is experimental software

Ed Cashin 6 Jun 3, 2022
API-agnostic audio plugin framework written in Rust

Because everything is better when you do it yourself - Rust VST3 and CLAP framework and plugins

Robbert van der Helm 415 Dec 27, 2022
🎹 Simple MIDI note repeater plugin (VST3/CLAP).

⏱️ Clockwork A simple MIDI note repeater plugin, written in Rust. ?? Showcase: (turn on audio) clockwork-showcase.mp4 ?? Manual: The user manual can b

Alexander Weichart 13 Nov 30, 2022
(VST3/CLAP) A wonky distortion plugin :3

Penare A free wonky distortion plugin :3 Installing For Windows: Get the lastest release from the here Unzip and find the .vst3 or .clap file inside a

azur 5 Aug 8, 2023
6 operator FM synthesizer. VST3/CLAP plugin.

Foam 6 operator FM synth with a cross-oscillator modulation matrix, available in VST3 and CLAP plugin formats. Open source under GPLv3. In development

null 4 Sep 4, 2023
A Skyline plugin for Super Smash Bros. Ultimate that enables the use and modification of stage features that are otherwise hardcoded into the game.

stage_config A Skyline plugin for Super Smash Bros. Ultimate that enables the use and modification of stage features that are otherwise hardcoded into

TNN 4 Oct 14, 2023