GStreamer plugin for speech to text using the Vosk Toolkit.

Overview

This project has moved upstream to the GStreamer Rust Plugins: https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/729

Vosk Speech Recognition GStreamer Plugin

Transcription of speech using Vosk Toolkit. Can be used to generate subtitles for movies, live streams, lectures and interviews.

Vosk is an offline open source speech recognition toolkit. It enables speech recognition for 20+ languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish, Japanese, Esperanto, Hindi. More to come.

https://github.com/alphacep/vosk-api

This GStreamer plugin was inspired by the work of @MathieuDuponchelle in the AwsTranscriber element.

Build

Compiling this project will provide a shared library that can be used by your local GStreamer installation.

cargo build --release

The compiled shared library ./target/release/libgstvosk.dylib must be made loadable to GStreamer. One possible solution is to use the argument --gst-plugin-path= pointing to the location where the library file is every time you run gst-launch-1.0 command line tool.

Example Usage

This plugin connects via websockets protocol to the Vosk Server. The easiest way to run the Vosk server is using Docker. You can run the server locally using this command:

docker run --rm --name vosk-server -d -p 2700:2700 alphacep/kaldi-en:latest

Running the recognition server as a separated process comes with the additional benefit that you don't need to install any special software. Plus the voice recognition work load is off your GStreamer pipeline process.

This example will just print out the raw text buffers that are published out by the Vosk transcriber:

gst-launch-1.0 \
  vosk_transcriber name=tc ! fakesink sync=true dump=true \
  uridecodebin uri=https://studio.blender.org/download-source/d1/d1f3b354a8f741c6afabf305489fa510/d1f3b354a8f741c6afabf305489fa510-1080p.mp4 ! audioconvert ! tc.
You might also like...
Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

GStreamer HTTP Live Streaming Plugin

A highly configurable GStreamer HLS sink plugin. Based on the hlssink2 element. The flexhlssink is written in Rust and has various options to configure the HLS output playlist generation.

Rust speech synth

Grail-rs (Work in progress) Grail, A simple formant speech synthesizer, built for portability This is the rust version The goal of this synthesizer is

WriteForAll is a text file style checker, that compares text documents with editorial tips to make text better.

WriteForAll: tips to make text better WriteForAll is a text file style checker, that compares text documents with editorial tips to make text better.

This crate bridges between gstreamer and tracing ecosystems.

This crate provides a bridge between gstreamer and the tracing ecosystem. The goal is to allow Rust applications utilizing GStreamer to better integra

Connect GStreamer pipelines to Jitsi Meet conferences

gst-meet: Integrate Jitsi Meet conferences with GStreamer pipelines Note: gst-meet is in an alpha state and is under active development. The command-l

All-batteries included GStreamer WebRTC producer

webrtcsink All-batteries included GStreamer WebRTC producer, that tries its best to do The Right Thing™. Use case The webrtcbin element in GStreamer i

Higher-level toolkit for MSDF text rendering

MSDF Toolkit Higher-level toolkit for MSDF text rendering About MSDF - an abbreviation of Multi-channel Signed Distance Field. In short, an efficient

Plugin to request a relaunch when uploading a Skyline plugin through cargo skyline

restart-plugin A skyline plugin for allowing cargo-skyline (or other tools) to restart your game without you having to touch your controller. Install

Text Expression Runner – Readable and easy to use text expressions
Text Expression Runner – Readable and easy to use text expressions

ter - Text Expression Runner ter is a cli to run text expressions and perform basic text operations such as filtering, ignoring and replacing on the c

Plugin for macro-, mini-quad (quads) to save data in simple local storage using Web Storage API in WASM and local file on a native platforms.

quad-storage This is the crate to save data in persistent local storage in miniquad/macroquad environment. In WASM the data persists even if tab or br

Julia plugin for the Lapce Editor (using LanguageServer.jl)

Lapce Julia (LanguageServer) Lapce plugin for the Julia language powered by LanguageServer.jl Pre-requisites Make sure you have the julia binary on PA

Text Renderer written in Rust using HarfBuzz for shaping, FreeType for rasterization and OpenGL for rendering.
Text Renderer written in Rust using HarfBuzz for shaping, FreeType for rasterization and OpenGL for rendering.

Provok Text Renderer written in Rust using HarfBuzz for shaping, FreeType for rasterization and OpenGL for rendering. Input Provok is fed with a JSON

A tool that generates a Sublime Text project file that helps you get started using Scoggle.

README A tool that generates a Sublime Text project file that helps you get started using Scoggle. While Scoggle-Gen may not find every single source

Answering the question nobody asked: what if you wanted to text your friends using only ARP?
Answering the question nobody asked: what if you wanted to text your friends using only ARP?

arpchat so... you know arp? the protocol your computer uses to find the mac addresses of other computers on your network? yeah. that. i thought it wou

🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.

SQLx 🧰 The Rust SQL Toolkit Install | Usage | Docs Built with ❤️ by The LaunchBadge team SQLx is an async, pure Rust† SQL crate featuring compile-tim

Rust bindings for libtcod 1.6.3 (the Doryen library/roguelike toolkit)

Warning: Not Maintained This project is no longer actively developed or maintained. Please accept our apologies. Open pull requests may still get merg

A data-first Rust-native UI design toolkit.
A data-first Rust-native UI design toolkit.

Druid A data-first Rust-native UI toolkit. Druid is an experimental Rust-native UI toolkit. Its main goal is to offer a polished user experience. Ther

The Rust UI-Toolkit.
The Rust UI-Toolkit.

The Orbital Widget Toolkit is a cross-platform (G)UI toolkit for building scalable user interfaces with the programming language Rust. It's based on t

Comments
  • Low Latency Causes Disconnect

    Low Latency Causes Disconnect

    Hello! Giving this plugin a whirl and was curious about trying to acheive lower latency. When using default latency of 30s it seems to work fine. However if I go to anything lower the plugin runs for a bit, but then vosk eventually terminates the connection.

    In browsing through the code it doesn't appear that the latency actually affects the configuration of the vosk server at all?

    I've got debug logs if that might explain what's going on: https://gist.github.com/raytiley/c9f741093a78367a010606eba337a8ec

    This is from the vosk server docker container.

    INFO:root:Connection from ('172.17.0.1', 61632)
    
    INFO:root:Config {'sample_rate': 48000, 'words': True}
    
    LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.00932693 seconds in looped compilation.
    
    ERROR:websockets.server:Error in connection handler
    
    Traceback (most recent call last):
    
      File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 827, in transfer_data
    
        message = await self.read_message()
    
      File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 895, in read_message
    
        frame = await self.read_data_frame(max_size=self.max_size)
    
      File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 971, in read_data_frame
    
        frame = await self.read_frame(max_size)
    
      File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 1047, in read_frame
    
        frame = await Frame.read(
    
      File "/usr/lib/python3/dist-packages/websockets/framing.py", line 105, in read
    
        data = await reader(2)
    
      File "/usr/lib/python3.9/asyncio/streams.py", line 723, in readexactly
    
        await self._wait_for_data('readexactly')
    
      File "/usr/lib/python3.9/asyncio/streams.py", line 517, in _wait_for_data
    
        await self._waiter
    
    asyncio.exceptions.CancelledError
    
    
    The above exception was the direct cause of the following exception:
    
    
    Traceback (most recent call last):
    
      File "/usr/lib/python3/dist-packages/websockets/server.py", line 191, in handler
    
        await self.ws_handler(self, path)
    
      File "/opt/vosk-server/websocket/./asr_server.py", line 38, in recognize
    
        message = await websocket.recv()
    
      File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 509, in recv
    
        await self.ensure_open()
    
      File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 812, in ensure_open
    
        raise self.connection_closed_exc()
    
    websockets.exceptions.ConnectionClosedError: code = 1006 (connection closed abnormally [internal]), no reason
    

    The command i'm using is:

    gst-launch-1.0 vosk_transcriber latency=5000 name=tc ! fakesink sync=true dump=true uridecodebin uri=file:///e:/content/dn.mp4 name=decode ! queue ! audioconvert ! tc.

    opened by raytiley 2
Owner
Rafael Carício
I like writing code. #StandWithUkraine
Rafael Carício
Selim – a real-time musical score follower toolkit

Selim – a real-time musical score follower toolkit You can provide Selim with a MIDI file (or text input with millisecond timestamps and integers for

Antti Kaihola 1 Dec 30, 2021
MVC audio plugin framework for rust

__ __ | |--.---.-.-----.-----.-----.| |.--.--.-----. | _ | _ |__ --| -__| _ || || | | _ | |

william light 93 Dec 23, 2022
Simple examples to demonstrate full-stack Rust audio plugin dev with baseplug and iced_audio

iced baseplug examples Simple examples to demonstrate full-stack Rust audio plugin dev with baseplug and iced_audio WIP (The GUI knobs do nothing curr

Billy Messenger 10 Sep 12, 2022
A low-level windowing system geared towards making audio plugin UIs.

baseview A low-level windowing system geared towards making audio plugin UIs. baseview abstracts the platform-specific windowing APIs (winapi, cocoa,

null 155 Dec 30, 2022
MIDI-controlled stereo-preserving granular-synthesizer LV2 plugin

Stereog "Stereog" rhymes with "hairy dog." Stereog is a MIDI-controlled stereo-preserving granular synthesizer LV2 plugin. It is experimental software

Ed Cashin 6 Jun 3, 2022
API-agnostic audio plugin framework written in Rust

Because everything is better when you do it yourself - Rust VST3 and CLAP framework and plugins

Robbert van der Helm 415 Dec 27, 2022
🎹 Simple MIDI note repeater plugin (VST3/CLAP).

⏱️ Clockwork A simple MIDI note repeater plugin, written in Rust. ?? Showcase: (turn on audio) clockwork-showcase.mp4 ?? Manual: The user manual can b

Alexander Weichart 13 Nov 30, 2022
A library for constructing Groth-Sahai proofs using pre-built wrappers

Groth-Sahai Wrappers A Rust library containing wrappers that facilitate the construction of non-interactive witness-indistinguishable and zero-knowled

Jacob White 1 Mar 7, 2022
ncspot is a ncurses Spotify client written in Rust using librespot.

ncspot is a ncurses Spotify client written in Rust using librespot. It is heavily inspired by ncurses MPD clients, such as ncmpc. My motivation was to provide a simple and resource friendly alternative to the official client as well as to support platforms that currently don't have a Spotify client, such as the *BSDs.

Henrik Friedrichsen 3.4k Jan 8, 2023
A crate using DeepSpeech bindings to convert mic audio from speech to text

DS-TRANSCRIBER Need an Offline Speech To Text converter? Records your mic, and returns a String containing what was said. Features Begins transcriptio

null 32 Oct 8, 2022