This project has moved upstream to the GStreamer Rust Plugins: https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/729

Vosk Speech Recognition GStreamer Plugin

Transcription of speech using Vosk Toolkit. Can be used to generate subtitles for movies, live streams, lectures and interviews.

Vosk is an offline open source speech recognition toolkit. It enables speech recognition for 20+ languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish, Japanese, Esperanto, Hindi. More to come.

https://github.com/alphacep/vosk-api

This GStreamer plugin was inspired by the work of @MathieuDuponchelle in the AwsTranscriber element.

Build

Compiling this project will provide a shared library that can be used by your local GStreamer installation.

cargo build --release

The compiled shared library ./target/release/libgstvosk.dylib must be made loadable to GStreamer. One possible solution is to use the argument --gst-plugin-path= pointing to the location where the library file is every time you run gst-launch-1.0 command line tool.

Example Usage

This plugin connects via websockets protocol to the Vosk Server. The easiest way to run the Vosk server is using Docker. You can run the server locally using this command:

docker run --rm --name vosk-server -d -p 2700:2700 alphacep/kaldi-en:latest

Running the recognition server as a separated process comes with the additional benefit that you don't need to install any special software. Plus the voice recognition work load is off your GStreamer pipeline process.

This example will just print out the raw text buffers that are published out by the Vosk transcriber:

gst-launch-1.0 \
  vosk_transcriber name=tc ! fakesink sync=true dump=true \
  uridecodebin uri=https://studio.blender.org/download-source/d1/d1f3b354a8f741c6afabf305489fa510/d1f3b354a8f741c6afabf305489fa510-1080p.mp4 ! audioconvert ! tc.

Hello! Giving this plugin a whirl and was curious about trying to acheive lower latency. When using default latency of 30s it seems to work fine. However if I go to anything lower the plugin runs for a bit, but then vosk eventually terminates the connection.

In browsing through the code it doesn't appear that the latency actually affects the configuration of the vosk server at all?

I've got debug logs if that might explain what's going on: https://gist.github.com/raytiley/c9f741093a78367a010606eba337a8ec

This is from the vosk server docker container.

INFO:root:Connection from ('172.17.0.1', 61632)

INFO:root:Config {'sample_rate': 48000, 'words': True}

LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.00932693 seconds in looped compilation.

ERROR:websockets.server:Error in connection handler

Traceback (most recent call last):

  File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 827, in transfer_data

    message = await self.read_message()

  File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 895, in read_message

    frame = await self.read_data_frame(max_size=self.max_size)

  File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 971, in read_data_frame

    frame = await self.read_frame(max_size)

  File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 1047, in read_frame

    frame = await Frame.read(

  File "/usr/lib/python3/dist-packages/websockets/framing.py", line 105, in read

    data = await reader(2)

  File "/usr/lib/python3.9/asyncio/streams.py", line 723, in readexactly

    await self._wait_for_data('readexactly')

  File "/usr/lib/python3.9/asyncio/streams.py", line 517, in _wait_for_data

    await self._waiter

asyncio.exceptions.CancelledError


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

  File "/usr/lib/python3/dist-packages/websockets/server.py", line 191, in handler

    await self.ws_handler(self, path)

  File "/opt/vosk-server/websocket/./asr_server.py", line 38, in recognize

    message = await websocket.recv()

  File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 509, in recv

    await self.ensure_open()

  File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 812, in ensure_open

    raise self.connection_closed_exc()

websockets.exceptions.ConnectionClosedError: code = 1006 (connection closed abnormally [internal]), no reason

The command i'm using is:

gst-launch-1.0 vosk_transcriber latency=5000 name=tc ! fakesink sync=true dump=true uridecodebin uri=file:///e:/content/dn.mp4 name=decode ! queue ! audioconvert ! tc.

A library for constructing Groth-Sahai proofs using pre-built wrappers

Groth-Sahai Wrappers A Rust library containing wrappers that facilitate the construction of non-interactive witness-indistinguishable and zero-knowled

1 Mar 7, 2022

ncspot is a ncurses Spotify client written in Rust using librespot.

ncspot is a ncurses Spotify client written in Rust using librespot. It is heavily inspired by ncurses MPD clients, such as ncmpc. My motivation was to provide a simple and resource friendly alternative to the official client as well as to support platforms that currently don't have a Spotify client, such as the *BSDs.

3.4k Jan 8, 2023

A crate using DeepSpeech bindings to convert mic audio from speech to text

DS-TRANSCRIBER Need an Offline Speech To Text converter? Records your mic, and returns a String containing what was said. Features Begins transcriptio

32 Oct 8, 2022

Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

294 Dec 23, 2022

🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.

SQLx 🧰 The Rust SQL Toolkit Install | Usage | Docs Built with ❤️ by The LaunchBadge team SQLx is an async, pure Rust† SQL crate featuring compile-tim

7.6k Dec 31, 2022

Low Latency Causes Disconnect

opened by raytiley 2

GStreamer plugin for speech to text using the Vosk Toolkit.

Related tags

Overview

This project has moved upstream to the GStreamer Rust Plugins: https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/729

Vosk Speech Recognition GStreamer Plugin

Build

Example Usage

You might also like...

A library for constructing Groth-Sahai proofs using pre-built wrappers

ncspot is a ncurses Spotify client written in Rust using librespot.

A crate using DeepSpeech bindings to convert mic audio from speech to text

Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

GStreamer HTTP Live Streaming Plugin

Speech-to-text lib for Melba Toast

WriteForAll is a text file style checker, that compares text documents with editorial tips to make text better.

This crate bridges between gstreamer and tracing ecosystems.

Connect GStreamer pipelines to Jitsi Meet conferences

Rust speech synth

All-batteries included GStreamer WebRTC producer

murasaki: Nostr to Speech (in Japanese)

Higher-level toolkit for MSDF text rendering

An implementation of Piet's text interface using cosmic-text

Plugin to request a relaunch when uploading a Skyline plugin through cargo skyline

Bevy plugin for a simple single-line text input widget.

Text Expression Runner – Readable and easy to use text expressions

Freeze.rs is a payload toolkit for bypassing EDRs using suspended processes, direct syscalls written in RUST

🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.

Comments

Low Latency Causes Disconnect

Owner

Rafael Carício

Selim – a real-time musical score follower toolkit

Simple examples to demonstrate full-stack Rust audio plugin dev with baseplug and iced_audio

A low-level windowing system geared towards making audio plugin UIs.

MVC audio plugin framework for rust

MIDI-controlled stereo-preserving granular-synthesizer LV2 plugin

API-agnostic audio plugin framework written in Rust

🎹 Simple MIDI note repeater plugin (VST3/CLAP).

(VST3/CLAP) A wonky distortion plugin :3

6 operator FM synthesizer. VST3/CLAP plugin.

A Skyline plugin for Super Smash Bros. Ultimate that enables the use and modification of stage features that are otherwise hardcoded into the game.