a universal meta-transliterator that can decipher arbitrary encoding schemas, built in pure Rust

Catherine Koshka

Last update: Dec 21, 2022

Related tags

Command-line transliterati

Overview

transliterati

a universal meta-transliterator that can decipher arbitrary encoding schemas, built in pure Rust

what does it do?

You give it this:

Барлығына сенімді және тиімді бағдарламалық жасақтаманы құруға мүмкіндік беретін тіл. Ол өте жылдам және жадты үнемдейді: жұмыс уақыты немесе қоқыс жинағышсыз ол өнімділігі маңызды қызметтерді қуаттай алады, ендірілген құрылғыларда жұмыс істей алады және басқа тілдермен оңай біріктіре алады. тың тамаша құжаттамалары, пайдалы қате туралы хабарлары бар ыңғайлы компиляторы және жоғары деңгейлі құралдары — біріктірілген пакет менеджері және құрастыру құралы, автоматты аяқтау және типті тексерулері бар смарт мульти-редакторды қолдау, автоматты пішімдеу және т.б. бар.

And this:

Barlığına senimdi jäne tïimdi bağdarlamalıq jasaqtamanı qurwğa mümkindik beretin til. Ol öte jıldam jäne jadtı ünemdeydi: jumıs waqıtı nemese qoqıs jïnağışsız ol önimdiligi mañızdı qızmetterdi qwattay aladı, endirilgen qurılğılarda jumıs istey aladı jäne basqa tildermen oñay biriktire aladı. tıñ tamaşa qujattamaları, paydalı qate twralı xabarları bar ıñğaylı kompïlyatorı jäne joğarı deñgeyli quraldarı — biriktirilgen paket menedjeri jäne qurastırw quralı, avtomattı ayaqtaw jäne tïpti tekserwleri bar smart mwltï-redaktordı qoldaw, avtomattı pişimdew

And it gives you this:

{
  etc...
  "ал": "al",
  "ар": "ar",
  "б": "e",
  "в": "ü",
  "г": "g",
  "д": "d",
  "ді": "di",
  ...etc
}

Except it works for any transliteration schema in any language. Here I just used a single paragraph, but the longer, the better.

how fast is it?

The longest newline-separated paragraph constrains its speed, since everything is executed in parallel. Generally it takes between 15ms and 600ms.

how accurate is it?

It seems to be a matter of:

How much data do you have? The more the better.
Is the orthography between the two transliterated pairs is a 1:1 match? Russian is close to perfect even for as little as 14 words, Japanese is only 75% accurate even with 1000 because of the mix of writing systems.
Are they completely different writing systems? If you pair a logographic language like Chinese with phonetic pinyin, you will need a godawful amount of data. That's pretty much it.

how do I use it?

transliterati file1.txt file2.txt 200

Where 200 is the minimum vocab size, if you're really sure you know what you're doing. I think you might have to clone and build it from source since I just learned Rust a week ago and I'm not confident enough with cargo yet.

Tips:

If you have a long text, chunk it evenly into pieces if you know where the boundaries are. The longer the chunks are, the longer it will take. The number of chunks doesn't really matter. Make sure there aren't any blank lines.
Play around with the vocab size if you're getting weird results

You might also like...

🗽 Universal Node Package Manager

🗽 NY Universal Node Package Manager node • yarn • pnpm Features Universal - Picks the right package manager for you based on the lockfile in your fol

46 Oct 12, 2023

Traversal of tree-sitter Trees and any arbitrary tree with a TreeCursor-like interface

tree-sitter-traversal Traversal of tree-sitter Trees and any arbitrary tree with a TreeCursor-like interface. Using cursors, iteration over the tree c

12 Jan 8, 2023

Abuse the node.js inspector mechanism in order to force any node.js/electron/v8 based process to execute arbitrary javascript code.

jscythe abuses the node.js inspector mechanism in order to force any node.js/electron/v8 based process to execute arbitrary javascript code, even if t

301 Jan 4, 2023

A PoC for the CVE-2022-44268 - ImageMagick arbitrary file read

CVE-2022-44268 Arbitrary File Read PoC - PNG generator This is a proof of concept of the ImageMagick bug discovered by https://www.metabaseq.com/image

100 Feb 19, 2023

AI-TOML Workflow Specification (aiTWS), a comprehensive and flexible specification for defining arbitrary Ai centric workflows.

AI-TOML Workflow Specification (aiTWS) The AI-TOML Workflow Specification (aiTWS) is a flexible and extensible specification for defining arbitrary wo

20 Apr 8, 2023

Blazingly fast interpolated LUT generator and applicator for arbitrary and popular color palettes.

lutgen-rs A blazingly fast interpolated LUT generator and applicator for arbitrary and popular color palettes. Theme any image to your dekstop colorsc

12 Jun 16, 2023

Encode and decode dynamically constructed values of arbitrary shapes to/from SCALE bytes

scale-value · This crate provides a Value type, which is a runtime representation that is compatible with scale_info::TypeDef. It somewhat analogous t

15 Jun 24, 2023

A library that allows for the arbitrary inspection and manipulation of the memory and code of a process on a Linux system.

raminspect raminspect is a crate that allows for the inspection and manipulation of the memory and code of a running process on a Linux system. It pro

24 Sep 26, 2023

A fast bump allocator that supports allocation scopes / checkpoints. Aka an arena for values of arbitrary types.

bump-scope A fast bump allocator that supports allocation scopes / checkpoints. Aka an arena for values of arbitrary types. What is bump allocation? A

7 May 4, 2024

a universal meta-transliterator that can decipher arbitrary encoding schemas, built in pure Rust

Related tags

Overview

transliterati

what does it do?

how fast is it?

how accurate is it?

how do I use it?

Tips:

You might also like...

🗽 Universal Node Package Manager

Traversal of tree-sitter Trees and any arbitrary tree with a TreeCursor-like interface

Abuse the node.js inspector mechanism in order to force any node.js/electron/v8 based process to execute arbitrary javascript code.

A PoC for the CVE-2022-44268 - ImageMagick arbitrary file read

AI-TOML Workflow Specification (aiTWS), a comprehensive and flexible specification for defining arbitrary Ai centric workflows.

Blazingly fast interpolated LUT generator and applicator for arbitrary and popular color palettes.

Encode and decode dynamically constructed values of arbitrary shapes to/from SCALE bytes

A library that allows for the arbitrary inspection and manipulation of the memory and code of a process on a Linux system.

A fast bump allocator that supports allocation scopes / checkpoints. Aka an arena for values of arbitrary types.

Releases(transliteration)

Owner

Catherine Koshka

Create, manage and deploy p2panda schemas

Databento Binary Encoding (DBZ) - Fast message encoding and storage format for market data

Animated app icons in your Dock that can run an arbitrary shell script when clicked.

🐱 HQ9C is a very serioues compiler for HQ9+, it meta-compiles with Rust.

🚧 Meta Programming language automating multilang communications in a smart way

Meta framework. Support for dynamic plug-ins and AOP

A universal load testing framework for Rust, with real-time tui support.

Wikit - A universal dictionary

Universal Windows library for discovering common render engines functions. Supports DirectX9 (D3D9), DirectX10 (D3D10), DirectX11 (D3D11), DirectX12 (D3D12).

ABQ is a universal test runner that runs test suites in parallel. It’s the best tool for splitting test suites into parallel jobs locally or on CI