Encode/Decode bytes as emoji base2048

Overview

mojibake

Encode and decode arbitrary bytes as a sequence of emoji optimized to produce the smallest number of graphemes.

Description

This is not a space efficient library.

Generally services(Twitter,Mastodon,etc) will restrict the number of characters you're allowed to submit based on the grapheme count, not the literal character count. Singular emoji graphemes often consist of multi byte sequences that include multiple characters.

Therefore, if you can encode more data in a smaller number of graphemes, you can transmit more information while also having far more bytes than you otherwise would.

There are at least 2048 unique emoji graphemes in the unicode specification. Therefore an emoji is actually just an 11 bit unsigned integer with extra steps.

This library packs bytes bytes into 11 bit unsigned integers, which are then mapped to sequences of unicode characters that display as a single grapheme.

Example

Original Text:
 Value: Shrek 2 was the greatest film ever made!!
 Bytes: 41,
 Characters: 41,
 Graphemes: 41

Mojibake Encoded:
 Value: ๐Ÿ‡ป๐Ÿ‡ณ๐Ÿ‘Œ??๐Ÿช€๐Ÿ”ถ๐Ÿซณ๐Ÿฟ๐Ÿง๐Ÿป๐Ÿ“ผ๐Ÿ•บ๐Ÿพ๐Ÿค›๐Ÿป๐Ÿฆบ๐Ÿคต๐Ÿฝ๐Ÿ‘ฆ๐Ÿผ๐Ÿ—„๏ธ๐Ÿ’†๐Ÿฟโš—๏ธโ†—๏ธ2๏ธโƒฃ๐Ÿงฅ๐Ÿคต๐Ÿป๐Ÿ•ค๐Ÿ™†๐Ÿซš๐Ÿช™๐Ÿ˜Ÿ๐Ÿ‡ฆ๐Ÿ‡ช๐Ÿซณ๐Ÿฝ๐Ÿ‡ธ๐Ÿ‡ฒ๐Ÿ˜น๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ๐Ÿ›Œ๐Ÿป
 Bytes: 210,
 Characters: 55,
 Graphemes: 30

Decoded Text:
 Value: Shrek 2 was the greatest film ever made!!
 Bytes: 41,
 Characters: 41,
 Graphemes: 41
You might also like...
bottom encodes UTF-8 text into a sequence comprised of bottom emoji
bottom encodes UTF-8 text into a sequence comprised of bottom emoji

bottom encodes UTF-8 text into a sequence comprised of bottom emoji (with , sprinkled in for good measure) followed by ๐Ÿ‘‰๐Ÿ‘ˆ. It can encode any valid UTF-8 - being a bottom transcends language, after all - and decode back into UTF-8.

A simple command-line utility (and Rust crate!) for converting from a conventional image file (e.g. a PNG file) into a pixel-art version constructed with emoji
A simple command-line utility (and Rust crate!) for converting from a conventional image file (e.g. a PNG file) into a pixel-art version constructed with emoji

EmojiPix This is a simple command-line utility (and Rust crate!) for converting from a conventional image file (e.g. a PNG file) into a pixel-art vers

a cute language with a bunch emoji๐Ÿถ
a cute language with a bunch emoji๐Ÿถ

nylang a cute language with a bunch emoji documentation WIKI usage dependancies rust ( cargo ) install & uninstall install chmod +x scripts/install.sh

decode a byte stream of varint length-encoded messages into a stream of chunks

length-prefixed-stream decode a byte stream of varint length-encoded messages into a stream of chunks This crate is similar to and compatible with the

Decode Metaplex mint account metadata into a JSON file.

Simple Metaplex Decoder (WIP) Install From Source Install Rust. curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh Clone the source: git c

Decode, explore, and sign JWTs
Decode, explore, and sign JWTs

JWT Explorer A utility for inspecting, modifying, and attacking JWTs. Supports Windows and Linux and probably also works on macOS but this has not bee

A quick way to decode a contract's transaction data with only the contract address and abi.

tx-decoder A quick way to decode a contract's transaction data with only the contract address and abi. E.g, let tx_data = "0xe70dd2fc00000000000000000

Decode URLs in your files!

urldecoder English | ็ฎ€ไฝ“ไธญๆ–‡ A tool to batch decode URLs in your files. A toy project written in Rust. Decoding URLs shortens the string length and incre

Decode Mode S and ADS-B signals in Rust

rs1090 rs1090 is a Rust library to decode Mode S and ADS-B messages. It takes its inspiration from the Python pyModeS library, and uses deku in order

A crate to convert bytes to something more useable and the other way around in a way Compatible with the Confluent Schema Registry. Supporting Avro, Protobuf, Json schema, and both async and blocking.
A crate to convert bytes to something more useable and the other way around in a way Compatible with the Confluent Schema Registry. Supporting Avro, Protobuf, Json schema, and both async and blocking.

#schema_registry_converter This library provides a way of using the Confluent Schema Registry in a way that is compliant with the Java client. The rel

Fast and compact sets of bytes or ASCII characters

bset Fast and compact sets of bytes and ASCII characters, useful for searching, parsing and determining membership of a given byte in the given set. T

A memory efficient immutable string type that can store up to 24* bytes on the stack

compact_str A memory efficient immutable string type that can store up to 24* bytes on the stack. * 12 bytes for 32-bit architectures About A CompactS

Generate or convert random bytes into passphrases. A Rust port of niceware.

niceware My blog post: Porting Niceware to Rust A Rust port of niceware. Sections of this README have been copied from the original project. This libr

A library for transcoding between bytes in Astro Notation Format and Native Rust data types.

Rust Astro Notation A library for transcoding between hexadecimal strings in Astro Notation Format and Native Rust data types. Usage In your Cargo.tom

hexyl is a simple hex viewer for the terminal. It uses a colored output to distinguish different categories of bytes
hexyl is a simple hex viewer for the terminal. It uses a colored output to distinguish different categories of bytes

hexyl is a simple hex viewer for the terminal. It uses a colored output to distinguish different categories of bytes (NULL bytes, printable ASCII characters, ASCII whitespace characters, other ASCII characters and non-ASCII).

hubpack is an algorithm for converting Rust values to bytes and back.

hubpack is an algorithm for converting Rust values to bytes and back. It was originally designed for encoding messages sent between embedded programs. It is designed for use with serde.

Astro Format is a library for efficiently encoding and decoding a set of bytes into a single buffer format.

Astro Format is a library for efficiently transcoding arrays into a single buffer and native rust types into strings

Rc version `tokio-rs/bytes`

RcBytes The aim for this crate is to implement a Rc version bytes, which means that the structs in this crate does not implement the Sync and Send. Th

A high-performance SPSC bounded circular buffer of bytes

Cueue A high performance, single-producer, single-consumer, bounded circular buffer of contiguous elements, that supports lock-free atomic batch opera

Comments
  • Add property based testing

    Add property based testing

    This just seemed like such an obvious thing to write property based tests for, so I tried it out and it found some issues. Did I understand it correctly that it should be able to encode and then decode any sequence of bytes and end up with the original input? The regression I committed (just run cargo test to reproduce) found that decode(encode(vec![0u8, 223u8, 124u8])) returns None.

    opened by bondo 1
  • Add encoding optimizing for Grapheme Clusters instead of Graphemes

    Add encoding optimizing for Grapheme Clusters instead of Graphemes

    While services often use grapheme count for character limits, the better analog for number of visual elements is grapheme clusters.

    An encoding that takes advantage of zero-width joiner (ZWG) to encode grapheme clusters made of multiple graphemes (e.g. gender, skin tone modifiers) should improve the visual density of encoded information. As a bonus, this will also increase the diversity of generated emojis.

    Reference

    enhancement 
    opened by Kylebrown9 1
A library for transcoding between bytes in Astro Notation Format and Native Rust data types.

Rust Astro Notation A library for transcoding between hexadecimal strings in Astro Notation Format and Native Rust data types. Usage In your Cargo.tom

Stelar Software 1 Feb 4, 2022
Efficiently store Rust idiomatic bytes related types in Avro encoding.

Serde Avro Bytes Avro is a binary encoding format which provides a "bytes" type optimized to store &[u8] data like. Unfortunately the apache_avro enco

Akanoa 3 Mar 30, 2024
Emoji-printer - Utility to convert strings with emoji shortcodes to strings with the emoji unicode

Emoji Printer Intro Utility to convert strings with emoji shortcodes (:sushi:) to strings with the emoji unicode ( ?? ) Install cargo add emoji-printe

Kyle Scully 2 Dec 30, 2021
Encode and decode dynamically constructed values of arbitrary shapes to/from SCALE bytes

scale-value ยท This crate provides a Value type, which is a runtime representation that is compatible with scale_info::TypeDef. It somewhat analogous t

Parity Technologies 15 Jun 24, 2023
CLI tool to encode/decode base64

b64 is a simple util to encode/decode base64 texts.

null 0 Jul 6, 2022
๐Ÿช Modern emoji picker popup for desktop, based on Emoji Mart, built with Tauri and Svelte

Emoji Mart desktop popup Modern emoji picker popup app for desktop, based on the amazing Emoji Mart web component. ?? Built as a popup: quick invocati

Vincent Emonet 10 Jul 3, 2023
Decode SCALE bytes into custom types using a scale-info type registry and a custom Visitor impl.

scale-decode This crate attempts to simplify the process of decoding SCALE encoded bytes into a custom data structure given a type registry (from scal

Parity Technologies 6 Sep 20, 2022
nombytes is a library that provides a wrapper for the bytes::Bytes byte container for use with nom.

NomBytes nombytes is a library that provides a wrapper for the bytes::Bytes byte container for use with nom. I originally made this so that I could ha

Alexander Krivรกcs Schrรธder 2 Jul 25, 2022
pem-rs pem PEM jcreekmore/pem-rs [pem] โ€” A Rust based way to parse and encode PEM-encoded data

pem A Rust library for parsing and encoding PEM-encoded data. Documentation Module documentation with examples Usage Add this to your Cargo.toml: [dep

Jonathan Creekmore 30 Dec 27, 2022
A Quest to Find a Highly Compressed Emoji :shortcode: Lookup Function

Highly Compressed Emoji Shortcode Mapping An experiment to try and find a highly compressed representation of the entire unicode shortcodes-to-emoji m

Daniel Prilik 13 Nov 16, 2021