A crate for converting an ASCII text string or file to a single unicode character

Overview

zalgo codec

This is a crate implementing the zalgo encoding and decoding functions originally written in Python by Scott Conner and extends them for Rust by providing a procedural macro that can run encoded source code.

With the functions defined in this crate you can transform an ASCII string into a unicode string that is a single "character" wide. While the encoding is reversible the encoded string will be larger than the original in terms of bytes.

The crate also provides the zalgo_embed! macro that can be used to decode a string of encoded source code and pass the results on to the compiler. Imagine the code clarity!

Additionally the crate provides functions to encode python code and wrap the result in a decoder that decodes and executes the encoded string.

Can not encode carriage returns, so files written on non-unix operating systems might not work. The file encoding functions will attempt to encode files anyway by ignoring carriage returns, but the string encoding functions will return an error.

Examples

We can execute encoded code with the macro:

// This expands to the code
// `fn add(x: i32, y: i32) -> i32 {x + y}`
zalgo_embed!("E͎͉͙͉̞͉͙͆̀́̈́̈́̈̀̓̒̌̀̀̓̒̉̀̍̀̓̒̀͛̀̋̀͘̚̚͘͝");

// The `add` function is now available
assert_eq!(add(10, 20), 30);

as well as evaluate expressions:

let x = 20;
let y = -10;
// This expands to the code 
// `x + y`
let z = zalgo_embed!("È͙̋̀͘");
assert_eq!(z, x + y);

The cursed character at the bottom of this section is the standard "Lorem ipsum" encoded with the encoding function in this crate.








E̬͏͍͉͓͕͍͒̀͐̀̈́ͅ͏͌͏͓͉͔͍͔͒̀̀́̌̀̓ͅ͏͎͓͔͔͕͉͉͓͉͎͇͉͔͓̓͒̀́̈́͐̓̀͌̌̀̈́̀̈́ͅͅͅͅ͏͉͕͓͍̀ͅ͏͔͍̈́̀͐ͅ͏͉͎͉͉͕͎͔͕͔͒̀̓̈́̈́̀̀͌́͂͏͔͒̀̀̈́ͅͅ͏͌͏͍͇͎͉͒̀́́̀́͌ͅ





\

Explanation

Characters U+0300–U+036F are the combining characters for unicode Latin. The fun thing about combining characters is that you can add as many of these characters as you like to the original character and it does not create any new symbols, it only adds symbols on top of the character. It's supposed to be used in order to create characters such as á by taking a normal a and adding another character to give it the mark (U+301, in this case). Fun fact, Unicode doesn't specify any limit on the number of these characters. Conveniently, this gives us 112 different characters we can map to, which nicely maps to the ASCII character range 0x20 -> 0x7F, aka all the non-control characters. The only issue is that we can't have new lines in this system, so to fix that, we can simply map 0x7F (DEL) to 0x0A (LF). This can be represented as (CHARACTER - 11) % 133 - 21, and decoded with (CHARACTER + 22) % 133 + 10.

Links

The original post where the python code was first presented together with the above explanation.
docs.rs.
crates.io.

You might also like...
Shows how to implement USB device on RP2040 in Rust, in a single file, with no hidden parts.

Rust RP2040 USB Device Example This is a worked example of implementing a USB device on the RP2040 microcontroller, in Rust. It is designed to be easy

Download a single file from a Git repository.

git-download Microservices architecture requires sharing service definition files like in protocol buffer, for clients to access the server. To share

Stack heap flexible string designed to improve performance for Rust

flexible-string A stack heap flexible string designed to improve performance. FlexibleString was first implemented in spdlog-rs crate, which improved

Simple string matching with questionmark- and star-wildcard operator

wildmatch Match strings against a simple wildcard pattern. Tests a wildcard pattern p against an input string s. Returns true only when p matches the

A special rope, designed to work with any data type that is not String

AnyRope AnyRope is an arbitrary data type rope for Rust, designed for similar operations that a rope would do, but targeted at data types that are not

Compact, clone-on-write vector and string.

ecow Compact, clone-on-write vector and string. Types An EcoVec is a reference-counted clone-on-write vector. It takes up two words of space (= 2 usiz

Parses a relative time string and returns a `Duration`

humantime_to_duration A Rust crate for parsing human-readable relative time strings and converting them to a Duration. Features Parses a variety of hu

Idiomatic Rust implementations for various Windows string types (like UNICODE_STRING)
Idiomatic Rust implementations for various Windows string types (like UNICODE_STRING)

nt-string by Colin Finck [email protected] Provides idiomatic Rust implementations for various Windows string types: NtUnicodeString (with NtUnicode

Rust based magic-string with source map chains support

enhanced-magic-string Rust implementation of https://www.npmjs.com/package/magic-string with original sourcemap chain support. license. This project i

Comments
  • Embed macro

    Embed macro

    I saw this on reddit, and specifically your comment

    I don't know enough about Rust macros to know if it would be possible to create a macro that decodes a character like this and then passes the result on to the compiler. Does anyone know if this is feasible?

    Inspired me to give it a go, this PR adds a zalgo_embed macro that does exactly this.

    opened by alexkeizer 4
Owner
Johanna Sörngård
Currently PhDing at the physics department at Stockholm University. Enjoys collecting cool rocks, and also other cool things that are not rocks.
Johanna Sörngård
API for the creation character based games in Linux.

Console Game Engine for Linux. API for the creation of character based games in Linux. The inspiration came from the olcConsoleGameEngine. This is my

Arjob Mukherjee 4 Sep 27, 2022
📱️🚫️🌝️💾️ 3FakeIM is a joke program meant to imitate various fictional characters, and the "[CHARACTER] CALLED ME AT 3:00 AM" clickbait trend, while poking fun.

3FakeIM ??️??️??️??️ 3FakeIM is a joke program meant to imitate various fictional characters, and the "[CHARACTER] CALLED ME AT 3:00 AM" clickbait tre

Sean P. Myrick V19.1.7.2 2 Jul 3, 2023
Convert character to binary using Rust.

Character-to-Binary-Rust This is a simple operation that is used to convert character to binary using Rust. Installation and Requirements First instal

Kariappa K R 8 Nov 20, 2023
A single-producer single-consumer Rust queue with smart batching

Batching Queue A library that implements smart batching between a producer and a consumer. In other words, a single-producer single-consumer queue tha

Roland Kuhn 2 Dec 21, 2021
A turing-complete programming language using only zero-width unicode characters, inspired by brainfuck and whitespace.

Zero-Width A turing-complete programming language using only zero-width unicode characters, inspired by brainfuck and whitespace. Currently a (possibl

Gavin M 2 Jan 14, 2022
Like wc, but unicode-aware, and with per-line mode

Like wc, but unicode-aware, and with per-line mode

Skyler Hawthorne 34 May 24, 2022
UNIC: Unicode and Internationalization Crates for Rust

UNIC: Unicode and Internationalization Crates for Rust https://github.com/open-i18n/rust-unic UNIC is a project to develop components for the Rust pro

open-i18n — Open Internationalization Initiative 219 Nov 12, 2022
OOLANG - an esoteric stack-based programming language where all instructions/commands are differnet unicode O characters

OOLANG is an esoteric stack-based programming language where all instructions/commands are differnet unicode O characters

RNM Enterprises 2 Mar 20, 2022
Utilities for converting Vega-Lite specs from the command line and Python

VlConvert VlConvert provides a Rust library, CLI utility, and Python library for converting Vega-Lite chart specifications into static images (SVG or

Vega 24 Feb 13, 2023
Rust crate for obfuscating string literals.

Obfustring This crate provides a obfuscation macro for string literals. This makes it easy to protect them from common reverse engineering attacks lik

null 7 Mar 1, 2023