Fast array expressions on the stack, no-std compatible

Related tags

Cryptography strobe
Overview

Strobe

Fast, low-memory, elementwise array expressions on the stack. Compatible with no-std (and no-alloc) environments.

This crate provides array expressions of arbitrary depth and executes without allocation (if an output array is provided) or with a single allocation only for the output array.

Vectorization over segments of the data is achieved whenever the inner function is compatible with the compiler's loop vectorizer and matches the available (and enabled) SIMD instruction sets on the target CPU.

Who is it for?

If you're

  • doing multi-step array operations of size >> 64 elements,
  • and can formulate your array expression as a tree,
  • and are bottlenecked on allocation or can't allocate at all,
  • or need concretely bounded memory usage,
  • or are memory or CPU usage constrained,
  • or don't have the standard library,
  • or want to bag a speedup without hand-vectorizing,

then this may be helpful.

Who is it not for?

If, however, you're

  • doing single-step or small-size array operations,
  • or can't reasonably formulate your expression as a tree,
  • or can get the performance you need from a library with excellent ergonomics, like ndarray,
  • or need rigorous elimination of all panic branches,
  • or need absolutely the fastest and lowest-memory method possible and are willing and able and have time to hand-vectorize your application,

then this may not be helpful at all.

Example: (A - B) * (C + D) with one allocation

use strobe::{array, add, sub, mul};

// Generate some arrays to operate on
const NT: usize = 10_000;
let a = vec![1.25_f64; NT];
let b = vec![-5.32; NT];
let c = vec![1e-3; NT];
let d = vec![3.14; NT];

// Associate those arrays with inputs to the expression
let an = &mut array(&a);
let bn = &mut array(&b);
let cn = &mut array(&c);
let dn = &mut array(&d);

// Build the expression tree, then evaluate,
// allocating once for the output array purely for convenience
let y = mul(&mut sub(an, bn), &mut add(cn, dn)).eval();

// Check results for consistency
(0..NT).for_each(|i| { assert_eq!(y[i], (a[i] - b[i]) * (c[i] + d[i]) ) });

Example: Evaluation with zero allocation

While we use a simple example here, any strobe expression can be evaluated into existing storage in this way.

use strobe::{array, mul};

// Generate some arrays to operate on
const NT: usize = 10_000;
let a = vec![1.25_f64; NT];
let an0 = &mut array(&a);  // Two input nodes from `a`, for brevity
let an1 = &mut array(&a);

// Pre-allocate storage
let mut y = vec![0.0; NT];

// Build the expression tree, then evaluate into preallocated storage.
mul(an0, an1).eval_into(&mut y);

// Check results for consistency
(0..NT).for_each(|i| { assert_eq!(y[i], a[i] * a[i] ) });

Example: Custom expression nodes

Many common functions are already implemented. Ones that are not can be assembled using the unary, binary, ternary, and accumulator functions along with a matching function pointer or closure.

use strobe::{array, unary};

let x = [0.0_f64, 1.0, 2.0];
let mut xn = array(&x);

let sq_func = |a: &[f64], out: &mut [f64]| { (0..a.len()).for_each(|i| {out[i] = x[i].powi(2)}) };
let xsq = unary(&mut xn, &sq_func).eval();

(0..x.len()).for_each(|i| {assert_eq!(x[i] * x[i], xsq[i])});

Conspicuous Design Decisions and UAQ (Un-Asked Questions)

  • Why not implement the standard num ops?
    • Because we can't guarantee the lifetime of the returned node will match the lifetime of the references it owns, unless the outside scope is guaranteed to own both the inputs and outputs.
  • Why abandon the more idiomatic functional programming regime at the interface?
    • Same reason we don't have the standard num ops - unfortunately, this usage breaks lifetime guarantees.
  • Why isn't it panic-never compatible?
    • In order to guarantee the lengths of the data and eliminate panic branches in slice operations, we would need to use fixed-length input arrays and propagate those lengths with const generics. This would give unpleasant ergonomics and, because array lengths would need to be established at compile time, this would also prevent any useful interlanguage bindings.
  • Why do expressions need to be strict trees?
    • Because we are strictly on the stack, and we need mutable references to each node in order to use intermediate storage, which we need in order to allow vectorization, and since we can only have one mutable reference to a anything, this naturally produces a tree structure.
    • However since the input arrays do not need to be mutable, more than one expression node can refer to a given input array, which resolves many (but not all) potential cases where the strict tree structure might prove inadequate.
  • I can hand-vectorize my array operations and do them even faster with even less memory!
    • Cool! Have fun with that.

License

Licensed under either of

at your option.

Comments
  • Use criterion for benchmarks

    Use criterion for benchmarks

    • Remove existing bench harness
    • Add criterion as a dev dep
    • Add benchmark comparisons of various strategies for multiplying arrays of various sizes in benches/
    • Move release profile up to workspace root

    criterion.zip

    image image image image image
    opened by jlogan03 1
  • chore: release v0.1.3

    chore: release v0.1.3

    🤖 New release

    • strobe: 0.1.2 -> 0.1.3 (✓ API compatible changes)
    Changelog

    0.1.3 - 2023-09-23

    Other

    • Use criterion for benchmarks (#6)


    This PR was generated with release-plz.

    opened by github-actions[bot] 0
  • chore: release v0.1.2

    chore: release v0.1.2

    🤖 New release

    • strobe: 0.1.1 -> 0.1.2 (✓ API compatible changes)
    Changelog

    0.1.2 - 2023-09-20

    Other

    • Add iterator input (#4)


    This PR was generated with release-plz.

    opened by github-actions[bot] 0
  • Add iterator input

    Add iterator input

    • Add iterator input
    • Remove reset functionality & update benchmarks
    • Add benchmark of iterator input
    • Implement From<> -> Expr for array, slice, and iterator inputs
    • Duplicate README up a directory
    opened by jlogan03 0
  • chore: release v0.1.1

    chore: release v0.1.1

    🤖 New release

    • strobe: 0.1.0 -> 0.1.1 (✓ API compatible changes)
    Changelog

    0.1.1 - 2023-09-17

    Other

    • Merge pull request #3 from jlogan03/jlogan/readme
    • Populate readme with lib root docs and add readme field to Cargo.toml


    This PR was generated with release-plz.

    opened by github-actions[bot] 0
  • chore: release v0.1.0

    chore: release v0.1.0

    🤖 New release

    • strobe: 0.1.0
    Changelog

    0.1.0 - 2023-09-17

    Other

    • canonicalize license expression
    • add description
    • Squash early development.


    This PR was generated with release-plz.

    opened by github-actions[bot] 0
Releases(v0.1.3)
Owner
James Logan
Senior Scientific Software Engineer at Commonwealth Fusion Systems.
James Logan
std::simd implementation of BLAKE3

BLAKE3-STD the first blake3 implementation on std::simd ONLY COMPILES WITH NIGHTLY [dependencies] blake3-std = "0.0.1" OFFICIAL DOC BLAKE3 is a crypto

LemonHX 16 Nov 23, 2022
A no-std / no-alloc TLS 1.3 client

puny-tls - no-std/no-alloc TLS 1.3 client This is an improvement over tiny-tls-rs to make it more useable. However the only reason this exists is to r

Björn Quentin 2 Aug 22, 2022
EVM compatible chain with NPoS/PoC consensus

Reef Chain Reef chain is written in Rust. A basic familiarity with Rust tooling is required. To learn more about Reef chain, please refer to Documenta

Reef Finance 148 Dec 31, 2022
Trustworthy encrypted command line authenticator app compatible with multiple backups.

cotp - command line totp authenticator I believe that security is of paramount importance, especially in this digital world. I created cotp because I

Reply 71 Dec 30, 2022
Decrypt your LUKS partition using a FIDO2 compatible authenticator

fido2luks This will allow you to unlock your LUKS encrypted disk with an FIDO2 compatible key. Note: This has only been tested under Fedora 31, Ubuntu

null 118 Dec 24, 2022
An Ethereum compatible Substrate blockchain for bounties and governance for the Devcash community.

Substrate Node Template A fresh FRAME-based Substrate node, ready for hacking ?? Getting Started Follow the steps below to get started with the Node T

null 4 Mar 30, 2022
Eternally liquid. Forward compatible. Nested, conditional, & Multi-resourced NFTs.

RMRK Substrate Rust Setup First, complete the basic Rust setup instructions. Run Use Rust's native cargo command to build and launch the template node

RMRK Team 67 Dec 25, 2022
Fiddi is a command line tool that does the boring and complex process of checking and processing/watching transactions on EVM compatible Blockchain.

Fiddi is a command line tool that does the boring and complex process of checking and processing/watching transactions on EVM compatible Blockchain.

Ahmad Abdullahi Adamu 7 Jan 9, 2023
Selendra is a multichains interoperable nominated Proof-of-Stake network for developing and running Substrate-based and EVM compatible blockchain applications.

Selendra An interoperable nominated Proof-of-Stake network for developing and running Substrate-based and EVM compatible blockchain applications. Read

Selendra 16 Nov 29, 2022
Minimalistic EVM-compatible chain indexer.

EVM Indexer Minimalistic EVM-compatible blockchain indexer written in rust. This repository contains a program to index helpful information from any E

Kike B 14 Dec 24, 2022
Minimalistic EVM-compatible chain indexer.

EVM Indexer Minimalistic EVM-compatible blockchain indexer written in rust. This repository contains a program to index helpful information from any E

LlamaFolio 11 Dec 15, 2022
Reference library that implements all the necessary functionality for developing a client that is compatible with TAPLE's DLT network.

⚠️ TAPLE is in early development and should not be used in production ⚠️ TAPLE Core TAPLE (pronounced T+ ?? ['tapəl]) stands for Tracking (Autonomous)

Open Canarias 6 Jan 25, 2023
Write Anchor-compatible Solana programs in TypeScript

Axolotl Write Achor-compatible Solana programs using TypeScript. Writing Rust is hard, but safe. It's also the go-to language for writing Solana progr

Anthony Morris 17 Nov 27, 2022
A framework for developing EVM-compatible chains

rt-evm A compact development framework for creating EVM-compatible runtimes/chains. Usage Check the example for details. Projects referenced trie, MPT

Rust Util Collections 4 Mar 15, 2023
Y-Octo is a high-performance CRDT implementation compatible with yjs

Y-Octo Y-Octo is a high-performance CRDT implementation compatible with yjs. Introduction Y-Octo is a tiny, ultra-fast CRDT collaboration library buil

null 79 Oct 5, 2023
Flexible Rust implementation of the MuSig2 multisignature protocol, compatible with Bitcoin.

MuSig2 This crate provides a flexible rust implementation of MuSig2, an optimized digital signature aggregation protocol, on the secp256k1 elliptic cu

null 4 Oct 31, 2023
A Boring(SSL)-compatible API abstraction for Rust cryptographic implementations.

A Boring(SSL)-compatible API abstraction for Rust cryptographic implementations. What is Superboring? Superboring hides the complexity, diversity and

Frank Denis 7 Dec 29, 2023
A high-performance, highly compatible EVM Inscriptions Indexer

Insdexer A high-performance, highly compatible EVM Inscriptions Indexer by Rust. An accessible and complete version of the documentation is available

null 105 Mar 17, 2024
Cryptography-oriented big integer library with constant-time, stack-allocated (no_std-friendly) implementations of modern formulas

RustCrypto: Cryptographic Big Integers Pure Rust implementation of a big integer library which has been designed from the ground-up for use in cryptog

Rust Crypto 88 Dec 31, 2022