Compact, clone-on-write vector and string.

Overview

ecow

Crates.io Documentation

Compact, clone-on-write vector and string.

Types

  • An EcoVec is a reference-counted clone-on-write vector. It takes up two words of space (= 2 usize) and has the same memory layout as a &[T] slice. Within its allocation, it stores a reference count, its capacity and its elements.

  • An EcoString is a reference-counted clone-on-write string with inline storage. It takes up 16 bytes of space. It has 15 bytes of inline storage and starting from 16 bytes it becomes an EcoVec<u8>.

Example

// This is stored inline.
let small = ecow::EcoString::from("Welcome");

// This spills to the heap, but only once: `big` and `third` share the
// same underlying allocation. Vectors and spilled strings are only
// really cloned upon mutation.
let big = small + " to earth! đŸŒ±";
let mut third = big.clone();

// This allocates again to mutate `third` without affecting `big`.
assert_eq!(third.pop(), Some('đŸŒ±'));
assert_eq!(third, "Welcome to earth! ");

Why should I use this instead of ...

Type Details
Vec<T> / String Normal vectors are a great general purpose data structure. But they have a quite big footprint (3 machine words) and are expensive to clone. The EcoVec has a bit of overhead for mutation, but is cheap to clone and only takes two words.
Arc<Vec<T>> / Arc<String> These require two allocations instead of one and are less convenient to mutate.
Arc<[T]> / Arc<str> While these require only one allocation, they aren't mutable.
Small vector Different trade-off. Great when there are few, small Ts, but expensive to clone when spilled to the heap.
Small string The EcoString combines different small string qualities into a very practical package: It has inline storage, a smaller footprint than a normal String, is efficient to clone even when spilled, and at the same time mutable.

License

This crate is dual-licensed under the MIT and Apache 2.0 licenses.

Comments
  • Setup GitHub actions for CI

    Setup GitHub actions for CI

    This sets up and action that runs on:

    • Pushes
    • Pull requests
    • A weekly schedule (to proactively point out any new warnings from new stable/beta releases)

    This enforces:

    • cargo fmt, cargo clippy, cargo test, and cargo doc warnings and hard errors on stable and beta
    • cargo check and cargo test hard errors on an MSRV version
    • cargo miri test on nightly
    opened by CosmicHorrorDev 8
  • Public API structure

    Public API structure

    I think it would make sense to expose the inline string size limit and then have the public API be something along the lines of

    ecow::{
        eco_vec,
        format_eco, // Maybe rename to eco_format to keep name ordering consistent?
        vec::{EcoVec, IntoIter},
        str::{EcoString, INLINE_SIZE_LIMIT},
    }
    

    The existence of IntoIter to me is enough to have a separate vec module at least (and there is the chance that other iterators could be added like Drain)

    I would be up for re-exporting or exclusively exposing EcoVec and EcoString in the root of the API though since they are likely going to be the most commonly use types

    opened by CosmicHorrorDev 6
  • Restructure public api

    Restructure public api

    Resolves #18

    This follows the described restructuring, renaming, and switches the unit tests to integration tests. The only change that the latter involved was switching a single call from .extend_from_byte_slice() to .extend_from_slice() since .extend_from_byte_slice() isn't included in the public API (and should get tested well by the EcoString tests anyways)

    opened by CosmicHorrorDev 5
  • Publish coverage to codecov?

    Publish coverage to codecov?

    Wanted to check before I go ahead and open a PR, but would you be fine with a CI job to publish code coverage information to codecov? I'm planning on adding property testing and fuzzing, and having code coverage information published would be helpful (and you can get a cool little badge for it)

    opened by CosmicHorrorDev 5
  • Loom scaffolding

    Loom scaffolding

    This just sets up the scaffolding for writing future tests using loom

    The atomics usage currently mirrors std, so I doubt there will be any issues, but it certainly still seems good to have

    opened by CosmicHorrorDev 3
  • Inline size for 32-bit systems

    Inline size for 32-bit systems

    I was doing some testing (got a lot planned for that btw), and I noticed that the test_mem_size test fails

    #[test]
    fn test_mem_size() {
        let word = mem::size_of::<usize>();
        assert_eq!(mem::size_of::<EcoVec<u8>>(), 2 * word);
        assert_eq!(mem::size_of::<Option<EcoVec<u8>>>(), 2 * word);
    
        if cfg!(target_endian = "little") {
            assert_eq!(mem::size_of::<EcoString>(), 2 * word);    // <-- Fails here. left: 16, right: 8
            assert_eq!(mem::size_of::<Option<EcoString>>(), 3 * word);
        }
    }
    

    It looks like the inline limit on 32-bit LE systems is still 15. I'd love to submit a PR to fix things, but I wasn't sure if the right approach was to use a smaller limit on 32-bit systems or to keep a larger limit even though the word size is smaller

    opened by CosmicHorrorDev 3
  • Fix `EcoVec`'s `Send` and `Sync` bounds

    Fix `EcoVec`'s `Send` and `Sync` bounds

    Last PR I can do for the day (got a lot of IRL stuff today), but it's another soundness fix

    This fixes EcoVec's Send and Sync bounds to match the standard library's Arc

    Arc-like constructs need both Send and Sync on the inner type to allow for Send or Sync. Without this it's trivial to violate the inner type's bound

    E.g. a type can be Send + !Sync, but that would allow you to create two Arcs on the same thread and then pass one of the Arcs to another thread. From there you have a reference to the type from multiple threads even though it's !Sync. A simple example program that makes miri angry

    use std::cell::Cell;
    
    use ecow::eco_vec;
    
    fn main() {
        let send_not_sync = Cell::new("foo");
        let one = eco_vec![send_not_sync];
        let another = one.clone();
    
        std::thread::spawn(move || loop {
            another[0].set("bar");
            another[0].set("foo");
        });
    
        // đŸ˜±
        loop {
            println!("{}", one[0].get());
        }
    }
    
    $ cargo +nightly miri run
    // -- 8< -- build output -- 8< --
    error: Undefined Behavior: Data race detected between (1) Read on thread `main` and (2) Write on thread `<unnamed>` at alloc1739+0x18. (2) just happened here
       --> /.../rustlib/src/rust/library/core/src/cell.rs:413:31
        |
    413 |         mem::replace(unsafe { &mut *self.value.get() }, val)
        |                               ^^^^^^^^^^^^^^^^^^^^^^ Data race detected between (1) Read on thread `main` and (2) Write on thread `<unnamed>` at alloc1739+0x18. (2) just happened here
        |
    help: and (1) occurred earlier here
       --> src/main.rs:17:24
        |
    17  |         println!("{}", one[0].get());
        |                        ^^^^^^^^^^^^
        = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
        = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
        = note: BACKTRACE (of the first span):
        = note: inside `std::cell::Cell::<&str>::replace` at /.../rustlib/src/rust/library/core/src/cell.rs:413:31: 413:53
        = note: inside `std::cell::Cell::<&str>::set` at /.../rustlib/src/rust/library/core/src/cell.rs:363:19: 363:36
    note: inside closure
       --> src/main.rs:11:9
        |
    11  |         another[0].set("bar");
        |         ^^^^^^^^^^^^^^^^^^^^^
    
    note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
    
    error: aborting due to previous error
    

    For even more fun you can change one of the strings that get swapped to something really big and then you can get fun runtime issues.

    $ # --release optimizes stuff away
    $ # --show-all prevents arbitrary bytes from messing with my terminal session
    $ cargo run | bat --show-all
    // -- 8< -- searching for "home" -- 8< --
    9962   │ foo#/rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/std/src/io/mod.rsfailed·to·write·whole·bufferfor
           │ matter·errorfatal·runtime·error:·␊
    9963   │ thread·result·panicked·on·dropassertion·failed:·self.is_allocated()/.../ecow/s
           │ rc/vec.rsassertion·failed:·self.is_unique()assertion·failed:·target·>·
    self.capacity()␀␀␀␀␀␀␀␀␀attempt·to·add
           │ ·with·overflow␀␀␀␀attempt·to·subtract·with·overflow␀␀␀␀␀%                                                   26;46m␀␀␀␀␀␀␀␀␀␀attempt·to·multiply·with·overflowreference
           │ ·count·overflow␀␀␀␀␀␀␀,O\xFC\xFFhO\xFC\xFFjO\xFC\xFFlO\xFC\xFFnO\xFC\xFFthere·is·no·such·thing·as·a·relaxed·
           │ fence␀␀␀/rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/sync/atomic.rs␀>P\xFC\xFFSP\xFC\xFF
           │ hP\xFC\xFF}P\xFC\xFF\x92P\xFC\xFFfailed·to·spawn·thread/rustc/fc594f15669680fa70d255faec3ca3fb507c3405/libra
           │ ry/std/src/thread/mod.rsthread·name·may·not·contain·interior·null·bytes␀␀\xA2e\xFC\xFF\xDAe\xFC\xFFie\xFC\xF
           │ F\x89e\xFC\xFF\xE2g\xFC\xFF\u{1a}h\xFC\xFF\xA9g\xFC\xFF\xC9g\xFC\xFFinternal·error:·entered·unreachable·code
           │ /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/std/src/io/error/repr_bitpacked.rs␀␀\xFEx\xFC\xFF\u{
           │ 12}y\xFC\xFF·y\xFC\xFF3y\xFC\xFF/.../Pro␊
    

    (Being totally honest I'm just having fun seeing what wacky stuff can happen with Rust's UB. Sometimes I miss how cursed C programming can be)

    opened by CosmicHorrorDev 3
  • Undo broken invariant before panicking in ref count overflow

    Undo broken invariant before panicking in ref count overflow

    The last PR in today's barrage is an extra spooky one :ghost::ghost::ghost:

    I believe this should be enough to fix exception safety from overflowing the ref count. This prevents a possible use-after-free that can occur in incredibly degenerate programs. For example

    use ecow::EcoString;
    
    fn main() {
        let one = EcoString::from("This is so large that it's a `Repr::Large`");
        let two = one.clone();
    
        for _ in 0..(usize::MAX - 1) {
            let _ = std::panic::catch_unwind(|| {
                std::mem::forget(two.clone());
            });
        }
    
        // The ref count is now at 2 + usize::MAX - 1 aka we overflowed to 1 :D
    
        drop(two);
    
        // Now that the ref count is 0 the backing allocation gets dropped, buuuuut we still have a
        // reference to it which lets us UAF
    
        println!("Boom {one}");
    }
    

    Considering this involves overflowing a usize it takes a considerable amount of time (I tried optimizing things, but even using cross to run on a 32-bit platform gets really slow when we start catching panics)

    If you wind up adding an std feature you could change ref_count_overflow() to abort when the feature is enabled which matches the standard library's handling

    opened by CosmicHorrorDev 3
  • Defensively test against mistakes found in the advisory db

    Defensively test against mistakes found in the advisory db

    I figured it would be worthwhile to start digging through issues reported in the advisory-db. So far it hasn't found anything interesting, but this includes areas that could have issues from future optimizations (primarily avoiding extra bounds/capacity checking)

    This is just the first set since there's more I would like to add (namely around panicking in either clone() or drop(), but that seems fine so far). Common sources of issues seem to be

    • Wrong Send or Sync bounds: already fixed
    • Forgetting to bounds check, or off by one error when bounds checking: already seems decently tested, but could be worth some property testing
    • Panicking in unexpected places: everything seems to update the bounds correctly to avoid dropping things multiple times, but could use some more testing since it's something that could get some error-prone performance improvements in the future
    • Using incorrect alignment: seems fine to me at a glance, and has some existing tests
    • Multiple mutable references to the same data: seems to correctly check uniqueness before allowing for mutation, although some more debug_assert!s wouldn't hurt
    • Use after free: Only issue I found was that extremely degenerate clone() issue. All the lifetimes seem right, so far
    • Making assumptions about #[repr(Rust)] layout: EcoVec seems appropriately #[repr(C)] and running with RUSTFLAGS="-Zrandomize-layout" didn't find any issues

    Still have more auditing to do, but so far things seem very solid :+1:

    opened by CosmicHorrorDev 2
  • Made vec and string generic over ref counter

    Made vec and string generic over ref counter

    Added Counter trait

    Don't think you wanna merge as is because I had to make the Vec ptr field into a regular, nullable pointer as I couldn't figure out how to create a static Header<Rc> value for any Rc: Counter.

    Also fixed some clippy lints.

    opened by jesnor 1
  • fix(clippy): Ran `cargo clippy --fix`

    fix(clippy): Ran `cargo clippy --fix`

    It removed some unnecessary references and changed a str comparison to a char comparison, it also removed a clone in a test case, but I kept it because it seemed intentional.

    opened by jalil-salame 1
Releases(v0.1.1)
Owner
Typst
Compose papers faster: Focus on your text and let Typst take care of layout and formatting.
Typst
Thread-safe clone-on-write container for fast concurrent writing and reading.

sync_cow Thread-safe clone-on-write container for fast concurrent writing and reading. SyncCow is a container for concurrent writing and reading of da

null 40 Jan 16, 2023
A compact generational arena data structure for Rust.

Compact Generational Arena This crate provides Arena<T>, a contiguous growable container which assigns and returns IDs to values when they are added t

Chevy Ray Johnston 17 Dec 6, 2022
Powerful math lib for Vector, Matrix and Quaternion operations

An opinionated, powerful math lib for Vector2, Vector3, Matrix and Quaternion operations Vector2 Add, Sub, Div, Mul, Eq Distance Move towards target a

O(ʒ) 4 Mar 28, 2022
Powerful math lib for Vector, Matrix and Quaternion operations

An opinionated, powerful math lib for Vector2, Vector3, Matrix and Quaternion operations Vector2 Add, Sub, Div, Mul, Eq Distance Move towards target a

O(ʒ) 4 Mar 28, 2022
An AI-native lightweight, reliable, and high performance open-source vector database.

What is OasysDB? OasysDB is a vector database that can be used to store and query high-dimensional vectors. Our goal is to make OasysDB fast and easy

Oasys 3 Dec 25, 2023
A HashMap/Vector hybrid: efficient, ordered key-value data storage in Rust.

hashvec A HashVec is a hash map / dictionary whose key-value pairs are stored (and can be iterated over) in a fixed order, by default the order in whi

Skye Terran 2 May 16, 2022
Type erased vector. All elements have the same type.

Type erased vector. All elements have the same type. Designed to be type-erased as far as possible - most of the operations does not know about concre

null 7 Dec 3, 2022
Vemcache is an in-memory vector database.

Vemcache Vemcache is an in-memory vector database. Vemcache can be thought of as the Redis equivalent for vector databases. Getting Started Prerequisi

Faizaan Chishtie 8 May 21, 2023
The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors.

Fast Vector Similarity Library Introduction The Fast Vector Similarity Library is designed to provide efficient computation of various similarity meas

Jeff Emanuel 243 Sep 6, 2023
The first compute-centric vector graphic video game

?? Vong This repository contains source code for the first native use of a compute-centric vector graphics video game, inspired by Pong. ✍ Authors @

Spencer C. Imbleau 29 Nov 24, 2023
Rust Vector for large amounts of data, that does not copy when growing, by using full `mmap`'d pages.

Large Vector Rust Vector for large amounts of data, that does not copy when growing, by using full mmap'd pages. Maturity I made ths to learn about mm

Wonko der VerstÀndige 21 Apr 23, 2024
sodiumoxide clone, now with less chlorides and more oxides

sodiumoxide2 Less Chlorides, more Oxides! This package is a fork of sodiumoxide, using rust-native crypto. It's not intended to be easy to use or impl

Conrad Ludgate 3 Apr 16, 2023
Simple string matching with questionmark- and star-wildcard operator

wildmatch Match strings against a simple wildcard pattern. Tests a wildcard pattern p against an input string s. Returns true only when p matches the

Armin Becher 38 Dec 18, 2022
Parses a relative time string and returns a `Duration`

humantime_to_duration A Rust crate for parsing human-readable relative time strings and converting them to a Duration. Features Parses a variety of hu

null 5 Apr 25, 2023
clone of grep cli written in Rust. From Chapter 12 of the Rust Programming Language book

minigrep is a clone of the grep cli in rust Minigrep will find a query string in a file. To test it out, clone the project and run cargo run body poem

Raunak Singh 1 Dec 14, 2021
Twitter clone written in Rust.

Percent A Work In Progress Twitter clone written in Rust. Requirements Docker compose Rust Building Run: cargo build Running Start database: docker-co

Nik 3 Dec 26, 2022
Derive `clone_from` for `Clone` in Rust

Derive Clone including clone_from This crate offers a derive macro called CloneFrom to derive Clone with a specialized clone_from implementation. When

Moritz Hoffmann 7 Mar 13, 2023
Aim is to make PCBoard clone that's finished on PCboards 100th birthday

icy_board Aim is to make PCBoard clone that's finished on PCboards 100th birthday - for sure it'll take much time. I've started to port a PPL decompil

Mike KrĂŒger 6 May 7, 2024
Stack heap flexible string designed to improve performance for Rust

flexible-string A stack heap flexible string designed to improve performance. FlexibleString was first implemented in spdlog-rs crate, which improved

Sprite 6 Feb 9, 2022