Rust port of simdjson

Overview

SIMD Json for Rust   Build Status Build Status ARM Quality Latest Version Code Coverage

Rust port of extremely fast simdjson JSON parser with serde compatibility.


readme (for real!)

simdjson version

Currently tracking version 0.2.x of simdjson upstream (work in progress, feedback welcome!).

CPU target

To be able to take advantage of simd-json your system needs to be SIMD capable. This means that it needs to compile with native cpu support and the given features. This also requires that projects using simd-json also need to be configured with native cpu support. Look at The cargo config in this repository to get an example of how to configure this in your project.

simd-json supports AVX2, SSE4.2 and NEON.

Unless the allow-non-simd feature is passed to your simd-json dependency in your Cargo.toml simd-json will fail to compile, this is to prevent unexpected slowness in fallback mode that can be hard to understand and hard to debug.

allocator

For best performance we highly suggest using mimalloc or jemalloc instead of the system allocator used by default. Another recent allocator that works well ( but we have yet to test in production a setting ) is snmalloc.

serde

simd-json is compatible with serde and serde-json. The Value types provided implement serializers and deserializers. In addition to that simd-json implements the Deserializer trait for the parser so it can deserialize anything that implements the serde Deserialize trait. Note, that serde provides both a Deserializer and a Deserialize trait.

That said the serde support is contained in the serde_impl feature which is part of the default feature set of simd-json, but it can be disabled.

known-key

The known-key feature changes the hash mechanism for the DOM representation of the underlying JSON object, from ahash to fxhash. The ahash hasher is faster at hashing and provides protection against DOS attacks by forcing multiple keys into a single hashing bucket. The fxhash hasher on the other hand allows for repeatable hashing results, which in turn allows memoizing hashes for well known keys and saving time on lookups. In workloads that are heavy at accessing some well known keys this can be a performance advantage.

The known-key feature is optional and disabled by default and should be explicitly configured.

serializing

simd-json is not capable of serializing JSON data as there would be very little gain in re-implementing it. For serialization, we typically rely on serde-json.

For DOM values we provide convience methods for serialization.

For struct values we defer to external serde-compatible serialization mechanisms.

unsafe

simd-json uses a lot of unsafe code.

There are a few reasons for this:

  • SIMD intrinsics are inherently unsafe. These uses of unsafe are inescapable in a library such as simd-json.
  • We work around some performance bottlenecks imposed by safe rust. These are avoidable, but at a cost to performance. This is a more considered path in simd-json.

simd-json goes through extra scrutiny for unsafe code. These steps are:

  • Unit tests - to test 'the obvious' cases, edge cases, and regression cases
  • Structural constructive property based testing - We generate random valid JSON objects to exercise the full simd-json codebase stochastically. Floats are currently excluded since slighty different parsing algorihtms lead to slighty different results here. In short "is simd-json correct".
  • Data-oriented property based testing of string-like data - to assert that sequences of legal printable characters don't panic or crash the parser (they might and often error so - they are not valid json!)
  • Destructive Property based testing - make sure that no illegal byte sequences crash the parser in any way
  • Fuzzing (using American Fuzzy Lop - afl) - fuzz based on upstream simd pass/fail cases

This doesn't ensure complete safety nor is at a bullet proof guarantee, but it does go a long way to asserting that the library is production quality and fit for purpose for practical industrial applications.

Other interesting things

There are also bindings for upstream simdjson available here

License

simd-json itself is licensed under either of

However it ports a lot of code from simdjson so their work and copyright on that should be respected along side.

The serde integration is based on their example and serde-json so again, their copyright should as well be respected.

Comments
  • RFC: Neon support (pretty much working)

    RFC: Neon support (pretty much working)

    Hello hello!

    I have been pulling some of your Neon intrinsics and porting the simdjson neon code. Maybe it's useful! I'll keep improving it... Comments welcome anytime!

    All the best,

    -Sunny

    opened by sunnygleason 50
  • 0.2 work

    0.2 work

    This is a work branch for the 0.2 release where we can introduce breaking changes for things we do not like in 0.1.

    So far:

    • [x] Add U64 type
    • [x] Remove deprecated functions
    • [x] Box objects to reduce enum size
    • [x] Add object and array access to value trait for convenience
    • [x] Arch alignment
    opened by Licenser 21
  • Test Rust optimization when target-features has

    Test Rust optimization when target-features has "allow-non-simd" and the CPU flag is set

    Summary

    To enable SIMD one has to do these things

    1. In Cargo.toml set the target-features
    simd-json = { version = "0.4", features = ["allow-non-simd", "known-key", "serde_impl"] }
    
    1. Pass CPU flag either directly to rustc or via adding the following in .cargo/config
    [build]
    rustflags = "-C target-cpu=native"
    

    But sometimes, despite adding the above, and especially with allow-non-simd in the features, the result is still not SIMD compatible code.

    To do

    Write tests that help identify if the rustc optimization is prevented by modifying Cargo.toml

    simd-json = { version = "0.4", features = ["known-key", "serde_impl"] }
    
    opened by amanjeev 17
  • Support for Apple Silicon(aarch64, M1)

    Support for Apple Silicon(aarch64, M1)

    I ran into a problem while compiling.

       Compiling value-trait v0.1.19
    error[E0425]: cannot find function `write_str_simd` in this scope
       --> /Users/jack/.cargo/registry/src/mirrors.tuna.tsinghua.edu.cn-df7c3c540f42cdbd/value-trait-0.1.19/src/generator.rs:162:19
        |
    162 |             stry!(write_str_simd(self.get_writer(), &mut string,));
        |                   ^^^^^^^^^^^^^^ not found in this scope
    

    I looked up the value-trait and added

    value-trait = { version = "*", features = ["neon"] }
    

    The problem became

       Compiling value-trait v0.1.19
    error[E0614]: type `u8` cannot be dereferenced
       --> /Users/jack/.cargo/registry/src/mirrors.tuna.tsinghua.edu.cn-df7c3c540f42cdbd/value-trait-0.1.19/src/generator.rs:669:48
        |
    669 |                 b'u' => stry!(u_encode(writer, *ch)),
        |                                                ^^^
    

    Since value-trait is not in active development, I post the issue here as well.

    ➜  ~ neofetch
                        'c.          [email protected]
                     ,xNMM.          ----------------------------
                   .OMMMMo           OS: macOS 11.1 20C69 arm64
                   OMMM0,            Host: MacBookAir10,1
         .;loddo:' loolloddol;.      Kernel: 20.2.0
       cKMMMMMMMMMMNWMMMMMMMMMM0:    Uptime: 18 hours, 36 mins
     .KMMMMMMMMMMMMMMMMMMMMMMMWd.    Packages: 45 (brew)
     XMMMMMMMMMMMMMMMMMMMMMMMX.      Shell: zsh 5.8
    ;MMMMMMMMMMMMMMMMMMMMMMMM:       Resolution: 1440x900
    :MMMMMMMMMMMMMMMMMMMMMMMM:       DE: Aqua
    .MMMMMMMMMMMMMMMMMMMMMMMMX.      WM: Quartz Compositor
     kMMMMMMMMMMMMMMMMMMMMMMMMWd.    WM Theme: Blue (Dark)
     .XMMMMMMMMMMMMMMMMMMMMMMMMMMk   Terminal: iTerm2
      .XMMMMMMMMMMMMMMMMMMMMMMMMK.   Terminal Font: FiraCode-Regular 12
        kMMMMMMMMMMMMMMMMMMMMMMd     CPU: Apple M1
         ;KMMMMMMMWXXWMMMMMMMk.      GPU: Apple M1
           .cooc,.    .,coo:.        Memory: 1296MiB / 8192MiB
    
    
    
    
    opened by qiujiangkun 15
  • Implement number parsing from simdjson v0.3.1

    Implement number parsing from simdjson v0.3.1

    This includes accurate and consistent float parsing. Float parsing is generally handled by the fast path but falls back to lexical-core.

    This passes number tests from JSONTestSuite except n_multidigit_number_then_00.json because of the trailing padding.

    opened by ijl 14
  • Illegal Instruction (core dumped)

    Illegal Instruction (core dumped)

    Important to note:

    • According to cat /proc/cpuinfo, my cpu supports SSE4.2.
    • I have the .cargo/config of this repository in place
    • I am using rustc 1.53.0

    After compiling the program, it crashes with the message Illegal Instruction (core dumped).

    How can I fix this?

    opened by superblaubeere27 12
  • RFC: implementation of SSE 4.2 compatible parsing (incl. utf8)

    RFC: implementation of SSE 4.2 compatible parsing (incl. utf8)

    Greetings! Thank you so much for your amazing work with simdjson-rs.

    I did some work earlier this year on an SSE 4.2 port of simdjson, and I'd love to gain more experience with Rust.

    I humbly submit this code for your comment & consideration. I'm not an expert in Rust feature detection & conditional compilation, but hopefully this code gives an idea of what the SSE-compatible code looks like.

    If you think it is an interesting idea, I'd be happy to do whatever work to get it in shape for possible merging.

    The items that are still noticeable TODO:

    • utf8 validation (can port SSE version for conditional usage)
    • conditional compilation/feature detection

    Thank you again for your consideration!

    Sincerely,

    -Sunny Gleason

    opened by sunnygleason 12
  • error[E0432]: unresolved import `crate::implementation::aarch64::neon`

    error[E0432]: unresolved import `crate::implementation::aarch64::neon`

    error[E0432]: unresolved import `crate::implementation::aarch64::neon`
       --> /Users/davirain/.cargo/registry/src/github.com-1ecc6299db9ec823/simdutf8-0.1.3/src/basic.rs:223:53
        |
    223 |             pub use crate::implementation::aarch64::neon::validate_utf8_basic as validate_utf8;
        |                                                     ^^^^ could not find `neon` in `aarch64`
    
    error[E0432]: unresolved import `crate::implementation::aarch64::neon`
       --> /Users/davirain/.cargo/registry/src/github.com-1ecc6299db9ec823/simdutf8-0.1.3/src/basic.rs:224:53
        |
    224 |             pub use crate::implementation::aarch64::neon::ChunkedUtf8ValidatorImp;
        |                                                     ^^^^ could not find `neon` in `aarch64`
    
    error[E0432]: unresolved import `crate::implementation::aarch64::neon`
       --> /Users/davirain/.cargo/registry/src/github.com-1ecc6299db9ec823/simdutf8-0.1.3/src/basic.rs:225:53
        |
    225 |             pub use crate::implementation::aarch64::neon::Utf8ValidatorImp;
        |                                                     ^^^^ could not find `neon` in `aarch64`
    
    error[E0432]: unresolved import `crate::implementation::aarch64::neon`
       --> /Users/davirain/.cargo/registry/src/github.com-1ecc6299db9ec823/simdutf8-0.1.3/src/compat.rs:124:53
        |
    124 |             pub use crate::implementation::aarch64::neon::validate_utf8_compat as validate_utf8;
        |                                                     ^^^^ could not find `neon` in `aarch64`
    
    For more information about this error, try `rustc --explain E0432`.
    error: could not compile `simdutf8` due to 4 previous errors
    warning: build failed, waiting for other jobs to finish...
    error: build failed
    
    opened by DaviRain-Su 11
  • Use simdutf8 for UTF-8 validation

    Use simdutf8 for UTF-8 validation

    This went rather well. LLVM does an awesome job optimizing and the benchmark results are virtually identical.

    Still experimenting a bit with the API in https://github.com/rusticstuff/simdutf8/pull/44.

    Comments welcome.

    opened by hkratz 11
  • Use with serde_derive

    Use with serde_derive

    Is it possible to use this crate with serde_derive to speed up parsing? E.g. currently I have

    #[derive(Serialize, Deserialize, Debug)]
    pub struct Foo {
      pub a: u64,
      pub b: Vec<u64>,
      pub c: Option<u64>,
    }
    ...
    
      let file = File::open(filename)?;
      let reader = BufReader::new(file);
      let result: Foo = serde_json::from_reader(reader)?
    

    Is there a way to do something like this?

      let result: Foo = simd_json::from_reader(reader)?
    
    opened by Timmmm 11
  • Improved number handling

    Improved number handling

    It would be nice to improve number handling, right now we only have I64 and F64, this ignores U64 and extended values such as i and u 128.

    defaulting to the 128 bit types will probably have a decreased performance we might want to avoid that, we could however extend then as additional types? but that would explode the number of types we have (however ValueTrait mittigates that).

    we definetly want a u64 type along with the i64 one to be able to represent those numbers.

    enhancement medium 
    opened by Licenser 11
Owner
null
A port of the Node.js library json-file-store

A port of the Node.js library json-file-store

Markus Kohlhase 60 Dec 19, 2022
JSON parser which picks up values directly without performing tokenization in Rust

Pikkr JSON parser which picks up values directly without performing tokenization in Rust Abstract Pikkr is a JSON parser which picks up values directl

Pikkr 615 Dec 29, 2022
Strongly typed JSON library for Rust

Serde JSON   Serde is a framework for serializing and deserializing Rust data structures efficiently and generically. [dependencies] serde_json = "1.0

null 3.6k Jan 5, 2023
JSON implementation in Rust

json-rust Parse and serialize JSON with ease. Changelog - Complete Documentation - Cargo - Repository Why? JSON is a very loose format where anything

Maciej Hirsz 500 Dec 21, 2022
Get JSON values quickly - JSON parser for Rust

get json values quickly GJSON is a Rust crate that provides a fast and simple way to get values from a json document. It has features such as one line

Josh Baker 160 Dec 29, 2022
This library is a pull parser for CommonMark, written in Rust

This library is a pull parser for CommonMark, written in Rust. It comes with a simple command-line tool, useful for rendering to HTML, and is also designed to be easy to use from as a library.

Raph Levien 1.5k Jan 1, 2023
A rust script to convert a better bibtex json file from Zotero into nice organised notes in Obsidian

Zotero to Obsidian script This is a script that takes a better bibtex JSON file exported by Zotero and generates an organised collection of reference

Sashin Exists 3 Oct 9, 2022
Fontdue - The fastest font renderer in the world, written in pure rust.

Fontdue is a simple, no_std (does not use the standard library for portability), pure Rust, TrueType (.ttf/.ttc) & OpenType (.otf) font rasterizer and layout tool. It strives to make interacting with fonts as fast as possible, and currently has the lowest end to end latency for a font rasterizer.

Joe C 1k Jan 2, 2023
CLI tool to convert HOCON into valid JSON or YAML written in Rust.

{hocon:vert} CLI Tool to convert HOCON into valid JSON or YAML. Under normal circumstances this is mostly not needed because hocon configs are parsed

Mathias Oertel 23 Jan 6, 2023
Typify - Compile JSON Schema documents into Rust types.

Typify Compile JSON Schema documents into Rust types. This can be used ... via the macro import_types!("types.json") to generate Rust types directly i

Oxide Computer Company 73 Dec 27, 2022
A easy and declarative way to test JSON input in Rust.

assert_json A easy and declarative way to test JSON input in Rust. assert_json is a Rust macro heavily inspired by serde json macro. Instead of creati

Charles Vandevoorde 8 Dec 5, 2022
Hjson for Rust

hjson-rust for serde { # specify rate in requests/second (because comments are helpful!) rate: 1000 // prefer c-style comments? /* feeling ol

Hjson 83 Oct 5, 2022
A small rust database that uses json in memory.

Tiny Query Database (TQDB) TQDB is a small library for creating a query-able database that is encoded with json. The library is well tested (~96.30% c

Kace Cottam 2 Jan 4, 2022
A JSON Query Language CLI tool built with Rust 🦀

JQL A JSON Query Language CLI tool built with Rust ?? ?? Core philosophy ?? Stay lightweight ?? Keep its features as simple as possible ?? Avoid redun

Davy Duperron 872 Jan 1, 2023
An implementation of the JSONPath A spec in Rust, with several extensions added on

Rust JSONPath Plus An implementation of the JSONPath A spec in Rust, with several extensions added on. This library also supports retrieving AST analy

Rune Tynan 4 Jul 13, 2022
An HCL serializer/deserializer for rust

hcl-rs This crate provides functionality to deserialize, serialize and manipulate HCL data. The main types are Deserializer for deserializing data, Se

null 56 Dec 31, 2022
Rust libraries and tools to help with interoperability and testing of serialization formats based on Serde.

The repository zefchain/serde-reflection is based on Facebook's repository novifinancial/serde-reflection. We are now maintaining the project here and

Zefchain Labs 46 Dec 22, 2022
Blazing fast Rust JSONPath query engine.

rsonpath – SIMD-powered JSONPath ?? Experimental JSONPath engine for querying massive streamed datasets. Features The rsonpath crate provides a JSONPa

V0ldek 21 Apr 11, 2023
A Rust program that analyzes your TikTok data.

The TikTok JSON analyzer This is a program to analyze your TikTok data and calculate these statistics : Number of logins (in the last 6 months) and lo

Elazrod56 3 May 2, 2023