rsonpath
β SIMD-powered JSONPath
π
Experimental JSONPath engine for querying massive streamed datasets.
Features
The rsonpath
crate provides a JSONPath parser and a query execution engine, which utilizes SIMD instructions to provide massive throughput improvements over conventional engines.
Benchmarks of rsonpath
against a reference no-SIMD engine on the Pison dataset. NOTE: Scale is logarithmic!
Supported selectors
The project is actively developed and currently supports only a subset of the JSONPath query language.
Selector | Syntax | Supported | Since | Tracking Issue |
---|---|---|---|---|
Root | $ |
|
v0.1.0 | |
Dot | .<label> |
|
v0.1.0 | |
Index (object member) | [<label>] |
|
v0.1.0 | |
Index (array index) | [<index>] |
|
- | #64 |
Index (array index from end) | [-<index>] |
|
- | |
Descendant | .. |
|
v0.1.0 | |
Child wildcard | .* , .[*] |
|
v0.3.0 | |
Descendant wildcard | ..* , ..[*] |
|
v0.4.0 | |
Slice | [<start>:<end>:<step>] |
|
- | |
List | [<sel1>, <sel2>, ..., <selN>] |
|
- | |
Filter | [?(<expr>)] |
|
- |
Installation
See Releases for precompiled binaries for all first-class support targets.
Easiest way to install is via cargo
.
cargo install rsonpath
This might fail with the following error:
Target architecture is not supported by SIMD features of this crate. Disable the default `simd` feature.
This means the SIMD features of the engine are not implemented for the machine's CPU. You can still use rsonpath
, but the speed will be limited (see the reference engine in the chart above). To install without simd, run cargo install --no-default-features -F default-optimizations
.
Alternatively, you can download the source code and manually run just install
(requires just
) or cargo install --path ./crates/rsonpath
.
Native CPU optimizations
If maximum speed is paramount, you should install rsonpath
with native CPU instructions support. This will result in a binary that is not portable and might work incorrectly on any other machine, but will squeeze out every last bit of throughput.
To do this, run the following cargo install
variant:
RUSTFLAGS="-C target-cpu=native" cargo install rsonpath
Usage
To run a JSONPath query on a file execute:
rsonpath '$..a.b' ./file.json
If the file is omitted, the engine reads standard input.
For details, consult rsonpath --help
.
Results
The results are presented as an array of indices at which a colon of a matching record was found, the comma directly preceding the matched record in a list, or the opening bracket of the list in case of the first element in it. Alternatively, passing --result count
returns only the number of matches. Work to support more useful result reports is ongoing.
Engine
By default, the main SIMD engine is used. On machines not supporting SIMD, the recursive implementation might be faster in some cases. To change the engine use --engine recursive
.
Supported platforms
The crate is continuously built for all Tier 1 Rust targets, and tests are continuously ran for targets that can be ran with GitHub action images. SIMD is supported only on x86-64 platforms for AVX2, while nosimd builds are always available for all targets.
Target triple | nosimd build | SIMD support | Continuous testing | Tracking issues |
---|---|---|---|---|
aarch64-unknown-linux-gnu |
|
|
|
#21, #115 |
i686-unknown-linux-gnu |
|
|
|
#14 |
x86_64-unknown-linux-gnu |
|
|
|
|
x86_64-apple-darwin |
|
|
|
|
i686-pc-windows-gnu |
|
|
|
#14 |
i686-pc-windows-msvc |
|
|
|
#14 |
x86_64-pc-windows-gnu |
|
|
|
|
x86_64-pc-windows-msvc |
|
|
|
Caveats and limitations
JSONPath
Not all selectors are supported, see the support table above.
Duplicate keys
The engine assumes that every object in the input JSON has no duplicate keys. Behavior on duplicate keys is not guaranteed to be stable, but currently the engine will simply match the first such key.
> rsonpath '$.key'
{"key":"value","key":"other value"}
[6]
This behavior can be overriden with a custom installation of rsonpath
, disabling the default unique-labels
feature. This will hurt performance.
> cargo install rsonpath --no-default-features -F simd -F head-skip -F tail-skip
> rsonpath '$.key'
{"key":"value","key":"other value"}
[6, 20]
Unicode
The engine does not parse unicode escape sequences in labels. This means that a label "a"
is different from a label "\u0041"
, even though semantically they represent the same string. Parsing unicode sequences is costly, so the support for this was postponed in favour of high performance. It would be possible for a flag to exist to trigger this behaviour, but it is not currently worked on.
Build & test
The dev workflow utilizes just
. Use the included Justfile
. It will automatically install Rust for you using the rustup
tool if it detects there is no Cargo in your environment.
just build
just test
Benchmarks
Benchmarks for rsonpath
are located in a separate repository, included as a git submodule in this main repository.
Easiest way to run all the benchmarks is just bench
. For details, look at the README in the submodule.
Background
This project is the result of my thesis. You can read it for details on the theoretical background on the engine and details of its implementation.
Dependencies
Showing direct dependencies, for full graph see below.
cargo tree --package rsonpath --edges normal --depth 1
rsonpath v0.4.0 (/home/mat/rsonpath/crates/rsonpath)
βββ clap v4.1.11
βββ color-eyre v0.6.2
βββ eyre v0.6.8
βββ log v0.4.17
βββ rsonpath-lib v0.4.0 (/home/mat/rsonpath/crates/rsonpath-lib)
βββ simple_logger v4.1.0
cargo tree --package rsonpath-lib --edges normal --depth 1
rsonpath-lib v0.4.0 (/home/mat/rsonpath/crates/rsonpath-lib)
βββ aligners v0.0.10
βββ cfg-if v1.0.0
βββ log v0.4.17
βββ memchr v2.5.0
βββ nom v7.1.3
βββ replace_with v0.1.7
βββ smallvec v1.10.0
βββ thiserror v1.0.40
βββ vector-map v1.0.1
Justification
-
clap
β standard crate to provide the CLI. -
color-eyre
,eyre
β more accessible error messages for the parser. -
log
,simple-logger
β diagnostic logs during compilation and execution. -
aligners
β SIMD operations require correct input data alignment, putting those requirements at type level makes our code more robust. -
cfg-if
β used to support SIMD and no-SIMD versions. -
memchr
β rapid, SIMDified substring search for fast-forwarding to labels. -
nom
β for parser implementation. -
replace_with
β for safe handling of internal classifier state when switching classifiers. -
smallvec
β crucial for small-stack performance. -
thiserror
β idiomaticError
implementations. -
vector_map
β used in the query compiler for measurably better performance.
Full dependency tree
cargo tree --package rsonpath --edges normal
rsonpath v0.4.0 (/home/mat/rsonpath/crates/rsonpath)
βββ clap v4.1.11
β βββ bitflags v2.0.2
β βββ clap_derive v4.1.9 (proc-macro)
β β βββ heck v0.4.1
β β βββ proc-macro-error v1.0.4
β β β βββ proc-macro-error-attr v1.0.4 (proc-macro)
β β β β βββ proc-macro2 v1.0.52
β β β β β βββ unicode-ident v1.0.6
β β β β βββ quote v1.0.26
β β β β βββ proc-macro2 v1.0.52 (*)
β β β βββ proc-macro2 v1.0.52 (*)
β β β βββ quote v1.0.26 (*)
β β β βββ syn v1.0.107
β β β βββ proc-macro2 v1.0.52 (*)
β β β βββ quote v1.0.26 (*)
β β β βββ unicode-ident v1.0.6
β β βββ proc-macro2 v1.0.52 (*)
β β βββ quote v1.0.26 (*)
β β βββ syn v1.0.107 (*)
β βββ clap_lex v0.3.1
β β βββ os_str_bytes v6.4.1
β βββ is-terminal v0.4.3
β β βββ io-lifetimes v1.0.5
β β β βββ libc v0.2.139
β β βββ rustix v0.36.8
β β βββ bitflags v1.3.2
β β βββ io-lifetimes v1.0.5 (*)
β β βββ libc v0.2.139
β β βββ linux-raw-sys v0.1.4
β βββ once_cell v1.17.0
β βββ strsim v0.10.0
β βββ termcolor v1.2.0
β βββ terminal_size v0.2.3
β βββ rustix v0.36.8 (*)
βββ color-eyre v0.6.2
β βββ backtrace v0.3.67
β β βββ addr2line v0.19.0
β β β βββ gimli v0.27.1
β β βββ cfg-if v1.0.0
β β βββ libc v0.2.139
β β βββ miniz_oxide v0.6.2
β β β βββ adler v1.0.2
β β βββ object v0.30.3
β β β βββ memchr v2.5.0
β β βββ rustc-demangle v0.1.21
β βββ eyre v0.6.8
β β βββ indenter v0.3.3
β β βββ once_cell v1.17.0
β βββ indenter v0.3.3
β βββ once_cell v1.17.0
β βββ owo-colors v3.5.0
βββ eyre v0.6.8 (*)
βββ log v0.4.17
β βββ cfg-if v1.0.0
βββ rsonpath-lib v0.4.0 (/home/mat/rsonpath/crates/rsonpath-lib)
β βββ aligners v0.0.10
β β βββ cfg-if v1.0.0
β β βββ lazy_static v1.4.0
β β βββ page_size v0.4.2
β β βββ libc v0.2.139
β βββ cfg-if v1.0.0
β βββ log v0.4.17 (*)
β βββ memchr v2.5.0
β βββ nom v7.1.3
β β βββ memchr v2.5.0
β β βββ minimal-lexical v0.2.1
β βββ replace_with v0.1.7
β βββ smallvec v1.10.0
β βββ thiserror v1.0.40
β β βββ thiserror-impl v1.0.40 (proc-macro)
β β βββ proc-macro2 v1.0.52 (*)
β β βββ quote v1.0.26 (*)
β β βββ syn v2.0.4
β β βββ proc-macro2 v1.0.52 (*)
β β βββ quote v1.0.26 (*)
β β βββ unicode-ident v1.0.6
β βββ vector-map v1.0.1
β βββ contracts v0.4.0 (proc-macro)
β β βββ proc-macro2 v1.0.52 (*)
β β βββ quote v1.0.26 (*)
β β βββ syn v1.0.107 (*)
β βββ rand v0.7.3
β βββ getrandom v0.1.16
β β βββ cfg-if v1.0.0
β β βββ libc v0.2.139
β βββ libc v0.2.139
β βββ rand_chacha v0.2.2
β β βββ ppv-lite86 v0.2.17
β β βββ rand_core v0.5.1
β β βββ getrandom v0.1.16 (*)
β βββ rand_core v0.5.1 (*)
βββ simple_logger v4.1.0
βββ colored v2.0.0
β βββ atty v0.2.14
β β βββ libc v0.2.139
β βββ lazy_static v1.4.0
βββ log v0.4.17 (*)
βββ time v0.3.17
βββ itoa v1.0.5
βββ libc v0.2.139
βββ num_threads v0.1.6
βββ time-core v0.1.0
βββ time-macros v0.2.6 (proc-macro)
βββ time-core v0.1.0