Strongly typed JSON library for Rust

Related tags

rust json serde
Overview

Serde JSON   Build Status Latest Version Rustc Version 1.31+

Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.


[dependencies]
serde_json = "1.0"

You may be looking for:

JSON is a ubiquitous open-standard format that uses human-readable text to transmit data objects consisting of key-value pairs.

{
    "name": "John Doe",
    "age": 43,
    "address": {
        "street": "10 Downing Street",
        "city": "London"
    },
    "phones": [
        "+44 1234567",
        "+44 2345678"
    ]
}

There are three common ways that you might find yourself needing to work with JSON data in Rust.

  • As text data. An unprocessed string of JSON data that you receive on an HTTP endpoint, read from a file, or prepare to send to a remote server.
  • As an untyped or loosely typed representation. Maybe you want to check that some JSON data is valid before passing it on, but without knowing the structure of what it contains. Or you want to do very basic manipulations like insert a key in a particular spot.
  • As a strongly typed Rust data structure. When you expect all or most of your data to conform to a particular structure and want to get real work done without JSON's loosey-goosey nature tripping you up.

Serde JSON provides efficient, flexible, safe ways of converting data between each of these representations.

Operating on untyped JSON values

Any valid JSON data can be manipulated in the following recursive enum representation. This data structure is serde_json::Value.

enum Value {
    Null,
    Bool(bool),
    Number(Number),
    String(String),
    Array(Vec<Value>),
    Object(Map<String, Value>),
}

A string of JSON data can be parsed into a serde_json::Value by the serde_json::from_str function. There is also from_slice for parsing from a byte slice &[u8] and from_reader for parsing from any io::Read like a File or a TCP stream.

use serde_json::{Result, Value};

fn untyped_example() -> Result<()> {
    // Some JSON input data as a &str. Maybe this comes from the user.
    let data = r#"
        {
            "name": "John Doe",
            "age": 43,
            "phones": [
                "+44 1234567",
                "+44 2345678"
            ]
        }"#;

    // Parse the string of data into serde_json::Value.
    let v: Value = serde_json::from_str(data)?;

    // Access parts of the data by indexing with square brackets.
    println!("Please call {} at the number {}", v["name"], v["phones"][0]);

    Ok(())
}

The result of square bracket indexing like v["name"] is a borrow of the data at that index, so the type is &Value. A JSON map can be indexed with string keys, while a JSON array can be indexed with integer keys. If the type of the data is not right for the type with which it is being indexed, or if a map does not contain the key being indexed, or if the index into a vector is out of bounds, the returned element is Value::Null.

When a Value is printed, it is printed as a JSON string. So in the code above, the output looks like Please call "John Doe" at the number "+44 1234567". The quotation marks appear because v["name"] is a &Value containing a JSON string and its JSON representation is "John Doe". Printing as a plain string without quotation marks involves converting from a JSON string to a Rust string with as_str() or avoiding the use of Value as described in the following section.

The Value representation is sufficient for very basic tasks but can be tedious to work with for anything more significant. Error handling is verbose to implement correctly, for example imagine trying to detect the presence of unrecognized fields in the input data. The compiler is powerless to help you when you make a mistake, for example imagine typoing v["name"] as v["nmae"] in one of the dozens of places it is used in your code.

Parsing JSON as strongly typed data structures

Serde provides a powerful way of mapping JSON data into Rust data structures largely automatically.

use serde::{Deserialize, Serialize};
use serde_json::Result;

#[derive(Serialize, Deserialize)]
struct Person {
    name: String,
    age: u8,
    phones: Vec<String>,
}

fn typed_example() -> Result<()> {
    // Some JSON input data as a &str. Maybe this comes from the user.
    let data = r#"
        {
            "name": "John Doe",
            "age": 43,
            "phones": [
                "+44 1234567",
                "+44 2345678"
            ]
        }"#;

    // Parse the string of data into a Person object. This is exactly the
    // same function as the one that produced serde_json::Value above, but
    // now we are asking it for a Person as output.
    let p: Person = serde_json::from_str(data)?;

    // Do things just like with any other Rust data structure.
    println!("Please call {} at the number {}", p.name, p.phones[0]);

    Ok(())
}

This is the same serde_json::from_str function as before, but this time we assign the return value to a variable of type Person so Serde will automatically interpret the input data as a Person and produce informative error messages if the layout does not conform to what a Person is expected to look like.

Any type that implements Serde's Deserialize trait can be deserialized this way. This includes built-in Rust standard library types like Vec<T> and HashMap<K, V>, as well as any structs or enums annotated with #[derive(Deserialize)].

Once we have p of type Person, our IDE and the Rust compiler can help us use it correctly like they do for any other Rust code. The IDE can autocomplete field names to prevent typos, which was impossible in the serde_json::Value representation. And the Rust compiler can check that when we write p.phones[0], then p.phones is guaranteed to be a Vec<String> so indexing into it makes sense and produces a String.

The necessary setup for using Serde's derive macros is explained on the Using derive page of the Serde site.

Constructing JSON values

Serde JSON provides a json! macro to build serde_json::Value objects with very natural JSON syntax.

use serde_json::json;

fn main() {
    // The type of `john` is `serde_json::Value`
    let john = json!({
        "name": "John Doe",
        "age": 43,
        "phones": [
            "+44 1234567",
            "+44 2345678"
        ]
    });

    println!("first phone number: {}", john["phones"][0]);

    // Convert to a string of JSON and print it out
    println!("{}", john.to_string());
}

The Value::to_string() function converts a serde_json::Value into a String of JSON text.

One neat thing about the json! macro is that variables and expressions can be interpolated directly into the JSON value as you are building it. Serde will check at compile time that the value you are interpolating is able to be represented as JSON.

let full_name = "John Doe";
let age_last_year = 42;

// The type of `john` is `serde_json::Value`
let john = json!({
    "name": full_name,
    "age": age_last_year + 1,
    "phones": [
        format!("+44 {}", random_phone())
    ]
});

This is amazingly convenient but we have the problem we had before with Value which is that the IDE and Rust compiler cannot help us if we get it wrong. Serde JSON provides a better way of serializing strongly-typed data structures into JSON text.

Creating JSON by serializing data structures

A data structure can be converted to a JSON string by serde_json::to_string. There is also serde_json::to_vec which serializes to a Vec<u8> and serde_json::to_writer which serializes to any io::Write such as a File or a TCP stream.

use serde::{Deserialize, Serialize};
use serde_json::Result;

#[derive(Serialize, Deserialize)]
struct Address {
    street: String,
    city: String,
}

fn print_an_address() -> Result<()> {
    // Some data structure.
    let address = Address {
        street: "10 Downing Street".to_owned(),
        city: "London".to_owned(),
    };

    // Serialize it to a JSON string.
    let j = serde_json::to_string(&address)?;

    // Print, write to a file, or send to an HTTP server.
    println!("{}", j);

    Ok(())
}

Any type that implements Serde's Serialize trait can be serialized this way. This includes built-in Rust standard library types like Vec<T> and HashMap<K, V>, as well as any structs or enums annotated with #[derive(Serialize)].

Performance

It is fast. You should expect in the ballpark of 500 to 1000 megabytes per second deserialization and 600 to 900 megabytes per second serialization, depending on the characteristics of your data. This is competitive with the fastest C and C++ JSON libraries or even 30% faster for many use cases. Benchmarks live in the serde-rs/json-benchmark repo.

Getting help

Serde is one of the most widely used Rust libraries so any place that Rustaceans congregate will be able to help you out. For chat, consider trying the #general or #beginners channels of the unofficial community Discord, the #rust-usage channel of the official Rust Project Discord, or the #general stream in Zulip. For asynchronous, consider the [rust] tag on StackOverflow, the /r/rust subreddit which has a pinned weekly easy questions post, or the Rust Discourse forum. It's acceptable to file a support issue in this repo but they tend not to get as many eyes as any of the above and may get closed without a response after some time.

No-std support

As long as there is a memory allocator, it is possible to use serde_json without the rest of the Rust standard library. This is supported on Rust 1.36+. Disable the default "std" feature and enable the "alloc" feature:

[dependencies]
serde_json = { version = "1.0", default-features = false, features = ["alloc"] }

For JSON support in Serde without a memory allocator, please see the serde-json-core crate.


License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Issues
  • Perfect accuracy float parsing

    Perfect accuracy float parsing

    Float parsing is currently implemented by calculating the significand as u64, casting to f64, and then multiplying or dividing by a nonnegative power of 10. For example the input 10.9876543210987654 would be parsed into the value 109876543210987654_u64 as f64 / 10e15.

    This algorithm is sometimes correct, or else usually close to correct in practical usage. It matches how JSON parsers are implemented in other languages.

    However, it can happen that the result from this algorithm is not the mathematically nearest 64-bit floating point value to the exact value of the input. A "correct" algorithm would always produce the mathematically nearest answer. This requires high precision big-integer arithmetic in the general case so there would be a large performance cost; if implemented, we would likely want this behind a cfg that is off by default, with the current approximate behavior as default. This way programs can opt in to the more expensive algorithm as required.

    fn main() {
        let input = "10.9876543210987654";
        let n: f64 = serde_json::from_str(input).unwrap();
    
        // produces 10.9876543210987644982878919108770787715911865234375
        // which is low by 9.017e-16
        let current_algorithm = 109876543210987654_u64 as f64 / 10e15;
        println!("{}", precise::to_string(current_algorithm));
        assert_eq!(n, current_algorithm);
    
        // produces 10.98765432109876627464473131112754344940185546875
        // which is high by 8.746e-16 (closer)
        let correct_answer = 10.9876543210987654_f64;
        println!("{}", precise::to_string(correct_answer));
        assert_ne!(n, correct_answer);
    }
    
    opened by dtolnay 50
  • Arbitrary-precision numerics support

    Arbitrary-precision numerics support

    Added support for arbitrary-precision numerics, in a similar way that the toml crate does for date-times (using an internal special struct).

    opened by alexreg 30
  • Allow increasing recursion limit

    Allow increasing recursion limit

    I am parsing/serializing pretty large JSON files and I regularly encounter RecursionLimitExceeded. I need a way to instantiate a Serializer/Deserializer with a much larger recursion limit.

    Could we introduce code to let us tweak that?

    enhancement 
    opened by Yoric 23
  • Parser cannot read arbitrary precision numbers

    Parser cannot read arbitrary precision numbers

    http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf specifies in the second paragraph of the introduction that JSON is agnostic about numbers, and simply represents them as a series of digits.

    However, serde-json parses anything with a decimal point as a Rust f64, which causes numbers to be read incorrectly. There is no way to avoid this because this behaviour is chosen as soon as a decimal point is encountered. This makes it impossible to use serde-json to interoperate with financial software using JSON.

    enhancement 
    opened by apoelstra 21
  • Add a RawValue type

    Add a RawValue type

    It would be helpful to have a type similar to Go's json.RawMessage that is not tokenized during deserialization, but rather its raw contents stored as a Vec<u8> or &'de [u8].

    The following pseudo-code demonstrates the idea.

    #[derive(Deserialize)]
    struct Struct {
        /// Deserialized normally.
        core_data: Vec<i32>,
    
        /// Contents of `user_data` are copied / borrowed directly from the input
        /// with no modification.
        ///
        /// `RawValue<'static>` is akin to `Vec<u8>`.
        /// `RawValue<'a>` is akin to `&'a [u8]`.
        user_data: serde_json::RawValue<'static>,
    }
    
    fn main() {
        let json = r#"
        {
            "core_data": [1, 2, 3],
            "user_data": { "foo": {}, "bar": 123, "baz": "abc" }
        }
        "#;
        
        let s: Struct = serde_json::from_bytes(&json).unwrap();
        println!("{}", s.user_data); // "{ \"foo\": {}, \"bar\": 123, \"baz\": \"abc\" }"
    }
    

    The main advantage of this is would be to have 'lazily-deserialized' values.

    enhancement 
    opened by alteous 17
  • Consider serializing map integer keys as strings

    Consider serializing map integer keys as strings

    Right now serde_json rejects serializing maps with integer keys because semantically speaking, JSON only supports maps with string keys. There are workarounds in serde 0.7 with the new #[serde(serialize_with="...", deserialize_with="...)] (see this gist), but it still can be annoying if this keeps causing problems.

    Is there any real value about erroring out on non-key values?

    enhancement 
    opened by erickt 16
  • Output JSON schema during build process

    Output JSON schema during build process

    It would be great if Serde could optionally produce a JSON schema as a side-effect of the build process. AFAIK it has all the information it needs to write one. You just need to translate the structs/enums to their appropriate schema representations (read: matching JSON type).

    Additional:

    While the above is an awesome starting block, it would also be really nice if you could compile-time check that Serde's JSON will validate against an externally provided schema. This isn't totally necessary, as you could do this after the fact with a tool like ajv. It would just provide stronger guarantees if it was compile-time checked.

    Motivation

    • Compatibility: Presently there is no way to guarantee that JSON produced by Serde is compatible with another framework. We can only write tests against JSON samples and write code to match an API spec. We have no way of knowing if either of them is up-to-date or correct.
    • Extendability: Having a portable artifact of your data-representation is an enormously useful tool. In many dynamic languages, you can auto generate data bindings and UIs provided a schema. This allows devs to quickly develop across platforms and languages while maintaining integrity of their data.

    Anticipated Questions:

    • Why Serde? - Serde already has all of the user-facing hardware necessary to produce a schema. Using attributes and types already in the user's code makes adding this feature "free" and to existing libraries.
    • Why at compile-time? - Validating against a schema at compile-time enables devs to "Hack without fear", because they will know that they are properly encoding their data types. It allows devs to easily update their code and immediately know if their schema/data-bindings are out of date.
    enhancement 
    opened by lylemoffitt 16
  • Round trip floats

    Round trip floats

    Ideally this test would pass.

    extern crate serde_json;
    
    #[macro_use]
    extern crate quickcheck;
    
    quickcheck! {
        fn floats(n: f64) -> bool {
            let j = serde_json::to_string(&n).unwrap();
            serde_json::from_str::<f64>(&j).unwrap() == n
        }
    }
    

    On the printing side grisu2 guarantees that the f64 closest to the string representation is identical to the original input, so the inaccuracy is somewhere on the parsing side.

    bug 
    opened by dtolnay 15
  • Arbitrary precision numbers

    Arbitrary precision numbers

    Fixes #18.

    serde_json = { version = "0.9", features = ["arbitrary_precision"] }
    
    #[derive(Serialize, Deserialize)]
    struct S {
        n: serde_json::Number,
        v: serde_json::Value,
    }
    
    let s: S = serde_json::from_str(...)?;
    
    // full precision
    println!("{}", s.n);
    println!("{}", s.v);
    println!("{}", serde_json::to_string(&s)?);
    
    do not merge 
    opened by dtolnay 14
  • Document behavior of to_pretty_string when passing Value

    Document behavior of to_pretty_string when passing Value

    From https://docs.rs/serde_json/1.0.68/serde_json/fn.to_string_pretty.html:

    image

    I assume that if I call

    use serde_json::*;
    to_string_pretty<Value>(...);
    

    I can .unwrap() this result safely, is it right? Or Value's implementation of Serialize might decide to fail?

    If it's the former, can I make a PR adding this piece of info to this doc page?

    opened by marcospb19 0
  • JSON abstraction

    JSON abstraction

    JSON is an ubiquitous format used in many applications. There is no single way of storing JSON values depending on the context, sometimes leading some applications to use multiples representations of JSON values in the same place. This can cause a problem for JSON processing libraries that should not care about the actual internal representation of JSON values, but are forced to stick to a particular format, leading to unwanted and costly conversions between the different formats.

    I am currently working on the json-ld library that provides an implementation of the JSON-LD data interchange format, based on JSON. One of my goals is to provide useful error reports when the input JSON document cannot be processed. This includes pinpointing the exact line-column position of the error, something that cannot be done with serde_json since code-mapping metadata is not kept by the parser. That is why I am also working on a JSON parsing crate providing such information. However I do not want to force my users to use my crate over serde_json whenever precise error reports are not needed.

    To solve this issue, my idea was to define common JSON features in a dedicated library, generic-json (still a work in progress), defining a basic Json trait abstracting away implementation details and a standard Value type defining the structure of a JSON value. This could give something like this:

    /// JSON document with metadata.
    pub trait Json {
    	/// Metadata associated to each JSON value.
    	/// In the case of `serde_json` this would be `()`.
    	/// In my case, that would be a more complicated type including code-mapping info.
    	type MetaData;
    	
    	/// Number type.
    	type Number;
    
    	/// String type.
    	type String;
    
    	/// Array type.
    	type Array;
    
    	/// Object key type.
    	type Key;
    
    	/// Object type.
    	type Object;
    }
    
    pub trait MetaValue<T: Json> {
    	fn metadata(&self) -> &T::Metadata;
    
    	fn value(&self) -> &Value<T>;
    }
    
    pub enum Value<T: Json> {
    	Null,
    	Bool(bool),
    	Number(T::Number),
    	String(T::String),
    	Array(T::Array),
    	Object(T::Object)
    }
    

    Would you be open to rely on such crate to define the Value type and improve interoperability with other JSON crates? Your Value type definition would become something like:

    pub struct SerdeJson;
    
    impl generic_json::Json for SerdeJson {
    	type MetaData = ();
    	
    	type Number = Number;
    
    	type String = String;
    
    	type Array = Vec<Value>;
    
    	type Key = String;
    
    	type Object = Map<String, Value>;
    }
    
    pub type Value = generic_json::Value<SerdeJson>;
    
    impl generic_json::MetaValue for Value {
    	fn metadata(&self) -> &() {
    		&()
    	}
    
    	fn value(&self) -> &Value {
    		self
    	}
    }
    

    In practice, that would not change anything about the Value type except that its actual definition would end up in an upstream crate. In theory, you would not need a major release for this. What do you think? I can open a PR for this (once the we agree on the content of the generic-json crate).

    opened by timothee-haudebourg 0
  • Misleading type_ascription error message on bogus JSON attribute syntax inside array

    Misleading type_ascription error message on bogus JSON attribute syntax inside array

    When I make this particular mistake (missing curly braces around the "foo": ... attribute):

      let exp = json!(
        [
          "foo": { "bar": "baz" }
        ]
      );
    

    ... the error message is not great 😊:

    error: expected type, found `{`
       --> my-proj/tests/json.rs:22:14
        |
    22  |       "foo": { "bar": "baz" }
        |            - ^ expected type
        |            |
        |            tried to parse a type due to this type ascription
        |
       ::: /Users/gthb/.cargo/registry/src/github.com-1ecc6299db9ec823/serde_json-1.0.67/src/macros.rs:113:32
        |
    113 |     (@array [$($elems:expr,)*] $next:expr, $($rest:tt)*) => {
        |                                ---------- while parsing argument for this `expr` macro fragment
        |
        = note: `#![feature(type_ascription)]` lets you annotate an expression with a type: `<expr>: <type>`
        = note: see issue #23416 <https://github.com/rust-lang/rust/issues/23416> for more information
    
    opened by gthb 0
  • Document features

    Document features

    There are some cargo features that appear to modify the behavior of the crate, but it is not clear what exactly they do. It would be useful to have an overview of the following features in the crate-level docs:

    • preserve_order
    • raw_value
    • unbounded_depth
    • arbitrary_precision
    • float_roundtrip
    opened by fenhl 0
  • mention `serde_json::from_value` in the doc

    mention `serde_json::from_value` in the doc

    it feels wrong not to mention serde_json::from_value the same way as from_slice and from_buffer.

    opened by RouquinBlanc 1
  • Is it possible to leave all strings escaped?

    Is it possible to leave all strings escaped?

    Is it possible to leave all strings escaped? For example, in a Visitor, I would like the following JSON to invoke the following Visitor calls:

    {"abc\"def": "012\"345"}
    
    // visit map elements:
    visit_borrowed_str // for a &'de str "abc\"def"
    visit_borrowed_str // for a &'de str "012\"345"
    

    instead of these visitor calls:

    // visit map elements:
    visit_string // for a Cow<'de, str>::Owned(r#"abc"def"#)
    visit_string // for a Cow<'de, str>::Owned(r#"012"345"#)
    

    Thanks!

    opened by rw 0
  • Questions about enum deserialization

    Questions about enum deserialization

    I've been working on fixing the enum implementation in msgpack-rust: https://github.com/3Hren/msgpack-rust/issues/225.

    I was looking at the implementation of enum deserialization here: https://github.com/serde-rs/json/blob/df1fb717badea8bda80f7e104d80265da0686166/src/de.rs#L1837 and tried to copy it for msgpack-rust.

    My attempt is here: https://github.com/vedantroy/msgpack-rust/blob/2da1609d9d66aee94e80b3640d5a3c0d93a6a8a7/rmp-serde/src/decode.rs#L639

    Quick context: In msgpack-rust, enums are currently serialized as maps with a single key / value, where the key = an integer / string, and the value is any associated data. I'm changing it so that if enums are unit variants, then they aren't serialized as maps.

    Explanation of the code: The code above checks if there is a map, if there is, it will deserialize the enum using VariantAccess, which will handle any associated data, otherwise, we deserialize the enum as a unit variant using UnitVariantAccess (I should be doing a check here to see whether the "marker" is a valid integer / string). The problem is that take_or_read_marker consumes the token, (unlike in serde_json, where parse_whitespace does not consume the token), in the UnitVariantAccess case, this causes variant_seed to try to deserialize what is after the enum into an enum variant, which causes a bug.

    https://github.com/vedantroy/msgpack-rust/blob/2da1609d9d66aee94e80b3640d5a3c0d93a6a8a7/rmp-serde/src/decode.rs#L861

    An example:

    We have the enum: enum Foo { A, B }. The MessagePack data buffer is [0, 1]. visit_enum reads the 0 "marker" and calls visitor.visit_enum(UnitVariantAccess::new(self)), but the reader is now right before the 1, it then calls UnitVariantAccess::variant_seed, which will end up deserializing the 1 into variant B and leaving the reader at the end of the buffer. When in reality, we wanted to deserialize an A and leave the reader after the 0 but before the 1.

    Was wondering if you had any thoughts on what to do.

    Hope this makes sense!

    opened by vedantroy 2
  • RawValue does not work in an enum

    RawValue does not work in an enum

    This is similar, but not quite the same as https://github.com/serde-rs/json/issues/497. In this case it is a normally tagged enum.

    Rust playground example: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=4d9015b013535f143aa5b8ad829be43c

    Code:

    use serde_json::value::RawValue;
    use serde::{Serialize, Deserialize};
    
    #[derive(Serialize, Deserialize)]
    pub struct RequestBody {
        pub payload: Box<RawValue>,
    }
    
    #[derive(Serialize, Deserialize)]
    #[serde(tag = "type")]
    enum Incoming {
        Request {
            payload: Box<RawValue>,
        },
    }
    
    fn main() {
        let foo = Incoming::Request {
            payload: serde_json::value::to_raw_value(&42).unwrap()
        };
        let txt = serde_json::to_string(&foo).unwrap();
        let roundtrip: Incoming = serde_json::from_str(&txt).unwrap();
    }
    
    opened by rklaehn 0
  • Provide an Object type alias

    Provide an Object type alias

    I find myself writing

    type JsonObject = serde_json::Map<String, serde_json::Value>;
    

    again and again. Would you accept a PR that adds a type alias like that (without the Json prefix) to serde_json::value, uses it for the Value::Object variant and maybe re-exports it at the crate root too?

    opened by jplatte 0
  • Update Moderate Path Float Parsing Algorithms

    Update Moderate Path Float Parsing Algorithms

    There have been a few updates to the moderate path float parsing algorithms in minimal-lexical, which can either provide performance benefits or reduce the amount of static storage required, depending on your use-case. I'll summarize a list of plausible options below, and if any seem beneficial to the maintainers of serde-json, will be happy to submit a PR.

    Quick Background

    Just for a quick summary: the float parsing algorithm is broken into 3 parts:

    • A fast path algorithm, where the significant digits can be exactly represented as a native float without truncated.
    • A moderate path algorithm, to process all floats except near-halfway cases through an extended representation of a float.
    • A slow path algorithm, that discerns the proper way to round near-halfway floats using arbitrary-precision arithmetic.

    The moderate path is ~66-75% faster than the slow path, and therefore improvements to it either from a performance standpoint or correctness standpoint can lead to dramatic performance gains.

    Interpolate the Cached Power Exponent

    Serde uses pre-computed values for the cached float exponents in cached_float80. However, we can interpolate all these exponents, since each exponent is just effectively ⌈ log2(10) * exp10 ⌉. Using a pre-computed, integral power for log2(10), we can calculate the exponent exactly from the index to the cached power.

    The specific pseudo-code can be used to generate this magic number, and verify it produces the correct result over the entire range of valid exponents:

    import math
    
    def get_range(radix, max_exp, bitshift):
        den = 1 << bitshift
        num = int(math.ceil(math.log2(radix) * den))
        for exp in range(0, max_exp):
            exp2_exact = int(math.log2(radix**exp))
            exp2_guess = num * exp // den
            if exp2_exact != exp2_guess:
                raise ValueError(f'{exp}')
        return num, bitshift
    
    >>> get_range(10, 350, 16)
    (217706, 16)
    

    See the appendix to see the full changes required to implement this change.

    Pros:

    • Less storage required.
    • No discernible impact on runtime performance.

    Cons:

    • N/A

    Correctness Concerns:

    • N/A, can be proven the generated exponents are identical for all valid inputs.

    Add the Eisel-Lemire Algorithm.

    A fast algorithm for creating correct representations of floats from an extended 128-bit (or 192-bit) representation was developed and is currently in use in major Google projects like Golang and Wuffs, as well as others. The Eisel-Lemire algorithm is ~15% for a uniform distribution of randomly-generated floats over the entire float range, and catches halfway cases different than the existing extended-float algorithm.

    A few examples of cases:

    • "9007199254740992.0" (or 1<<53): correctly classified by both.
    • "9007199254740992000.0e-3"(or 1<<53): only classified by extended-float only.
    • "9007199254740993.0" (or1 + (1<<53`): both cannot classify.
    • "9007199254740994.0" (or2 + (1<<53)`): correctly classified by both.
    • "9007199254740995.0" (or3 + (1<<53)`): correctly classified by Eisel-Lemire only.
    • "9007199254740996.0" (or4 + (1<<53)`): correctly classified by both.
    • "2.470328229206232720e-324" (near-halfway subnormal float): correctly classified by extended-float only.
    • "8.988465674311580536e307" (large near-halfway float): correctly classified by Eisel-Lemire only.

    In short, the two combined have overlapping coverage, and can avoid delegating to the slow path algorithm, leading to major performance benefits. See minimal-lexical/lemire.rs for an example implementation of this algorithm. The general approach therefore is run Eisel-Lemire, and if the algorithm fails, delegate to the extended-float algorithm.

    Pros:

    • Slightly faster performance than extended-float in some cases.
    • Can be combined with extended-float to minimize delegating to the slow path.
    • Can use pre-computed powers for Eisel-Lemire for extended-float too, leading to minor performance improvements.

    Cons:

    • Increased storage required (requires an additional 1226 u64s, or ~9.8 KB).

    Correctness Concerns:

    • Substantial, but well-established algorithm and passes all correctness tests.
    • It passes the curated suite of halfway cases, a large, curated suite of cases used to validate Go's parser, and Rust's extensive randomly-generated test-cases.

    Replace Extended-Float with Lemire

    A third option is to entirely remove the extended-float algorithm, and replace it with the Eisel-Lemire algorithm. In order to do so, we need to round-down to b so the slow algorithm can correctly differentiate between b, b+h, and b+u. Extensive comments and code samples are included in lexical-core/lemire.rs for how to implement this.

    Pros:

    • Slightly faster performance than extended-float in some cases.

    Cons:

    • Increased storage required (requires an additional 1226 u64s, or ~9.8 KB).
    • Less correct than extended-float, and therefore delegates to the slow path algorithm more often.

    Correctness Concerns:

    • Substantial, but well-established algorithm and passes all correctness tests.
    • It passes the curated suite of halfway cases, a large, curated suite of cases used to validate Go's parser, and Rust's extensive randomly-generated test-cases.

    Appendix

    Interpolation

    The full changes to interpolate the exponent are the following:

    diff --git a/src/lexical/cached.rs b/src/lexical/cached.rs
    index ef5a9fe..701a897 100644
    --- a/src/lexical/cached.rs
    +++ b/src/lexical/cached.rs
    @@ -5,31 +5,8 @@
     use super::cached_float80;
     use super::float::ExtendedFloat;
     
    -// POWERS
    -
    -/// Precalculated powers that uses two-separate arrays for memory-efficiency.
    -#[doc(hidden)]
    -pub(crate) struct ExtendedFloatArray {
    -    // Pre-calculated mantissa for the powers.
    -    pub mant: &'static [u64],
    -    // Pre-calculated binary exponents for the powers.
    -    pub exp: &'static [i32],
    -}
    -
    -/// Allow indexing of values without bounds checking
    -impl ExtendedFloatArray {
    -    #[inline]
    -    pub fn get_extended_float(&self, index: usize) -> ExtendedFloat {
    -        let mant = self.mant[index];
    -        let exp = self.exp[index];
    -        ExtendedFloat { mant, exp }
    -    }
    -
    -    #[inline]
    -    pub fn len(&self) -> usize {
    -        self.mant.len()
    -    }
    -}
    +const LOG2: i64 = 217706;
    +const LOG2_SHIFT: i32 = 16;
     
     // MODERATE PATH POWERS
     
    @@ -37,9 +14,9 @@ impl ExtendedFloatArray {
     #[doc(hidden)]
     pub(crate) struct ModeratePathPowers {
         // Pre-calculated small powers.
    -    pub small: ExtendedFloatArray,
    +    pub small: &'static [u64],
         // Pre-calculated large powers.
    -    pub large: ExtendedFloatArray,
    +    pub large: &'static [u64],
         /// Pre-calculated small powers as 64-bit integers
         pub small_int: &'static [u64],
         // Step between large powers and number of small powers.
    @@ -52,17 +29,23 @@ pub(crate) struct ModeratePathPowers {
     impl ModeratePathPowers {
         #[inline]
         pub fn get_small(&self, index: usize) -> ExtendedFloat {
    -        self.small.get_extended_float(index)
    +        let mant = self.small[index];
    +        let exp = -63 + ((LOG2 * index as i64) >> LOG2_SHIFT);
    +        ExtendedFloat {
    +            mant,
    +            exp: exp as i32,
    +        }
         }
     
         #[inline]
         pub fn get_large(&self, index: usize) -> ExtendedFloat {
    -        self.large.get_extended_float(index)
    -    }
    -
    -    #[inline]
    -    pub fn get_small_int(&self, index: usize) -> u64 {
    -        self.small_int[index]
    +        let mant = self.large[index];
    +        let biased_e = index as i64 * self.step as i64 - self.bias as i64;
    +        let exp = -63 + ((LOG2 * biased_e) >> LOG2_SHIFT);
    +        ExtendedFloat {
    +            mant,
    +            exp: exp as i32,
    +        }
         }
     }
     
    diff --git a/src/lexical/cached_float80.rs b/src/lexical/cached_float80.rs
    index 9beda3d..43e18e8 100644
    --- a/src/lexical/cached_float80.rs
    +++ b/src/lexical/cached_float80.rs
    @@ -10,7 +10,7 @@
     //! integer to calculate exact extended-representation of each value.
     //! These values are all normalized.
     
    -use super::cached::{ExtendedFloatArray, ModeratePathPowers};
    +use super::cached::ExtendedFloatArray;
     
     // LOW-LEVEL
     // ---------
    @@ -29,18 +29,6 @@ const BASE10_SMALL_MANTISSA: [u64; 10] = [
         13743895347200000000, // 10^8
         17179869184000000000, // 10^9
     ];
    -const BASE10_SMALL_EXPONENT: [i32; 10] = [
    -    -63, // 10^0
    -    -60, // 10^1
    -    -57, // 10^2
    -    -54, // 10^3
    -    -50, // 10^4
    -    -47, // 10^5
    -    -44, // 10^6
    -    -40, // 10^7
    -    -37, // 10^8
    -    -34, // 10^9
    -];
     const BASE10_LARGE_MANTISSA: [u64; 66] = [
         11555125961253852697, // 10^-350
         13451937075301367670, // 10^-340
    @@ -109,74 +97,6 @@ const BASE10_LARGE_MANTISSA: [u64; 66] = [
         11830521861667747109, // 10^290
         13772540099066387756, // 10^300
     ];
    -const BASE10_LARGE_EXPONENT: [i32; 66] = [
    -    -1226, // 10^-350
    -    -1193, // 10^-340
    -    -1160, // 10^-330
    -    -1127, // 10^-320
    -    -1093, // 10^-310
    -    -1060, // 10^-300
    -    -1027, // 10^-290
    -    -994,  // 10^-280
    -    -960,  // 10^-270
    -    -927,  // 10^-260
    -    -894,  // 10^-250
    -    -861,  // 10^-240
    -    -828,  // 10^-230
    -    -794,  // 10^-220
    -    -761,  // 10^-210
    -    -728,  // 10^-200
    -    -695,  // 10^-190
    -    -661,  // 10^-180
    -    -628,  // 10^-170
    -    -595,  // 10^-160
    -    -562,  // 10^-150
    -    -529,  // 10^-140
    -    -495,  // 10^-130
    -    -462,  // 10^-120
    -    -429,  // 10^-110
    -    -396,  // 10^-100
    -    -362,  // 10^-90
    -    -329,  // 10^-80
    -    -296,  // 10^-70
    -    -263,  // 10^-60
    -    -230,  // 10^-50
    -    -196,  // 10^-40
    -    -163,  // 10^-30
    -    -130,  // 10^-20
    -    -97,   // 10^-10
    -    -63,   // 10^0
    -    -30,   // 10^10
    -    3,     // 10^20
    -    36,    // 10^30
    -    69,    // 10^40
    -    103,   // 10^50
    -    136,   // 10^60
    -    169,   // 10^70
    -    202,   // 10^80
    -    235,   // 10^90
    -    269,   // 10^100
    -    302,   // 10^110
    -    335,   // 10^120
    -    368,   // 10^130
    -    402,   // 10^140
    -    435,   // 10^150
    -    468,   // 10^160
    -    501,   // 10^170
    -    534,   // 10^180
    -    568,   // 10^190
    -    601,   // 10^200
    -    634,   // 10^210
    -    667,   // 10^220
    -    701,   // 10^230
    -    734,   // 10^240
    -    767,   // 10^250
    -    800,   // 10^260
    -    833,   // 10^270
    -    867,   // 10^280
    -    900,   // 10^290
    -    933,   // 10^300
    -];
     const BASE10_SMALL_INT_POWERS: [u64; 10] = [
         1, 10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000,
     ];
    @@ -187,14 +107,8 @@ const BASE10_BIAS: i32 = 350;
     // ----------
     
     const BASE10_POWERS: ModeratePathPowers = ModeratePathPowers {
    -    small: ExtendedFloatArray {
    -        mant: &BASE10_SMALL_MANTISSA,
    -        exp: &BASE10_SMALL_EXPONENT,
    -    },
    -    large: ExtendedFloatArray {
    -        mant: &BASE10_LARGE_MANTISSA,
    -        exp: &BASE10_LARGE_EXPONENT,
    -    },
    +    small: &BASE10_SMALL_MANTISSA,
    +    large: &BASE10_LARGE_MANTISSA,
         small_int: &BASE10_SMALL_INT_POWERS,
         step: BASE10_STEP,
         bias: BASE10_BIAS,
    
    
    opened by Alexhuszagh 1
Releases(v1.0.68)
Rust port of simdjson

SIMD Json for Rust   Rust port of extremely fast simdjson JSON parser with serde compatibility. readme (for real!) simdjson version Currently tracking

null 567 Sep 14, 2021
Rust port of gjson,get JSON value by dotpath syntax

A-JSON Read JSON values quickly - Rust JSON Parser change name to AJSON, see issue Inspiration comes from gjson in golang Installation Add it to your

Chen Jiaju 83 Jul 22, 2021
JSON parser which picks up values directly without performing tokenization in Rust

Pikkr JSON parser which picks up values directly without performing tokenization in Rust Abstract Pikkr is a JSON parser which picks up values directl

Pikkr 581 Sep 16, 2021
JSON implementation in Rust

json-rust Parse and serialize JSON with ease. Changelog - Complete Documentation - Cargo - Repository Why? JSON is a very loose format where anything

Maciej Hirsz 413 Aug 27, 2021
A rust script to convert a better bibtex json file from Zotero into nice organised notes in Obsidian

Zotero to Obsidian script This is a script that takes a better bibtex JSON file exported by Zotero and generates an organised collection of reference

Sashin Exists 3 Jul 10, 2021
A small discord bot to archive the messages in a discord text channel.

discord-channel-archiver A small discord bot to archive the messages in a discord text channel. This is still WIP. The HTML and JSON modes are vaguely

Jamie Quigley 18 Sep 17, 2021
rurl is like curl but with a json configuration file per request

rurl rurl is a curl-like cli tool made in rust, the difference is that it takes its params from a json file so you can have all different requests sav

Bruno Ribeiro da Silva 5 Apr 30, 2021
This library is a pull parser for CommonMark, written in Rust

This library is a pull parser for CommonMark, written in Rust. It comes with a simple command-line tool, useful for rendering to HTML, and is also designed to be easy to use from as a library.

Raph Levien 1.2k Sep 18, 2021
Generate unique, yet sortable identifiers

ulid-lite About An implementation of the ULID ("Universally Unique Lexicographically Sortable Identifier") standard. A ULID is 128-bit compatible with

Tim McNamara 12 Jul 24, 2021
Fontdue - The fastest font renderer in the world, written in pure rust.

Fontdue is a simple, no_std (does not use the standard library for portability), pure Rust, TrueType (.ttf/.ttc) & OpenType (.otf) font rasterizer and layout tool. It strives to make interacting with fonts as fast as possible, and currently has the lowest end to end latency for a font rasterizer.

Joe C 676 Sep 18, 2021