A CSV parser for Rust, with Serde support.

Overview

csv

A fast and flexible CSV reader and writer for Rust, with support for Serde.

Linux build status Windows build status

Dual-licensed under MIT or the UNLICENSE.

Documentation

https://docs.rs/csv

If you're new to Rust, the tutorial is a good place to start.

Usage

Add this to your Cargo.toml:

[dependencies]
csv = "1.1"

Example

This example shows how to read CSV data from stdin and print each record to stdout.

There are more examples in the cookbook.

use std::error::Error;
use std::io;
use std::process;

fn example() -> Result<(), Box<dyn Error>> {
    // Build the CSV reader and iterate over each record.
    let mut rdr = csv::Reader::from_reader(io::stdin());
    for result in rdr.records() {
        // The iterator yields Result<StringRecord, Error>, so we check the
        // error here.
        let record = result?;
        println!("{:?}", record);
    }
    Ok(())
}

fn main() {
    if let Err(err) = example() {
        println!("error running example: {}", err);
        process::exit(1);
    }
}

The above example can be run like so:

$ git clone git://github.com/BurntSushi/rust-csv
$ cd rust-csv
$ cargo run --example cookbook-read-basic < examples/data/smallpop.csv

Example with Serde

This example shows how to read CSV data from stdin into your own custom struct. By default, the member names of the struct are matched with the values in the header record of your CSV data.

use std::error::Error;
use std::io;
use std::process;

use serde::Deserialize;

#[derive(Debug, Deserialize)]
struct Record {
    city: String,
    region: String,
    country: String,
    population: Option<u64>,
}

fn example() -> Result<(), Box<dyn Error>> {
    let mut rdr = csv::Reader::from_reader(io::stdin());
    for result in rdr.deserialize() {
        // Notice that we need to provide a type hint for automatic
        // deserialization.
        let record: Record = result?;
        println!("{:?}", record);
    }
    Ok(())
}

fn main() {
    if let Err(err) = example() {
        println!("error running example: {}", err);
        process::exit(1);
    }
}

The above example can be run like so:

$ git clone git://github.com/BurntSushi/rust-csv
$ cd rust-csv
$ cargo run --example cookbook-read-serde < examples/data/smallpop.csv
Comments
  • Parse rows on demand?

    Parse rows on demand?

    I'm trying to use rust-csv in an async context and have troubles getting this done with the Reader - maybe there is a way to improve the API ? (or I am missing anything).

    So I'm consuming a futures::Stream and want to map a line of String (which would be my csv line) into a HashMap of the parsed one.

    So my question is: Is there a way to "setup" rust-csv with all the params and then push/consume lines in as they come in and consume the parsed output?

    Thanks, Michael

    opened by daschl 24
  • use serde

    use serde

    I should start using serde for the automatic serialization component of this crate.

    I don't think this means removing support for rustc-serialize just yet though.

    opened by BurntSushi 24
  • Remove deprecated std::error::Error::description and use CARGO_MANIFEST_DIR to find the examples/data/ for test cases

    Remove deprecated std::error::Error::description and use CARGO_MANIFEST_DIR to find the examples/data/ for test cases

    Was looking for unsafe code to eliminate and deprecated warnings to resolve. So far I've only done the latter. Please reconsider how you use unsafe code. Yes, this appears safe.. but it'd be nicer if it were simply safe.

    Fortunately for me, this is only a transitive dependency via [dev-dependencies.criterion] and does not involve user controlled data.

    Additionally, as I use a custom RUST_TARGET_PATH, I caught the directory traversal hack you're using and replaced it with env!("CARGO_MANIFEST_DIR").

    opened by WildCryptoFox 19
  • StringRecord to String without Result?

    StringRecord to String without Result?

    This seems to be the recommended way to turn a StringRecord into a String:

    fn example() -> Result<(), Box<dyn Error>> {
        let mut wtr = Writer::from_writer(vec![]);
        wtr.write_record(&["a", "b", "c"])?;
        wtr.write_record(&["x", "y", "z"])?;
    
        let data = String::from_utf8(wtr.into_inner()?)?;
        assert_eq!(data, "a,b,c\nx,y,z\n");
        Ok(())
    }
    

    Would it be possible to convert directly into a String instead of a Result?

    opened by probablykasper 15
  • Updating vs Overriding csv file

    Updating vs Overriding csv file

    I'm trying to update my csv file using let wtr = Writer::from_path('cities.csv'); and wtr.serialize(&new_record) below is my code. I noticed that my code is overriding whatever there in the file, i.e. delete everything in the cities.csv file, and insert the new record in it!! I need to know how to increase the number of records in my file by inserting/writing/amending a new record to it. Thanks

    use std::error::Error;
    use std::env;
    use csv::Writer;
    
    use city_model::Record;
    
    pub fn write_csv(csv_file: &str) -> Result<(), Box<Error>> {
        let executable = env::current_exe()?;
        let path = match executable.parent() {
            Some(name) => name,
            _ => panic!()
        };
    
        let data_file = format!("{}/{}", path.display(), csv_file);
    
        println!("the data_file name is: {}", data_file);
        let new_record = Record {
                latitude: 456.67,
                longitude: 5675.78,
                population: Some(45768),
                city: "Zaqra".to_string(),
                state: "Jordan".to_string(),
        };
        
        let wtr = Writer::from_path(data_file);
        match wtr {
            Ok(mut t) => match t.serialize(&new_record) {
                Ok(_) => println!("File updated, recordhad been added successfully"),
                Err(e) => panic!("Could not add the record: {:?}", e)
            },
            Err(e) => panic!("Could not open the file: {:?}", e)
        };
        Ok(())
    }
    

    When I tried t.write_record(&new_record) I got the below error:

    rcr

    My model is defined as below:

    #[derive(Debug, Serialize, Deserialize)]
    #[serde(rename_all = "PascalCase")]
    pub struct Record {
        pub latitude: f64,
        pub longitude: f64,
        pub population: Option<u64>,
        pub city: String,
        pub state: String,
    }
    
    duplicate invalid 
    opened by hasanOryx 12
  • Accept whitespace trimming settings

    Accept whitespace trimming settings

    Fixes burntsushi/rust-csv#78

    Per my comment in #78, I'm popping whitespace and updating the record bounds to reflect the changes.

    Some thoughts:

    • I don't know if match or if statements are more idiomatic in this case. Let me know if you have an opinion.
    • If the user turns off headers then I apply trimming to the first row if Trim::All or Trim::Fields are set. I think this is reasonable but I could imagine that maybe Trim::Headers should be Trim::FirstRow or something in which case it isn't reasonable.
    • Just now I was thinking maybe Trim::Fields should be Trim::Records instead.
    opened by medwards 12
  • Support (de)serialization of tuple of structs with header row

    Support (de)serialization of tuple of structs with header row

    The new serde functionality is convenient for reading/writing CSV files with a header row, but only when a CSV record corresponds to a single struct in your program. However, a common case is to read/write a CSV record that corresponds to two or more structs. For example, I might have

    struct Input {
        x: f64,
        y: f64,
    }
    
    struct Properties {
        prop1: f64,
        prop2: f64,
    }
    

    and want to get a CSV file such as:

    x,y,prop1,prop2
    1,2,3,4
    

    by serializing an instance of (Input, Properties).

    With csv 1.0.0-beta.4, to achieve this and ensure the correct field names, I have to create a new struct InputWithProperties that combines the fields from the two structs and write methods to convert between (Input, Properties) and InputWithProperties.

    It would be much more convenient of the csv crate supported serialization/deserialization of tuples of structs. I would suggest restricting this functionality to tuples of structs with unique field names, although that wouldn't be strictly necessary if the ordering in the tuple was used.

    I may get around to implementing this and submitting a PR at some point, but I did want to suggest this feature before 1.0.0 is stabilized in case implementing it would require breaking changes.

    By the way, thanks for creating the csv crate!

    opened by jturner314 12
  • Leading spaces in header fields

    Leading spaces in header fields

    I came across a CSV file which had leading spaces in the header fields:

    medallion, hack_license, vendor_id, pickup_datetime, payment_type, fare_amount, surcharge, mta_tax, tip_amount, tolls_amount, total_amount
    

    When I deserialized it with Serde, each field has a space in front. Would it make sense to have an option to trim the header fields or to transform them in a more general way?

    enhancement help-wanted 
    opened by vmx 11
  • Request: Optional header support for serialize nested struct

    Request: Optional header support for serialize nested struct

    May I ask if it is possible for an option to serialize struct that contains array / tuple / struct?

    Previously in Python I use a flatten function to convert a nested dictionary to a flat dictionary, with field combined and seperated by some seperator (e.g. "velocity.0"), but right now serializing map or nested struct with header is not supported now.

    Therefore, I would like to wish for a way to optionally serialize nested struct header with user provided seperator.

    opened by minstrel271 11
  • Single record deserialization is 250 times slower than json

    Single record deserialization is 250 times slower than json

    Maybe I'm doing something wrong, but it seems to work 250 times slower comparing to serde_json. I've tried the last (1.1.6) release and master's versions. Here's the code of the benchmark:

    use serde::Deserialize;
    
    const CSV_RECORD: &[u8] = "1,foo".as_bytes();
    const JSON_RECORD: &str = r#"{ "id": 1, "text": "foo" }"#;
    
    #[derive(Debug, Deserialize, PartialEq)]
    struct Record {
        id: u32,
        text: String,
    }
    
    fn main() {
        let expected = Record {
            id: 1,
            text: "foo".into(),
        };
        assert_eq!(Record::from_csv(), expected);
        assert_eq!(Record::from_json(), expected);
    
        let times = 100_000;
        bench("csv", Record::from_csv, times);
        bench("json", Record::from_json, times);
    }
    
    impl Record {
        fn from_csv() -> Self {
            let mut rdr = csv::ReaderBuilder::new()
                .has_headers(false)
                .from_reader(CSV_RECORD);
            rdr.deserialize().next().unwrap().unwrap()
        }
    
        fn from_json() -> Self {
            serde_json::from_str(JSON_RECORD).unwrap()
        }
    }
    
    fn bench(name: &str, f: fn() -> Record, times: usize) {
        let start = std::time::Instant::now();
        for _ in 0..times {
            let _row = f();
        }
        let per_sec = times as f64 / start.elapsed().as_secs_f64();
        println!("{:>5}{:>10.0} / sec", name, per_sec);
    }
    

    And it's result on my laptop:

    $ cargo run --release -q --bin serde-performance
      csv     54865 / sec
     json  13123199 / sec
    
    opened by imbolc 10
  • Deserializer does not work with `serde(default)`

    Deserializer does not work with `serde(default)`

    I'm using serde(default) / serde(default = "...") to populate records with fallback values when they're not present in the input, just like with any other deserializer:

    extern crate csv;
    extern crate serde;
    
    #[macro_use]
    extern crate serde_derive;
    
    #[derive(Debug, Deserialize)]
    pub struct Record {
        #[serde(default)]
        x: u32,
    
        #[serde(default)]
        y: u32,
    }
    
    fn main() {
        let reader = csv::Reader::from_reader(b"\
    x,y
    0,1
    ,2
    3,
    ,
    5,6
    " as &[u8]);
    
        reader.into_deserialize().for_each(|res: Result<Record, _>| {
            println!("{:?}", res);
        });
    }
    

    However, this doesn't work with csv crate - supposedly, even for struct fields, it returns an empty string instead of a missing field, so such records are reported as errors:

    Ok(Record { x: 0, y: 1 })
    Err(Error(Deserialize { pos: Some(Position { byte: 8, line: 3, record: 2 }), err: DeserializeError { field: Some(0), kind: ParseInt(ParseIntError { kind: Empty }) } }))
    Err(Error(Deserialize { pos: Some(Position { byte: 11, line: 4, record: 3 }), err: DeserializeError { field: Some(1), kind: ParseInt(ParseIntError { kind: Empty }) } }))
    Err(Error(Deserialize { pos: Some(Position { byte: 14, line: 5, record: 4 }), err: DeserializeError { field: Some(0), kind: ParseInt(ParseIntError { kind: Empty }) } }))
    Ok(Record { x: 5, y: 6 })
    
    opened by RReverser 10
  • Issue in reader

    Issue in reader

    Possible error in code: file reader.rs

    Check code if self.next_class = 255

    existing code: if self.next_class > CLASS_SIZE {

    probably should be: if self.next_class >= CLASS_SIZE {

    const CLASS_SIZE: usize = 256;
    
        fn new() -> DfaClasses {
            DfaClasses {
                classes: [0; CLASS_SIZE],
                next_class: 1,
            }
        }
    
        fn add(&mut self, b: u8) {
            if self.next_class > CLASS_SIZE {
                panic!("added too many classes")
            }
            self.classes[b as usize] = self.next_class as u8;
            self.next_class += 1;
        }
    

    May use controlled conversion to u8:

    use std::convert::TryInto;
    pub fn main() {
        // let mut c = DfaClasses::new();
        let b: u32 = 256;
        //let xxx: u8 = b as u8; // -> ignore overfow
        //let xxx: u8 = b.try_into().unwrap(); // -> runtime error: out of range in conversion
        let xxx: u8 = match b.try_into() {
            Result::Ok(b2) => b2,
            Result::Err(e2) => {
                println!("Error: {}", e2);
                0
            }
        }; // -> Error: out of range integral type conversion attempted
        // c.next_class = 255;
        // println!("{}", c.next_class);
        // c.add(123);
        // println!("{}", c.next_class);
        // c.add(123);
        println!("xxx = {}", xxx);
    }
    
    opened by wiluite 1
  • Add support for source byte-range tracking for ByteRecord

    Add support for source byte-range tracking for ByteRecord

    ByteRecord (via ByteRecordInner) already exposes a position method to access line number, record number and byte offset of the start of the record. This PR adds information to ByteRecord to track not only the byte offset of the start of the record, but also its end, via a span method returning a Span; this is useful to retrieve the original source bytes for a parsed record when e.g. reporting errors.

    opened by AndreaOddo89 3
  • impl Reader<Reader<File>>

    impl Reader>

    Hello Andrew!

    First of all, thank you for this library and all other awesome work that you do! I was browsing through sources of this crate to get a better understanding of how everything works and found something odd in reader.rs:789.

    impl Reader<Reader<File>> {
    

    I've spent quite some time trying to understand the intent here, but luckily we have Cargo so I ended up cloning the project and experimenting with it. It turns out that the project will build and test pass if we remove inner Reader or replace it with any other type.

    This silliness also works:

    impl Reader<Reader<Reader<Reader<File>>>> {
    
    impl Reader<Vec<File>> {
    

    So I'm assuming that this is just a typo that curiously slipped through the type checker.

    I think it should be fixed to be just impl Reader<File> { ... to not scare Rust newbies like myself.

    opened by Murtaught 0
  • Automatically escape fields that contain the comment character

    Automatically escape fields that contain the comment character

    Currently, if data is written with QuoteStyle::Necessary, and the first field of a row happens to start with a comment character, the row will be ignored as a comment when later reading it back in.

    This change adds a comment property to Writer, and automatically quotes fields that have the provided comment character in them, so they round-trip correctly.

    opened by dae 0
Owner
Andrew Gallant
I love to code.
Andrew Gallant
Encoding and decoding support for BSON in Rust

bson-rs Encoding and decoding support for BSON in Rust Index Overview of BSON Format Usage BSON Values BSON Documents Modeling BSON with strongly type

mongodb 304 Dec 30, 2022
Character encoding support for Rust

Encoding 0.3.0-dev Character encoding support for Rust. (also known as rust-encoding) It is based on WHATWG Encoding Standard, and also provides an ad

Kang Seonghoon 264 Dec 14, 2022
Rust implementation of CRC(16, 32, 64) with support of various standards

crc Rust implementation of CRC(16, 32, 64). MSRV is 1.46. Usage Add crc to Cargo.toml [dependencies] crc = "2.0" Compute CRC use crc::{Crc, Algorithm,

Rui Hu 120 Dec 23, 2022
TLV-C encoding support.

TLV-C: Tag - Length - Value - Checksum TLV-C is a variant on the traditional [TLV] format that adds a whole mess of checksums and whatnot. Why, you as

Oxide Computer Company 3 Nov 25, 2022
rust-jsonnet - The Google Jsonnet( operation data template language) for rust

rust-jsonnet ==== Crate rust-jsonnet - The Google Jsonnet( operation data template language) for rust Google jsonnet documet: (http://google.github.io

Qihoo 360 24 Dec 1, 2022
MessagePack implementation for Rust / msgpack.org[Rust]

RMP - Rust MessagePack RMP is a pure Rust MessagePack implementation. This repository consists of three separate crates: the RMP core and two implemen

Evgeny Safronov 840 Dec 30, 2022
A Rust ASN.1 (DER) serializer.

rust-asn1 This is a Rust library for parsing and generating ASN.1 data (DER only). Installation Add asn1 to the [dependencies] section of your Cargo.t

Alex Gaynor 85 Dec 16, 2022
Implementation of Bencode encoding written in rust

Rust Bencode Implementation of Bencode encoding written in rust. Project Status Not in active developement due to lack of time and other priorities. I

Arjan Topolovec 32 Aug 6, 2022
Rust library for reading/writing numbers in big-endian and little-endian.

byteorder This crate provides convenience methods for encoding and decoding numbers in either big-endian or little-endian order. Dual-licensed under M

Andrew Gallant 811 Jan 1, 2023
Cap'n Proto for Rust

Cap'n Proto for Rust documentation blog Introduction Cap'n Proto is a type system for distributed systems. With Cap'n Proto, you describe your data an

Cap'n Proto 1.5k Dec 26, 2022
A Gecko-oriented implementation of the Encoding Standard in Rust

encoding_rs encoding_rs an implementation of the (non-JavaScript parts of) the Encoding Standard written in Rust and used in Gecko (starting with Fire

Henri Sivonen 284 Dec 13, 2022
A HTTP Archive format (HAR) serialization & deserialization library, written in Rust.

har-rs HTTP Archive format (HAR) serialization & deserialization library, written in Rust. Install Add the following to your Cargo.toml file: [depende

Sebastian Mandrean 25 Dec 24, 2022
A HTML entity encoding library for Rust

A HTML entity encoding library for Rust Example usage All example assume a extern crate htmlescape; and use htmlescape::{relevant functions here}; is

Viktor Dahl 41 Nov 1, 2022
pem-rs pem PEM jcreekmore/pem-rs [pem] — A Rust based way to parse and encode PEM-encoded data

pem A Rust library for parsing and encoding PEM-encoded data. Documentation Module documentation with examples Usage Add this to your Cargo.toml: [dep

Jonathan Creekmore 30 Dec 27, 2022
PROST! a Protocol Buffers implementation for the Rust Language

PROST! prost is a Protocol Buffers implementation for the Rust Language. prost generates simple, idiomatic Rust code from proto2 and proto3 files. Com

Dan Burkert 17 Jan 8, 2023
Rust implementation of Google protocol buffers

rust-protobuf Protobuf implementation in Rust. Written in pure rust Generate rust code Has runtime library for generated code (Coded{Input|Output}Stre

Stepan Koltsov 2.3k Dec 31, 2022
tnetstring serialization library for rust.

TNetStrings: Tagged Netstrings This module implements bindings for the tnetstring serialization format. API let t = tnetstring::str("hello world"); le

Erick Tryzelaar 16 Jul 14, 2019
A TOML encoding/decoding library for Rust

toml-rs A TOML decoder and encoder for Rust. This library is currently compliant with the v0.5.0 version of TOML. This library will also likely contin

Alex Crichton 1k Dec 30, 2022
A fast, performant implementation of skip list in Rust.

Subway A fast, performant implementation of skip list in Rust. A skip list is probabilistic data structure that provides O(log N) search and insertion

Sushrut 16 Apr 5, 2022