This crate allows writing a struct in Rust and have it derive a struct of arrays layed out in memory according to the arrow format.


This crate allows writing a struct in Rust and have it derive a struct of arrays layed out in memory according to the arrow format.

, is_deleted: bool, a1: Option , a2: i64, // binary a3: Option >, // date32 a4: NaiveDate, // optional list array of optional strings nullable_list: Option

In the example above, the derived struct is

#[derive(Default, Debug)]
pub struct FooArray {
    name: MutableUtf8Array<i32>,
    is_deleted: MutableBooleanArray<i32>,
    a1: MutablePrimitiveArray<f64>,
    a2: MutablePrimitiveArray<i64>,
    a3: MutableBinaryArray<i32>,
    nullable_list: MutableListArray<i32, MutableUtf8Array<i32>>,
    required_list: MutableListArray<i32, MutableUtf8Array<i32>>,
    other_list: MutableListArray<i32, MutablePrimitiveArray<i32>>,

FooArray::push lays data in memory according to the arrow spec and can be used for all kinds of IPC, FFI, etc. supported by arrow2.


Licensed under either of

at your option.


Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

  • trait bound not satisfied

    trait bound not satisfied

    Hey! This seems to be exactly what I need for translating structured log data that I've packed into structs cleanly into the arrow format, so I'm really excited by the potential of this crate.

    I seem to be getting an issue in my project:

    error[E0277]: the trait bound `arrow2::array::struct_::StructArray: From<LogDataArray>` is not satisfied
      --> src/
    37 | #[derive(Debug, Clone, PartialEq, StructOfArrow)]
       |                                   ^^^^^^^^^^^^^ the trait `From<LogDataArray>` is not implemented for `arrow2::array::struct_::StructArray`
      ::: /home/weaton/.cargo/git/checkouts/arrow2-derive-1c9d82f1d5ff8f2d/d4f7231/src/
    9  | pub trait ArrowStruct: Into<StructArray> {
       |                        ----------------- required by this bound in `ArrowStruct`
       = help: the following implementations were found:
                 <arrow2::array::struct_::StructArray as From<arrow2::array::growable::structure::GrowableStruct<'a>>>
                 <arrow2::array::struct_::StructArray as From<arrow2::record_batch::RecordBatch>>
       = note: required because of the requirements on the impl of `Into<arrow2::array::struct_::StructArray>` for `LogDataArray`
       = note: this error originates in the derive macro `StructOfArrow` (in Nightly builds, run with -Z macro-backtrace for more info)

    Any idea what could causing this? When I drop my struct definition into your test suite it compiles, which is causing me to scratch my head.

    Here's my cargo.toml entry, am I missing something silly?

    arrow2-derive = { git = "", branch = "main" }

    Thanks again for your work on this!

  • Support conversion of rust struct to an arrow2 chunk

    Support conversion of rust struct to an arrow2 chunk

    Created from the discussion in

    A rust struct can conceptually represent either an Arrow Struct or an arrow2::Chunk (a column group). The arrow2::Chunk is important since it's used in the deserialization/serialization API for parquet and flight conversion.

    We can extend the arrow2_convert::TryIntoArrow and arrow2_convert::FromArrow traits to convert to/from arrow2::Chunk, but there are two possible mappings from a vector of structs, Vec<S> to Chunk:

    1. The Chunk has a single field of type Struct
    2. The Chunk contains the same number of fields as the struct.

    1 can be easily supported by wrapping the an arrow2::Array in a Chunk.

    2 has a couple of approaches:

    a. A new derive macro to generate the mapping to a Chunk (eg. ArrowChunk or ArrowRoot). b. Providing a helper method to convert a arrow2::StructArray to a Chunkby unwrapping the fields.

    One related use-case that could guide this design is to support generic typed versions of the arrow2 csv, json, parquet, and flight serialize/deserialize methods, where the schema is specified by a rust struct (opened for this). To achieve this, it would be useful to access the deserialize/serialize methods of each column separately for parallelism which is cleaner via 2a.

  • Add support for large types

    Add support for large types

    The complexity is mostly in the serialize path since for deserialize we can just look at the arrow type (LargeList, LargeUtf8, etc) and cast to the appropriate array type.

    Couple of ways I can think of support this for serialize:

    1. Only support i64 offsets, and provide a conversion method that converts large types to small types in another pass

    2. Support an attribute either on a container or per field to use the large offset.

  • Renaming crates before publishing?

    Renaming crates before publishing?

    One thing that stands out is that we have two crates: arrow2_derive, and derive_internals, both both need to be published.

    If we were to follow the typical convention, the derive_internals crate should actually be arrow2_derive, and the current arrow2_derive, which contains the recently added traits should be called something else, perhaps arrow2_convert?

    There is some additional functionality that could go into arrow2_convert that provide additional helper API that's higher-level than what the arrow2 crate provides.

    @jorgecarleitao thoughts?

  • Fix #20 Prepare crate for publishing

    Fix #20 Prepare crate for publishing

    • Rename to arrow2_convert
    • Improve error reporting in proc macro
    • Add tests for error reporing in proc macro
    • Beginnings of macro attributes for serialize only and deserialize only
    • Beginnings of optionally enabling serialize/deserialize
    • More modular tests
    • Add licenses and symlinks to readme and licenses
    • Updated documentation
    • Follow the serde serializer pattern for passing in mutable arrays, which enables serializing borrowed values
  • [question] Why are union variant without payload mapped to a bool array instead of a null array ?

    [question] Why are union variant without payload mapped to a bool array instead of a null array ?

    The following enum gets mapped to a bool array currently:

    enum Foo {

    Couldn't it be mapped to a null array instead ? I'm just recently started using arrow and your lib so I'm trying to figure out the landscape of how things are done and how to take best advantage of what's available.

  • Crash while serializing

    Crash while serializing

    thread 'main' panicked at 'attempt to subtract with overflow', src/analysis/
    stack backtrace:
       0: rust_begin_unwind
                 at /rustc/e0098a5cc3a87d857e597af824d0ce1ed1ad85e0/library/std/src/
       1: core::panicking::panic_fmt
                 at /rustc/e0098a5cc3a87d857e597af824d0ce1ed1ad85e0/library/core/src/
       2: core::panicking::panic
                 at /rustc/e0098a5cc3a87d857e597af824d0ce1ed1ad85e0/library/core/src/
       3: <lisa_rust_analysis::analysis::tasks::MutableTaskStateArray as arrow2::array::TryPush<core::option::Option<__T>>>::try_push
                 at ./tools/analysis/src/analysis/
       4: <lisa_rust_analysis::analysis::tasks::TaskState as arrow2_convert::serialize::ArrowSerialize>::arrow_serialize
                 at ./tools/analysis/src/analysis/
       5: <lisa_rust_analysis::analysis::tasks::MutableTasksStatesRowArray as arrow2::array::TryPush<core::option::Option<__T>>>::try_push
                 at ./tools/analysis/src/analysis/
       6: <lisa_rust_analysis::analysis::tasks::TasksStatesRow as arrow2_convert::serialize::ArrowSerialize>::arrow_serialize
                 at ./tools/analysis/src/analysis/
       7: arrow2_convert::serialize::arrow_serialize_extend_internal
                 at /home/.cargo/registry/src/
       8: arrow2_convert::serialize::arrow_serialize_to_mutable_array
                 at /home/.cargo/registry/src/
       9: <Collection as arrow2_convert::serialize::TryIntoArrow<alloc::boxed::Box<dyn arrow2::array::Array>,Element>>::try_into_arrow
                 at /home/.cargo/registry/src/

    This crashes in debug, and compiling in --release mode silently leads to a corrupted file that pyarrow chokes on when using pyarrow.compute.struct_field:

    pyarrow.lib.ArrowIndexError: Index -1 out of bounds
  • derive(ArrowField) breaks in macros

    derive(ArrowField) breaks in macros

    I'm trying to either:

    • apply #[derive(ArrowField)] to a struct defined in a macro_rules
    • apply my macro_rules on a struct using #[derive(ArrowField)]

    The struct def is parsed using this little book of rust macros:

    use arrow2_convert::ArrowField;
    macro_rules! mymacro {
            $( #[$meta:meta] )*
            //  ^~~~attributes~~~~^
                $vis:vis struct $name:ident {
                        $( #[$field_meta:meta] )*
                        //          ^~~~field attributes~~~!^
                            $field_vis:vis $field_name:ident : $field_ty:ty
                        //          ^~~~~~~~~~~~~~~~~a single field~~~~~~~~~~~~~~~^
                        $(,)? }
        ) => {
            $( #[$meta] )*
                $vis struct $name {
                        $( #[$field_meta] )*
                            $field_vis $field_name : $field_ty
    mymacro! {
        struct Foo2 {
            myfield: u8,

    Sadly this fails with:

    error: proc-macro derive panicked
      --> src/
    97 |     #[derive(ArrowField)]
       |              ^^^^^^^^^^
       = help: message: Only types are supported atm
    error: could not compile `dataframe` due to previous error

    The only macro that seemed to work is this one:

    macro_rules! mymacro2 {
        ($($tts:tt)*) => { $($tts)* }

    Other derive macros such as JsonSchema don't seem to have this issue:

    EDIT: This also fails, but differently:

    mymacro! {
        struct Foo2 {
            myfield: u8,
    struct Foo3(Foo2);

    fails with:

    error: proc-macro derive panicked
       --> src/
    114 | #[derive(ArrowField)]
        |          ^^^^^^^^^^
        = help: message: called `Option::unwrap()` on a `None` value
  • enable customizing list inner child element name?

    enable customizing list inner child element name?

    When Spark outputs a parquet file, I believe it always uses the inner list item name of element as opposed to item:

    message spark_schema {
      OPTIONAL group mylistcolumn (LIST) {
        REPEATED group list {
          OPTIONAL BYTE_ARRAY element (UTF8);

    It appears this crate (or one of its dependencies, perhaps arrow2 itself?), is always assuming that the inner field name of a list is item rather than element.

    Expected: Struct([Field { name: "mylistcolumn", data_type: List(Field { name: "item", data_type: Int32, is_nullable: false, metadata: {} }), is_nullable: false, metadata: {} }])

    Actual: Struct([Field { name: "mylistcolumn", data_type: List(Field { name: "element", data_type: Int32, is_nullable: false, metadata: {} }), is_nullable: false, metadata: {} }])

    I'm guessing this is because of this line of code?

    1. If this is controlled by arrow2-convert, can we perhaps customize this via an annotation on the struct member?
    2. Should the default by re-evaluated if parquet-mr / Spark uses element?

    P.S. Likely not related, but I ran into a very similar error in this other crate as well:

  • improve

    improve "Data type mismatch" error message

    Currently the error message doesn't give enough information to debug what's going on.

    Something like this helped me debug an issue I was running into:

        "Data type mismatch. Expected: {:?} | Found: {:?}",
        &<ArrowType as ArrowField>::data_type(),

    It produced an error message that looks like:

    thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Data type mismatch. Expected: Struct([...redacted...]) | Found: Struct(...redacted...)")', examples/

    Could potentially make it even better by doing a diff.

  • is it possible to run `try_into_collection` on a `Chunk` instead of an `Array?`

    is it possible to run `try_into_collection` on a `Chunk` instead of an `Array?`

    Starting with the parquet_read_parallel example from arrow2, I am trying to deserialize a Chunk into a Vec of structs.

    Using the deserialize_parallel function as defined in the above example, the following code currently works for me:

    pub struct Document {
        content: String,
    let chunk = deserialize_parallel(&mut columns)?;
    let array = StructArray::new(
    let documents: Vec<Document> = array.to_boxed().try_into_collection().unwrap();


    1. With the currently exposed APIs in arrow2 and arrow2-convert, is there a better way to convert the Chunk into a Struct? I think the extra conversion from Chunk to StructArray with the to_boxed at the end is perhaps not the most efficient.
    2. Would it be possible to expose TryIntoCollection::try_into_collection directly on the Chunk as well?
  • v0.3.2(Sep 29, 2022)

    What's Changed

    • Upgrade to arrow2 v0.14 @ncpenke

  • v0.3.0(Aug 25, 2022)

    Thank you @nielsmeima, and @teymour-aldridge for your contributions to this release!

    Features and Enhancements

    • Add support for converting to Chunk @ncpenke
    • Add support for i128 @ncpenke
    • Add support for enums @ncpenke
    • Flatten chunks @nielsmeima
    • Serialize escaped Rust identifiers unescaped. @teymour-aldridge
    • Update arrow2 version. @teymour-aldridge

  • v0.2.0(Jun 13, 2022)

  • v0.1.0(Mar 3, 2022)

