Rust DataFrame library

Overview

Polars

rust docs Build and test Gitter

Blazingly fast DataFrames in Rust & Python

Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arrow as backend.

It currently consists of an eager API similar to pandas and a lazy API that is somewhat similar to spark. Amongst more, Polars has the following functionalities.

To learn more about the inner workings of Polars read the WIP book.

Functionality Eager Lazy (DataFrame) Lazy (Series)
Filters
Shifts
Joins
GroupBys + aggregations
Comparisons
Arithmetic
Sorting
Reversing
Closure application (User Defined Functions)
SIMD
Pivots
Melts
Filling nulls + fill strategies
Aggregations
Moving Window aggregates
Find unique values
Rust iterators
IO (csv, json, parquet, Arrow IPC
Query optimization: (predicate pushdown)
Query optimization: (projection pushdown)
Query optimization: (type coercion)
Query optimization: (simplify expressions)
Query optimization: (aggregate pushdown)

Note that almost all eager operations supported by Eager on Series/ChunkedArrays can be used in Lazy via UDF's

Documentation

Want to know about all the features Polars support? Read the docs!

Rust

Python

Performance

Polars is written to be performant, and it is! But don't take my word for it, take a look at the results in h2oai's db-benchmark.

Cargo Features

Additional cargo features:

  • temporal (default)
    • Conversions between Chrono and Polars for temporal data
  • simd (nightly)
    • SIMD operations
  • parquet
    • Read Apache Parquet format
  • json
    • Json serialization
  • ipc
    • Arrow's IPC format serialization
  • random
    • Generate array's with randomly sampled values
  • ndarray
    • Convert from DataFrame to ndarray
  • lazy
    • Lazy api
  • strings
    • String utilities for Utf8Chunked
  • object
    • Support for generic ChunkedArray's called ObjectChunked<T> (generic over T). These will downcastable from Series through the Any trait.
  • parallel
    • ChunkedArrays can be used by rayon::par_iter()
  • [plain_fmt | pretty_fmt] (mutually exclusive)
    • one of them should be chosen to fmt DataFrames. pretty_fmt can deal with overflowing cells and looks nicer but has more dependencies. plain_fmt is plain formatting.

Contribution

Want to contribute? Read our contribution guideline.

Env vars

  • POLARS_PAR_SORT_BOUND -> Sets the lower bound of rows at which Polars will use a parallel sorting algorithm. Default is 1M rows.
  • POLARS_FMT_MAX_COLS -> maximum number of columns shown when formatting DataFrames.
  • POLARS_FMT_MAX_ROWS -> maximum number of rows shown when formatting DataFrames.
  • POLARS_TABLE_WIDTH -> width of the tables used during DataFrame formatting.
  • POLARS_MAX_THREADS -> maximum number of threads used in join algorithm. Default is unbounded.
Comments
  • Groupby on integer + date column of large dataframe requests enormous memory alloc [Windows only]

    Groupby on integer + date column of large dataframe requests enormous memory alloc [Windows only]

    Are you using Python or Rust?

    Python

    Which feature gates did you use?

    NA

    What version of polars are you using?

    0.10.5

    What operating system are you using polars on?

    Windows10

    Describe your bug.

    Polarsa fails to execute groupby on modestly sized dataframe

    What are the steps to reproduce the behavior?

    I've used something like the below code to generate the data (it's pretty big) and then run the polars_load function

    import pandas as pd
    import polars as pl
    
    def make_some_data():
      n_ids = int(1e3)
      n_features = 10
      freq = "1H"
      year = 2000
      start = f"{year}0101"
      end = f"{year + 1}0615" # one year and a bit
      date_index = pd.date_range(start=start, end=end, freq=freq, closed="left")
      dates= np.tile(date_index, n_ids)
      n_dates = len(date_index)
      
      ids = np.repeat(np.arange(n_ids, dtype=np.int32), n_dates)
      features= np.random.randn(n_ids * n_dates, n_features).astype(np.float32)
      df = pd.DataFrame(
          {
              "ids": ids,
              "dates": dates,
          }
      )
      df[[f"feature_{i}" for i in range(n_features)]] = features
      
      print(f"\t[year={year}] n_ids={n_ids}, n_dates={n_dates}, n_rows={n_ids * n_dates}, n_cols={n_features}", flush=True)
      df.to_parquet(data_path / f"features_{year}.parquet")
    
    def pandas_load(data_dir: Path):
        import pandas as pd
    
        df = pd.read_parquet(f)
        agg_df = df.groupby(["ids", "dates"]).agg("mean")
        return agg_df
    
    
    def polars_load(df):
        df = pl.read_parquet(f)
        agg_df = df.groupby(["ids", "dates"]).agg(pl.col("*").mean())
    
        return agg_df
    

    If we cannot reproduce the bug, it is unlikely that we will be able fix it.

    What is the actual behavior?

    I get the following error

    UserWarning: Conversion of (potentially) timezone aware to naive datetimes. TZ information may be 
    lost
      "Conversion of (potentially) timezone aware to naive datetimes. TZ information may be lost",
    memory allocation of memory allocation of memory allocation of 191160001911600019116000 bytes failed
    

    The CPU usage goes to 100, memory stays modest, and then the screen goes dark and I regain control after a minute or two with an unkillable python.exe -System Error window

    What is the expected behavior?

    Running the pandas_load function exhibits the correct behaviour.

    help wanted 
    opened by CHDev93 43
  • refactor[python]: Dispatch Series namespace methods to Expr using a decorator

    refactor[python]: Dispatch Series namespace methods to Expr using a decorator

    Relates to #4422

    Changes:

    • Added a decorator for dispatching Series methods to the Expr equivalent.
    • Created a module series.utils to house the decorator. Moved the get_ffi_func here as well.
    • Applied the decorator to all Series namespace methods. Only a handful of methods did not have a directly equivalent expression.

    I like that it's now very explicit that these methods do not implement any fancy - they only dispatch to another implementation.

    If you like this approach, I will try to apply this to the Series non-namespace methods next, and then see if I can do something similar for DataFrame/LazyFrame.

    python 
    opened by stinodego 33
  • Reading nested struct panics with `OutOfSpec` error

    Reading nested struct panics with `OutOfSpec` error

    What language are you using?

    Rust

    Which feature gates did you use?

    "polars-io", "parquet", "lazy", "dtype-struct"

    Have you tried latest version of polars?

    • [yes]

    What version of polars are you using?

    Latest, master branch.

    What operating system are you using polars on?

    macOS Monterey 12.3.1

    What language version are you using

    $ rustc --version
    rustc 1.64.0-nightly (495b21669 2022-07-03)
    
    $ cargo --version
    cargo 1.64.0-nightly (dbff32b27 2022-06-24)
    

    Describe your bug.

    Reading nested struct panics with OutOfSpec error.

    What are the steps to reproduce the behavior?

    Given the attached parquet file with only 2 rows: nested_struct_OutOfSpec.snappy.parquet.zip

    Running the following code:

    let file_location = "nested_struct_OutOfSpec.snappy.parquet".to_string();
    let df = LazyFrame::scan_parquet(
        file_location, 
        ScanArgsParquet::default())
        .unwrap()
        .select([all()])
        .collect()
        .unwrap();
    dbg!(df);
    

    Results in this panic error:

    thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: OutOfSpec("The children 
    DataTypes of a StructArray must equal the children data types.\n                         However, the 
    values 1 has a length of 11, which is different from values 0, 2.")', 
    /.../.cargo/git/checkouts/arrow2-945af624853845da/eeddfac/src/array/struct_/mod.rs:118:52
    

    What is the actual behavior?

    The result is a panic error with this output:

    thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: OutOfSpec("The children 
    DataTypes of a StructArray must equal the children data types.\n                         However, the 
    values 1 has a length of 11, which is different from values 0, 2.")', 
    /.../.cargo/git/checkouts/arrow2-945af624853845da/eeddfac/src/array/struct_/mod.rs:118:52
    stack backtrace:
       0: rust_begin_unwind
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/std/src/panicking.rs:584:5
       1: core::panicking::panic_fmt
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/panicking.rs:142:14
       2: core::result::unwrap_failed
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:1805:5
       3: core::result::Result<T,E>::unwrap
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:1098:23
       4: arrow2::array::struct_::StructArray::new
                 at /.../.cargo/git/checkouts/arrow2-945af624853845da/eeddfac/src/array/struct_/mod.rs:118:9
       5: arrow2::array::struct_::StructArray::from_data
                 at /.../.cargo/git/checkouts/arrow2-945af624853845da/eeddfac/src/array/struct_/mod.rs:127:9
       6: <arrow2::io::parquet::read::deserialize::struct_::StructIterator as core::iter::traits::iterator::Iterator>::next
                 at /.../.cargo/git/checkouts/arrow2-945af624853845da/eeddfac/src/io/parquet/read/deserialize/struct_.rs:50:22
       7: <alloc::boxed::Box<I,A> as core::iter::traits::iterator::Iterator>::next
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/boxed.rs:1868:9
       8: <arrow2::io::parquet::read::deserialize::struct_::StructIterator as core::iter::traits::iterator::Iterator>::next::{{closure}}
                 at /.../.cargo/git/checkouts/arrow2-945af624853845da/eeddfac/src/io/parquet/read/deserialize/struct_.rs:26:25
       9: core::iter::adapters::map::map_fold::{{closure}}
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/map.rs:84:28
      10: core::iter::traits::iterator::Iterator::fold
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/traits/iterator.rs:2414:21
      11: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/map.rs:124:9
      12: core::iter::traits::iterator::Iterator::for_each
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/traits/iterator.rs:831:9
      13: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/spec_extend.rs:40:17
      14: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/spec_from_iter_nested.rs:62:9
      15: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/spec_from_iter.rs:33:9
      16: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/mod.rs:2648:9
      17: core::iter::traits::iterator::Iterator::collect
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/traits/iterator.rs:1836:9
      18: <arrow2::io::parquet::read::deserialize::struct_::StructIterator as core::iter::traits::iterator::Iterator>::next
                 at /.../.cargo/git/checkouts/arrow2-945af624853845da/eeddfac/src/io/parquet/read/deserialize/struct_.rs:23:22
      19: <alloc::boxed::Box<I,A> as core::iter::traits::iterator::Iterator>::next
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/boxed.rs:1868:9
      20: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/map.rs:103:9
      21: <alloc::boxed::Box<I,A> as core::iter::traits::iterator::Iterator>::next
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/boxed.rs:1868:9
      22: core::iter::traits::iterator::Iterator::try_fold
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/traits/iterator.rs:2237:29
      23: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:191:9
      24: core::iter::traits::iterator::Iterator::try_for_each
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:174:9
      25: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::next
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:174:9
      26: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/spec_from_iter_nested.rs:26:32
      27: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/spec_from_iter.rs:33:9
      28: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/mod.rs:2648:9
      29: core::iter::traits::iterator::Iterator::collect
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:2092:49
      30: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter::{{closure}}
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:2092:49
      31: core::iter::adapters::try_process
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:160:17
      32: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:2092:9
      33: core::iter::traits::iterator::Iterator::collect
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/traits/iterator.rs:1836:9
      34: polars_io::parquet::read_impl::array_iter_to_series
                 at /.../github/polars/polars/polars-io/src/parquet/read_impl.rs:47:17
      35: polars_io::parquet::read_impl::column_idx_to_series
                 at /.../github/polars/polars/polars-io/src/parquet/read_impl.rs:36:9
      36: polars_io::parquet::read_impl::rg_to_dfs::{{closure}}
                 at /.../github/polars/polars/polars-io/src/parquet/read_impl.rs:126:21
      37: core::iter::adapters::map::map_try_fold::{{closure}}
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/map.rs:91:28
      38: core::iter::traits::iterator::Iterator::try_fold
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/traits/iterator.rs:2238:21
      39: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/map.rs:117:9
      40: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:191:9
      41: core::iter::traits::iterator::Iterator::try_for_each
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:174:9
      42: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::next
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:174:9
      43: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/spec_from_iter_nested.rs:26:32
      44: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/spec_from_iter.rs:33:9
      45: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/mod.rs:2648:9
      46: core::iter::traits::iterator::Iterator::collect
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:2092:49
      47: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter::{{closure}}
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:2092:49
      48: core::iter::adapters::try_process
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:160:17
      49: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:2092:9
      50: core::iter::traits::iterator::Iterator::collect
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/traits/iterator.rs:1836:9
      51: polars_io::parquet::read_impl::rg_to_dfs
                 at /.../github/polars/polars/polars-io/src/parquet/read_impl.rs:123:13
      52: polars_io::parquet::read_impl::read_parquet
                 at /.../github/polars/polars/polars-io/src/parquet/read_impl.rs:249:63
      53: polars_io::parquet::read::ParquetReader<R>::_finish_with_scan_ops
                 at /.../github/polars/polars/polars-io/src/parquet/read.rs:60:9
      54: polars_lazy::physical_plan::executors::scan::parquet::ParquetExec::read
                 at /.../github/polars/polars/polars-lazy/src/physical_plan/executors/scan/parquet.rs:39:9
      55: <polars_lazy::physical_plan::executors::scan::parquet::ParquetExec as polars_lazy::physical_plan::Executor>::execute::{{closure}}
                 at /.../github/polars/polars/polars-lazy/src/physical_plan/executors/scan/parquet.rs:61:68
      56: polars_lazy::physical_plan::file_cache::FileCache::read
                 at /.../github/polars/polars/polars-lazy/src/physical_plan/file_cache.rs:40:13
      57: <polars_lazy::physical_plan::executors::scan::parquet::ParquetExec as polars_lazy::physical_plan::Executor>::execute
                 at /.../github/polars/polars/polars-lazy/src/physical_plan/executors/scan/parquet.rs:59:9
      58: <polars_lazy::physical_plan::executors::udf::UdfExec as polars_lazy::physical_plan::Executor>::execute
                 at /.../github/polars/polars/polars-lazy/src/physical_plan/executors/udf.rs:12:18
      59: polars_lazy::frame::LazyFrame::collect
                 at /.../github/polars/polars/polars-lazy/src/frame/mod.rs:718:19
      60: gyrfalcon::main
                 at ./src/main.rs:21:14
      61: core::ops::function::FnOnce::call_once
                 at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/ops/function.rs:248:5
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    

    What is the expected behavior?

    The parquet file should have been correctly loaded.

    The parquet-tools util shows it property. Also, Apache Spark properly reads it and processes it.

    bug 
    opened by andrei-ionescu 32
  • Failed join #2

    Failed join #2

    Are you using Python or Rust?

    Python

    What version of polars are you using?

    0.12.18

    What operating system are you using polars on?

    CentOS Linux release 8.1.1911 (Core)

    Describe your bug.

    Rows aren't joining that should. I.e. on an inner join, some rows are being omitted from the joined data frame that should be there. On a left join, some of the right columns are filled with null values when they shouldn't be.

    What are the steps to reproduce the behavior?

    Output:

    >>> ./reproduce_error.py
    --right_df--
    shape: (1, 4)
    ┌──────────┬───────────────────┬───────────────────┬────────────────┐
    │ join_col ┆ new_col_from_join ┆ table_1_indicator ┆ other_join_len │
    │ ---      ┆ ---               ┆ ---               ┆ ---            │
    │ i64      ┆ f64               ┆ bool              ┆ i64            │
    ╞══════════╪═══════════════════╪═══════════════════╪════════════════╡
    │ 59546483 ┆ 6.9900e-46        ┆ true              ┆ null           │
    └──────────┴───────────────────┴───────────────────┴────────────────┘
    --left_df--
    shape: (1, 3)
    ┌────────────┬──────────┬────────────────┐
    │ temp       ┆ join_col ┆ other_join_len │
    │ ---        ┆ ---      ┆ ---            │
    │ str        ┆ i64      ┆ i64            │
    ╞════════════╪══════════╪════════════════╡
    │ 1_59546483 ┆ 59546483 ┆ null           │
    └────────────┴──────────┴────────────────┘
    --join--
    shape: (1, 5)
    ┌────────────┬──────────┬────────────────┬───────────────────┬───────────────────┐
    │ temp       ┆ join_col ┆ other_join_len ┆ new_col_from_join ┆ table_1_indicator │
    │ ---        ┆ ---      ┆ ---            ┆ ---               ┆ ---               │
    │ str        ┆ i64      ┆ i64            ┆ f64               ┆ bool              │
    ╞════════════╪══════════╪════════════════╪═══════════════════╪═══════════════════╡
    │ 1_59546483 ┆ 59546483 ┆ null           ┆ null              ┆ null              │
    └────────────┴──────────┴────────────────┴───────────────────┴───────────────────┘
    

    Notice that the two columns being joined on ('join_col' and 'other_join_len') are the same in the right and left DFs, but the right columns don't show up in the joined DF.

    Reproducing code

    #!/usr/bin/env python3
    
    import polars as pl
    
    right_df_1 = pl.scan_csv(
        'repro_tsv_1.txt',
        sep='\t',
    ).select([
        'join_col',
        'new_col_from_join',
        pl.lit(True).alias('table_1_indicator'),
        pl.lit(None).cast(int).alias('other_join_len'),
    ])
    
    right_df_2 = pl.scan_csv(
        'repro_tsv_2.txt',
        sep='\t',
    ).select([
        'join_col',
        'new_col_from_join',
        pl.col('other_join_col').str.lengths().cast(int).alias('other_join_len'),
    ]).groupby(['join_col', 'other_join_len']).agg([
        pl.col('new_col_from_join').min().alias('new_col_from_join'),
    ]).with_columns([
        pl.lit(False).alias('table_1_indicator')
    ]).select([ 'join_col', 'new_col_from_join', 'table_1_indicator', 'other_join_len'])
    
    right_df = pl.concat([right_df_1, right_df_2])
    print('--right_df--')
    print(right_df.filter((pl.col('join_col') == 59546483) & pl.col('other_join_len').is_null()).collect())
    
    rownames_fname = 'munged_rownames.txt'
    with open(rownames_fname) as var_file:
        rownames = [line.strip() for line in var_file if line.strip()]
    
    left_df = pl.DataFrame({
        'temp': rownames,
    }).lazy().with_columns([
        pl.col('temp').str.extract('^[^_]*_([^_]*)', 1).cast(int).alias('join_col'),
        pl.col('temp').str.extract('^[^_]*_[^_]*_([^_]*)$', 1).str.lengths().cast(int).alias('other_join_len'),
    ])
    print('--left_df--')
    print(left_df.filter((pl.col('join_col') == 59546483) & pl.col('other_join_len').is_null()).collect())
    
    total_df = left_df.join(
        right_df,
        how='left',
        on=['join_col', 'other_join_len']
    ).collect()
    print('--join--')
    print(total_df.filter((pl.col('join_col') == 59546483) & pl.col('other_join_len').is_null()))
    

    repro_tsv_1.txt repro_tsv_2.txt munged_rownames.txt

    bug 
    opened by LiterallyUniqueLogin 32
  • test(python): Parametric test coverage for EWM functions

    test(python): Parametric test coverage for EWM functions

    Massive expansion of EWM test coverage, using pandas (with ignore_na=True) as a reference implementation for comparison purposes.

    Parametric tests cover...

    • use of all three decay params: com, span, half_life.
    • floating point values between -1e8 and 1e8.
    • null values present / not present.
    • different min_period values.
    • chunked / unchunked series.
    • series of different lengths
    • int and float series.
    python test 
    opened by alexander-beedie 31
  • Additional lints for the Python code base

    Additional lints for the Python code base

    As the project becomes more popular, we can expect more people to start contributing to the code base. Having a good linting setup will make sure our code quality remains consistently high, while aiding in the code review process. I outlined a number of tools/settings that I think will help. Suggestions are more than welcome.

    flake8

    • [x] Set max-line-length = 88
      • We have a lot of unnecessary inconsistency regarding line lengths in the code. ~#4041 helps address this for the code itself.~ There are many strings/comments/docstrings that could easily fit within 88 characters, but are now going over this limit for no reason. These should be fixed. Exceptions like these can be ignored on a per-case basis using # noqa: E501.

    flake8 plugins

    flake8 has a rich plugin ecosystem with additional lints that can help keep your code clean. They can be enabled simply by adding them to our build requirements. Using these, flake8 becomes more like the programming buddy that cargo is for Rust. Below is a list that I recommend (loosely in order of importance):

    All of the following find legitimate issues in the existing code base:

    We will skip the flake8 lints below for now. They have minimal impact.

    The following should be nice to enforce, but we are currently compliant:

    The following I am not sure about, but might be useful:

    • ~flake8-use-fstring - Force the use of f-strings for formatted strings. Should be nice, but seems to have some false positives for our code base.~
    • ~flake8-annotations - Helps set certain requirements for type hints. Not sure how well this complements mypy.~
    • ~flake8-eradicate - Helps identify and remove commented out code. The idea is nice, but we have some code commented out on purpose, so there might be false positives.~

    mypy

    I would like to set strict = True for mypy in order to improve reliability and quality of our type hints. This currently produces 1157 errors in 38 files. The strict flag is a combination of multiple strictness-related flags. I recommend we enable these one-by-one and fix the related errors.

    • [x] warn-unused-configs
    • [x] disallow-any-generics
    • [x] disallow-subclassing-any
    • [x] disallow-untyped-calls
    • [x] disallow-untyped-defs
    • [x] disallow-incomplete-defs
    • [x] check-untyped-defs
    • [x] disallow-untyped-decorators
    • [x] no-implicit-optional
    • [x] warn-redundant-casts
    • [x] warn-unused-ignores
    • [x] warn-return-any
    • [x] no-implicit-reexport
    • [x] strict-equality
    • [x] strict-concatenate

    Other helpful CLI tools

    These can be added as additional commands in the CI pipeline.

    • [x] pyupgrade - Makes sure to use that we're using the latest language features.
    • ~pycln - Automatically detect and remove unused imports. Has functionality for identifying side effects (like the pyarrow imports).~ Not worth it for the small benefit it brings. flake8 will catch unused imports; fix them manually.
    • ~yesqa - Functions like mypy's warn-unused-ignores. It makes sure all the # noqa comments are actually necessary.~ Not worth incorporating in the CI right now.
    feature python 
    opened by stinodego 31
  • shift_and_fill by groups + other operations by grouping variables

    shift_and_fill by groups + other operations by grouping variables

    I'm new to polars and would like to understand how to run certain methods by group variables

    A few methods include: lags and rolling stats with varying window sizes: lag1, lag2, ..., lagN ma1, ma2, ..., maN

    Using python

    # Create data (data from apply section of docs)
    data = pl.DataFrame(
        {
            "A": [1, 2, 3, 4, 5],
            "fruits": ["banana", "banana", "apple", "apple", "banana"],
            "B": [5, 4, 3, 2, 1],
            "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
        }
    )
    
    # Define variables
    ShiftColName = "B"
    N = 1
    ImputeValue = -1
    GroupVariable = "fruits"
    
    # shift_and_fill: no groups
    data.hstack(data[[ShiftColName]].shift_and_fill(N, fill_value = ImputeValue).rename({ShiftColName : 'Lag_' + str(N) + '_' + ShiftColName}))
    
    # shift_and_fill: by fruits?
    ?
    
    opened by AdrianAntico 26
  • write_parquet function in polars-u64-idx does not support large data frames

    write_parquet function in polars-u64-idx does not support large data frames

    What language are you using?

    Python

    What version of polars are you using?

    0.13.21

    What operating system are you using polars on?

    Ubuntu 20.04.1 LTS

    What language version are you using

    Python 3.8.5

    Describe your bug.

    I'm using the 64 bit version of Polars. However, the write_parquet function does not seem to support large data frames.

    What are the steps to reproduce the behavior?

    df = pl.select(pl.repeat(0,n=2**32).alias('col_1'))
    df.write_parquet('tmp.parquet')
    

    What is the actual behavior?

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.8/dist-packages/polars/internals/frame.py", line 1400, in write_parquet
        self._df.to_parquet(file, compression, statistics)
    exceptions.ArrowErrorException: ExternalFormat("underlying snap error: snappy: input buffer (size = 34359738368) is larger than allowed (size = 4294967295)")
    
    bug 
    opened by jnthnhss 25
  • Add optional lexical ordering of dtype `Categorical`

    Add optional lexical ordering of dtype `Categorical`

    Are you using Python or Rust?

    Python

    What version of polars are you using?

    0.12.20

    What operating system are you using polars on?

    macOS Monterey

    Describe your bug.

    DataFrame.sort() on Categorical type behave incorrectly.

    What are the steps to reproduce the behavior?

    df = pl.DataFrame(
        [
            pl.Series("col1", [1, 1], dtype=pl.UInt8),
            pl.Series("col2", ["foo", "bar"], dtype=pl.Categorical),
            pl.Series("col3", [3.3, 1.1], dtype=pl.Float64),
        ]
    )
    df.sort(['col1', 'col2'])
    

    What is the actual behavior?

    shape: (2, 3)
    ┌──────┬──────┬──────┐
    │ col1 ┆ col2 ┆ col3 │
    │ ---  ┆ ---  ┆ ---  │
    │ u8   ┆ cat  ┆ f64  │
    ╞══════╪══════╪══════╡
    │ 1    ┆ foo  ┆ 3.3  │
    ├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
    │ 1    ┆ bar  ┆ 1.1  │
    └──────┴──────┴──────┘
    

    What is the expected behavior?

    shape: (2, 3)
    ┌──────┬──────┬──────┐
    │ col1 ┆ col2 ┆ col3 │
    │ ---  ┆ ---  ┆ ---  │
    │ u8   ┆ cat  ┆ f64  │
    ╞══════╪══════╪══════╡
    │ 1    ┆ bar  ┆ 1.1  │
    ├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
    │ 1    ┆ foo  ┆ 3.3  │
    └──────┴──────┴──────┘
    

    I think polars should sort correctly on categorical type, like the following:

    df.sort('col2')
    shape: (2, 3)
    ┌──────┬──────┬──────┐
    │ col1 ┆ col2 ┆ col3 │
    │ ---  ┆ ---  ┆ ---  │
    │ u8   ┆ cat  ┆ f64  │
    ╞══════╪══════╪══════╡
    │ 1    ┆ bar  ┆ 1.1  │
    ├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
    │ 1    ┆ foo  ┆ 3.3  │
    └──────┴──────┴──────┘
    
    feature 
    opened by mutecamel 25
  •  fix(python): fix delta issues

    fix(python): fix delta issues

    Aims to fixes the following issues,

    • https://github.com/pola-rs/polars/issues/5790
    • https://github.com/pola-rs/polars/issues/5785

    Checklist

    • [x] Fix imports issue
    • [x] Resolve paths before passing to delta for relative path support
    • [x] read_delta should rely on delta side implementation
    • [x] scan_delta should work as expected without relying on delta implementation.
    • [x] create internal test matrix of [local, s3, azure, gcs] X [read, scan] X [absolute, relative paths, absolute with external fs, relative with external fs]
    • [x] additional cases for relative paths
    • [x] perform tests in a separate env
    • [x] added example to read and scan from azure
    • [x] added example to read and scan from gcs

    Notes

    1. Till the time https://github.com/delta-io/delta-rs/issues/1015 is not resolved, I have added a simple delta specific path resolution in scan_delta. Later we can remove this in favor of the implementation provided in delta-rs itself.
    2. read_delta now relies on the implementation provided in delta-rs itself.
    3. For GCS, Azure and S3 there is no relative path support, full URI must be provided.

    Testing

    1. For LocalFS, GCS, Azure and S3, I've created an internal test matrix as described in the check list which contains over 35 additional tests other than the unit tests. I'll check how to mock different pyarrow.fs somehow and then add these to the unit tests later.
    2. This will matrix was also executed in a separate venv, after locally installing polars with fixes from a local full build (maturin build) as the import errors were not caught in the unit tests before.

    Signed-off-by: chitralverma [email protected]

    python fix 
    opened by chitralverma 23
  • Optimize python imports

    Optimize python imports

    Problem Description

    Our optional dependencies influence our import time. Pandas import is ~500ms!!

    Import times:

    only polars installed:
    0.268s
    polars + pandas installed
    0.755s
    

    We should explore if we can do this lazily.

    image image

    performance 
    opened by ritchie46 22
  • No example of SQLContext use in documentation

    No example of SQLContext use in documentation

    Polars version checks

    • [X] I have checked that this issue has not already been reported.

    • [X] I have confirmed this bug exists on the latest version of Polars.

    Issue description

    There is no example on how to use SQLContext. Luckily I found one in the PR which added it.

    https://pola-rs.github.io/polars/py-polars/html/reference/sql.html

    In particular, I think it would be useful to show how it can replicate pandas.DataFrame.query, when using polar expressions is difficult (for example JSON serialized filter conditions):

    # pandas
    df.query("level > 2")
    
    # polars
    ctx = pl.SQLContext()
    ctx.register("df", df.lazy())
    ctx.query("SELECT * FROM df WHERE level > 2")
    

    Is SQLContext relatively supported or is it at risk of being removed in the future?

    Reproducible example

    NA
    

    Expected behavior

    NA

    Installed versions

    Replace this line with the output of pl.show_versions(), leave the backticks in place
    
    bug python 
    opened by 2-5 0
  • Performance degradation caused by presence of optional numpy dependency

    Performance degradation caused by presence of optional numpy dependency

    First of all, thanks for making polars available! It turned out to be a blazing fast alternative to pandas when doing pre-processing for real-time ML inference. True game changer!

    We run an internal benchmark to flag possible performance regressions as part of testing new polars releases. Starting with 0.15.9, I noticed a minor performance regression that affects DataFrame initialization from data: dict[str, tuple[float | str | None, ...]]. After some digging, I believe this is caused by #5918 which dramatically increases calls to _NUMPY_TYPE().

    I was surprised by the performance impact of _NUMPY_TYPE():

    • without numpy being installed, the impact is minimal. Great!
    • with numpy being merely installed, the impact grows by a factor of 27x - regardless of whether numpy is actually used. Not so great!

    There is a trivial hack that speeds up DataFrame initialization and our benchmark by ~10%, well offsetting the initially discovered regression:

    import polars.dependencies polars.dependencies._NUMPY_AVAILABLE = False

    Would you consider supporting a corresponding fast path for users that are also working with polars and numpy, just not in combination?

    A few potential approaches come to mind:

    1. Refactor polars.dependencies._NUMPY_AVAILABLE and make it available as part of the API - for users to overwrite manually when required. Something like polars.NO_NUMPY = True. -> Still feels hacky
    2. Make polars.dependencies._NUMPY_AVAILABLE user configurable, e.g. by supporting optional environment variables or config files that may be used to trigger an override of polars.dependencies._NUMPY_AVAILABLE. -> I would be happy to use an environment variable, such as PY_POLARS_NO_NUMPY=1
    3. Understand and accept performance penalty of polars.dependencies._NUMPY_AVAILABLE and polars.dependencies._NUMPY_TYPE() and potentially reduce reliance on those throughout the code-base. -> high effort

    BTW: The same issue applies to similar constants, such as polars.dependencies._PANDAS_AVAILABLE, which might affect other use cases, too.

    opened by jakob-keller 0
  • feat(python): Improve iterating over `GroupBy`

    feat(python): Improve iterating over `GroupBy`

    Changes:

    • Removed the GroupBy._select and GroupBy._select_all deprecated private methods. These are unnecessary as we have GroupBy.agg which can offer the same functionality. This means that the GBSelection class can also be removed.
    • Removed the GroupBy._groups private method. This method is unnecessary as we can get the same functionality by using GroupBy.agg with the agg_groups expression. And the benefit of using agg is that it respects the maintain_order parameter.
    • Implemented the GroupBy.__next__ method to complement __iter__. This makes the GroupBy object an iterator, which means you can do things like call list(df.groupby('a')) to get a list of dataframes (one for each group).

    This is not breaking because these methods were private and undocumented. The fact that it is now an iterator instead of just an iterable should not break anyone's workflows, I believe.

    This also addresses the last bit of (currently) deprecated behaviour in the code base!

    python enhancement 
    opened by stinodego 0
  • fix(python): Fix typing for `DataFrame.select`

    fix(python): Fix typing for `DataFrame.select`

    Fixes #6026

    Turns out Iterable is an acceptable type, rather than Sequence. There are probably a lot of other functions that can be fixed this way (anything that utilizes selection_to_pyexpr_list), but I'll leave it at this for now.

    python fix 
    opened by stinodego 0
Releases(py-0.15.11)
  • py-0.15.11(Jan 3, 2023)

    🚀 Performance improvements

    • ensure set_at_idx is O(1) (#5977)

    ✨ Enhancements

    • allow eq,ne,lt etc (#5995)
    • Improve Expr.is_between API (#5981)
    • large speedup for df.iterrows (~200-400%) (#5979)
    • updated default table format from "UTF8_FULL" to "UTF8_FULL_CONDENSED" (#5967)
    • Access rows as namedtuples (#5966)
    • Improve assert_frame_equal messages (#5962)

    🐞 Bug fixes

    • make weekday tz-aware (#5989)
    • fix categorical in struct anyvalue issue (#5987)
    • fix invalid boolean simplification (#5976)
    • allow empty sort on any dtype (#5975)
    • properly deal with categoricals in streaming queries (#5974)

    Thank you to all our contributors for making this release possible! @alexander-beedie, @ritchie46 and @stinodego

    Source code(tar.gz)
    Source code(zip)
  • py-0.15.9(Dec 31, 2022)

    🚀 Performance improvements

    • improve reducing window function performance ~33% (#5878)

    ✨ Enhancements

    • str.strip with multiple chars (#5929)
    • add iterrrows (#5945)
    • read decimal as f64 (#5938)
    • improve query plan scan formatting (#5937)
    • allow all null cast (#5933)
    • allow objects in struct types (#5925)
    • handle Series init from python sequence of numpy arrays (#5918)
    • merge sorted dataframes (#5817)
    • impl hex and base64 for binary (#5892)
    • Add datatype hierarchy (#5901)
    • Add .item() on DataFrame and Series (#5893)
    • make get_any_value fallible (#5877)
    • Add string representation for data types (#5861)
    • directly push all operator result into sink, prev… (#5856)

    🐞 Bug fixes

    • don't panic on ignored context (#5958)
    • don't allow named expression in arr.eval (#5957)
    • error on invalid dtype (#5956)
    • fix panic in join expressions (#5954)
    • block ordered predicates before explode (#5951)
    • adhere to schema in arr.eval of empty list (#5947)
    • fix from_dict schema_inference=0 (#5948)
    • fix arrow nested null conversion (#5946)
    • allow None in arr.slice length (#5934)
    • fix time to duration cast (#5932)
    • error on addition with datetime/time (#5931)
    • don't create categoricals in streaming (#5926)
    • object filter should keep single chunk (#5913)
    • csv, read escaped "" as missing (#5912)
    • fix pivot of signed integers (#5909)
    • don't allow duplicate columns in read_csv arg (#5908)
    • fix latest oob in streaming convertion (#5902)
    • adapt k to len in topk (#5888)
    • fix lazy swapping rename (#5884)
    • fix window function with nullable values; regression due… (#5874)
    • improve equality consistency between types (#5873)
    • evaluate whole branch expression to determine if r… (#5864)
    • fix top_k on empty (#5865)
    • fix slice in streaming (#5854)
    • Fix type hint for IO *_options arguments (#5852)

    🛠️ Other improvements

    • Fix docs for sink_parquet (#5952)
    • Fix misspelling in LazyFrame docstring (#5917)
    • add bin, series.is_sorted and merge_sorted (#5914)

    Thank you to all our contributors for making this release possible! @AnatolyBuga, @alexander-beedie, @cannero, @chitralverma, @dannyvankooten, @johngunerli, @ozgrakkurt, @ritchie46, @stinodego, @winding-lines and @zundertj

    Source code(tar.gz)
    Source code(zip)
  • rs-0.26.0(Dec 22, 2022)

    ⚠️ Breaking changes

    • remove Series::append_array (#5681)
    • iso weekday (#5598)

    🚀 Performance improvements

    • improve reducing window function performance ~33% (#5878)
    • impove performance reducing window functions with numeric output ~-14% (#5841)
    • set_sorted flag when creating from literal (#5728)
    • use sorted fast path in streaming groupby (#5727)
    • ensure fast_explode propagates (#5676)
    • fix quadratic time complexity of groupby in stream… (#5614)
    • Aggregate projection pushdown (#5556)
    • improve streaming primitve groupby (#5575)
    • vectorize integer vec-hash by using very simple, … (#5572)
    • specialized utf8 groupby in streaming (#5535)

    ✨ Enhancements

    • make get_any_value fallible (#5877)
    • directly push all operator result into sink, prev… (#5856)
    • add sink_parquet (#5480)
    • Support parsing more float string representations. (#5824)
    • implement mean aggregation for duration (#5807)
    • implement sensible boolean aggregates (#5806)
    • allow expression as quantile input (#5751)
    • accept expression in str.extract_all (#5742)
    • tz-aware strptime (#5736)
    • Add "fmt_no_tty" feature for formatting support without r… (#5725)
    • lazy diagonal concat. (#5647)
    • to_struct add upper_bound (#5714)
    • inversely scale chunk_size with thread count in s… (#5699)
    • add streaming minmax (#5693)
    • improve dynamic inference of anyvalues and structs (#5690)
    • support is_in for boolean dtype (#5682)
    • add a cache to strptime (#5628)
    • add nearest interpolation strategy (#5626)
    • make cast recursive (#5596)
    • add arg_min/arg_max for series of dtype boolean (#5592)
    • prefer streaming groupby if partitionable (#5580)
    • make map_alias fallible (#5532)
    • pl.min & pl.max accept wildcard similar to pl.sum (#5511)
    • add predicate pushdown to anonymous_scan (#5467)
    • make streaming work with multiple sinks in a sing… (#5474)
    • add streaming slice operation (#5466)
    • run partial streaming queries (#5464)
    • streaming left joins (#5456)
    • file statistics so we only (try to) keep smallest table in memory (#5454)
    • streaming inner joins. (#5400)
    • build_info() provides detailed information how polars was built (#5423)
    • add missing width property to LazyFrame (#5431)
    • allow regex and wildcard in groupby (#5425)
    • Streaming joins architecture and Cross join implementation. (#5339)
    • add support for am/pm notation in parse_dates read_csv (#5373)
    • add reduce/cumreduce expression as an easier fold (#5364)

    🐞 Bug fixes

    • fix lazy swapping rename (#5884)
    • improve equality consistency between types (#5873)
    • evaluate whole branch expression to determine if r… (#5864)
    • fix top_k on empty (#5865)
    • fix slice in streaming (#5854)
    • correct invalid type in struct anyvalue access (#5844)
    • don't set fast_explode if null values in list (#5838)
    • duration formatting (#5837)
    • respect fetch in union (#5836)
    • keep f32 dtype in fill_null by int (#5834)
    • err on epoch on time dtype (#5831)
    • fix panic in hmean (#5808)
    • asof join by logical groups (#5805)
    • fix parquet regression upstream in arrow2 (#5797)
    • Fix lazy cumsum and cumprod result types (#5792)
    • fix nested writer (#5777)
    • fix(rust, python) Summation on empty series evaluates to Some(0) (#5773)
    • empty concat utf8 (#5768)
    • projection pushdown with union and asof join (#5763)
    • check null values in asof_join + groupby (#5756)
    • fix generic streaming groupby on logical types (#5752)
    • fix date_range on expressions (#5750)
    • fix dtypes in join_asof_by (#5746)
    • fix group order in binary aggregation (#5744)
    • implement min/max aggregation for utf8 in groupby (#5737)
    • fix all_null/sorted into_groups panic (#5733)
    • asof join 'by', 'forward' combination (#5720)
    • fix pivot on floating point indexes (#5704)
    • fix arange with column/literal input (#5703)
    • fix double projection that leads to uneven union d… (#5700)
    • Fix a bug in floating regex handling used in CSV type inference (#5695)
    • fix asof join schema (#5686)
    • fix owned arithmetic schema (#5685)
    • take glob into account in scan_csv 'with_schema_mo… (#5683)
    • fix boolean schema in agg_max/min (#5678)
    • fix boolean arg-max if all equal (#5680)
    • early error on duplicate names in streaming groupby (#5638)
    • fix streaming groupby aggregate types (#5636)
    • convert panic to err in concat_list (#5637)
    • fix dot diagram of single nodes (#5624)
    • fix dynamic struct inference (#5619)
    • keep dtype when eval on empty list (#5597)
    • fix ternary with list output on empty frame (#5595)
    • fix tz-awareness of truncate (#5591)
    • check chunks before doing chunked_id join optimiza… (#5589)
    • invert cast_time_zone conversion (#5587)
    • asof join ensure join column is not dropped when '… (#5585)
    • fix ub due to invalid dtype on splitting dfs (#5579)
    • fix(rust, python); fix projection pushdown in asof joins (#5542)
    • streaming hstack allow duplicates (#5538)
    • fix streaming empty join panic (#5534)
    • fix duplicate caches in cse and prevent quadratic … (#5528)
    • allow appending categoricals that are all null (#5526)
    • tz-aware strftime (#5525)
    • make 'truncate' tz-aware (#5522)
    • fix coalesce expreession expansion (#5521)
    • fix nested aggregatin in when then and window expr… (#5520)
    • fix sort_by expression if groups already aggregated (#5518)
    • fix bug in batched parquet reader that dropped dfs… (#5506)
    • fix bugs in skew and kurtosis (#5484)
    • compute correct offset for streaming join on multi… (#5479)
    • return error on invalid sortby expression (#5478)
    • add missing AnyValueBuffer specialisation for Duration dtype (#5436)
    • fix freeze/stall when writing more than 2^31 string values to parquet (#5366)
    • properly handle json with unclosed strings (#5427)
    • fix null poisoning in rank operation (#5417)
    • correct expr::diff dtype for temporal columns (#5416)
    • fix cse for nested caches (#5412)
    • don't set sorted flag in argsort (#5410)
    • explicit nan comparison in min/max agg (#5403)
    • Correct CSV row indexing (#5385)

    🛠️ Other improvements

    • Update rustc and fix clippy (#5880)
    • update arrow (#5862)
    • move join dispatch to polars-ops (#5809)
    • Remove dbg statement from union (#5791)
    • Continue removing compilation warnings (#5778)
    • shrink anyvalue size (#5770)
    • update arrow (#5766)
    • chore(rust,python) Change allow_streaming to streaming (#5747)
    • remove rev-map from ChunkedArray (#5721)
    • simplify fast projection by schema (#5716)
    • Reindent df! docs code (#5698)
    • remove Series::append_array (#5681)
    • Remove unused symbols and uneeded mut qualifier (#5672)
    • Include license files in Rust crates (#5675)
    • Use NaiveTime::from_hms_opt instead of NaiveTime::from_hms (#5664)
    • use xxhash3 for string types (#5617)
    • iso weekday (#5598)
    • Improve contributing guide (#5558)
    • streaming improvements (#5541)
    • Refer to DataFrame::unique instead of distinct (#5482)
    • don't panic if part of query cannot run strea… (#5458)
    • make generic join builder more dry (#5439)
    • use IdHash for streaming groupby generic (#5435)
    • fix freeze/stall when writing more than 2^31 string values to parquet (#5366)

    Thank you to all our contributors for making this release possible! @AnatolyBuga, @CalOmnie, @Kuhlwein, @MarcoGorelli, @OneRaynyDay, @YuRiTan, @alexander-beedie, @andrewpollack, @ankane, @braaannigan, @chitralverma, @dannyvankooten, @ghais, @ghuls, @jjerphan, @matteosantama, @messense, @owrior, @pickfire, @ritchie46, @s1ck, @sa-, @slonik-az, @sorhawell, @stinodego, @universalmind303 and @zundertj

    Source code(tar.gz)
    Source code(zip)
  • py-0.15.7(Dec 19, 2022)

    🚀 Performance improvements

    • impove performance reducing window functions with numeric output ~-14% (#5841)

    ✨ Enhancements

    • allow more pyarrow literals (#5842)
    • add sink_parquet (#5480)
    • release GIL when writing (#5830)
    • Support parsing more float string representations. (#5824)
    • implement mean aggregation for duration (#5807)
    • implement sensible boolean aggregates (#5806)

    🐞 Bug fixes

    • correct invalid type in struct anyvalue access (#5844)
    • don't set fast_explode if null values in list (#5838)
    • duration formatting (#5837)
    • respect fetch in union (#5836)
    • keep f32 dtype in fill_null by int (#5834)
    • fix(python): fix delta issues (#5802)
    • err on epoch on time dtype (#5831)
    • fix panic in hmean (#5808)
    • asof join by logical groups (#5805)

    🛠️ Other improvements

    • lazily import connectorx (#5835)

    Thank you to all our contributors for making this release possible! @chitralverma, @ghuls and @ritchie46

    Source code(tar.gz)
    Source code(zip)
  • py-0.15.6(Dec 14, 2022)

    🐞 Bug fixes

    • fix struct dataset (#5798)
    • fix parquet regression upstream in arrow2 (#5797)

    🛠️ Other improvements

    • remove unused cmake-rs patch (#5794)

    Thank you to all our contributors for making this release possible! @OneRaynyDay, @messense, @ritchie46 and @universalmind303

    Source code(tar.gz)
    Source code(zip)
  • py-0.15.3(Dec 12, 2022)

    🚀 Performance improvements

    • set_sorted flag when creating from literal (#5728)
    • use sorted fast path in streaming groupby (#5727)

    ✨ Enhancements

    • push down predicates to pyarrow datasets (#5780)
    • Support for reading delta lake tables (#5761)
    • Add DataFrame.glimpse() (#5622)
    • allow expression as quantile input (#5751)
    • accept expression in str.extract_all (#5742)
    • tz-aware strptime (#5736)
    • lazy diagonal concat. (#5647)
    • to_struct add upper_bound (#5714)

    🐞 Bug fixes

    • fix(rust, python) Summation on empty series evaluates to Some(0) (#5773)
    • empty concat utf8 (#5768)
    • projection pushdown with union and asof join (#5763)
    • check null values in asof_join + groupby (#5756)
    • fix generic streaming groupby on logical types (#5752)
    • fix date_range on expressions (#5750)
    • fix dtypes in join_asof_by (#5746)
    • fix group order in binary aggregation (#5744)
    • implement min/max aggregation for utf8 in groupby (#5737)
    • fix all_null/sorted into_groups panic (#5733)
    • address several edge-cases found when asserting NaN equality (#5732)
    • asof join 'by', 'forward' combination (#5720)

    🛠️ Other improvements

    • add DataFrame.pearson_corr to reference (#5772)
    • Parse fixed timezone offsets without pytz (#5769)
    • chore(rust,python) Change allow_streaming to streaming (#5747)
    • Remove pyarrow nightlies requirement. (#5719)
    • fix incorrect accepted type in df.write_csv (#5715)

    Thank you to all our contributors for making this release possible! @AnatolyBuga, @MarcoGorelli, @alexander-beedie, @andrewpollack, @braaannigan, @chitralverma, @ghuls, @ritchie46, @sa- and @zundertj

    Source code(tar.gz)
    Source code(zip)
  • py-0.15.2(Dec 2, 2022)

    🚀 Performance improvements

    • ensure fast_explode propagates (#5676)

    ✨ Enhancements

    • Series.get_chunks (#5701)
    • inversely scale chunk_size with thread count in s… (#5699)
    • add streaming minmax (#5693)
    • Support large page sizes on aarch64 linux builds (#5694)
    • improve dynamic inference of anyvalues and structs (#5690)
    • support is_in for boolean dtype (#5682)
    • add notebook html repr for Series (#5653)

    🐞 Bug fixes

    • fix pivot on floating point indexes (#5704)
    • fix arange with column/literal input (#5703)
    • fix double projection that leads to uneven union d… (#5700)
    • Fix Series -> Expr dispatch for @property methods (#5689)
    • fix asof join schema (#5686)
    • fix owned arithmetic schema (#5685)
    • take glob into account in scan_csv 'with_schema_mo… (#5683)
    • fix boolean schema in agg_max/min (#5678)
    • fix boolean arg-max if all equal (#5680)
    • respect python objects read method even if filename is f… (#5677)
    • Fix DataFrame.n_chunks return type (#5650)

    🛠️ Other improvements

    • Parametrize test_parquet_datetime (#5696)
    • Function and lazy function doctrings (#5657)
    • Fix formatting (#5658)

    Thank you to all our contributors for making this release possible! @alexander-beedie, @ankane, @braaannigan, @ghais, @ghuls, @jjerphan, @pickfire, @ritchie46, @stinodego and @zundertj

    Source code(tar.gz)
    Source code(zip)
  • py-0.15.1(Nov 26, 2022)

    ⚠️ Breaking changes

    • Update Expr.sample signature and change random seeding (#4648)
    • rollup breaking changes (#5602)
    • iso weekday (#5598)
    • Change null_equal default to True for Series.series_equal (#5051)
    • rollup breaking changes (#5602)

    🚀 Performance improvements

    • fix quadratic time complexity of groupby in stream… (#5614)
    • Improve performance of indexing operations on Series. (#5610)
    • Aggregate projection pushdown (#5556)

    ✨ Enhancements

    • add a cache to strptime (#5628)
    • add nearest interpolation strategy (#5626)
    • Update Expr.sample signature and change random seeding (#4648)
    • Change null_equal default to True for Series.series_equal (#5051)
    • make cast recursive (#5596)
    • add arg_min/arg_max for series of dtype boolean (#5592)

    🐞 Bug fixes

    • early error on duplicate names in streaming groupby (#5638)
    • fix streaming groupby aggregate types (#5636)
    • convert panic to err in concat_list (#5637)
    • fix dot diagram of single nodes (#5624)
    • fix dynamic struct inference (#5619)
    • tz-aware filtering (#5603)
    • keep dtype when eval on empty list (#5597)
    • fix ternary with list output on empty frame (#5595)
    • fix tz-awareness of truncate (#5591)
    • check chunks before doing chunked_id join optimiza… (#5589)
    • invert cast_time_zone conversion (#5587)
    • asof join ensure join column is not dropped when '… (#5585)

    🛠️ Other improvements

    • Remaining docstring examples for frame and lazyframe (#5630)
    • use xxhash3 for string types (#5617)
    • only trigger build.rs file if that file itself has cha… (#5618)
    • iso weekday (#5598)
    • Merge release workflows (#5564)
    • Fix broken lint workflow (#5584)

    Thank you to all our contributors for making this release possible! @Kuhlwein, @braaannigan, @ghuls, @matteosantama, @ritchie46 and @stinodego

    Source code(tar.gz)
    Source code(zip)
  • py-0.14.31(Nov 22, 2022)

    🚀 Performance improvements

    • improve streaming primitve groupby (#5575)
    • vectorize integer vec-hash by using very simple, … (#5572)

    ✨ Enhancements

    • prefer streaming groupby if partitionable (#5580)

    🐞 Bug fixes

    • fix ub due to invalid dtype on splitting dfs (#5579)

    🛠️ Other improvements

    • Remove old Python changelog file (#5577)
    • namespace registration docs update (#5565)
    • Improve contributing guide (#5558)

    Thank you to all our contributors for making this release possible! @alexander-beedie, @ghuls, @ritchie46 and @stinodego

    Source code(tar.gz)
    Source code(zip)
  • py-0.14.29(Nov 19, 2022)

    🚀 Performance improvements

    • specialized utf8 groupby in streaming (#5535)

    ✨ Enhancements

    • add dataframe.pearson_corr (#5533)
    • support namespace registration (#5531)
    • make map_alias fallible (#5532)
    • pl.min & pl.max accept wildcard similar to pl.sum (#5511)
    • additional support for using timedelta with duration-type arguments (#5487)

    🐞 Bug fixes

    • fix(rust, python); fix projection pushdown in asof joins (#5542)
    • streaming hstack allow duplicates (#5538)
    • fix streaming empty join panic (#5534)
    • fix duplicate caches in cse and prevent quadratic … (#5528)
    • allow appending categoricals that are all null (#5526)
    • tz-aware strftime (#5525)
    • make 'truncate' tz-aware (#5522)
    • fix coalesce expreession expansion (#5521)
    • fix nested aggregatin in when then and window expr… (#5520)
    • fix sort_by expression if groups already aggregated (#5518)
    • fix bug in batched parquet reader that dropped dfs… (#5506)
    • preserve Series name when exporting to pandas (#5498)
    • Refactor is_between (#5491)
    • fix bugs in skew and kurtosis (#5484)

    🛠️ Other improvements

    • support tabbed panels in sphinx, add namespace docs (#5540)
    • Update dev dependencies (#5517)

    Thank you to all our contributors for making this release possible! @alexander-beedie, @braaannigan, @ghuls, @ritchie46, @sorhawell and @zundertj

    Source code(tar.gz)
    Source code(zip)
  • py-0.14.27(Nov 11, 2022)

    ✨ Enhancements

    • additional autocomplete affordances for IPython users (#5477)
    • make streaming work with multiple sinks in a sing… (#5474)
    • add streaming slice operation (#5466)
    • run partial streaming queries (#5464)
    • streaming left joins (#5456)
    • file statistics so we only (try to) keep smallest table in memory (#5454)
    • streaming inner joins. (#5400)

    🐞 Bug fixes

    • compute correct offset for streaming join on multi… (#5479)
    • return error on invalid sortby expression (#5478)
    • use json for expr pickle (#5476)
    • improved namespace/accessor behaviour (resolves VSCode autocomplete issue) (#5469)
    • further improved lazy loading (#5459)
    • fix for categorical inserts from row-oriented data (#5462)
    • use of fill_null with temporal literals (#5440)

    🛠️ Other improvements

    • don't panic if part of query cannot run strea… (#5458)
    • add build_info() to the API doc (#5442)
    • Improved structure for DataFrame and LazyFrame API docs, misc design improvements (#5433)

    Thank you to all our contributors for making this release possible! @alexander-beedie, @dannyvankooten, @ritchie46, @s1ck, @slonik-az, @stinodego and @universalmind303

    Source code(tar.gz)
    Source code(zip)
  • py-0.14.26(Nov 6, 2022)

    ✨ Enhancements

    • build_info() provides detailed information how polars was built (#5423)
    • add missing width property to LazyFrame (#5431)
    • enhanced Series.dot method and related interop (#5428)
    • allow regex and wildcard in groupby (#5425)
    • support DataFrame init from generators (#5424)
    • support Series init from generator (#5411)

    🐞 Bug fixes

    • fix freeze/stall when writing more than 2^31 string values to parquet (#5366)
    • properly handle json with unclosed strings (#5427)
    • fix null poisoning in rank operation (#5417)
    • correct expr::diff dtype for temporal columns (#5416)
    • fix cse for nested caches (#5412)
    • don't set sorted flag in argsort (#5410)

    🛠️ Other improvements

    • Fix dependencies on memory allocator (#5426)
    • Better docstring for keep_name (#5378) (#5421)

    Thank you to all our contributors for making this release possible! @CalOmnie, @alexander-beedie, @ghuls, @ritchie46, @slonik-az, @stinodego and @universalmind303

    Source code(tar.gz)
    Source code(zip)
  • py-0.14.25(Nov 2, 2022)

    ✨ Enhancements

    • 30x speedup initialising Series from python range object (#5397)
    • r-associative support for commutative DataFrame operators (#5394)
    • pl.from_epoch function (#5330)
    • Streaming joins architecture and Cross join implementation. (#5339)
    • enable frame init from sequence of pandas series, and improve lazy typechecks (handle subclasses) (#5383)
    • add support for am/pm notation in parse_dates read_csv (#5373)
    • add reduce/cumreduce expression as an easier fold (#5364)

    🐞 Bug fixes

    • explicit nan comparison in min/max agg (#5403)
    • lazy proxy module does not require global registration (#5390)
    • Correct CSV row indexing (#5385)

    🛠️ Other improvements

    • Docstrings for frame, lazyframe and time series (#5398)
    • add integrated support for copying API examples, and auto-parallelise docs build (#5393)
    • improve rendering of API docs type signatures, mark PivotOps as deprecated, misc tidy-ups (#5388)
    • Expression docstrings (#5377)
    • minor navbar improvements; adds discord and twitter links, fixes github icon (#5379)
    • improve structure of sphinx-generated API docs (#5376)
    • Add with_time_zone to reference guide (#5369)

    Thank you to all our contributors for making this release possible! @YuRiTan, @alexander-beedie, @braaannigan, @owrior, @ritchie46 and @zundertj

    Source code(tar.gz)
    Source code(zip)
  • rs-0.25.0(Oct 28, 2022)

    Most notable mention this release is the start of Out Of Core support in polars, meaning we are able to process larger than RAM datasets. This is currently supported for parts of queries that read from csv or parquet and are limited to select, filter, and groupby operations. Many more operations will follow in next releases.

    See https://github.com/pola-rs/polars/pull/5139#issuecomment-1274687634 where we were able to process a 80GB dataset on a laptop with only 16GB RAM.

    Thanks to everyone who contributed to another release! :raised_hands:

    ⚠️ Breaking changes

    • rename expand_at_index -> new_from_index (#5259)

    🚀 Performance improvements

    • lower contention in out of core filter (#5311)
    • improve pivot performance by using faster series… (#5172)
    • improve streaming performance (~15%) (#5170)
    • don't block projection pushdown on unnest (#5123)
    • more conservative JIT sort settings (#5080)
    • sort and unsort join key if other side is sorted (#5069)
    • do not rechunk left joins (#5066)
    • Prune unneeded projections (#5032)
    • Improve predicate pushdown + with_columns (#5029)
    • Don't execute unused with_column expressions (#5026)

    ✨ Enhancements

    • shrink_type expression (#5351)
    • tz_localize expression (#5340)
    • accept expr in arr.get (#5337)
    • Implement forward strategy in groupby join_asof (#5335)
    • improve dynamic inference of struct types (#5297)
    • Add newline to Aggregate..FROM describe_optimization_plan (#5253)
    • date_range expression (#5267)
    • show expression where error originated if raised … (#5263)
    • improve error msg if window expressions length do… (#5262)
    • Add round for date and datetime (#5153)
    • new n_chars functionality for utf8 strings (#5252)
    • added new Config formatting option set_tbl_column_data_type_inline, fixed reading of env vars, improved interaction between formatting options (#5243)
    • make date_range timezone aware (#5234)
    • Rust functions for typed JsonPath implementation (#5140)
    • allow polars Config options to be serialised/shared, and more easily unset (#5219)
    • batched csv reader (#5212)
    • accept expressions in arr.slice (#5191)
    • is_sorted aggregation fast path for Utf8Chunked (#5184)
    • hybrid streaming query engine (#5139)
    • add binary dtype (#5122)
    • improve function expansion (#5110)
    • add struct arithmetics (#5107)
    • add cumfold/cumsum expression (#5103)
    • error on invalid asof join inputs (#5100)
    • small plan and profile chart improvements (#5067)
    • Initial implementation of histogram algorithm (#4752)

    🐞 Bug fixes

    • unnest only pushdown column if there are projections (#5360)
    • block is_null predicate in asof join (#5358)
    • ensure that no-projection is seen as select all in… (#5356)
    • resolve duplicated column names in pivot (#5349)
    • fix serde of expression (pickle) (#5333)
    • don't set auto-explode in apply_multiple (#5265)
    • export anonymousscan in lazy prelude (#5295)
    • fix explicit list + sort aggregation in groupby co… (#5317)
    • fix sort-merge dispatch of utf8 (#5315)
    • properly interpret FMT_MAX_ROWS - remove arbitrary minimum, fix Series formatting (#5281)
    • don't block non matching groups in binary expression (#5273)
    • fix logical type of nested take (#5271)
    • tag IntoSeries trait as unsafe (#5258)
    • include single null value in global cat builder (#5254)
    • include slice in sort fast path (#5247)
    • determine supertype of datetimes with timezones an… (#5240)
    • fix groupby dynamic truncate for > days resolution (#5235)
    • set timezone on groupby_dynamic boundaries (#5233)
    • fix incorrect duration dtype (#5226)
    • set string cache if lazy schema contains categorical (#5225)
    • fix pipeline dtypes (#5224)
    • fix asof_join schema (#5213)
    • fix single thread loop if schema lenght is off by 1 (#5210)
    • improve numeric stability of rolling_variance (#5207)
    • fix overflow in partitioned groupby mean of int32/… (#5204)
    • don't allow categorical append that is not under s… (#5195)
    • include offset in arr.get (#5193)
    • fix rolling_float in case closure returns None (#5180)
    • Implement missing extract conversion for Time datatype (#5161)
    • implement missing conversion to python time object (#5152)
    • microsecond noise on date >> time cast (add 00:00:00 fast-path) (#5149)
    • wrong operator mapped for LtEq (#5120)
    • unique include null (#5112)
    • don't recurse assign uniuns as it SO > 5k files (#5098)
    • block projection pushdown on unnest (#5093)
    • projection_node always do projection locally if no… (#5090)
    • fix iso_year for Date dtype (#5074)
    • fix bug in unneeded projection pruning (#5071)
    • Improve printing controls of DataFrame and Series (#5047)
    • Double projections should be checked on input schema (#5058)
    • Apply flat overlapping row groups when possible (#5039)
    • Ensure all predicates use same key function when inserting… (#5034)
    • Only consider dt series equal if they have the same tz (#5025)
    • Special-case ewm_mean(alpha=1) (#5019)
    • Time zone conversion bug (NY -> UTC works, UTC -> NY doesn't) (#5014)
    • Fix timezone cast (#5016)

    🛠️ Other improvements

    • update to rustc to nightly-2022-10-24 (#5312)
    • update ahash and add nightly features of hashbrown (#5310)
    • Update comfy-table and memchr. (#5276)
    • rename expand_at_index -> new_from_index (#5259)
    • ensure streaming groupby take slice into account (#5178)
    • move polars-sql under polars folder (#5176)
    • remove aggregate pushdown optimization (#5173)
    • relax sync requirement on Executor trait impls (#5142)
    • Get rid of unnecessary check in SplitLines iterator (#5141)
    • Constant instead of literal (#5088)
    • Use release-drafter to draft releases with changelogs (#5033)
    • Fix docs by activating docfg feature (#5028)
    • Split up polars-lazy crate. (#5020)

    Thank you to all our contributors for making this release possible! @AlecZorab, @YuRiTan, @alexander-beedie, @cjermain, @dannyvankooten, @dpatton-gr, @egorchakov, @ghuls, @hpux735, @matteosantama, @mcrumiller, @owrior, @ritchie46, @slonik-az, @sorhawell, @stinodego, @thatlittleboy, @universalmind303 and @zundertj

    Source code(tar.gz)
    Source code(zip)
  • py-0.14.24(Oct 28, 2022)

    ✨ Enhancements

    • shrink_type expression (#5351)
    • don't raise error but print a warning if mp fork method… (#5342)
    • tz_localize expression (#5340)
    • accept expr in arr.get (#5337)
    • Implement forward strategy in groupby join_asof (#5335)

    🐞 Bug fixes

    • unnest only pushdown column if there are projections (#5360)
    • block is_null predicate in asof join (#5358)
    • ensure that no-projection is seen as select all in… (#5356)
    • resolve duplicated column names in pivot (#5349)
    • remove unused branch in getitem (#5348)
    • nested dicts / list generation (#5336)
    • fix serde of expression (pickle) (#5333)
    • handle old-style module loaders such that we can still lazy load them (#5331)
    • explicit output type in apply (#5328)

    🛠️ Other improvements

    • remove multiprocessing check, and leave it to the user (#5347)
    • Update dev, lint and docs dependencies (#5338)
    • lazy module proxy (obviate attribute access guards for missing modules) (#5320)

    Thank you to all our contributors for making this release possible! @AlecZorab, @alexander-beedie, @ghuls and @ritchie46

    Source code(tar.gz)
    Source code(zip)
  • py-0.14.23(Oct 25, 2022)

    🐞 Bug fixes

    • fix explicit list + sort aggregation in groupby co… (#5317)
    • fix sort-merge dispatch of utf8 (#5315)
    • close multi-threading pool in df creation (#5309)
    • fix and check all uninstalled imports in ci (#5304)

    🛠️ Other improvements

    • Add "import polars.testing" to testing docstrings (#5316) (#5318)
    • streamline lazy imports (#5302)
    • Catch deprecation warnings in unit tests (#5306)
    • fix and check all uninstalled imports in ci (#5304)

    Thank you to all our contributors for making this release possible! @alexander-beedie, @ghuls, @ritchie46, @thatlittleboy, @universalmind303 and @zundertj

    Source code(tar.gz)
    Source code(zip)
  • py-0.14.22(Oct 22, 2022)

    🚀 Performance improvements

    • Make all expensive imports lazy - ~85% (#5287)
    • remove pandas imports (#5286)
    • never import hypothesis in user code (#5282)

    ✨ Enhancements

    • expose to_struct to series list namespace (#5298)
    • improve dynamic inference of struct types (#5297)
    • don't panic in failing apply (#5294)
    • improve error message in struct apply (#5291)
    • accept schema in read_dicts (#5290)
    • Do not import polars.testing by default (#5284)
    • Pass more options to pyarrow in write_parquet (#5278) (#5280)
    • date_range expression (#5267)
    • allow implicit None branch in when then otherwise (#5264)
    • show expression where error originated if raised … (#5263)
    • improve error msg if window expressions length do… (#5262)
    • pl.ones, pl.zeros and Series.new_from_index functions (#5260)
    • Add round for date and datetime (#5153)
    • new n_chars functionality for utf8 strings (#5252)
    • added new Config formatting option set_tbl_column_data_type_inline, fixed reading of env vars, improved interaction between formatting options (#5243)

    🐞 Bug fixes

    • throw error on invalid lazy concat strategy (#5292)
    • fix to_pandas edge case (#5293)
    • properly interpret FMT_MAX_ROWS - remove arbitrary minimum, fix Series formatting (#5281)
    • respect schema overwrite in from rows (#5275)
    • don't block non matching groups in binary expression (#5273)
    • fix logical type of nested take (#5271)
    • Check if BatchedCsvReader.next_batches() is None befor… (#5256)
    • include single null value in global cat builder (#5254)
    • Check multiprocessing start_method on import (#3144) (#5237)

    🛠️ Other improvements

    • Add ModuleType for import functions in import_check.py (#5289)

    Thank you to all our contributors for making this release possible! @alexander-beedie, @ghuls, @owrior and @ritchie46

    Source code(tar.gz)
    Source code(zip)
  • py-0.14.21(Oct 18, 2022)

    🐞 Bug fixes

    • include slice in sort fast path (#5247)
    • don't use zoneinfo globally (#5246)

    Thank you to all our contributors for making this release possible! @ritchie46

    Source code(tar.gz)
    Source code(zip)
  • py-0.14.20(Oct 18, 2022)

    ✨ Enhancements

    • make date_range timezone aware (#5234)
    • infer timezone and improve display (#5232)
    • allow Config to be used as a context manager, and update some docs (#5223)
    • allow polars Config options to be serialised/shared, and more easily unset (#5219)

    🐞 Bug fixes

    • determine supertype of datetimes with timezones an… (#5240)
    • fix groupby dynamic truncate for > days resolution (#5235)
    • ensure that polars_type_to_constructor works with tz-aware Datetime dtypes (#5239)
    • set timezone on groupby_dynamic boundaries (#5233)
    • accept tuple[bool, bool] instead of Sequence[bool] for Expr.is_between (#5094)
    • fix incorrect duration dtype (#5226)
    • set string cache if lazy schema contains categorical (#5225)
    • fix pipeline dtypes (#5224)

    🛠️ Other improvements

    • update lazyframe lazygroupby apply docstring (#5238)
    • Consistent naming for Python release workflow (#5229)

    Thank you to all our contributors for making this release possible! @YuRiTan, @alexander-beedie, @cjermain, @matteosantama, @ritchie46 and @stinodego

    Source code(tar.gz)
    Source code(zip)
  • py-0.14.19(Oct 15, 2022)

    🚀 Performance improvements

    • improve pivot performance by using faster series… (#5172)
    • improve streaming performance (~15%) (#5170)
    • don't block projection pushdown on unnest (#5123)

    ✨ Enhancements

    • batched csv reader (#5212)
    • accept expressions in arr.slice (#5191)
    • is_sorted aggregation fast path for Utf8Chunked (#5184)
    • support DataFrame init with Datetime dtypes that specify a timezone (#5174)
    • frame-level n_unique() that can count unique rows or col/expr subsets (#5165)
    • hybrid streaming query engine (#5139)
    • return Datetime/Duration with appropriate timeunit when inferring from pytype (#5127)
    • add binary dtype (#5122)

    🐞 Bug fixes

    • fix asof_join schema (#5213)
    • fix single thread loop if schema lenght is off by 1 (#5210)
    • improve numeric stability of rolling_variance (#5207)
    • fix apply function over object dtype (#5206)
    • fix overflow in partitioned groupby mean of int32/… (#5204)
    • don't allow categorical append that is not under s… (#5195)
    • include offset in arr.get (#5193)
    • DataFrame.fill_null include unsigned integers (#5192)
    • error on fill_nan on non float dtype (#5185)
    • infer missing columns in from_dicts (#5183)
    • fix rolling_float in case closure returns None (#5180)
    • Implement missing extract conversion for Time datatype (#5161)
    • implement missing conversion to python time object (#5152)
    • Rendering long docstring lines. (#5150)
    • add missing _NUMPY_AVAILABLE check in Series.__getitem__ (#5126)
    • wrong operator mapped for LtEq (#5120)

    🛠️ Other improvements

    • skip failing test until #5177 is resolved (#5205)
    • ensure streaming groupby take slice into account (#5178)
    • remove aggregate pushdown optimization (#5173)
    • Add support for ruff python linter. (#5151)
    • improve typing; many list types are better defined as Sequence (#5164)
    • Get rid of unnecessary check in SplitLines iterator (#5141)

    Thank you to all our contributors for making this release possible! @alexander-beedie, @dannyvankooten, @ghuls, @ritchie46 and @sorhawell

    Source code(tar.gz)
    Source code(zip)
  • py-0.14.18(Oct 5, 2022)

    🚀 Performance improvements

    • take advantage of sorted join for frame alignment (#5106)

    ✨ Enhancements

    • improve function expansion (#5110)
    • add struct arithmetics (#5107)
    • add cumfold/cumsum expression (#5103)
    • error on invalid asof join inputs (#5100)

    🐞 Bug fixes

    • unique include null (#5112)
    • don't recurse assign uniuns as it SO > 5k files (#5098)
    • block projection pushdown on unnest (#5093)
    • projection_node always do projection locally if no… (#5090)

    🛠️ Other improvements

    • deprecate name argument in drop (#5099)
    • improve py-polars/Makefile (#5089)

    Thank you to all our contributors for making this release possible! @alexander-beedie, @owrior, @ritchie46 and @slonik-az

    Source code(tar.gz)
    Source code(zip)
  • py-0.14.17(Oct 3, 2022)

    🚀 Performance improvements

    • more conservative JIT sort settings (#5080)

    Thank you to all our contributors for making this release possible! @mcrumiller, @ritchie46 and @zundertj

    Source code(tar.gz)
    Source code(zip)
  • py-0.14.16(Oct 3, 2022)

    🚀 Performance improvements

    • sort and unsort join key if other side is sorted (#5069)
    • do not rechunk left joins (#5066)

    ✨ Enhancements

    • deprecate boolean mask for Series indexing (#5075)
    • small plan and profile chart improvements (#5067)
    • add gantt chart plot to LazyFrame::profile (#5063)
    • Support Series init as struct from @dataclass and annotated NamedTuple (#5057)

    🐞 Bug fixes

    • fix iso_year for Date dtype (#5074)
    • tz-aware get_idx (#5072)
    • Fix empty method detection when PYTHONOPTIMIZE=2 (#5043)
    • fix bug in unneeded projection pruning (#5071)
    • remove overloads for from_arrow (#5065)
    • Improve printing controls of DataFrame and Series (#5047)
    • Double projections should be checked on input schema (#5058)
    • Add missing cse param to LazyFrame "profile" method (#5054)

    🛠️ Other improvements

    • Default to zstd parquet compression (#5060)
    • Refactor show_graph (#5059)
    • Use release-drafter to draft releases with changelogs (#5033)
    • Update Makefile (#5056)
    • Parametric test coverage for EWM functions (#5011)

    Thank you to all our contributors for making this release possible! @alexander-beedie, @egorchakov, @matteosantama, @ritchie46, @slonik-az, @stinodego and @zundertj

    Source code(tar.gz)
    Source code(zip)
  • py-polars-v0.14.15(Oct 1, 2022)

  • rs-0.24.3(Oct 1, 2022)

  • rust-polars-v0.24.0(Sep 18, 2022)

    New rust polars release! :rocket:

    This is the release of rust polars 0.24.0. This release comes with a lot of bug fixes, performance improvements and added functionality. The changes that stand out are larger than RAM memory mapping of IPC files and a new common-subplan-optimization that prunes duplicated sub-plan from the query plan and thereby potentially save a lot of duplicated work.

    See more

    Update to arrow2 0.14.0

    See the 0.14.0 release for all upstream improvements.

    New Contributors

    • @ydarma made their first contribution in https://github.com/pola-rs/polars/pull/4269
    • @gaoxinge made their first contribution in https://github.com/pola-rs/polars/pull/4300
    • @SimonSchneider made their first contribution in https://github.com/pola-rs/polars/pull/4436
    • @lorenzwalthert made their first contribution in https://github.com/pola-rs/polars/pull/4445
    • @neeldug made their first contribution in https://github.com/pola-rs/polars/pull/4384
    • @isaacthefallenapple made their first contribution in https://github.com/pola-rs/polars/pull/4522
    • @Chuxiaof made their first contribution in https://github.com/pola-rs/polars/pull/4524
    • @luk-f-a made their first contribution in https://github.com/pola-rs/polars/pull/4565
    • @OneRaynyDay made their first contribution in https://github.com/pola-rs/polars/pull/4621
    • @abalkin made their first contribution in https://github.com/pola-rs/polars/pull/4650
    • @tikkanz made their first contribution in https://github.com/pola-rs/polars/pull/4676
    • @hpux735 made their first contribution in https://github.com/pola-rs/polars/pull/4693
    • @huang12zheng made their first contribution in https://github.com/pola-rs/polars/pull/4823
    • @owrior made their first contribution in https://github.com/pola-rs/polars/pull/4840
    • @jly36963 made their first contribution in https://github.com/pola-rs/polars/pull/4886

    Full Changelog: https://github.com/pola-rs/polars/compare/rust-polars-v0.23.0...rust-polars-v0.24.0

    Source code(tar.gz)
    Source code(zip)
  • rust-polars-v0.23.0(Aug 4, 2022)

    What's Changed

    • respect ipc column ordering by @ritchie46 in https://github.com/pola-rs/polars/pull/3591
    • zfill expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3593
    • Patch release by @ritchie46 in https://github.com/pola-rs/polars/pull/3595
    • Fix TOML typos by @ryanrussell in https://github.com/pola-rs/polars/pull/3598
    • Anonymous scan lazyframe by @universalmind303 in https://github.com/pola-rs/polars/pull/3561
    • ljust and rjust expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/3603
    • cast string to categorical in 'is_in' by @ritchie46 in https://github.com/pola-rs/polars/pull/3606
    • python data type units by @ritchie46 in https://github.com/pola-rs/polars/pull/3609
    • unset sorted metadata on append by @ritchie46 in https://github.com/pola-rs/polars/pull/3610
    • feat(nodejs): scan json by @universalmind303 in https://github.com/pola-rs/polars/pull/3611
    • Expand regex function input by @ritchie46 in https://github.com/pola-rs/polars/pull/3613
    • node 0.5.3 release by @universalmind303 in https://github.com/pola-rs/polars/pull/3612
    • improve when then otherwise for lists by @ritchie46 in https://github.com/pola-rs/polars/pull/3614
    • python polars 0.13.44 by @ritchie46 in https://github.com/pola-rs/polars/pull/3615
    • Fix mode for multiple modes by @GregoryBL in https://github.com/pola-rs/polars/pull/3566
    • fix empty list edge case by @ritchie46 in https://github.com/pola-rs/polars/pull/3621
    • fix invalid concat dtype by @ritchie46 in https://github.com/pola-rs/polars/pull/3622
    • respect n_rows by @ritchie46 in https://github.com/pola-rs/polars/pull/3624
    • Python: scan_ipc/parquet can scan from fsspec sources e.g. s3. by @ritchie46 in https://github.com/pola-rs/polars/pull/3626
    • Fix Series init (as pl.Object dtype) from mixed-type input and extend test coverage by @alexander-beedie in https://github.com/pola-rs/polars/pull/3627
    • restrict parallel branches in lazy Union by @ritchie46 in https://github.com/pola-rs/polars/pull/3628
    • native exp expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3629
    • python dict parallel dataframe creation by @ritchie46 in https://github.com/pola-rs/polars/pull/3630
    • Enhanced column typedef/inference support for DataFrame init by @alexander-beedie in https://github.com/pola-rs/polars/pull/3633
    • fix row count file projection pushdown by @ritchie46 in https://github.com/pola-rs/polars/pull/3635
    • fix list concat by @ritchie46 in https://github.com/pola-rs/polars/pull/3636
    • rust publish makefile by @ritchie46 in https://github.com/pola-rs/polars/pull/3637
    • improve explode of empty lists by @ritchie46 in https://github.com/pola-rs/polars/pull/3638
    • Improve numpy ufunc support. fixes: #3228 by @ghuls in https://github.com/pola-rs/polars/pull/3583
    • Update various python build requirements. by @ghuls in https://github.com/pola-rs/polars/pull/3641
    • is_in for struct dtype by @ritchie46 in https://github.com/pola-rs/polars/pull/3639
    • Update black and change some code so is sees it as a call chain. by @ghuls in https://github.com/pola-rs/polars/pull/3645
    • concat list determine supertype by @ritchie46 in https://github.com/pola-rs/polars/pull/3649
    • update arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/3650
    • Parallel csv writer by @ritchie46 in https://github.com/pola-rs/polars/pull/3652
    • fix groups state in complex aggregation by @ritchie46 in https://github.com/pola-rs/polars/pull/3656
    • Rust Comment Readability Fixes by @ryanrussell in https://github.com/pola-rs/polars/pull/3662
    • Add Expr.reverse() Python API example by @cnpryer in https://github.com/pola-rs/polars/pull/3660
    • Added StringCache Python API example by @cnpryer in https://github.com/pola-rs/polars/pull/3659
    • improve dtype selection by @ritchie46 in https://github.com/pola-rs/polars/pull/3664
    • accept regex in filter by @ritchie46 in https://github.com/pola-rs/polars/pull/3666
    • python: improve html render by @ritchie46 in https://github.com/pola-rs/polars/pull/3667
    • Python: infer_schema_len arg to from_dicts by @ritchie46 in https://github.com/pola-rs/polars/pull/3669
    • Add LICENSE link to py-polars by @gyscos in https://github.com/pola-rs/polars/pull/3674
    • python: fix and test globbing by @ritchie46 in https://github.com/pola-rs/polars/pull/3675
    • python polars 0.13.45 by @ritchie46 in https://github.com/pola-rs/polars/pull/3676
    • Add useful example for pl.StringCache(). by @ghuls in https://github.com/pola-rs/polars/pull/3677
    • Fix StringCache docstring typo by @cnpryer in https://github.com/pola-rs/polars/pull/3678
    • Fix polars.Expr.apply() Python API docs text by @cnpryer in https://github.com/pola-rs/polars/pull/3661
    • Anonymous scan enhancements & cleanup by @universalmind303 in https://github.com/pola-rs/polars/pull/3657
    • add pyarrow install to quickstart setup by @ritchie46 in https://github.com/pola-rs/polars/pull/3682
    • fix oob in sorted groupby by @ritchie46 in https://github.com/pola-rs/polars/pull/3681
    • fix branch supertypes by @ritchie46 in https://github.com/pola-rs/polars/pull/3683
    • fix cargo.toml for docs.rs by @ritchie46 in https://github.com/pola-rs/polars/pull/3684
    • python polars 0.13.46 by @ritchie46 in https://github.com/pola-rs/polars/pull/3686
    • ndjson reader complex types support by @universalmind303 in https://github.com/pola-rs/polars/pull/3665
    • fix groupby aggregation on empty df by @ritchie46 in https://github.com/pola-rs/polars/pull/3688
    • Nodejs groupbyrolling by @universalmind303 in https://github.com/pola-rs/polars/pull/3670
    • Add pl.Expr.hash Python example by @cnpryer in https://github.com/pola-rs/polars/pull/3679
    • Adding 'line-height' at 95% to df _html.py print by @LVG77 in https://github.com/pola-rs/polars/pull/3691
    • unique counts for logical types by @ritchie46 in https://github.com/pola-rs/polars/pull/3694
    • Update arrow and prepare for mutable arithmetics by @ritchie46 in https://github.com/pola-rs/polars/pull/3695
    • Improve lit agg by @ritchie46 in https://github.com/pola-rs/polars/pull/3702
    • panic on invalid groupby rolling input by @ritchie46 in https://github.com/pola-rs/polars/pull/3703
    • docs: Readability improvements in py-polars by @ryanrussell in https://github.com/pola-rs/polars/pull/3700
    • docs: polars-lazy readability improvements by @ryanrussell in https://github.com/pola-rs/polars/pull/3701
    • Python: parallel concat df by @gunjunlee in https://github.com/pola-rs/polars/pull/3671
    • fix ipc column order by @ritchie46 in https://github.com/pola-rs/polars/pull/3706
    • nodejs release by @universalmind303 in https://github.com/pola-rs/polars/pull/3698
    • add coc by @ritchie46 in https://github.com/pola-rs/polars/pull/3712
    • inplace arithmetic by @ritchie46 in https://github.com/pola-rs/polars/pull/3709
    • format empty df by @ritchie46 in https://github.com/pola-rs/polars/pull/3719
    • Add typing overloads for DataFrame.hstack() by @adamgreg in https://github.com/pola-rs/polars/pull/3697
    • Add Series to DataFrame.with_columns() argument annotation by @adamgreg in https://github.com/pola-rs/polars/pull/3696
    • fix rolling groupby ordering with 'by' argument by @ritchie46 in https://github.com/pola-rs/polars/pull/3720
    • allow literal as aggregation by @ritchie46 in https://github.com/pola-rs/polars/pull/3722
    • Improve performance of categorical casting by @ritchie46 in https://github.com/pola-rs/polars/pull/3724
    • Add flag to allow str.contains to search for string literals (#3711) by @alexander-beedie in https://github.com/pola-rs/polars/pull/3718
    • fix join negative keys by @ritchie46 in https://github.com/pola-rs/polars/pull/3730
    • fix arr.get() offsets by @ritchie46 in https://github.com/pola-rs/polars/pull/3731
    • update arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/3732
    • fix from_pandas object null array by @ritchie46 in https://github.com/pola-rs/polars/pull/3733
    • python polars 0.13.47 by @ritchie46 in https://github.com/pola-rs/polars/pull/3734
    • Replace OOB slice indexing with spare_capacity_mut by @saethlin in https://github.com/pola-rs/polars/pull/3737
    • pow fast paths by @ritchie46 in https://github.com/pola-rs/polars/pull/3738
    • Simplify contains check that opts-in to contains_literal fast-path by @alexander-beedie in https://github.com/pola-rs/polars/pull/3736
    • fix aritmetic bug introduced in #3709 by @ritchie46 in https://github.com/pola-rs/polars/pull/3741
    • check nan in sort by single column by @ritchie46 in https://github.com/pola-rs/polars/pull/3742
    • python fix concat by @ritchie46 in https://github.com/pola-rs/polars/pull/3743
    • patch python polars 0.13.48 by @ritchie46 in https://github.com/pola-rs/polars/pull/3744
    • ternary literal predicates by @ritchie46 in https://github.com/pola-rs/polars/pull/3747
    • python polars 0.13.49 by @ritchie46 in https://github.com/pola-rs/polars/pull/3748
    • unset sorted on take by @ritchie46 in https://github.com/pola-rs/polars/pull/3756
    • reexport polars for extension libraries by @universalmind303 in https://github.com/pola-rs/polars/pull/3760
    • add global pl by @universalmind303 in https://github.com/pola-rs/polars/pull/3763
    • arg_where expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3757
    • update arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/3762
    • python lhs power and broadcast by @ritchie46 in https://github.com/pola-rs/polars/pull/3768
    • allow regex expansion in binary/ternary expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/3769
    • str.ends_with/ str.starts_with by @ritchie46 in https://github.com/pola-rs/polars/pull/3770
    • fix bug in agg projections and init tpch schema tests by @ritchie46 in https://github.com/pola-rs/polars/pull/3771
    • always include offset in groupby_dynamic by @ritchie46 in https://github.com/pola-rs/polars/pull/3779
    • Cache file reads (tpch 2/7) ~5% faster by @ritchie46 in https://github.com/pola-rs/polars/pull/3774
    • python fix arr.contains type by @ritchie46 in https://github.com/pola-rs/polars/pull/3782
    • improve predicate combination and schema state by @ritchie46 in https://github.com/pola-rs/polars/pull/3788
    • fix duration computation by @ritchie46 in https://github.com/pola-rs/polars/pull/3790
    • Update arrow2 to support IPC Stream Reading with projections by @joshuataylor in https://github.com/pola-rs/polars/pull/3793
    • Some API alignment (missing funcs) between DataFrame, LazyFrame, and Series by @alexander-beedie in https://github.com/pola-rs/polars/pull/3791
    • Docs: sort entries within subsections by @alexander-beedie in https://github.com/pola-rs/polars/pull/3794
    • csv don't skip delimiter in whitespace trimming by @ritchie46 in https://github.com/pola-rs/polars/pull/3796
    • don't copy the sorted flag on many operations by @ritchie46 in https://github.com/pola-rs/polars/pull/3795
    • csv don't skip trailing delimiters when infering schema. by @ghuls in https://github.com/pola-rs/polars/pull/3799
    • Allow date_range to produce date ranges as well as datetime by @alexander-beedie in https://github.com/pola-rs/polars/pull/3798
    • quarter expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3797
    • Update rustc to 2022-06-22 by @ritchie46 in https://github.com/pola-rs/polars/pull/3801
    • Fix Node installation instructions by @Smittyvb in https://github.com/pola-rs/polars/pull/3804
    • python polars 0.13.50 by @ritchie46 in https://github.com/pola-rs/polars/pull/3802
    • rolling groupby fix index column output order by @ritchie46 in https://github.com/pola-rs/polars/pull/3806
    • Add support for IPC Streaming Read/Write by @joshuataylor in https://github.com/pola-rs/polars/pull/3783
    • chore: chunked_array readability improvements by @ryanrussell in https://github.com/pola-rs/polars/pull/3810
    • Add serde feature to field to fix serde feature by @joshuataylor in https://github.com/pola-rs/polars/pull/3808
    • fix join asof on floats by @ritchie46 in https://github.com/pola-rs/polars/pull/3812
    • chore: /polars/polars-core/src/frame/ readability by @ryanrussell in https://github.com/pola-rs/polars/pull/3813
    • Fixing small typos in docs by @thatlittleboy in https://github.com/pola-rs/polars/pull/3811
    • fix join asof tolerance by @ritchie46 in https://github.com/pola-rs/polars/pull/3816
    • docs: use quotes in pip install instruction by @thatlittleboy in https://github.com/pola-rs/polars/pull/3820
    • Improve parquet reading performance ~35-40% by @ritchie46 in https://github.com/pola-rs/polars/pull/3821
    • from anyvalue for small integers by @ritchie46 in https://github.com/pola-rs/polars/pull/3826
    • add date offset by @ritchie46 in https://github.com/pola-rs/polars/pull/3827
    • fix sorted unique by @ritchie46 in https://github.com/pola-rs/polars/pull/3837
    • fix ternary groupby agg_list/not_aggregated combination by @ritchie46 in https://github.com/pola-rs/polars/pull/3835
    • don't parallelize upsample by @ritchie46 in https://github.com/pola-rs/polars/pull/3836
    • python fix time divide by zero by @ritchie46 in https://github.com/pola-rs/polars/pull/3838
    • Improve map/apply docstrings by @braaannigan in https://github.com/pola-rs/polars/pull/3750
    • don't cache in-expression window functions by @ritchie46 in https://github.com/pola-rs/polars/pull/3840
    • Hypothesis testing framework integrations for Polars by @alexander-beedie in https://github.com/pola-rs/polars/pull/3842
    • docs: Improve expr.string documentation by @thatlittleboy in https://github.com/pola-rs/polars/pull/3841
    • make hypothesis optional and don't fail if not installed by @ritchie46 in https://github.com/pola-rs/polars/pull/3849
    • update arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/3848
    • python: fix time conversion by @ritchie46 in https://github.com/pola-rs/polars/pull/3851
    • Make frame/series asserts more resilient against integer overflow by @alexander-beedie in https://github.com/pola-rs/polars/pull/3850
    • parquet: allow writing smaller row groups by @ritchie46 in https://github.com/pola-rs/polars/pull/3852
    • python polars 0.13.51 by @ritchie46 in https://github.com/pola-rs/polars/pull/3854
    • allow branching null with struct dtype by @ritchie46 in https://github.com/pola-rs/polars/pull/3856
    • Address distinction between DataType and DataType() by @alexander-beedie in https://github.com/pola-rs/polars/pull/3857
    • Deprecate df/ldf argument to .join by @thomasaarholt in https://github.com/pola-rs/polars/pull/3855
    • null_probability functionality for dataframes/series test strategies. by @alexander-beedie in https://github.com/pola-rs/polars/pull/3860
    • Modern style type hints by @stinodego in https://github.com/pola-rs/polars/pull/3863
    • Concise empty class syntax by @stinodego in https://github.com/pola-rs/polars/pull/3864
    • fix groups after take expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3881
    • fix predicate pushdown in union + count expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3882
    • add join/union branch in window cache keys by @ritchie46 in https://github.com/pola-rs/polars/pull/3884
    • Fast/cheap empty clone ops by @alexander-beedie in https://github.com/pola-rs/polars/pull/3883
    • parquet read: fix remaining_rows counter by @ritchie46 in https://github.com/pola-rs/polars/pull/3887
    • Parquet writing: reduce heap allocs by @ritchie46 in https://github.com/pola-rs/polars/pull/3879
    • Negative-indexing support for additional functions, and frame-level take_every by @alexander-beedie in https://github.com/pola-rs/polars/pull/3888
    • Make numpy an optional requirement by @stinodego in https://github.com/pola-rs/polars/pull/3861
    • Address deprecation warnings while running pytest by @stinodego in https://github.com/pola-rs/polars/pull/3889
    • Fix reading of gzipped CSV files. Fixes: #3895 by @ghuls in https://github.com/pola-rs/polars/pull/3896
    • Relocate hypothesis unit tests to parallel tests_parametric dir by @alexander-beedie in https://github.com/pola-rs/polars/pull/3899
    • Assign dtypes to expected columns when dtypes is a list and column se… by @ghuls in https://github.com/pola-rs/polars/pull/3901
    • docs: fix link to series method in DataFrame by @duskmoon314 in https://github.com/pola-rs/polars/pull/3897
    • docs: Improve py-polars docs by @thatlittleboy in https://github.com/pola-rs/polars/pull/3873
    • Complete pythonic slice support (inc. negative indexing/stride) for DataFrame and Series by @alexander-beedie in https://github.com/pola-rs/polars/pull/3904
    • Update docstring outputs by @ghuls in https://github.com/pola-rs/polars/pull/3912
    • Make embedded CSV test strings easier to read. by @ghuls in https://github.com/pola-rs/polars/pull/3907
    • Quiet an unnecessary warning (tests), and minor optimisation for slices with negative stride by @alexander-beedie in https://github.com/pola-rs/polars/pull/3913
    • fix dataframe explode with empty lists by @ritchie46 in https://github.com/pola-rs/polars/pull/3916
    • Implement pow/rpow for Series by @stinodego in https://github.com/pola-rs/polars/pull/3908
    • Fix Series __setitem__ and take by @stinodego in https://github.com/pola-rs/polars/pull/3910
    • fix negative offset in groupby_rolling by @ritchie46 in https://github.com/pola-rs/polars/pull/3918
    • make string formatting configurable by @ritchie46 in https://github.com/pola-rs/polars/pull/3919
    • Expr docstrings by @braaannigan in https://github.com/pola-rs/polars/pull/3871
    • parquet: parallelize over row groups ~3x by @ritchie46 in https://github.com/pola-rs/polars/pull/3924
    • Don't unwrap IPC Stream, instead use ? to not panic by @joshuataylor in https://github.com/pola-rs/polars/pull/3927
    • Corrected .select type hint to Sequence[str, Expr] by @thomasaarholt in https://github.com/pola-rs/polars/pull/3931
    • add impl from anyvalue for literal by @savente93 in https://github.com/pola-rs/polars/pull/3921
    • update arrow: ipc limit and reduce categorical-> dictionary bound checks by @ritchie46 in https://github.com/pola-rs/polars/pull/3926
    • fix window expression case by @ritchie46 in https://github.com/pola-rs/polars/pull/3937
    • fix oob panic on expand_at_index and series from pyarrow chunkedarray by @ritchie46 in https://github.com/pola-rs/polars/pull/3938
    • block equality/ordering based predicates on null producing joins by @ritchie46 in https://github.com/pola-rs/polars/pull/3939
    • Extended with_columns to allow **kwargs style named expressions by @alexander-beedie in https://github.com/pola-rs/polars/pull/3917
    • upcast float16 to float32 by @ritchie46 in https://github.com/pola-rs/polars/pull/3940
    • python: fix already mutable borrowed append by @ritchie46 in https://github.com/pola-rs/polars/pull/3943
    • Fixed assert_frame_equal and assert_series_equal for NaN values by @alexander-beedie in https://github.com/pola-rs/polars/pull/3941
    • Add from_numpy constructor by @stinodego in https://github.com/pola-rs/polars/pull/3944
    • Fix Pandas date_range warnings in tests by @zundertj in https://github.com/pola-rs/polars/pull/3945
    • fix ipc ordering by @ritchie46 in https://github.com/pola-rs/polars/pull/3947
    • Remove "import polars as pl" from docstrings by @zundertj in https://github.com/pola-rs/polars/pull/3948
    • [docs] improve python polars documentation by @thatlittleboy in https://github.com/pola-rs/polars/pull/3954
    • Modern style type hints for the test suite by @stinodego in https://github.com/pola-rs/polars/pull/3949
    • Fixed most See Also docstring formatting, quietened the last warnings coming from doctests by @alexander-beedie in https://github.com/pola-rs/polars/pull/3932
    • python: loossen truncate sorted restriction in docstring by @ritchie46 in https://github.com/pola-rs/polars/pull/3956
    • groupby apply: use inner type to infer dtype by @ritchie46 in https://github.com/pola-rs/polars/pull/3955
    • python polars 0.13.52 by @ritchie46 in https://github.com/pola-rs/polars/pull/3957
    • Fix pytest warning by @stinodego in https://github.com/pola-rs/polars/pull/3962
    • Update README.md by @cxtruong70 in https://github.com/pola-rs/polars/pull/3959
    • implicit datelike string comparison warning by @ritchie46 in https://github.com/pola-rs/polars/pull/3967
    • fix count union predicate by @ritchie46 in https://github.com/pola-rs/polars/pull/3969
    • docs: conventions, mwe and docstring fixes by @thatlittleboy in https://github.com/pola-rs/polars/pull/3973
    • Pythonic slice support for LazyFrame (efficient computation paths only) by @alexander-beedie in https://github.com/pola-rs/polars/pull/3970
    • add from_numpy to docs by @thatlittleboy in https://github.com/pola-rs/polars/pull/3976
    • use bitflags crate by @ritchie46 in https://github.com/pola-rs/polars/pull/3978
    • fix accidentally slow cross join by @ritchie46 in https://github.com/pola-rs/polars/pull/3980
    • ensure main lazyframe gets file cache opt state by @ritchie46 in https://github.com/pola-rs/polars/pull/3981
    • chore(tests): small readability fixes by @ryanrussell in https://github.com/pola-rs/polars/pull/3989
    • Remove unnessary imports by @zundertj in https://github.com/pola-rs/polars/pull/3988
    • Add support for loading a collection of parquet files by @andrei-ionescu in https://github.com/pola-rs/polars/pull/3894
    • improve from dictionary -> categorical by @ritchie46 in https://github.com/pola-rs/polars/pull/3996
    • fix col aggregation schema and ternary on empty series by @ritchie46 in https://github.com/pola-rs/polars/pull/3995
    • release memory on 0% selectivity by @ritchie46 in https://github.com/pola-rs/polars/pull/4000
    • col(dtypes).exclude() by @ritchie46 in https://github.com/pola-rs/polars/pull/4001
    • fix explode offsets for empty lists by @ritchie46 in https://github.com/pola-rs/polars/pull/4005
    • reduce peak memory of reading parquet by row groups ~-22% by @ritchie46 in https://github.com/pola-rs/polars/pull/4006
    • fix rolling groupby with negative windows by @ritchie46 in https://github.com/pola-rs/polars/pull/4010
    • fix: Lazyframe::from(lp) #3877 by @universalmind303 in https://github.com/pola-rs/polars/pull/4012
    • Date encode types by @ritchie46 in https://github.com/pola-rs/polars/pull/4013
    • csv: allow multiple null values by @ritchie46 in https://github.com/pola-rs/polars/pull/4016
    • python polars 0.13.53 by @ritchie46 in https://github.com/pola-rs/polars/pull/4017
    • Improve lazy state struct by @ritchie46 in https://github.com/pola-rs/polars/pull/4008
    • python: fix pyarrow imports by @ritchie46 in https://github.com/pola-rs/polars/pull/4025
    • fix lazy schema by @ritchie46 in https://github.com/pola-rs/polars/pull/4027
    • Align the exclude docstrings and annotation by @thatlittleboy in https://github.com/pola-rs/polars/pull/4020
    • docs: add mwe and internal links by @thatlittleboy in https://github.com/pola-rs/polars/pull/4019
    • impl explode for nested lists by @ritchie46 in https://github.com/pola-rs/polars/pull/4028
    • allow joining on expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/4029
    • allow nulls last in sort by expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/4030
    • python polars 0.13.54 by @ritchie46 in https://github.com/pola-rs/polars/pull/4031
    • feat: implement contains for DataFrame and LazyFrame by @thatlittleboy in https://github.com/pola-rs/polars/pull/4035
    • Remove py-polars legacy package by @stinodego in https://github.com/pola-rs/polars/pull/4037
    • Native trigonometry functions by @stinodego in https://github.com/pola-rs/polars/pull/4034
    • parquet: stop reading when slice is reached by @ritchie46 in https://github.com/pola-rs/polars/pull/4046
    • fix cross join by @ritchie46 in https://github.com/pola-rs/polars/pull/4045
    • More trigonometry by @stinodego in https://github.com/pola-rs/polars/pull/4047
    • Update flake8 settings by @stinodego in https://github.com/pola-rs/polars/pull/4038
    • pivot: fix categorical logicaltype by @ritchie46 in https://github.com/pola-rs/polars/pull/4048
    • Update mypy settings by @stinodego in https://github.com/pola-rs/polars/pull/4049
    • fix: reproducible Expr.hash by @thatlittleboy in https://github.com/pola-rs/polars/pull/4033
    • Fix constructor orient type hint by @stinodego in https://github.com/pola-rs/polars/pull/3961
    • Improve coverage report settings by @stinodego in https://github.com/pola-rs/polars/pull/4039
    • Added literal param to string-replace functions, optimized replace performance in small-string regime (30-80% faster) by @alexander-beedie in https://github.com/pola-rs/polars/pull/4057
    • parquet: low memory arg by @ritchie46 in https://github.com/pola-rs/polars/pull/4050
    • Upgrade Windows 10 tests, benchmark and doc jobs to Python3.10 by @zundertj in https://github.com/pola-rs/polars/pull/4059
    • Revert "Upgrade Windows 10 tests, benchmark and doc jobs to Python3.10" by @ritchie46 in https://github.com/pola-rs/polars/pull/4062
    • fill_null expr: ensure minimal supertype by @ritchie46 in https://github.com/pola-rs/polars/pull/4061
    • Fix connector-x integration for PostgreSQL by @valxv in https://github.com/pola-rs/polars/pull/4063
    • node updates by @universalmind303 in https://github.com/pola-rs/polars/pull/3984
    • python polars 0.13.55 by @ritchie46 in https://github.com/pola-rs/polars/pull/4064
    • Handle wrong input for orient argument by @stinodego in https://github.com/pola-rs/polars/pull/4065
    • Turn on doctests; fix wrong examples by @zundertj in https://github.com/pola-rs/polars/pull/4060
    • Mypy warn redundant casts by @zundertj in https://github.com/pola-rs/polars/pull/4055
    • Add mypy optional error codes by @stinodego in https://github.com/pola-rs/polars/pull/4054
    • recursively convert arrow logical types in to_arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/4067
    • improve unique performance by @ritchie46 in https://github.com/pola-rs/polars/pull/4070
    • Small formatting fixes by @stinodego in https://github.com/pola-rs/polars/pull/4071
    • [mypy] Add error codes by @stinodego in https://github.com/pola-rs/polars/pull/4072
    • reduce contention of global string cache: >4x performance improvement by @ritchie46 in https://github.com/pola-rs/polars/pull/4078
    • Add lazy() method to LazyFrame by @zundertj in https://github.com/pola-rs/polars/pull/4077
    • [flake8] Enable flake8-bugbear extension by @stinodego in https://github.com/pola-rs/polars/pull/4073
    • csv: allow reading with different eol character by @ritchie46 in https://github.com/pola-rs/polars/pull/4080
    • docs: rework some MWE and minor formatting fixes by @thatlittleboy in https://github.com/pola-rs/polars/pull/4082
    • Upgrade maturin to 0.13.0 by @messense in https://github.com/pola-rs/polars/pull/4086
    • dataframe display: use POLARS_FMT_STR_LEN by @ritchie46 in https://github.com/pola-rs/polars/pull/4088
    • don't allow comparing local categoricals by @ritchie46 in https://github.com/pola-rs/polars/pull/4087
    • implement list hash for simply nested lists by @ritchie46 in https://github.com/pola-rs/polars/pull/4090
    • improve error on missing column access by @ritchie46 in https://github.com/pola-rs/polars/pull/4095
    • value_counts add sorted argument by @ritchie46 in https://github.com/pola-rs/polars/pull/4094
    • from_rows improve schema correctness by @ritchie46 in https://github.com/pola-rs/polars/pull/4097
    • Cache length of ChunkedArray. by @ritchie46 in https://github.com/pola-rs/polars/pull/4105
    • fix explode with empty lists by @ritchie46 in https://github.com/pola-rs/polars/pull/4113
    • fix so rank by @ritchie46 in https://github.com/pola-rs/polars/pull/4114
    • fix explode for sliced arrays by @ritchie46 in https://github.com/pola-rs/polars/pull/4115
    • python: to_numpy use first type as supertype by @ritchie46 in https://github.com/pola-rs/polars/pull/4116
    • python: remove css line for vscode by @ritchie46 in https://github.com/pola-rs/polars/pull/4117
    • Remove read_excel hacks by @cnpryer in https://github.com/pola-rs/polars/pull/4081
    • python allow set by string by @ritchie46 in https://github.com/pola-rs/polars/pull/4118
    • fill_nan preserve name by @ritchie46 in https://github.com/pola-rs/polars/pull/4119
    • Fix prefix/suffix docstrings. by @ghuls in https://github.com/pola-rs/polars/pull/4122
    • allow summing of duration in selection context by @ritchie46 in https://github.com/pola-rs/polars/pull/4124
    • python: improve setitem by @ritchie46 in https://github.com/pola-rs/polars/pull/4121
    • python polars 0.13.56 by @ritchie46 in https://github.com/pola-rs/polars/pull/4127
    • Assert deprecation warning on DataFrame.setitem in tests by @zundertj in https://github.com/pola-rs/polars/pull/4126
    • Run PR workflows on definition changes by @zundertj in https://github.com/pola-rs/polars/pull/4125
    • fix 'fatal: unsafe repository' in python build by @ritchie46 in https://github.com/pola-rs/polars/pull/4129
    • Nested dict by @ritchie46 in https://github.com/pola-rs/polars/pull/4131
    • improve performance of building global string cache from arrow dictio… by @ritchie46 in https://github.com/pola-rs/polars/pull/4132
    • csv writer quote if string contains new line char by @ritchie46 in https://github.com/pola-rs/polars/pull/4134
    • fix explode edge cases by @ritchie46 in https://github.com/pola-rs/polars/pull/4133
    • add pl.cut utility by @ritchie46 in https://github.com/pola-rs/polars/pull/4137
    • python polars 0.13.57 by @ritchie46 in https://github.com/pola-rs/polars/pull/4141
    • Mypy disallow untyped calls by @ritchie46 in https://github.com/pola-rs/polars/pull/4140
    • Improve re-raises of Exceptions by @zundertj in https://github.com/pola-rs/polars/pull/4142
    • pivot fix categorical index by @ritchie46 in https://github.com/pola-rs/polars/pull/4149
    • Fix typo by @stinodego in https://github.com/pola-rs/polars/pull/4146
    • Wrap long strings by @stinodego in https://github.com/pola-rs/polars/pull/4144
    • Fix Python line lengths to 88 characters by @stinodego in https://github.com/pola-rs/polars/pull/4152
    • add is_in for categoricals by @ritchie46 in https://github.com/pola-rs/polars/pull/4153
    • python 0.13.58 by @ritchie46 in https://github.com/pola-rs/polars/pull/4154
    • Docstring lints & improvements by @stinodego in https://github.com/pola-rs/polars/pull/4155
    • pivot: fix logical type of multiple indexes by @ritchie46 in https://github.com/pola-rs/polars/pull/4159
    • more tests by @ritchie46 in https://github.com/pola-rs/polars/pull/4163
    • Use latest arrow2 to support latest nightly rust by @gyscos in https://github.com/pola-rs/polars/pull/4162
    • Fix invalid inputs for trigonometric functions by @stinodego in https://github.com/pola-rs/polars/pull/4164
    • update schema in udfs by @ritchie46 in https://github.com/pola-rs/polars/pull/4165
    • python: expose idx type by @ritchie46 in https://github.com/pola-rs/polars/pull/4167
    • Improve getitem for Dataframe/Series. by @ghuls in https://github.com/pola-rs/polars/pull/4160
    • Dataframe equality by @stinodego in https://github.com/pola-rs/polars/pull/4076
    • Docstring improvements & enable lints by @stinodego in https://github.com/pola-rs/polars/pull/4161
    • Native implementation of the sign function by @stinodego in https://github.com/pola-rs/polars/pull/4147
    • Minor docs updates by @stinodego in https://github.com/pola-rs/polars/pull/4173
    • Validation for groupby arguments by @stinodego in https://github.com/pola-rs/polars/pull/4176
    • update arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/4177
    • throw error on schema failure by @ritchie46 in https://github.com/pola-rs/polars/pull/4178
    • with_columns update on duplicates by @ritchie46 in https://github.com/pola-rs/polars/pull/4179
    • fold regex expand by @ritchie46 in https://github.com/pola-rs/polars/pull/4181
    • python: prefer pyarrow when we can memory map the file by @ritchie46 in https://github.com/pola-rs/polars/pull/4182
    • window functions: sort cached groups if needed by @ritchie46 in https://github.com/pola-rs/polars/pull/4184
    • reduce supertype match by calling twice/ allow Some(tz)/None supertype by @ritchie46 in https://github.com/pola-rs/polars/pull/4186
    • Added const empty initializer to DataFrame by @TheDan64 in https://github.com/pola-rs/polars/pull/4187
    • fix utf8 explode for nulls and empty strings by @ritchie46 in https://github.com/pola-rs/polars/pull/4189
    • type-coercion: ignore unknown untill replaced by @ritchie46 in https://github.com/pola-rs/polars/pull/4192
    • python: always use stdlib http reader and improve memmap ipc reader a… by @ritchie46 in https://github.com/pola-rs/polars/pull/4193
    • slice pushdown for cross joins by @ritchie46 in https://github.com/pola-rs/polars/pull/4194
    • csv: ignore quoted lines in skip lines by @ritchie46 in https://github.com/pola-rs/polars/pull/4191
    • Small fixes in type formatting by @stinodego in https://github.com/pola-rs/polars/pull/4195
    • use native ndjson reader by @ritchie46 in https://github.com/pola-rs/polars/pull/4196
    • python polars: 0.13.59 by @ritchie46 in https://github.com/pola-rs/polars/pull/4198
    • Miscellaneous improvements by @matteosantama in https://github.com/pola-rs/polars/pull/4203
    • Add flake8 extension: comprehensions by @stinodego in https://github.com/pola-rs/polars/pull/4200
    • Add flake8 extension: simplify by @stinodego in https://github.com/pola-rs/polars/pull/4201
    • don't use pyarrow read if we have categoricals in the schema by @ritchie46 in https://github.com/pola-rs/polars/pull/4205
    • python: don't lock gil in arr.contains by @ritchie46 in https://github.com/pola-rs/polars/pull/4210
    • fix nested struct append by @ritchie46 in https://github.com/pola-rs/polars/pull/4217
    • use default context for col upstream col expression type by @ritchie46 in https://github.com/pola-rs/polars/pull/4219
    • ensure weekday starts at 0 by @ritchie46 in https://github.com/pola-rs/polars/pull/4220
    • python datetime consistency by @ritchie46 in https://github.com/pola-rs/polars/pull/4221
    • python: improve error by @ritchie46 in https://github.com/pola-rs/polars/pull/4223
    • Upgrade black, blackdoc, mypy, flake8 by @matteosantama in https://github.com/pola-rs/polars/pull/4209
    • python: ensure utf8 encoding when writing dot file by @ritchie46 in https://github.com/pola-rs/polars/pull/4225
    • convert arrow map to list by @ritchie46 in https://github.com/pola-rs/polars/pull/4226
    • fast path for sorted min/max by @ritchie46 in https://github.com/pola-rs/polars/pull/4228
    • Set no_implicit_reexport = true in pyproject.toml by @matteosantama in https://github.com/pola-rs/polars/pull/4211
    • fix and improve rolling_skew by @ritchie46 in https://github.com/pola-rs/polars/pull/4232
    • ternary expr: validate predicate in groupby context by @ritchie46 in https://github.com/pola-rs/polars/pull/4237
    • Overload pl.from_arrow type hints by @matteosantama in https://github.com/pola-rs/polars/pull/4236
    • python: allow horizontal expanding sum by @ritchie46 in https://github.com/pola-rs/polars/pull/4242
    • improve strictness/consistency of when then otherwise by @ritchie46 in https://github.com/pola-rs/polars/pull/4241
    • reinstate old ternary behavior as experimental by @ritchie46 in https://github.com/pola-rs/polars/pull/4244
    • correct dtype for power by @ritchie46 in https://github.com/pola-rs/polars/pull/4246
    • csv: improve data/datetime/bool overwrite by @ritchie46 in https://github.com/pola-rs/polars/pull/4247
    • Release rust 0.23.0 by @ritchie46 in https://github.com/pola-rs/polars/pull/4248

    New Contributors

    • @GregoryBL made their first contribution in https://github.com/pola-rs/polars/pull/3566
    • @gyscos made their first contribution in https://github.com/pola-rs/polars/pull/3674
    • @LVG77 made their first contribution in https://github.com/pola-rs/polars/pull/3691
    • @gunjunlee made their first contribution in https://github.com/pola-rs/polars/pull/3671
    • @saethlin made their first contribution in https://github.com/pola-rs/polars/pull/3737
    • @joshuataylor made their first contribution in https://github.com/pola-rs/polars/pull/3793
    • @Smittyvb made their first contribution in https://github.com/pola-rs/polars/pull/3804
    • @thatlittleboy made their first contribution in https://github.com/pola-rs/polars/pull/3811
    • @braaannigan made their first contribution in https://github.com/pola-rs/polars/pull/3750
    • @thomasaarholt made their first contribution in https://github.com/pola-rs/polars/pull/3855
    • @duskmoon314 made their first contribution in https://github.com/pola-rs/polars/pull/3897
    • @savente93 made their first contribution in https://github.com/pola-rs/polars/pull/3921
    • @cxtruong70 made their first contribution in https://github.com/pola-rs/polars/pull/3959
    • @andrei-ionescu made their first contribution in https://github.com/pola-rs/polars/pull/3894
    • @valxv made their first contribution in https://github.com/pola-rs/polars/pull/4063
    • @matteosantama made their first contribution in https://github.com/pola-rs/polars/pull/4203

    Full Changelog: https://github.com/pola-rs/polars/compare/rust-polars-v0.22.1...rust-polars-v0.23.0

    Source code(tar.gz)
    Source code(zip)
  • rust-polars-v0.22.1(Jun 6, 2022)

    What's Changed

    • partial support for list arithmetic by @ritchie46 in https://github.com/pola-rs/polars/pull/3307
    • shuffle sample option by @ritchie46 in https://github.com/pola-rs/polars/pull/3308
    • improve predicate pushdown by @ritchie46 in https://github.com/pola-rs/polars/pull/3313
    • Improve partitioned agg by @ritchie46 in https://github.com/pola-rs/polars/pull/3314
    • list to struct by @ritchie46 in https://github.com/pola-rs/polars/pull/3317
    • oncecell in favor of lazy_static by @ritchie46 in https://github.com/pola-rs/polars/pull/3319
    • Update cummax documentation by @briandk in https://github.com/pola-rs/polars/pull/3323
    • scan pyarrow dataset by @ritchie46 in https://github.com/pola-rs/polars/pull/3327
    • fix panic in csv parser by @ritchie46 in https://github.com/pola-rs/polars/pull/3339
    • implement anyvalue -> datatype for all variants by @ritchie46 in https://github.com/pola-rs/polars/pull/3340
    • remove badge by @ritchie46 in https://github.com/pola-rs/polars/pull/3341
    • Added PartitionedWriter for disk partitioning. by @illumination-k in https://github.com/pola-rs/polars/pull/3331
    • Fast json by @universalmind303 in https://github.com/pola-rs/polars/pull/3324
    • add hash to rust expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/3350
    • serde for group options by @elferherrera in https://github.com/pola-rs/polars/pull/3349
    • Check if length of index in pivot operation is non-zero. Fixes: #3343. by @ghuls in https://github.com/pola-rs/polars/pull/3346
    • improve agg_list performance of chunked numerical data by @ritchie46 in https://github.com/pola-rs/polars/pull/3351
    • Fix init of DataFrame with empty dataset (eg:"[]") and column/schema typedefs by @alexander-beedie in https://github.com/pola-rs/polars/pull/3353
    • rechunk on default sort and groupby by @ritchie46 in https://github.com/pola-rs/polars/pull/3354
    • more partitioned groupby by @ritchie46 in https://github.com/pola-rs/polars/pull/3355
    • Add extension_module in python example by @Maxyme in https://github.com/pola-rs/polars/pull/3358
    • allow join on same cat source by @ritchie46 in https://github.com/pola-rs/polars/pull/3363
    • fix rename same name by @ritchie46 in https://github.com/pola-rs/polars/pull/3364
    • initial timezone support by @ritchie46 in https://github.com/pola-rs/polars/pull/3357
    • pivot index maintain logical type by @ritchie46 in https://github.com/pola-rs/polars/pull/3367
    • use array_ref in favor of chunks by @ritchie46 in https://github.com/pola-rs/polars/pull/3368
    • entropy normalization arg by @ritchie46 in https://github.com/pola-rs/polars/pull/3369
    • categorical keep type in comparisson by @ritchie46 in https://github.com/pola-rs/polars/pull/3370
    • rechunk in asof and allow concat to empty df by @ritchie46 in https://github.com/pola-rs/polars/pull/3376
    • improve overflow of numeric mean by @ritchie46 in https://github.com/pola-rs/polars/pull/3377
    • fix parquet stats by @ritchie46 in https://github.com/pola-rs/polars/pull/3378
    • delay rechunk optimization by @ritchie46 in https://github.com/pola-rs/polars/pull/3381
    • Allow Z in native strpttime by @ritchie46 in https://github.com/pola-rs/polars/pull/3382
    • more partitioned aggregators by @ritchie46 in https://github.com/pola-rs/polars/pull/3385
    • improve partition_by by @ritchie46 in https://github.com/pola-rs/polars/pull/3386
    • Add overload support to partition_by. by @ghuls in https://github.com/pola-rs/polars/pull/3388
    • Check if some arguments for read_csv and scan_csv got a 1 byte input. by @ghuls in https://github.com/pola-rs/polars/pull/3389
    • fix rayon SO in partition_by by @ritchie46 in https://github.com/pola-rs/polars/pull/3391
    • fix bug in predicate pushdown on dependent predicates by @ritchie46 in https://github.com/pola-rs/polars/pull/3394
    • fix predicate pushdown for predicates that do aggregations by @ritchie46 in https://github.com/pola-rs/polars/pull/3396
    • cumulative_eval by @ritchie46 in https://github.com/pola-rs/polars/pull/3400
    • ensure that Cast expressions first updates groups before it flattens by @ritchie46 in https://github.com/pola-rs/polars/pull/3401
    • improve and simplify ternary aggregation by @ritchie46 in https://github.com/pola-rs/polars/pull/3403
    • fix explode empty df by @ritchie46 in https://github.com/pola-rs/polars/pull/3405
    • Improve list builders, iteration and construction by @ritchie46 in https://github.com/pola-rs/polars/pull/3419
    • feature gate timezones by @ritchie46 in https://github.com/pola-rs/polars/pull/3422
    • fix cumulative_eval on window expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/3421
    • csv allow only header and fix lazy rename by @ritchie46 in https://github.com/pola-rs/polars/pull/3423
    • upgrade arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/3425
    • infer dtype of empty list in recursive list construction & fix struct.arr take by @ritchie46 in https://github.com/pola-rs/polars/pull/3433
    • fix struct list concat by @ritchie46 in https://github.com/pola-rs/polars/pull/3435
    • csv parser fallback on chrono if datetime pattern fails by @ritchie46 in https://github.com/pola-rs/polars/pull/3436
    • improve rolling_quantile kernel (no nulls) ~28x by @ritchie46 in https://github.com/pola-rs/polars/pull/3437
    • improve rolling_{min/max/sum/mean} prerformance ~3.4x by @ritchie46 in https://github.com/pola-rs/polars/pull/3444
    • struct add chunk and impl reverse by @ritchie46 in https://github.com/pola-rs/polars/pull/3445
    • fix struct equality by @ritchie46 in https://github.com/pola-rs/polars/pull/3446
    • Struct error on different dict orders by @ritchie46 in https://github.com/pola-rs/polars/pull/3447
    • Inherit Exception in fallback exception classes by @adamgreg in https://github.com/pola-rs/polars/pull/3450
    • Struct creations/append/extend stricter schema by @ritchie46 in https://github.com/pola-rs/polars/pull/3454
    • don't allow predicate pushdown if compared column is being coerced by @ritchie46 in https://github.com/pola-rs/polars/pull/3457
    • improve rolling_min/max for columns with null values by @ritchie46 in https://github.com/pola-rs/polars/pull/3458
    • Improve rolling_sum/rolling_mean for windows with null values. by @ritchie46 in https://github.com/pola-rs/polars/pull/3466
    • explode series after slide fast path by @ritchie46 in https://github.com/pola-rs/polars/pull/3467
    • Improve struct by @ritchie46 in https://github.com/pola-rs/polars/pull/3468
    • improve rolling_var performance by @ritchie46 in https://github.com/pola-rs/polars/pull/3470
    • power by expression and improve rust lazy ergonomics by @ritchie46 in https://github.com/pola-rs/polars/pull/3475
    • add specialized rolling_std kernel by @ritchie46 in https://github.com/pola-rs/polars/pull/3476
    • fix null commutativity by @ritchie46 in https://github.com/pola-rs/polars/pull/3479
    • use anyvalue if first apply list result is empty by @ritchie46 in https://github.com/pola-rs/polars/pull/3480
    • Added describe method to rust library by @glennpierce in https://github.com/pola-rs/polars/pull/3320
    • Groupby Optimization for sorted keys: ~15x perf gain. by @ritchie46 in https://github.com/pola-rs/polars/pull/3489
    • make cat merge fallible and loossen restrictions on categorical appends by @ritchie46 in https://github.com/pola-rs/polars/pull/3491
    • Fix LazyFrame.join_asof documentation reference by @adamgreg in https://github.com/pola-rs/polars/pull/3493
    • feat: support pl.Time in Series.str.strptime by @fsimkovic in https://github.com/pola-rs/polars/pull/3496
    • str().extract_all / str().count_match by @ritchie46 in https://github.com/pola-rs/polars/pull/3507
    • add apply to cookbooks by @ritchie46 in https://github.com/pola-rs/polars/pull/3504
    • support all arrow dictionary keys < 64 bit by @ritchie46 in https://github.com/pola-rs/polars/pull/3508
    • fix accidental quadratic behavior in rolling_groupby by @ritchie46 in https://github.com/pola-rs/polars/pull/3510
    • Fix some unit test deprecation warnings by @adamgreg in https://github.com/pola-rs/polars/pull/3503
    • Experimental Allow rolling_<agg> expressions to determine window size by another {Date, Datetime} series. by @ritchie46 in https://github.com/pola-rs/polars/pull/3514
    • use specialize kernels in rolling_groupby aggregation ~10x perf gain (window of 100 elements) by @ritchie46 in https://github.com/pola-rs/polars/pull/3515
    • reduce probability of quadratic behavior in min/max rolling by @ritchie46 in https://github.com/pola-rs/polars/pull/3516
    • adjust for kleene logic in drop_na by @ritchie46 in https://github.com/pola-rs/polars/pull/3529
    • fix aggregation of empty list by @ritchie46 in https://github.com/pola-rs/polars/pull/3527
    • fix sorting of chunked numeric arrays by @ritchie46 in https://github.com/pola-rs/polars/pull/3528
    • adjust for kleene logic in drop_na by @ritchie46 in https://github.com/pola-rs/polars/pull/3530
    • Improve rolling min max by @ritchie46 in https://github.com/pola-rs/polars/pull/3531
    • fix null aggregation edge case by @ritchie46 in https://github.com/pola-rs/polars/pull/3536
    • allow concat/append expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/3541
    • make sort by multiple columns parallel by @ritchie46 in https://github.com/pola-rs/polars/pull/3549
    • allow more aggregations on dtype duration by @ritchie46 in https://github.com/pola-rs/polars/pull/3550
    • use first series to validate length by @ritchie46 in https://github.com/pola-rs/polars/pull/3551
    • Raise a more helpful TypeError when trying to subscript a LazyFrame. by @ghuls in https://github.com/pola-rs/polars/pull/3554
    • Readability Fixes r2 by @ryanrussell in https://github.com/pola-rs/polars/pull/3556
    • add count_match, extract_all to python ref guide by @ritchie46 in https://github.com/pola-rs/polars/pull/3558
    • fill_null limits by @ritchie46 in https://github.com/pola-rs/polars/pull/3559
    • test sortedness propagation by @ritchie46 in https://github.com/pola-rs/polars/pull/3560
    • update boolean aggregates and ensure they return IdxSize by @ritchie46 in https://github.com/pola-rs/polars/pull/3563
    • Improve parse_lines error message. by @ghuls in https://github.com/pola-rs/polars/pull/3569
    • sorted_merge_join by @ritchie46 in https://github.com/pola-rs/polars/pull/3505
    • Rust Readability Improvements by @ryanrussell in https://github.com/pola-rs/polars/pull/3573
    • fix invalid fast path of sorted joins and improve sortedness propagation by @ritchie46 in https://github.com/pola-rs/polars/pull/3577
    • prevent expensive type coercion in expression and fix when->then->oth… by @ritchie46 in https://github.com/pola-rs/polars/pull/3579
    • Updated the fmt feature flag error message by @TheDan64 in https://github.com/pola-rs/polars/pull/3586
    • Fix u16 Series formatting. by @ghuls in https://github.com/pola-rs/polars/pull/3584
    • update arrow to crates.io: ~2x json parsing improvement by @ritchie46 in https://github.com/pola-rs/polars/pull/3588

    New Contributors

    • @kianmeng made their first contribution in https://github.com/pola-rs/polars/pull/3311
    • @briandk made their first contribution in https://github.com/pola-rs/polars/pull/3323
    • @EwoutH made their first contribution in https://github.com/pola-rs/polars/pull/3352
    • @adamgreg made their first contribution in https://github.com/pola-rs/polars/pull/3450
    • @ryanrussell made their first contribution in https://github.com/pola-rs/polars/pull/3488
    • @fsimkovic made their first contribution in https://github.com/pola-rs/polars/pull/3496
    • @chitralverma made their first contribution in https://github.com/pola-rs/polars/pull/3578
    • @TheDan64 made their first contribution in https://github.com/pola-rs/polars/pull/3586

    Full Changelog: https://github.com/pola-rs/polars/compare/rust-polars-v0.21.1...rust-polars-v0.22.1

    Source code(tar.gz)
    Source code(zip)
  • rust-polars-v0.21.1(Jun 6, 2022)

    What's Changed

    • Remove crate num_cpus from polars by @dandxy89 in https://github.com/pola-rs/polars/pull/2890
    • temporarely pin crossbeam-epoch by @ritchie46 in https://github.com/pola-rs/polars/pull/2902
    • fix unique and drop by @ritchie46 in https://github.com/pola-rs/polars/pull/2908
    • fix explode of empty lists by @ritchie46 in https://github.com/pola-rs/polars/pull/2910
    • fix function input expansion by @ritchie46 in https://github.com/pola-rs/polars/pull/2913
    • fix compilation lazy + string by @ritchie46 in https://github.com/pola-rs/polars/pull/2914
    • respect dtype overwrite when schema is overwritten in lazy csv scanner by @ritchie46 in https://github.com/pola-rs/polars/pull/2915
    • deprecate to_ and string cache in lazy by @ritchie46 in https://github.com/pola-rs/polars/pull/2916
    • Refactor: move most temporal related code to polars-time. by @ritchie46 in https://github.com/pola-rs/polars/pull/2918
    • improve datetime inference by @ritchie46 in https://github.com/pola-rs/polars/pull/2923
    • rename distinct to unique by @ritchie46 in https://github.com/pola-rs/polars/pull/2926
    • fix some warning by @ritchie46 in https://github.com/pola-rs/polars/pull/2927
    • improve date/datetime inference by @ritchie46 in https://github.com/pola-rs/polars/pull/2925
    • fix fill_nan dtypes by @ritchie46 in https://github.com/pola-rs/polars/pull/2933
    • fix future calculation in groupby dynamic by @ritchie46 in https://github.com/pola-rs/polars/pull/2935
    • add tolerance to asof + by by @ritchie46 in https://github.com/pola-rs/polars/pull/2937
    • fix(scan_csv): handle empty csv file exception by @LuisCardosoOliveira in https://github.com/pola-rs/polars/pull/2934
    • handle Utf8Owned AnyValue for DataType by @cigrainger in https://github.com/pola-rs/polars/pull/2944
    • Fix argsort by @ritchie46 in https://github.com/pola-rs/polars/pull/2946
    • value_counts and unique_counts expression by @ritchie46 in https://github.com/pola-rs/polars/pull/2947
    • use schema in 'with_columns' to amortize lookups and fix bug in emptr… by @ritchie46 in https://github.com/pola-rs/polars/pull/2949
    • add native log and entropy expression by @ritchie46 in https://github.com/pola-rs/polars/pull/2952
    • csv parsing: skip whitespace on failed parse by @ritchie46 in https://github.com/pola-rs/polars/pull/2953
    • Literal in groupby context, arange and repeat by @ritchie46 in https://github.com/pola-rs/polars/pull/2958
    • Huge perf improvement of many expressions and ListChunked::from_iter perf by @ritchie46 in https://github.com/pola-rs/polars/pull/2962
    • update groups in count() agg and correctly update state by @ritchie46 in https://github.com/pola-rs/polars/pull/2963
    • add sign by @ritchie46 in https://github.com/pola-rs/polars/pull/2977
    • see kurtosis as aggregation by @ritchie46 in https://github.com/pola-rs/polars/pull/2993
    • fix groups state after apply by @ritchie46 in https://github.com/pola-rs/polars/pull/2992
    • Home directory support by @cjermain in https://github.com/pola-rs/polars/pull/2940
    • make sure that sort does not index empty list by @ritchie46 in https://github.com/pola-rs/polars/pull/2996
    • python: improve arithmetic consistency by @ritchie46 in https://github.com/pola-rs/polars/pull/3001
    • python: add apply on struct dtype by @ritchie46 in https://github.com/pola-rs/polars/pull/3003
    • fix null in non-fast-explode explode of numeric arrays by @ritchie46 in https://github.com/pola-rs/polars/pull/3006
    • also expand rename in filters by @ritchie46 in https://github.com/pola-rs/polars/pull/3008
    • fix when then with literal by @ritchie46 in https://github.com/pola-rs/polars/pull/3009
    • fix groups update to match exploded offsets by @ritchie46 in https://github.com/pola-rs/polars/pull/3010
    • add duration expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3017
    • allow nested groupby in groupby_rolling by @ritchie46 in https://github.com/pola-rs/polars/pull/3018
    • Fix read_parquet with list having nested struct by @cjermain in https://github.com/pola-rs/polars/pull/2991
    • fix outer join schema by @ritchie46 in https://github.com/pola-rs/polars/pull/3021
    • lazy: fix drop all by @ritchie46 in https://github.com/pola-rs/polars/pull/3023
    • fix schemas of groupby rolling/dynamic by @ritchie46 in https://github.com/pola-rs/polars/pull/3028
    • fix div by zero by @ritchie46 in https://github.com/pola-rs/polars/pull/3031
    • fix incorrect match in agg_mean by @ritchie46 in https://github.com/pola-rs/polars/pull/3030
    • check alias in whole expr on opt by @ritchie46 in https://github.com/pola-rs/polars/pull/3032
    • align groups in binary when they not align by @ritchie46 in https://github.com/pola-rs/polars/pull/3033
    • only expand function inputs if wildcard expansion allows it by @ritchie46 in https://github.com/pola-rs/polars/pull/3039
    • fix when_then_chain containing nulls by @ritchie46 in https://github.com/pola-rs/polars/pull/3040
    • fixed typo in format_path docstring by @cnpryer in https://github.com/pola-rs/polars/pull/3045
    • fix when-then-chain by @ritchie46 in https://github.com/pola-rs/polars/pull/3048
    • throw error on empty keyed groupby by @ritchie46 in https://github.com/pola-rs/polars/pull/3049
    • compare expand_cols by variant not exact datatype by @ritchie46 in https://github.com/pola-rs/polars/pull/3050
    • dot: use apply instead of map by @ritchie46 in https://github.com/pola-rs/polars/pull/3051
    • check output length of all 'map' expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/3052
    • error on invalid asof_join by input by @ritchie46 in https://github.com/pola-rs/polars/pull/3053
    • improve performance of asof_join by equal or more than 2 keys by @ritchie46 in https://github.com/pola-rs/polars/pull/3055
    • remove unneeded expensive assert by @ritchie46 in https://github.com/pola-rs/polars/pull/3069
    • improve boolean null comparsions consistency by @ritchie46 in https://github.com/pola-rs/polars/pull/3068
    • fix entropy by @ritchie46 in https://github.com/pola-rs/polars/pull/3070
    • fix explode empty lists by @ritchie46 in https://github.com/pola-rs/polars/pull/3083
    • Lazy: update schema in explode op by @ritchie46 in https://github.com/pola-rs/polars/pull/3084
    • CSV datetime inference 3x performance improvement by @ritchie46 in https://github.com/pola-rs/polars/pull/2950
    • [polars-sql] Adding SQL Context, SELECT and GROUP BY by @potter420 in https://github.com/pola-rs/polars/pull/3024
    • Default sample n param to 1 by @cnpryer in https://github.com/pola-rs/polars/pull/3090
    • Expose 'rechunk' param from "read_ipc" for consistency (default behaviour unchanged) by @alexander-beedie in https://github.com/pola-rs/polars/pull/3088
    • Add optional seeding for sampling by @cnpryer in https://github.com/pola-rs/polars/pull/3080
    • default to native strptime by @ritchie46 in https://github.com/pola-rs/polars/pull/3093
    • Raise error in sample() if n and frac are both passed by @cnpryer in https://github.com/pola-rs/polars/pull/3091
    • split up planner by @ritchie46 in https://github.com/pola-rs/polars/pull/3095
    • add test for #3097 by @ritchie46 in https://github.com/pola-rs/polars/pull/3098
    • Initial support for serde/pickling expressions. by @ritchie46 in https://github.com/pola-rs/polars/pull/3096
    • Adding nested struct support by fixing ArrayRef determination by @cjermain in https://github.com/pola-rs/polars/pull/3103
    • Enhanced columns param for DataFrame init, additionally allowing for inline type specification by @alexander-beedie in https://github.com/pola-rs/polars/pull/3100
    • Improve rolling agg by @ritchie46 in https://github.com/pola-rs/polars/pull/3101
    • add estimate_size methods by @ritchie46 in https://github.com/pola-rs/polars/pull/3110
    • fix and test estimated_size by @ritchie46 in https://github.com/pola-rs/polars/pull/3113
    • remove unused datafusion integration by @ritchie46 in https://github.com/pola-rs/polars/pull/3115
    • Nodejs writejson fix & avro read/write by @universalmind303 in https://github.com/pola-rs/polars/pull/3116
    • Parquet statistics: don't panic by @ritchie46 in https://github.com/pola-rs/polars/pull/3127
    • lazy: expand cols in filter by @ritchie46 in https://github.com/pola-rs/polars/pull/3128
    • melt extra arguments by @ritchie46 in https://github.com/pola-rs/polars/pull/3133
    • Lazy: Don't materialize whole table in JOIN followed by SLICE by @ritchie46 in https://github.com/pola-rs/polars/pull/3136
    • Pushdown SLICE to GROUPBY nodes by @ritchie46 in https://github.com/pola-rs/polars/pull/3138
    • Switch from unmaintained jemalloctor to maintained tikv-jemallocator. by @ghuls in https://github.com/pola-rs/polars/pull/3141
    • Polars vs Pivot: Round 3 🥊 ~2-25x improvement by @ritchie46 in https://github.com/pola-rs/polars/pull/3143
    • DataFrame::partition_by by @ritchie46 in https://github.com/pola-rs/polars/pull/3148
    • Add semi and anti joins. by @ritchie46 in https://github.com/pola-rs/polars/pull/3149
    • derive clone for lazy groupby by @elferherrera in https://github.com/pola-rs/polars/pull/3156
    • pushdown slice to sort nodes by @ritchie46 in https://github.com/pola-rs/polars/pull/3159
    • slice_pushdown projections by @ritchie46 in https://github.com/pola-rs/polars/pull/3160
    • lazy err on not found col by @ritchie46 in https://github.com/pola-rs/polars/pull/3169
    • improve inner join performance by @ritchie46 in https://github.com/pola-rs/polars/pull/3168
    • fix duration filters with different time units by @marcvanheerden in https://github.com/pola-rs/polars/pull/3179
    • fix overflow in agg_mean by @ritchie46 in https://github.com/pola-rs/polars/pull/3183
    • list eval expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3185
    • Supporting Struct comparison and any/all API by @cjermain in https://github.com/pola-rs/polars/pull/3180
    • struct logical type arrow conversion by @ritchie46 in https://github.com/pola-rs/polars/pull/3193
    • make series comparissons fallible by @ritchie46 in https://github.com/pola-rs/polars/pull/3192
    • fix_pivot by @ritchie46 in https://github.com/pola-rs/polars/pull/3199
    • recursively convert arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/3200
    • fix arr.eval type inference by @ritchie46 in https://github.com/pola-rs/polars/pull/3203
    • Improve Left join on chunked data by @ritchie46 in https://github.com/pola-rs/polars/pull/3177
    • polars-ops by @ritchie46 in https://github.com/pola-rs/polars/pull/3212
    • Fix tree traversal complexity by @ritchie46 in https://github.com/pola-rs/polars/pull/3213
    • Adding struct column tests by @ishmandoo in https://github.com/pola-rs/polars/pull/3209
    • struct: handle validity by @ritchie46 in https://github.com/pola-rs/polars/pull/3217
    • bug template bounce resolved bugs by @ritchie46 in https://github.com/pola-rs/polars/pull/3218
    • add duration minutes by @ritchie46 in https://github.com/pola-rs/polars/pull/3219
    • fix partition boundary by @ritchie46 in https://github.com/pola-rs/polars/pull/3223
    • Option to check column order when comparing polars dataframes by @physinet in https://github.com/pola-rs/polars/pull/3206
    • fix dispatch of quantile aggregations by @ritchie46 in https://github.com/pola-rs/polars/pull/3234
    • Improving array refs for to_list by @cjermain in https://github.com/pola-rs/polars/pull/3231
    • fix offsets in categorical merge by @ritchie46 in https://github.com/pola-rs/polars/pull/3242
    • Serialize/Deserialize LazyFrames/Logical plans by @ritchie46 in https://github.com/pola-rs/polars/pull/3244
    • setup serializable function + null_count expr by @ritchie46 in https://github.com/pola-rs/polars/pull/3247
    • improve ternary in groupby context by @ritchie46 in https://github.com/pola-rs/polars/pull/3248
    • fix skew autoexplode and add test by @marcvanheerden in https://github.com/pola-rs/polars/pull/3251
    • quantile agg; update grouptuples by @ritchie46 in https://github.com/pola-rs/polars/pull/3252
    • Only pass dtype to array, if not None: Fixes #3253 by @ghuls in https://github.com/pola-rs/polars/pull/3257
    • polars 0.21.0 by @ritchie46 in https://github.com/pola-rs/polars/pull/3258
    • do not write empty chunk to parquet by @ritchie46 in https://github.com/pola-rs/polars/pull/3259
    • Improve partitioned groupby by @ritchie46 in https://github.com/pola-rs/polars/pull/3263
    • improve sample_perf by @ritchie46 in https://github.com/pola-rs/polars/pull/3264
    • add iso strptime patterns by @ritchie46 in https://github.com/pola-rs/polars/pull/3265
    • add partial decompression in read_csv by @ritchie46 in https://github.com/pola-rs/polars/pull/3268
    • fix partitoned and error don't ignore errors by @ritchie46 in https://github.com/pola-rs/polars/pull/3273
    • fix row count for u64 idx by @ritchie46 in https://github.com/pola-rs/polars/pull/3285
    • Code coverage for Rust/Python by @cjermain in https://github.com/pola-rs/polars/pull/3278
    • Improve groupby states by @ritchie46 in https://github.com/pola-rs/polars/pull/3291
    • recursive list builder in rows by @ritchie46 in https://github.com/pola-rs/polars/pull/3293
    • Fix ipc_read_schema so Path() and filename which start with "~/" work. by @ghuls in https://github.com/pola-rs/polars/pull/3297

    New Contributors

    • @LuisCardosoOliveira made their first contribution in https://github.com/pola-rs/polars/pull/2934
    • @keiv-fly made their first contribution in https://github.com/pola-rs/polars/pull/2930
    • @cigrainger made their first contribution in https://github.com/pola-rs/polars/pull/2944
    • @slonik-az made their first contribution in https://github.com/pola-rs/polars/pull/3124
    • @physinet made their first contribution in https://github.com/pola-rs/polars/pull/3215

    Full Changelog*: https://github.com/pola-rs/polars/compare/rust-polars-v0.20.0...rust-polars-v0.21.

    Source code(tar.gz)
    Source code(zip)
  • rust-polars-v0.20.0(Mar 14, 2022)

    New rust polars release! :rocket:

    This release of 286 commits is here thanks to the contributions of: (in no specific order):

    • @moritzwilksch
    • @JakobGM
    • @illumination-k
    • @tamasfe
    • @ghuls
    • @alexander-beedie
    • @Maxyme
    • @universalmind303
    • @qiemem
    • @glennpierce
    • @nmandery
    • @ilsley
    • @marcvanheerden

    did I forget your contribution, please ping me, I do this manually :see_no_evil:

    Most notable changes are:

    • Many bug fixes.
    • Many performance improvements.

    features

    • Made representation of groups tuples more cache friendly #2431

    • Remove Seek requirement of readers

    • Add groupby_rolling as new entrance to expression API.

    • Improve CSV parsers stability and performance on several occasions

    • Horizontal aggregations are parallelized #2454

    • Reduce pivot code bloat and improve performance #2458

    • Struct data type added.

    • Extend methods that allow modification of the same memory if Arc::ref_count == 1

    • Avro readers and writers.

    • Improved rules of window expressions.

    • Support for us time unit.

    • Parquet use statistics in query optimizations.

    • Optimize projections in lazy computations. (Mostly useful when you deal with a large number of columns e.g. millions).

    • Improve performance and flexibility of melt operation @2799

    • new expressions

      • str.split
      • str.split_inclusive
      • arr.join
      • unique_stable
      • str.split_exact
      • count expression that does not require column names
      • arr.arg_min
      • arr.arg_max
      • arr.diff
      • arr.shift

    Update to arrow2 0.10.0

    See the 0.10.0 release for all upstream improvements.

    Source code(tar.gz)
    Source code(zip)
Owner
Ritchie Vink
Data Scientist | Data Engineer
Ritchie Vink
A Rust DataFrame implementation, built on Apache Arrow

Rust DataFrame A dataframe implementation in Rust, powered by Apache Arrow. What is a dataframe? A dataframe is a 2-dimensional tabular data structure

Wakahisa 287 Nov 11, 2022
DataFrame / Series data processing in Rust

black-jack While PRs are welcome, the approach taken only allows for concrete types (String, f64, i64, ...) I'm not sure this is the way to go. I want

Miles Granger 30 Dec 10, 2022
DataFrame & its adaptors

Fabrix Fabrix is a lib crate, who uses Polars Series and DataFrame as fundamental data structures, and is capable to communicate among different data

Jacob Xie 18 Dec 12, 2022
Provides multiple-dtype columner storage, known as DataFrame in pandas/R

brassfibre Provides multiple-dtype columner storage, known as DataFrame in pandas/R. Series Single-dtype 1-dimentional vector with label (index). Crea

Sinhrks 21 Nov 28, 2022
A dataframe manipulation tool inspired by dplyr and powered by polars.

dply is a command line tool for viewing, querying, and writing csv and parquet files, inspired by dplyr and powered by polars. Usage overview A dply p

null 14 May 29, 2023
Yet Another Technical Analysis library [for Rust]

YATA Yet Another Technical Analysis library YaTa implements most common technical analysis methods and indicators. It also provides you an interface t

Dmitry 197 Dec 29, 2022
Rayon: A data parallelism library for Rust

Rayon Rayon is a data-parallelism library for Rust. It is extremely lightweight and makes it easy to convert a sequential computation into a parallel

null 7.8k Jan 8, 2023
sparse linear algebra library for rust

sprs, sparse matrices for Rust sprs implements some sparse matrix data structures and linear algebra algorithms in pure Rust. The API is a work in pro

Vincent Barrielle 311 Dec 18, 2022
ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

SFU Database Group 939 Jan 5, 2023
A bit vector with the Rust standard library's portable SIMD API

bitsvec A bit vector with the Rust standard library's portable SIMD API Usage Add bitsvec to Cargo.toml: bitsvec = "x.y.z" Write some code like this:

Chojan Shang 31 Dec 21, 2022
A rust library built to support building time-series based projection models

TimeSeries TimeSeries is a framework for building analytical models in Rust that have a time dimension. Inspiration The inspiration for writing this i

James MacAdie 12 Dec 7, 2022
Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model.

Polars Python Documentation | Rust Documentation | User Guide | Discord | StackOverflow Blazingly fast DataFrames in Rust, Python & Node.js Polars is

null 11.8k Jan 8, 2023
Library for scripting analyses against crates.io's database dumps

crates.io database dumps Library for scripting analyses against crates.io's database dumps. These database dumps contain all information exposed by th

David Tolnay 52 Dec 14, 2022
A cross-platform library to retrieve performance statistics data.

A toolkit designed to be a foundation for applications to monitor their performance.

Lark Technologies Pte. Ltd. 155 Nov 12, 2022
This library provides a data view for reading and writing data in a byte array.

Docs This library provides a data view for reading and writing data in a byte array. This library requires feature(generic_const_exprs) to be enabled.

null 2 Nov 2, 2022
Dataflow is a data processing library, primarily for machine learning

Dataflow Dataflow is a data processing library, primarily for machine learning. It provides efficient pipeline primitives to build a directed acyclic

Sidekick AI 9 Dec 19, 2022
A Rust crate that reads and writes tfrecord files

tfrecord-rust The crate provides the functionality to serialize and deserialize TFRecord data format from TensorFlow. Features Provide both high level

null 22 Nov 3, 2022
Official Rust implementation of Apache Arrow

Native Rust implementation of Apache Arrow Welcome to the implementation of Arrow, the popular in-memory columnar format, in Rust. This part of the Ar

The Apache Software Foundation 1.3k Jan 9, 2023
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

Parquet2 This is a re-write of the official parquet crate with performance, parallelism and safety in mind. The five main differentiators in compariso

Jorge Leitao 237 Jan 1, 2023