Rust DataFrame library

Ritchie Vink

Last update: Jan 8, 2023

Related tags

Overview

Polars

Blazingly fast DataFrames in Rust & Python

Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arrow as backend.

It currently consists of an eager API similar to pandas and a lazy API that is somewhat similar to spark. Amongst more, Polars has the following functionalities.

To learn more about the inner workings of Polars read the WIP book.

Functionality	Eager	Lazy (DataFrame)	Lazy (Series)
Filters	✔	✔	✔
Shifts	✔	✔	✔
Joins	✔	✔
GroupBys + aggregations	✔	✔
Comparisons	✔	✔	✔
Arithmetic	✔		✔
Sorting	✔	✔	✔
Reversing	✔	✔	✔
Closure application (User Defined Functions)	✔		✔
SIMD	✔		✔
Pivots	✔	✗
Melts	✔	✗
Filling nulls + fill strategies	✔	✗	✔
Aggregations	✔	✔	✔
Moving Window aggregates	✔	✗	✗
Find unique values	✔		✗
Rust iterators	✔		✔
IO (csv, json, parquet, Arrow IPC	✔	✗
Query optimization: (predicate pushdown)	✗	✔
Query optimization: (projection pushdown)	✗	✔
Query optimization: (type coercion)	✗	✔
Query optimization: (simplify expressions)	✗	✔
Query optimization: (aggregate pushdown)	✗	✔

Note that almost all eager operations supported by Eager on Series/ChunkedArrays can be used in Lazy via UDF's

Documentation

Want to know about all the features Polars support? Read the docs!

Rust

Python

installation guide: pip install py-polars
the book
Reference guide

Performance

Polars is written to be performant, and it is! But don't take my word for it, take a look at the results in h2oai's db-benchmark.

Cargo Features

Additional cargo features:

temporal (default)
- Conversions between Chrono and Polars for temporal data
simd (nightly)
- SIMD operations
parquet
- Read Apache Parquet format
json
- Json serialization
ipc
- Arrow's IPC format serialization
random
- Generate array's with randomly sampled values
ndarray
- Convert from DataFrame to ndarray
lazy
- Lazy api
strings
- String utilities for Utf8Chunked
object
- Support for generic ChunkedArray's called ObjectChunked<T> (generic over T). These will downcastable from Series through the Any trait.
parallel
- ChunkedArrays can be used by rayon::par_iter()
[plain_fmt | pretty_fmt] (mutually exclusive)
- one of them should be chosen to fmt DataFrames. pretty_fmt can deal with overflowing cells and looks nicer but has more dependencies. plain_fmt is plain formatting.

Contribution

Want to contribute? Read our contribution guideline.

Env vars

POLARS_PAR_SORT_BOUND -> Sets the lower bound of rows at which Polars will use a parallel sorting algorithm. Default is 1M rows.
POLARS_FMT_MAX_COLS -> maximum number of columns shown when formatting DataFrames.
POLARS_FMT_MAX_ROWS -> maximum number of rows shown when formatting DataFrames.
POLARS_TABLE_WIDTH -> width of the tables used during DataFrame formatting.
POLARS_MAX_THREADS -> maximum number of threads used in join algorithm. Default is unbounded.

Comments

Groupby on integer + date column of large dataframe requests enormous memory alloc [Windows only]

Are you using Python or Rust?

Python

Which feature gates did you use?

What version of polars are you using?

0.10.5

What operating system are you using polars on?

Windows10

Describe your bug.

Polarsa fails to execute groupby on modestly sized dataframe

What are the steps to reproduce the behavior?

I've used something like the below code to generate the data (it's pretty big) and then run the polars_load function

import pandas as pd
import polars as pl

def make_some_data():
  n_ids = int(1e3)
  n_features = 10
  freq = "1H"
  year = 2000
  start = f"{year}0101"
  end = f"{year + 1}0615" # one year and a bit
  date_index = pd.date_range(start=start, end=end, freq=freq, closed="left")
  dates= np.tile(date_index, n_ids)
  n_dates = len(date_index)
  
  ids = np.repeat(np.arange(n_ids, dtype=np.int32), n_dates)
  features= np.random.randn(n_ids * n_dates, n_features).astype(np.float32)
  df = pd.DataFrame(
      {
          "ids": ids,
          "dates": dates,
      }
  )
  df[[f"feature_{i}" for i in range(n_features)]] = features
  
  print(f"\t[year={year}] n_ids={n_ids}, n_dates={n_dates}, n_rows={n_ids * n_dates}, n_cols={n_features}", flush=True)
  df.to_parquet(data_path / f"features_{year}.parquet")

def pandas_load(data_dir: Path):
    import pandas as pd

    df = pd.read_parquet(f)
    agg_df = df.groupby(["ids", "dates"]).agg("mean")
    return agg_df


def polars_load(df):
    df = pl.read_parquet(f)
    agg_df = df.groupby(["ids", "dates"]).agg(pl.col("*").mean())

    return agg_df

If we cannot reproduce the bug, it is unlikely that we will be able fix it.

What is the actual behavior?

I get the following error

UserWarning: Conversion of (potentially) timezone aware to naive datetimes. TZ information may be 
lost
  "Conversion of (potentially) timezone aware to naive datetimes. TZ information may be lost",
memory allocation of memory allocation of memory allocation of 191160001911600019116000 bytes failed

The CPU usage goes to 100, memory stays modest, and then the screen goes dark and I regain control after a minute or two with an unkillable python.exe -System Error window

What is the expected behavior?

Running the pandas_load function exhibits the correct behaviour.

help wanted

opened by CHDev93 43

refactor[python]: Dispatch Series namespace methods to Expr using a decorator
Relates to #4422

Changes:

Added a decorator for dispatching Series methods to the Expr equivalent.

Created a module series.utils to house the decorator. Moved the get_ffi_func here as well.

Applied the decorator to all Series namespace methods. Only a handful of methods did not have a directly equivalent expression.

I like that it's now very explicit that these methods do not implement any fancy - they only dispatch to another implementation.

If you like this approach, I will try to apply this to the Series non-namespace methods next, and then see if I can do something similar for DataFrame/LazyFrame.
python
opened by stinodego 33

Reading nested struct panics with `OutOfSpec` error

What language are you using?

Rust

Which feature gates did you use?

"polars-io", "parquet", "lazy", "dtype-struct"

Have you tried latest version of polars?

[yes]

What version of polars are you using?

Latest, master branch.

What operating system are you using polars on?

macOS Monterey 12.3.1

What language version are you using

$ rustc --version
rustc 1.64.0-nightly (495b21669 2022-07-03)

$ cargo --version
cargo 1.64.0-nightly (dbff32b27 2022-06-24)

Describe your bug.

Reading nested struct panics with OutOfSpec error.

What are the steps to reproduce the behavior?

Given the attached parquet file with only 2 rows: nested_struct_OutOfSpec.snappy.parquet.zip

Running the following code:

let file_location = "nested_struct_OutOfSpec.snappy.parquet".to_string();
let df = LazyFrame::scan_parquet(
    file_location, 
    ScanArgsParquet::default())
    .unwrap()
    .select([all()])
    .collect()
    .unwrap();
dbg!(df);

Results in this panic error:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: OutOfSpec("The children 
DataTypes of a StructArray must equal the children data types.\n                         However, the 
values 1 has a length of 11, which is different from values 0, 2.")', 
/.../.cargo/git/checkouts/arrow2-945af624853845da/eeddfac/src/array/struct_/mod.rs:118:52

What is the actual behavior?

The result is a panic error with this output:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: OutOfSpec("The children 
DataTypes of a StructArray must equal the children data types.\n                         However, the 
values 1 has a length of 11, which is different from values 0, 2.")', 
/.../.cargo/git/checkouts/arrow2-945af624853845da/eeddfac/src/array/struct_/mod.rs:118:52
stack backtrace:
   0: rust_begin_unwind
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/panicking.rs:142:14
   2: core::result::unwrap_failed
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:1805:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:1098:23
   4: arrow2::array::struct_::StructArray::new
             at /.../.cargo/git/checkouts/arrow2-945af624853845da/eeddfac/src/array/struct_/mod.rs:118:9
   5: arrow2::array::struct_::StructArray::from_data
             at /.../.cargo/git/checkouts/arrow2-945af624853845da/eeddfac/src/array/struct_/mod.rs:127:9
   6: <arrow2::io::parquet::read::deserialize::struct_::StructIterator as core::iter::traits::iterator::Iterator>::next
             at /.../.cargo/git/checkouts/arrow2-945af624853845da/eeddfac/src/io/parquet/read/deserialize/struct_.rs:50:22
   7: <alloc::boxed::Box<I,A> as core::iter::traits::iterator::Iterator>::next
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/boxed.rs:1868:9
   8: <arrow2::io::parquet::read::deserialize::struct_::StructIterator as core::iter::traits::iterator::Iterator>::next::{{closure}}
             at /.../.cargo/git/checkouts/arrow2-945af624853845da/eeddfac/src/io/parquet/read/deserialize/struct_.rs:26:25
   9: core::iter::adapters::map::map_fold::{{closure}}
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/map.rs:84:28
  10: core::iter::traits::iterator::Iterator::fold
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/traits/iterator.rs:2414:21
  11: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/map.rs:124:9
  12: core::iter::traits::iterator::Iterator::for_each
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/traits/iterator.rs:831:9
  13: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/spec_extend.rs:40:17
  14: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/spec_from_iter_nested.rs:62:9
  15: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/spec_from_iter.rs:33:9
  16: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/mod.rs:2648:9
  17: core::iter::traits::iterator::Iterator::collect
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/traits/iterator.rs:1836:9
  18: <arrow2::io::parquet::read::deserialize::struct_::StructIterator as core::iter::traits::iterator::Iterator>::next
             at /.../.cargo/git/checkouts/arrow2-945af624853845da/eeddfac/src/io/parquet/read/deserialize/struct_.rs:23:22
  19: <alloc::boxed::Box<I,A> as core::iter::traits::iterator::Iterator>::next
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/boxed.rs:1868:9
  20: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/map.rs:103:9
  21: <alloc::boxed::Box<I,A> as core::iter::traits::iterator::Iterator>::next
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/boxed.rs:1868:9
  22: core::iter::traits::iterator::Iterator::try_fold
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/traits/iterator.rs:2237:29
  23: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:191:9
  24: core::iter::traits::iterator::Iterator::try_for_each
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:174:9
  25: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::next
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:174:9
  26: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/spec_from_iter_nested.rs:26:32
  27: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/spec_from_iter.rs:33:9
  28: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/mod.rs:2648:9
  29: core::iter::traits::iterator::Iterator::collect
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:2092:49
  30: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter::{{closure}}
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:2092:49
  31: core::iter::adapters::try_process
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:160:17
  32: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:2092:9
  33: core::iter::traits::iterator::Iterator::collect
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/traits/iterator.rs:1836:9
  34: polars_io::parquet::read_impl::array_iter_to_series
             at /.../github/polars/polars/polars-io/src/parquet/read_impl.rs:47:17
  35: polars_io::parquet::read_impl::column_idx_to_series
             at /.../github/polars/polars/polars-io/src/parquet/read_impl.rs:36:9
  36: polars_io::parquet::read_impl::rg_to_dfs::{{closure}}
             at /.../github/polars/polars/polars-io/src/parquet/read_impl.rs:126:21
  37: core::iter::adapters::map::map_try_fold::{{closure}}
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/map.rs:91:28
  38: core::iter::traits::iterator::Iterator::try_fold
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/traits/iterator.rs:2238:21
  39: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/map.rs:117:9
  40: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:191:9
  41: core::iter::traits::iterator::Iterator::try_for_each
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:174:9
  42: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::next
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:174:9
  43: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/spec_from_iter_nested.rs:26:32
  44: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/spec_from_iter.rs:33:9
  45: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/alloc/src/vec/mod.rs:2648:9
  46: core::iter::traits::iterator::Iterator::collect
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:2092:49
  47: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter::{{closure}}
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:2092:49
  48: core::iter::adapters::try_process
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/adapters/mod.rs:160:17
  49: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/result.rs:2092:9
  50: core::iter::traits::iterator::Iterator::collect
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/iter/traits/iterator.rs:1836:9
  51: polars_io::parquet::read_impl::rg_to_dfs
             at /.../github/polars/polars/polars-io/src/parquet/read_impl.rs:123:13
  52: polars_io::parquet::read_impl::read_parquet
             at /.../github/polars/polars/polars-io/src/parquet/read_impl.rs:249:63
  53: polars_io::parquet::read::ParquetReader<R>::_finish_with_scan_ops
             at /.../github/polars/polars/polars-io/src/parquet/read.rs:60:9
  54: polars_lazy::physical_plan::executors::scan::parquet::ParquetExec::read
             at /.../github/polars/polars/polars-lazy/src/physical_plan/executors/scan/parquet.rs:39:9
  55: <polars_lazy::physical_plan::executors::scan::parquet::ParquetExec as polars_lazy::physical_plan::Executor>::execute::{{closure}}
             at /.../github/polars/polars/polars-lazy/src/physical_plan/executors/scan/parquet.rs:61:68
  56: polars_lazy::physical_plan::file_cache::FileCache::read
             at /.../github/polars/polars/polars-lazy/src/physical_plan/file_cache.rs:40:13
  57: <polars_lazy::physical_plan::executors::scan::parquet::ParquetExec as polars_lazy::physical_plan::Executor>::execute
             at /.../github/polars/polars/polars-lazy/src/physical_plan/executors/scan/parquet.rs:59:9
  58: <polars_lazy::physical_plan::executors::udf::UdfExec as polars_lazy::physical_plan::Executor>::execute
             at /.../github/polars/polars/polars-lazy/src/physical_plan/executors/udf.rs:12:18
  59: polars_lazy::frame::LazyFrame::collect
             at /.../github/polars/polars/polars-lazy/src/frame/mod.rs:718:19
  60: gyrfalcon::main
             at ./src/main.rs:21:14
  61: core::ops::function::FnOnce::call_once
             at /rustc/7b46aa594c4bdc507fbd904b6777ca30c37a9209/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

What is the expected behavior?

The parquet file should have been correctly loaded.

The parquet-tools util shows it property. Also, Apache Spark properly reads it and processes it.

bug

opened by andrei-ionescu 32

Failed join #2

Are you using Python or Rust?

Python

What version of polars are you using?

0.12.18

What operating system are you using polars on?

CentOS Linux release 8.1.1911 (Core)

Describe your bug.

Rows aren't joining that should. I.e. on an inner join, some rows are being omitted from the joined data frame that should be there. On a left join, some of the right columns are filled with null values when they shouldn't be.

What are the steps to reproduce the behavior?

Output:

>>> ./reproduce_error.py
--right_df--
shape: (1, 4)
┌──────────┬───────────────────┬───────────────────┬────────────────┐
│ join_col ┆ new_col_from_join ┆ table_1_indicator ┆ other_join_len │
│ ---      ┆ ---               ┆ ---               ┆ ---            │
│ i64      ┆ f64               ┆ bool              ┆ i64            │
╞══════════╪═══════════════════╪═══════════════════╪════════════════╡
│ 59546483 ┆ 6.9900e-46        ┆ true              ┆ null           │
└──────────┴───────────────────┴───────────────────┴────────────────┘
--left_df--
shape: (1, 3)
┌────────────┬──────────┬────────────────┐
│ temp       ┆ join_col ┆ other_join_len │
│ ---        ┆ ---      ┆ ---            │
│ str        ┆ i64      ┆ i64            │
╞════════════╪══════════╪════════════════╡
│ 1_59546483 ┆ 59546483 ┆ null           │
└────────────┴──────────┴────────────────┘
--join--
shape: (1, 5)
┌────────────┬──────────┬────────────────┬───────────────────┬───────────────────┐
│ temp       ┆ join_col ┆ other_join_len ┆ new_col_from_join ┆ table_1_indicator │
│ ---        ┆ ---      ┆ ---            ┆ ---               ┆ ---               │
│ str        ┆ i64      ┆ i64            ┆ f64               ┆ bool              │
╞════════════╪══════════╪════════════════╪═══════════════════╪═══════════════════╡
│ 1_59546483 ┆ 59546483 ┆ null           ┆ null              ┆ null              │
└────────────┴──────────┴────────────────┴───────────────────┴───────────────────┘

Notice that the two columns being joined on ('join_col' and 'other_join_len') are the same in the right and left DFs, but the right columns don't show up in the joined DF.

Reproducing code

#!/usr/bin/env python3

import polars as pl

right_df_1 = pl.scan_csv(
    'repro_tsv_1.txt',
    sep='\t',
).select([
    'join_col',
    'new_col_from_join',
    pl.lit(True).alias('table_1_indicator'),
    pl.lit(None).cast(int).alias('other_join_len'),
])

right_df_2 = pl.scan_csv(
    'repro_tsv_2.txt',
    sep='\t',
).select([
    'join_col',
    'new_col_from_join',
    pl.col('other_join_col').str.lengths().cast(int).alias('other_join_len'),
]).groupby(['join_col', 'other_join_len']).agg([
    pl.col('new_col_from_join').min().alias('new_col_from_join'),
]).with_columns([
    pl.lit(False).alias('table_1_indicator')
]).select([ 'join_col', 'new_col_from_join', 'table_1_indicator', 'other_join_len'])

right_df = pl.concat([right_df_1, right_df_2])
print('--right_df--')
print(right_df.filter((pl.col('join_col') == 59546483) & pl.col('other_join_len').is_null()).collect())

rownames_fname = 'munged_rownames.txt'
with open(rownames_fname) as var_file:
    rownames = [line.strip() for line in var_file if line.strip()]

left_df = pl.DataFrame({
    'temp': rownames,
}).lazy().with_columns([
    pl.col('temp').str.extract('^[^_]*_([^_]*)', 1).cast(int).alias('join_col'),
    pl.col('temp').str.extract('^[^_]*_[^_]*_([^_]*)$', 1).str.lengths().cast(int).alias('other_join_len'),
])
print('--left_df--')
print(left_df.filter((pl.col('join_col') == 59546483) & pl.col('other_join_len').is_null()).collect())

total_df = left_df.join(
    right_df,
    how='left',
    on=['join_col', 'other_join_len']
).collect()
print('--join--')
print(total_df.filter((pl.col('join_col') == 59546483) & pl.col('other_join_len').is_null()))

repro_tsv_1.txt repro_tsv_2.txt munged_rownames.txt

bug

opened by LiterallyUniqueLogin 32

test(python): Parametric test coverage for EWM functions
Massive expansion of EWM test coverage, using pandas (with ignore_na=True) as a reference implementation for comparison purposes.

Parametric tests cover...

use of all three decay params: com, span, half_life.

floating point values between -1e8 and 1e8.

null values present / not present.

different min_period values.

chunked / unchunked series.

series of different lengths

int and float series.

python test
opened by alexander-beedie 31
Additional lints for the Python code base
As the project becomes more popular, we can expect more people to start contributing to the code base. Having a good linting setup will make sure our code quality remains consistently high, while aiding in the code review process. I outlined a number of tools/settings that I think will help. Suggestions are more than welcome.

flake8

[x] Set max-line-length = 88

We have a lot of unnecessary inconsistency regarding line lengths in the code. ~#4041 helps address this for the code itself.~ There are many strings/comments/docstrings that could easily fit within 88 characters, but are now going over this limit for no reason. These should be fixed. Exceptions like these can be ignored on a per-case basis using # noqa: E501.

flake8 plugins

flake8 has a rich plugin ecosystem with additional lints that can help keep your code clean. They can be enabled simply by adding them to our build requirements. Using these, flake8 becomes more like the programming buddy that cargo is for Rust. Below is a list that I recommend (loosely in order of importance):

All of the following find legitimate issues in the existing code base:

[x] flake8-docstrings - Enforce docstring uniformity.

[x] flake8-bugbear - Finds possible bugs.

[x] flake8-comprehensions - Helps optimize list/set/dict comprehensions.

[x] flake8-simplify - Helps simplify some code patterns.

~pep8-naming - Helps enforce PEP8 naming conventions.~ Not worth it, finds a single error that is questionable.

[x] flake8-tidy-imports - Helps keep imports clean

~flake8-type-checking - Enforces the use of if TYPE_CHECKING: blocks in order to minimize import overhead.~ Cannot use this right now due to requirement of Python 3.8 and up.

We will skip the flake8 lints below for now. They have minimal impact.

The following should be nice to enforce, but we are currently compliant:

~flake8-typing-imports - Makes sure your typing imports (i.e. Literal) are valid in your supported Python versions.~

~flake8-broken-line - Avoid backslash line breaks.~

The following I am not sure about, but might be useful:

~flake8-use-fstring - Force the use of f-strings for formatted strings. Should be nice, but seems to have some false positives for our code base.~

~flake8-annotations - Helps set certain requirements for type hints. Not sure how well this complements mypy.~

~flake8-eradicate - Helps identify and remove commented out code. The idea is nice, but we have some code commented out on purpose, so there might be false positives.~

mypy

I would like to set strict = True for mypy in order to improve reliability and quality of our type hints. This currently produces 1157 errors in 38 files. The strict flag is a combination of multiple strictness-related flags. I recommend we enable these one-by-one and fix the related errors.

[x] warn-unused-configs

[x] disallow-any-generics

[x] disallow-subclassing-any

[x] disallow-untyped-calls

[x] disallow-untyped-defs

[x] disallow-incomplete-defs

[x] check-untyped-defs

[x] disallow-untyped-decorators

[x] no-implicit-optional

[x] warn-redundant-casts

[x] warn-unused-ignores

[x] warn-return-any

[x] no-implicit-reexport

[x] strict-equality

[x] strict-concatenate

Other helpful CLI tools

These can be added as additional commands in the CI pipeline.

[x] pyupgrade - Makes sure to use that we're using the latest language features.

~pycln - Automatically detect and remove unused imports. Has functionality for identifying side effects (like the pyarrow imports).~ Not worth it for the small benefit it brings. flake8 will catch unused imports; fix them manually.

~yesqa - Functions like mypy's warn-unused-ignores. It makes sure all the # noqa comments are actually necessary.~ Not worth incorporating in the CI right now.

feature python
opened by stinodego 31

shift_and_fill by groups + other operations by grouping variables

I'm new to polars and would like to understand how to run certain methods by group variables

A few methods include: lags and rolling stats with varying window sizes: lag1, lag2, ..., lagN ma1, ma2, ..., maN

Using python

# Create data (data from apply section of docs)
data = pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
    }
)

# Define variables
ShiftColName = "B"
N = 1
ImputeValue = -1
GroupVariable = "fruits"

# shift_and_fill: no groups
data.hstack(data[[ShiftColName]].shift_and_fill(N, fill_value = ImputeValue).rename({ShiftColName : 'Lag_' + str(N) + '_' + ShiftColName}))

# shift_and_fill: by fruits?
?

opened by AdrianAntico 26

write_parquet function in polars-u64-idx does not support large data frames
What language are you using?

Python

What version of polars are you using?

0.13.21

What operating system are you using polars on?

Ubuntu 20.04.1 LTS

What language version are you using

Python 3.8.5

Describe your bug.

I'm using the 64 bit version of Polars. However, the write_parquet function does not seem to support large data frames.

What are the steps to reproduce the behavior?

df = pl.select(pl.repeat(0,n=2**32).alias('col_1')) df.write_parquet('tmp.parquet')

What is the actual behavior?

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.8/dist-packages/polars/internals/frame.py", line 1400, in write_parquet self._df.to_parquet(file, compression, statistics) exceptions.ArrowErrorException: ExternalFormat("underlying snap error: snappy: input buffer (size = 34359738368) is larger than allowed (size = 4294967295)")
bug
opened by jnthnhss 25

Add optional lexical ordering of dtype `Categorical`

Are you using Python or Rust?

Python

What version of polars are you using?

0.12.20

What operating system are you using polars on?

macOS Monterey

Describe your bug.

DataFrame.sort() on Categorical type behave incorrectly.

What are the steps to reproduce the behavior?

df = pl.DataFrame(
    [
        pl.Series("col1", [1, 1], dtype=pl.UInt8),
        pl.Series("col2", ["foo", "bar"], dtype=pl.Categorical),
        pl.Series("col3", [3.3, 1.1], dtype=pl.Float64),
    ]
)
df.sort(['col1', 'col2'])

What is the actual behavior?

shape: (2, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ u8   ┆ cat  ┆ f64  │
╞══════╪══════╪══════╡
│ 1    ┆ foo  ┆ 3.3  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1    ┆ bar  ┆ 1.1  │
└──────┴──────┴──────┘

What is the expected behavior?

shape: (2, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ u8   ┆ cat  ┆ f64  │
╞══════╪══════╪══════╡
│ 1    ┆ bar  ┆ 1.1  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1    ┆ foo  ┆ 3.3  │
└──────┴──────┴──────┘

I think polars should sort correctly on categorical type, like the following:

df.sort('col2')
shape: (2, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ u8   ┆ cat  ┆ f64  │
╞══════╪══════╪══════╡
│ 1    ┆ bar  ┆ 1.1  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1    ┆ foo  ┆ 3.3  │
└──────┴──────┴──────┘

feature

opened by mutecamel 25

fix(python): fix delta issues
Aims to fixes the following issues,

https://github.com/pola-rs/polars/issues/5790

https://github.com/pola-rs/polars/issues/5785

Checklist

[x] Fix imports issue

[x] Resolve paths before passing to delta for relative path support

[x] read_delta should rely on delta side implementation

[x] scan_delta should work as expected without relying on delta implementation.

[x] create internal test matrix of [local, s3, azure, gcs] X [read, scan] X [absolute, relative paths, absolute with external fs, relative with external fs]

[x] additional cases for relative paths

[x] perform tests in a separate env

[x] added example to read and scan from azure

[x] added example to read and scan from gcs

Notes

Till the time https://github.com/delta-io/delta-rs/issues/1015 is not resolved, I have added a simple delta specific path resolution in scan_delta. Later we can remove this in favor of the implementation provided in delta-rs itself.

read_delta now relies on the implementation provided in delta-rs itself.

For GCS, Azure and S3 there is no relative path support, full URI must be provided.

Testing

For LocalFS, GCS, Azure and S3, I've created an internal test matrix as described in the check list which contains over 35 additional tests other than the unit tests. I'll check how to mock different pyarrow.fs somehow and then add these to the unit tests later.

This will matrix was also executed in a separate venv, after locally installing polars with fixes from a local full build (maturin build) as the import errors were not caught in the unit tests before.

Signed-off-by: chitralverma [email protected]
python fix
opened by chitralverma 23
Optimize python imports
Problem Description

Our optional dependencies influence our import time. Pandas import is ~500ms!!

Import times:

only polars installed: 0.268s polars + pandas installed 0.755s

We should explore if we can do this lazily.

performance
opened by ritchie46 22
No example of SQLContext use in documentation
Polars version checks

[X] I have checked that this issue has not already been reported.

[X] I have confirmed this bug exists on the latest version of Polars.

Issue description

There is no example on how to use SQLContext. Luckily I found one in the PR which added it.

https://pola-rs.github.io/polars/py-polars/html/reference/sql.html

In particular, I think it would be useful to show how it can replicate pandas.DataFrame.query, when using polar expressions is difficult (for example JSON serialized filter conditions):

# pandas df.query("level > 2") # polars ctx = pl.SQLContext() ctx.register("df", df.lazy()) ctx.query("SELECT * FROM df WHERE level > 2")

Is SQLContext relatively supported or is it at risk of being removed in the future?

Reproducible example

NA

Expected behavior

NA

Installed versions

Replace this line with the output of pl.show_versions(), leave the backticks in place

bug python
opened by 2-5 0
Performance degradation caused by presence of optional numpy dependency
First of all, thanks for making polars available! It turned out to be a blazing fast alternative to pandas when doing pre-processing for real-time ML inference. True game changer!

We run an internal benchmark to flag possible performance regressions as part of testing new polars releases. Starting with 0.15.9, I noticed a minor performance regression that affects DataFrame initialization from data: dict[str, tuple[float | str | None, ...]]. After some digging, I believe this is caused by #5918 which dramatically increases calls to _NUMPY_TYPE().

I was surprised by the performance impact of _NUMPY_TYPE():

without numpy being installed, the impact is minimal. Great!

with numpy being merely installed, the impact grows by a factor of 27x - regardless of whether numpy is actually used. Not so great!

There is a trivial hack that speeds up DataFrame initialization and our benchmark by ~10%, well offsetting the initially discovered regression:

import polars.dependencies polars.dependencies._NUMPY_AVAILABLE = False

Would you consider supporting a corresponding fast path for users that are also working with polars and numpy, just not in combination?

A few potential approaches come to mind:

Refactor polars.dependencies._NUMPY_AVAILABLE and make it available as part of the API - for users to overwrite manually when required. Something like polars.NO_NUMPY = True. -> Still feels hacky

Make polars.dependencies._NUMPY_AVAILABLE user configurable, e.g. by supporting optional environment variables or config files that may be used to trigger an override of polars.dependencies._NUMPY_AVAILABLE. -> I would be happy to use an environment variable, such as PY_POLARS_NO_NUMPY=1

Understand and accept performance penalty of polars.dependencies._NUMPY_AVAILABLE and polars.dependencies._NUMPY_TYPE() and potentially reduce reliance on those throughout the code-base. -> high effort

BTW: The same issue applies to similar constants, such as polars.dependencies._PANDAS_AVAILABLE, which might affect other use cases, too.
opened by jakob-keller 0
feat(python): Improve iterating over `GroupBy`
Changes:

Removed the GroupBy._select and GroupBy._select_all deprecated private methods. These are unnecessary as we have GroupBy.agg which can offer the same functionality. This means that the GBSelection class can also be removed.

Removed the GroupBy._groups private method. This method is unnecessary as we can get the same functionality by using GroupBy.agg with the agg_groups expression. And the benefit of using agg is that it respects the maintain_order parameter.

Implemented the GroupBy.__next__ method to complement __iter__. This makes the GroupBy object an iterator, which means you can do things like call list(df.groupby('a')) to get a list of dataframes (one for each group).

This is not breaking because these methods were private and undocumented. The fact that it is now an iterator instead of just an iterable should not break anyone's workflows, I believe.

This also addresses the last bit of (currently) deprecated behaviour in the code base!
python enhancement
opened by stinodego 0
fix(python): Fix typing for `DataFrame.select`

Fixes #6026

Turns out Iterable is an acceptable type, rather than Sequence. There are probably a lot of other functions that can be fixed this way (anything that utilizes selection_to_pyexpr_list), but I'll leave it at this for now.
python fix

opened by stinodego 0

Releases(py-0.15.11)

py-0.15.11(Jan 3, 2023)
🚀 Performance improvements

ensure set_at_idx is O(1) (#5977)

✨ Enhancements

allow eq,ne,lt etc (#5995)

Improve Expr.is_between API (#5981)

large speedup for df.iterrows (~200-400%) (#5979)

updated default table format from "UTF8_FULL" to "UTF8_FULL_CONDENSED" (#5967)

Access rows as namedtuples (#5966)

Improve assert_frame_equal messages (#5962)

🐞 Bug fixes

make weekday tz-aware (#5989)

fix categorical in struct anyvalue issue (#5987)

fix invalid boolean simplification (#5976)

allow empty sort on any dtype (#5975)

properly deal with categoricals in streaming queries (#5974)

Thank you to all our contributors for making this release possible! @alexander-beedie, @ritchie46 and @stinodego
Source code(tar.gz)
Source code(zip)
py-0.15.9(Dec 31, 2022)
🚀 Performance improvements

improve reducing window function performance ~33% (#5878)

✨ Enhancements

str.strip with multiple chars (#5929)

add iterrrows (#5945)

read decimal as f64 (#5938)

improve query plan scan formatting (#5937)

allow all null cast (#5933)

allow objects in struct types (#5925)

handle Series init from python sequence of numpy arrays (#5918)

merge sorted dataframes (#5817)

impl hex and base64 for binary (#5892)

Add datatype hierarchy (#5901)

Add .item() on DataFrame and Series (#5893)

make get_any_value fallible (#5877)

Add string representation for data types (#5861)

directly push all operator result into sink, prev… (#5856)

🐞 Bug fixes

don't panic on ignored context (#5958)

don't allow named expression in arr.eval (#5957)

error on invalid dtype (#5956)

fix panic in join expressions (#5954)

block ordered predicates before explode (#5951)

adhere to schema in arr.eval of empty list (#5947)

fix from_dict schema_inference=0 (#5948)

fix arrow nested null conversion (#5946)

allow None in arr.slice length (#5934)

fix time to duration cast (#5932)

error on addition with datetime/time (#5931)

don't create categoricals in streaming (#5926)

object filter should keep single chunk (#5913)

csv, read escaped "" as missing (#5912)

fix pivot of signed integers (#5909)

don't allow duplicate columns in read_csv arg (#5908)

fix latest oob in streaming convertion (#5902)

adapt k to len in topk (#5888)

fix lazy swapping rename (#5884)

fix window function with nullable values; regression due… (#5874)

improve equality consistency between types (#5873)

evaluate whole branch expression to determine if r… (#5864)

fix top_k on empty (#5865)

fix slice in streaming (#5854)

Fix type hint for IO *_options arguments (#5852)

🛠️ Other improvements

Fix docs for sink_parquet (#5952)

Fix misspelling in LazyFrame docstring (#5917)

add bin, series.is_sorted and merge_sorted (#5914)

Thank you to all our contributors for making this release possible! @AnatolyBuga, @alexander-beedie, @cannero, @chitralverma, @dannyvankooten, @johngunerli, @ozgrakkurt, @ritchie46, @stinodego, @winding-lines and @zundertj
Source code(tar.gz)
Source code(zip)
rs-0.26.0(Dec 22, 2022)
⚠️ Breaking changes

remove Series::append_array (#5681)

iso weekday (#5598)

🚀 Performance improvements

improve reducing window function performance ~33% (#5878)

impove performance reducing window functions with numeric output ~-14% (#5841)

set_sorted flag when creating from literal (#5728)

use sorted fast path in streaming groupby (#5727)

ensure fast_explode propagates (#5676)

fix quadratic time complexity of groupby in stream… (#5614)

Aggregate projection pushdown (#5556)

improve streaming primitve groupby (#5575)

vectorize integer vec-hash by using very simple, … (#5572)

specialized utf8 groupby in streaming (#5535)

✨ Enhancements

make get_any_value fallible (#5877)

directly push all operator result into sink, prev… (#5856)

add sink_parquet (#5480)

Support parsing more float string representations. (#5824)

implement mean aggregation for duration (#5807)

implement sensible boolean aggregates (#5806)

allow expression as quantile input (#5751)

accept expression in str.extract_all (#5742)

tz-aware strptime (#5736)

Add "fmt_no_tty" feature for formatting support without r… (#5725)

lazy diagonal concat. (#5647)

to_struct add upper_bound (#5714)

inversely scale chunk_size with thread count in s… (#5699)

add streaming minmax (#5693)

improve dynamic inference of anyvalues and structs (#5690)

support is_in for boolean dtype (#5682)

add a cache to strptime (#5628)

add nearest interpolation strategy (#5626)

make cast recursive (#5596)

add arg_min/arg_max for series of dtype boolean (#5592)

prefer streaming groupby if partitionable (#5580)

make map_alias fallible (#5532)

pl.min & pl.max accept wildcard similar to pl.sum (#5511)

add predicate pushdown to anonymous_scan (#5467)

make streaming work with multiple sinks in a sing… (#5474)

add streaming slice operation (#5466)

run partial streaming queries (#5464)

streaming left joins (#5456)

file statistics so we only (try to) keep smallest table in memory (#5454)

streaming inner joins. (#5400)

build_info() provides detailed information how polars was built (#5423)

add missing width property to LazyFrame (#5431)

allow regex and wildcard in groupby (#5425)

Streaming joins architecture and Cross join implementation. (#5339)

add support for am/pm notation in parse_dates read_csv (#5373)

add reduce/cumreduce expression as an easier fold (#5364)

🐞 Bug fixes

fix lazy swapping rename (#5884)

improve equality consistency between types (#5873)

evaluate whole branch expression to determine if r… (#5864)

fix top_k on empty (#5865)

fix slice in streaming (#5854)

correct invalid type in struct anyvalue access (#5844)

don't set fast_explode if null values in list (#5838)

duration formatting (#5837)

respect fetch in union (#5836)

keep f32 dtype in fill_null by int (#5834)

err on epoch on time dtype (#5831)

fix panic in hmean (#5808)

asof join by logical groups (#5805)

fix parquet regression upstream in arrow2 (#5797)

Fix lazy cumsum and cumprod result types (#5792)

fix nested writer (#5777)

fix(rust, python) Summation on empty series evaluates to Some(0) (#5773)

empty concat utf8 (#5768)

projection pushdown with union and asof join (#5763)

check null values in asof_join + groupby (#5756)

fix generic streaming groupby on logical types (#5752)

fix date_range on expressions (#5750)

fix dtypes in join_asof_by (#5746)

fix group order in binary aggregation (#5744)

implement min/max aggregation for utf8 in groupby (#5737)

fix all_null/sorted into_groups panic (#5733)

asof join 'by', 'forward' combination (#5720)

fix pivot on floating point indexes (#5704)

fix arange with column/literal input (#5703)

fix double projection that leads to uneven union d… (#5700)

Fix a bug in floating regex handling used in CSV type inference (#5695)

fix asof join schema (#5686)

fix owned arithmetic schema (#5685)

take glob into account in scan_csv 'with_schema_mo… (#5683)

fix boolean schema in agg_max/min (#5678)

fix boolean arg-max if all equal (#5680)

early error on duplicate names in streaming groupby (#5638)

fix streaming groupby aggregate types (#5636)

convert panic to err in concat_list (#5637)

fix dot diagram of single nodes (#5624)

fix dynamic struct inference (#5619)

keep dtype when eval on empty list (#5597)

fix ternary with list output on empty frame (#5595)

fix tz-awareness of truncate (#5591)

check chunks before doing chunked_id join optimiza… (#5589)

invert cast_time_zone conversion (#5587)

asof join ensure join column is not dropped when '… (#5585)

fix ub due to invalid dtype on splitting dfs (#5579)

fix(rust, python); fix projection pushdown in asof joins (#5542)

streaming hstack allow duplicates (#5538)

fix streaming empty join panic (#5534)

fix duplicate caches in cse and prevent quadratic … (#5528)

allow appending categoricals that are all null (#5526)

tz-aware strftime (#5525)

make 'truncate' tz-aware (#5522)

fix coalesce expreession expansion (#5521)

fix nested aggregatin in when then and window expr… (#5520)

fix sort_by expression if groups already aggregated (#5518)

fix bug in batched parquet reader that dropped dfs… (#5506)

fix bugs in skew and kurtosis (#5484)

compute correct offset for streaming join on multi… (#5479)

return error on invalid sortby expression (#5478)

add missing AnyValueBuffer specialisation for Duration dtype (#5436)

fix freeze/stall when writing more than 2^31 string values to parquet (#5366)

properly handle json with unclosed strings (#5427)

fix null poisoning in rank operation (#5417)

correct expr::diff dtype for temporal columns (#5416)

fix cse for nested caches (#5412)

don't set sorted flag in argsort (#5410)

explicit nan comparison in min/max agg (#5403)

Correct CSV row indexing (#5385)

🛠️ Other improvements

Update rustc and fix clippy (#5880)

update arrow (#5862)

move join dispatch to polars-ops (#5809)

Remove dbg statement from union (#5791)

Continue removing compilation warnings (#5778)

shrink anyvalue size (#5770)

update arrow (#5766)

chore(rust,python) Change allow_streaming to streaming (#5747)

remove rev-map from ChunkedArray (#5721)

simplify fast projection by schema (#5716)

Reindent df! docs code (#5698)

remove Series::append_array (#5681)

Remove unused symbols and uneeded mut qualifier (#5672)

Include license files in Rust crates (#5675)

Use NaiveTime::from_hms_opt instead of NaiveTime::from_hms (#5664)

use xxhash3 for string types (#5617)

iso weekday (#5598)

Improve contributing guide (#5558)

streaming improvements (#5541)

Refer to DataFrame::unique instead of distinct (#5482)

don't panic if part of query cannot run strea… (#5458)

make generic join builder more dry (#5439)

use IdHash for streaming groupby generic (#5435)

fix freeze/stall when writing more than 2^31 string values to parquet (#5366)

Thank you to all our contributors for making this release possible! @AnatolyBuga, @CalOmnie, @Kuhlwein, @MarcoGorelli, @OneRaynyDay, @YuRiTan, @alexander-beedie, @andrewpollack, @ankane, @braaannigan, @chitralverma, @dannyvankooten, @ghais, @ghuls, @jjerphan, @matteosantama, @messense, @owrior, @pickfire, @ritchie46, @s1ck, @sa-, @slonik-az, @sorhawell, @stinodego, @universalmind303 and @zundertj
Source code(tar.gz)
Source code(zip)
py-0.15.7(Dec 19, 2022)
🚀 Performance improvements

impove performance reducing window functions with numeric output ~-14% (#5841)

✨ Enhancements

allow more pyarrow literals (#5842)

add sink_parquet (#5480)

release GIL when writing (#5830)

Support parsing more float string representations. (#5824)

implement mean aggregation for duration (#5807)

implement sensible boolean aggregates (#5806)

🐞 Bug fixes

correct invalid type in struct anyvalue access (#5844)

don't set fast_explode if null values in list (#5838)

duration formatting (#5837)

respect fetch in union (#5836)

keep f32 dtype in fill_null by int (#5834)

fix(python): fix delta issues (#5802)

err on epoch on time dtype (#5831)

fix panic in hmean (#5808)

asof join by logical groups (#5805)

🛠️ Other improvements

lazily import connectorx (#5835)

Thank you to all our contributors for making this release possible! @chitralverma, @ghuls and @ritchie46
Source code(tar.gz)
Source code(zip)
py-0.15.6(Dec 14, 2022)
🐞 Bug fixes

fix struct dataset (#5798)

fix parquet regression upstream in arrow2 (#5797)

🛠️ Other improvements

remove unused cmake-rs patch (#5794)

Thank you to all our contributors for making this release possible! @OneRaynyDay, @messense, @ritchie46 and @universalmind303
Source code(tar.gz)
Source code(zip)
py-0.15.3(Dec 12, 2022)
🚀 Performance improvements

set_sorted flag when creating from literal (#5728)

use sorted fast path in streaming groupby (#5727)

✨ Enhancements

push down predicates to pyarrow datasets (#5780)

Support for reading delta lake tables (#5761)

Add DataFrame.glimpse() (#5622)

allow expression as quantile input (#5751)

accept expression in str.extract_all (#5742)

tz-aware strptime (#5736)

lazy diagonal concat. (#5647)

to_struct add upper_bound (#5714)

🐞 Bug fixes

fix(rust, python) Summation on empty series evaluates to Some(0) (#5773)

empty concat utf8 (#5768)

projection pushdown with union and asof join (#5763)

check null values in asof_join + groupby (#5756)

fix generic streaming groupby on logical types (#5752)

fix date_range on expressions (#5750)

fix dtypes in join_asof_by (#5746)

fix group order in binary aggregation (#5744)

implement min/max aggregation for utf8 in groupby (#5737)

fix all_null/sorted into_groups panic (#5733)

address several edge-cases found when asserting NaN equality (#5732)

asof join 'by', 'forward' combination (#5720)

🛠️ Other improvements

add DataFrame.pearson_corr to reference (#5772)

Parse fixed timezone offsets without pytz (#5769)

chore(rust,python) Change allow_streaming to streaming (#5747)

Remove pyarrow nightlies requirement. (#5719)

fix incorrect accepted type in df.write_csv (#5715)

Thank you to all our contributors for making this release possible! @AnatolyBuga, @MarcoGorelli, @alexander-beedie, @andrewpollack, @braaannigan, @chitralverma, @ghuls, @ritchie46, @sa- and @zundertj
Source code(tar.gz)
Source code(zip)
py-0.15.2(Dec 2, 2022)
🚀 Performance improvements

ensure fast_explode propagates (#5676)

✨ Enhancements

Series.get_chunks (#5701)

inversely scale chunk_size with thread count in s… (#5699)

add streaming minmax (#5693)

Support large page sizes on aarch64 linux builds (#5694)

improve dynamic inference of anyvalues and structs (#5690)

support is_in for boolean dtype (#5682)

add notebook html repr for Series (#5653)

🐞 Bug fixes

fix pivot on floating point indexes (#5704)

fix arange with column/literal input (#5703)

fix double projection that leads to uneven union d… (#5700)

Fix Series -> Expr dispatch for @property methods (#5689)

fix asof join schema (#5686)

fix owned arithmetic schema (#5685)

take glob into account in scan_csv 'with_schema_mo… (#5683)

fix boolean schema in agg_max/min (#5678)

fix boolean arg-max if all equal (#5680)

respect python objects read method even if filename is f… (#5677)

Fix DataFrame.n_chunks return type (#5650)

🛠️ Other improvements

Parametrize test_parquet_datetime (#5696)

Function and lazy function doctrings (#5657)

Fix formatting (#5658)

Thank you to all our contributors for making this release possible! @alexander-beedie, @ankane, @braaannigan, @ghais, @ghuls, @jjerphan, @pickfire, @ritchie46, @stinodego and @zundertj
Source code(tar.gz)
Source code(zip)
py-0.15.1(Nov 26, 2022)
⚠️ Breaking changes

Update Expr.sample signature and change random seeding (#4648)

rollup breaking changes (#5602)

iso weekday (#5598)

Change null_equal default to True for Series.series_equal (#5051)

rollup breaking changes (#5602)

🚀 Performance improvements

fix quadratic time complexity of groupby in stream… (#5614)

Improve performance of indexing operations on Series. (#5610)

Aggregate projection pushdown (#5556)

✨ Enhancements

add a cache to strptime (#5628)

add nearest interpolation strategy (#5626)

Update Expr.sample signature and change random seeding (#4648)

Change null_equal default to True for Series.series_equal (#5051)

make cast recursive (#5596)

add arg_min/arg_max for series of dtype boolean (#5592)

🐞 Bug fixes

early error on duplicate names in streaming groupby (#5638)

fix streaming groupby aggregate types (#5636)

convert panic to err in concat_list (#5637)

fix dot diagram of single nodes (#5624)

fix dynamic struct inference (#5619)

tz-aware filtering (#5603)

keep dtype when eval on empty list (#5597)

fix ternary with list output on empty frame (#5595)

fix tz-awareness of truncate (#5591)

check chunks before doing chunked_id join optimiza… (#5589)

invert cast_time_zone conversion (#5587)

asof join ensure join column is not dropped when '… (#5585)

🛠️ Other improvements

Remaining docstring examples for frame and lazyframe (#5630)

use xxhash3 for string types (#5617)

only trigger build.rs file if that file itself has cha… (#5618)

iso weekday (#5598)

Merge release workflows (#5564)

Fix broken lint workflow (#5584)

Thank you to all our contributors for making this release possible! @Kuhlwein, @braaannigan, @ghuls, @matteosantama, @ritchie46 and @stinodego
Source code(tar.gz)
Source code(zip)
py-0.14.31(Nov 22, 2022)
🚀 Performance improvements

improve streaming primitve groupby (#5575)

vectorize integer vec-hash by using very simple, … (#5572)

✨ Enhancements

prefer streaming groupby if partitionable (#5580)

🐞 Bug fixes

fix ub due to invalid dtype on splitting dfs (#5579)

🛠️ Other improvements

Remove old Python changelog file (#5577)

namespace registration docs update (#5565)

Improve contributing guide (#5558)

Thank you to all our contributors for making this release possible! @alexander-beedie, @ghuls, @ritchie46 and @stinodego
Source code(tar.gz)
Source code(zip)
py-0.14.29(Nov 19, 2022)
🚀 Performance improvements

specialized utf8 groupby in streaming (#5535)

✨ Enhancements

add dataframe.pearson_corr (#5533)

support namespace registration (#5531)

make map_alias fallible (#5532)

pl.min & pl.max accept wildcard similar to pl.sum (#5511)

additional support for using timedelta with duration-type arguments (#5487)

🐞 Bug fixes

fix(rust, python); fix projection pushdown in asof joins (#5542)

streaming hstack allow duplicates (#5538)

fix streaming empty join panic (#5534)

fix duplicate caches in cse and prevent quadratic … (#5528)

allow appending categoricals that are all null (#5526)

tz-aware strftime (#5525)

make 'truncate' tz-aware (#5522)

fix coalesce expreession expansion (#5521)

fix nested aggregatin in when then and window expr… (#5520)

fix sort_by expression if groups already aggregated (#5518)

fix bug in batched parquet reader that dropped dfs… (#5506)

preserve Series name when exporting to pandas (#5498)

Refactor is_between (#5491)

fix bugs in skew and kurtosis (#5484)

🛠️ Other improvements

support tabbed panels in sphinx, add namespace docs (#5540)

Update dev dependencies (#5517)

Thank you to all our contributors for making this release possible! @alexander-beedie, @braaannigan, @ghuls, @ritchie46, @sorhawell and @zundertj
Source code(tar.gz)
Source code(zip)
py-0.14.27(Nov 11, 2022)
✨ Enhancements

additional autocomplete affordances for IPython users (#5477)

make streaming work with multiple sinks in a sing… (#5474)

add streaming slice operation (#5466)

run partial streaming queries (#5464)

streaming left joins (#5456)

file statistics so we only (try to) keep smallest table in memory (#5454)

streaming inner joins. (#5400)

🐞 Bug fixes

compute correct offset for streaming join on multi… (#5479)

return error on invalid sortby expression (#5478)

use json for expr pickle (#5476)

improved namespace/accessor behaviour (resolves VSCode autocomplete issue) (#5469)

further improved lazy loading (#5459)

fix for categorical inserts from row-oriented data (#5462)

use of fill_null with temporal literals (#5440)

🛠️ Other improvements

don't panic if part of query cannot run strea… (#5458)

add build_info() to the API doc (#5442)

Improved structure for DataFrame and LazyFrame API docs, misc design improvements (#5433)

Thank you to all our contributors for making this release possible! @alexander-beedie, @dannyvankooten, @ritchie46, @s1ck, @slonik-az, @stinodego and @universalmind303
Source code(tar.gz)
Source code(zip)
py-0.14.26(Nov 6, 2022)
✨ Enhancements

build_info() provides detailed information how polars was built (#5423)

add missing width property to LazyFrame (#5431)

enhanced Series.dot method and related interop (#5428)

allow regex and wildcard in groupby (#5425)

support DataFrame init from generators (#5424)

support Series init from generator (#5411)

🐞 Bug fixes

fix freeze/stall when writing more than 2^31 string values to parquet (#5366)

properly handle json with unclosed strings (#5427)

fix null poisoning in rank operation (#5417)

correct expr::diff dtype for temporal columns (#5416)

fix cse for nested caches (#5412)

don't set sorted flag in argsort (#5410)

🛠️ Other improvements

Fix dependencies on memory allocator (#5426)

Better docstring for keep_name (#5378) (#5421)

Thank you to all our contributors for making this release possible! @CalOmnie, @alexander-beedie, @ghuls, @ritchie46, @slonik-az, @stinodego and @universalmind303
Source code(tar.gz)
Source code(zip)
py-0.14.25(Nov 2, 2022)
✨ Enhancements

30x speedup initialising Series from python range object (#5397)

r-associative support for commutative DataFrame operators (#5394)

pl.from_epoch function (#5330)

Streaming joins architecture and Cross join implementation. (#5339)

enable frame init from sequence of pandas series, and improve lazy typechecks (handle subclasses) (#5383)

add support for am/pm notation in parse_dates read_csv (#5373)

add reduce/cumreduce expression as an easier fold (#5364)

🐞 Bug fixes

explicit nan comparison in min/max agg (#5403)

lazy proxy module does not require global registration (#5390)

Correct CSV row indexing (#5385)

🛠️ Other improvements

Docstrings for frame, lazyframe and time series (#5398)

add integrated support for copying API examples, and auto-parallelise docs build (#5393)

improve rendering of API docs type signatures, mark PivotOps as deprecated, misc tidy-ups (#5388)

Expression docstrings (#5377)

minor navbar improvements; adds discord and twitter links, fixes github icon (#5379)

improve structure of sphinx-generated API docs (#5376)

Add with_time_zone to reference guide (#5369)

Thank you to all our contributors for making this release possible! @YuRiTan, @alexander-beedie, @braaannigan, @owrior, @ritchie46 and @zundertj
Source code(tar.gz)
Source code(zip)
rs-0.25.0(Oct 28, 2022)
Most notable mention this release is the start of Out Of Core support in polars, meaning we are able to process larger than RAM datasets. This is currently supported for parts of queries that read from csv or parquet and are limited to select, filter, and groupby operations. Many more operations will follow in next releases.

See https://github.com/pola-rs/polars/pull/5139#issuecomment-1274687634 where we were able to process a 80GB dataset on a laptop with only 16GB RAM.

Thanks to everyone who contributed to another release! :raised_hands:

⚠️ Breaking changes

rename expand_at_index -> new_from_index (#5259)

🚀 Performance improvements

lower contention in out of core filter (#5311)

improve pivot performance by using faster series… (#5172)

improve streaming performance (~15%) (#5170)

don't block projection pushdown on unnest (#5123)

more conservative JIT sort settings (#5080)

sort and unsort join key if other side is sorted (#5069)

do not rechunk left joins (#5066)

Prune unneeded projections (#5032)

Improve predicate pushdown + with_columns (#5029)

Don't execute unused with_column expressions (#5026)

✨ Enhancements

shrink_type expression (#5351)

tz_localize expression (#5340)

accept expr in arr.get (#5337)

Implement forward strategy in groupby join_asof (#5335)

improve dynamic inference of struct types (#5297)

Add newline to Aggregate..FROM describe_optimization_plan (#5253)

date_range expression (#5267)

show expression where error originated if raised … (#5263)

improve error msg if window expressions length do… (#5262)

Add round for date and datetime (#5153)

new n_chars functionality for utf8 strings (#5252)

added new Config formatting option set_tbl_column_data_type_inline, fixed reading of env vars, improved interaction between formatting options (#5243)

make date_range timezone aware (#5234)

Rust functions for typed JsonPath implementation (#5140)

allow polars Config options to be serialised/shared, and more easily unset (#5219)

batched csv reader (#5212)

accept expressions in arr.slice (#5191)

is_sorted aggregation fast path for Utf8Chunked (#5184)

hybrid streaming query engine (#5139)

add binary dtype (#5122)

improve function expansion (#5110)

add struct arithmetics (#5107)

add cumfold/cumsum expression (#5103)

error on invalid asof join inputs (#5100)

small plan and profile chart improvements (#5067)

Initial implementation of histogram algorithm (#4752)

🐞 Bug fixes

unnest only pushdown column if there are projections (#5360)

block is_null predicate in asof join (#5358)

ensure that no-projection is seen as select all in… (#5356)

resolve duplicated column names in pivot (#5349)

fix serde of expression (pickle) (#5333)

don't set auto-explode in apply_multiple (#5265)

export anonymousscan in lazy prelude (#5295)

fix explicit list + sort aggregation in groupby co… (#5317)

fix sort-merge dispatch of utf8 (#5315)

properly interpret FMT_MAX_ROWS - remove arbitrary minimum, fix Series formatting (#5281)

don't block non matching groups in binary expression (#5273)

fix logical type of nested take (#5271)

tag IntoSeries trait as unsafe (#5258)

include single null value in global cat builder (#5254)

include slice in sort fast path (#5247)

determine supertype of datetimes with timezones an… (#5240)

fix groupby dynamic truncate for > days resolution (#5235)

set timezone on groupby_dynamic boundaries (#5233)

fix incorrect duration dtype (#5226)

set string cache if lazy schema contains categorical (#5225)

fix pipeline dtypes (#5224)

fix asof_join schema (#5213)

fix single thread loop if schema lenght is off by 1 (#5210)

improve numeric stability of rolling_variance (#5207)

fix overflow in partitioned groupby mean of int32/… (#5204)

don't allow categorical append that is not under s… (#5195)

include offset in arr.get (#5193)

fix rolling_float in case closure returns None (#5180)

Implement missing extract conversion for Time datatype (#5161)

implement missing conversion to python time object (#5152)

microsecond noise on date >> time cast (add 00:00:00 fast-path) (#5149)

wrong operator mapped for LtEq (#5120)

unique include null (#5112)

don't recurse assign uniuns as it SO > 5k files (#5098)

block projection pushdown on unnest (#5093)

projection_node always do projection locally if no… (#5090)

fix iso_year for Date dtype (#5074)

fix bug in unneeded projection pruning (#5071)

Improve printing controls of DataFrame and Series (#5047)

Double projections should be checked on input schema (#5058)

Apply flat overlapping row groups when possible (#5039)

Ensure all predicates use same key function when inserting… (#5034)

Only consider dt series equal if they have the same tz (#5025)

Special-case ewm_mean(alpha=1) (#5019)

Time zone conversion bug (NY -> UTC works, UTC -> NY doesn't) (#5014)

Fix timezone cast (#5016)

🛠️ Other improvements

update to rustc to nightly-2022-10-24 (#5312)

update ahash and add nightly features of hashbrown (#5310)

Update comfy-table and memchr. (#5276)

rename expand_at_index -> new_from_index (#5259)

ensure streaming groupby take slice into account (#5178)

move polars-sql under polars folder (#5176)

remove aggregate pushdown optimization (#5173)

relax sync requirement on Executor trait impls (#5142)

Get rid of unnecessary check in SplitLines iterator (#5141)

Constant instead of literal (#5088)

Use release-drafter to draft releases with changelogs (#5033)

Fix docs by activating docfg feature (#5028)

Split up polars-lazy crate. (#5020)

Thank you to all our contributors for making this release possible! @AlecZorab, @YuRiTan, @alexander-beedie, @cjermain, @dannyvankooten, @dpatton-gr, @egorchakov, @ghuls, @hpux735, @matteosantama, @mcrumiller, @owrior, @ritchie46, @slonik-az, @sorhawell, @stinodego, @thatlittleboy, @universalmind303 and @zundertj
Source code(tar.gz)
Source code(zip)
py-0.14.24(Oct 28, 2022)
✨ Enhancements

shrink_type expression (#5351)

don't raise error but print a warning if mp fork method… (#5342)

tz_localize expression (#5340)

accept expr in arr.get (#5337)

Implement forward strategy in groupby join_asof (#5335)

🐞 Bug fixes

unnest only pushdown column if there are projections (#5360)

block is_null predicate in asof join (#5358)

ensure that no-projection is seen as select all in… (#5356)

resolve duplicated column names in pivot (#5349)

remove unused branch in getitem (#5348)

nested dicts / list generation (#5336)

fix serde of expression (pickle) (#5333)

handle old-style module loaders such that we can still lazy load them (#5331)

explicit output type in apply (#5328)

🛠️ Other improvements

remove multiprocessing check, and leave it to the user (#5347)

Update dev, lint and docs dependencies (#5338)

lazy module proxy (obviate attribute access guards for missing modules) (#5320)

Thank you to all our contributors for making this release possible! @AlecZorab, @alexander-beedie, @ghuls and @ritchie46
Source code(tar.gz)
Source code(zip)
py-0.14.23(Oct 25, 2022)
🐞 Bug fixes

fix explicit list + sort aggregation in groupby co… (#5317)

fix sort-merge dispatch of utf8 (#5315)

close multi-threading pool in df creation (#5309)

fix and check all uninstalled imports in ci (#5304)

🛠️ Other improvements

Add "import polars.testing" to testing docstrings (#5316) (#5318)

streamline lazy imports (#5302)

Catch deprecation warnings in unit tests (#5306)

fix and check all uninstalled imports in ci (#5304)

Thank you to all our contributors for making this release possible! @alexander-beedie, @ghuls, @ritchie46, @thatlittleboy, @universalmind303 and @zundertj
Source code(tar.gz)
Source code(zip)
py-0.14.22(Oct 22, 2022)
🚀 Performance improvements

Make all expensive imports lazy - ~85% (#5287)

remove pandas imports (#5286)

never import hypothesis in user code (#5282)

✨ Enhancements

expose to_struct to series list namespace (#5298)

improve dynamic inference of struct types (#5297)

don't panic in failing apply (#5294)

improve error message in struct apply (#5291)

accept schema in read_dicts (#5290)

Do not import polars.testing by default (#5284)

Pass more options to pyarrow in write_parquet (#5278) (#5280)

date_range expression (#5267)

allow implicit None branch in when then otherwise (#5264)

show expression where error originated if raised … (#5263)

improve error msg if window expressions length do… (#5262)

pl.ones, pl.zeros and Series.new_from_index functions (#5260)

Add round for date and datetime (#5153)

new n_chars functionality for utf8 strings (#5252)

added new Config formatting option set_tbl_column_data_type_inline, fixed reading of env vars, improved interaction between formatting options (#5243)

🐞 Bug fixes

throw error on invalid lazy concat strategy (#5292)

fix to_pandas edge case (#5293)

properly interpret FMT_MAX_ROWS - remove arbitrary minimum, fix Series formatting (#5281)

respect schema overwrite in from rows (#5275)

don't block non matching groups in binary expression (#5273)

fix logical type of nested take (#5271)

Check if BatchedCsvReader.next_batches() is None befor… (#5256)

include single null value in global cat builder (#5254)

Check multiprocessing start_method on import (#3144) (#5237)

🛠️ Other improvements

Add ModuleType for import functions in import_check.py (#5289)

Thank you to all our contributors for making this release possible! @alexander-beedie, @ghuls, @owrior and @ritchie46
Source code(tar.gz)
Source code(zip)
py-0.14.21(Oct 18, 2022)
🐞 Bug fixes

include slice in sort fast path (#5247)

don't use zoneinfo globally (#5246)

Thank you to all our contributors for making this release possible! @ritchie46
Source code(tar.gz)
Source code(zip)
py-0.14.20(Oct 18, 2022)
✨ Enhancements

make date_range timezone aware (#5234)

infer timezone and improve display (#5232)

allow Config to be used as a context manager, and update some docs (#5223)

allow polars Config options to be serialised/shared, and more easily unset (#5219)

🐞 Bug fixes

determine supertype of datetimes with timezones an… (#5240)

fix groupby dynamic truncate for > days resolution (#5235)

ensure that polars_type_to_constructor works with tz-aware Datetime dtypes (#5239)

set timezone on groupby_dynamic boundaries (#5233)

accept tuple[bool, bool] instead of Sequence[bool] for Expr.is_between (#5094)

fix incorrect duration dtype (#5226)

set string cache if lazy schema contains categorical (#5225)

fix pipeline dtypes (#5224)

🛠️ Other improvements

update lazyframe lazygroupby apply docstring (#5238)

Consistent naming for Python release workflow (#5229)

Thank you to all our contributors for making this release possible! @YuRiTan, @alexander-beedie, @cjermain, @matteosantama, @ritchie46 and @stinodego
Source code(tar.gz)
Source code(zip)
py-0.14.19(Oct 15, 2022)
🚀 Performance improvements

improve pivot performance by using faster series… (#5172)

improve streaming performance (~15%) (#5170)

don't block projection pushdown on unnest (#5123)

✨ Enhancements

batched csv reader (#5212)

accept expressions in arr.slice (#5191)

is_sorted aggregation fast path for Utf8Chunked (#5184)

support DataFrame init with Datetime dtypes that specify a timezone (#5174)

frame-level n_unique() that can count unique rows or col/expr subsets (#5165)

hybrid streaming query engine (#5139)

return Datetime/Duration with appropriate timeunit when inferring from pytype (#5127)

add binary dtype (#5122)

🐞 Bug fixes

fix asof_join schema (#5213)

fix single thread loop if schema lenght is off by 1 (#5210)

improve numeric stability of rolling_variance (#5207)

fix apply function over object dtype (#5206)

fix overflow in partitioned groupby mean of int32/… (#5204)

don't allow categorical append that is not under s… (#5195)

include offset in arr.get (#5193)

DataFrame.fill_null include unsigned integers (#5192)

error on fill_nan on non float dtype (#5185)

infer missing columns in from_dicts (#5183)

fix rolling_float in case closure returns None (#5180)

Implement missing extract conversion for Time datatype (#5161)

implement missing conversion to python time object (#5152)

Rendering long docstring lines. (#5150)

add missing _NUMPY_AVAILABLE check in Series.__getitem__ (#5126)

wrong operator mapped for LtEq (#5120)

🛠️ Other improvements

skip failing test until #5177 is resolved (#5205)

ensure streaming groupby take slice into account (#5178)

remove aggregate pushdown optimization (#5173)

Add support for ruff python linter. (#5151)

improve typing; many list types are better defined as Sequence (#5164)

Get rid of unnecessary check in SplitLines iterator (#5141)

Thank you to all our contributors for making this release possible! @alexander-beedie, @dannyvankooten, @ghuls, @ritchie46 and @sorhawell
Source code(tar.gz)
Source code(zip)
py-0.14.18(Oct 5, 2022)
🚀 Performance improvements

take advantage of sorted join for frame alignment (#5106)

✨ Enhancements

improve function expansion (#5110)

add struct arithmetics (#5107)

add cumfold/cumsum expression (#5103)

error on invalid asof join inputs (#5100)

🐞 Bug fixes

unique include null (#5112)

don't recurse assign uniuns as it SO > 5k files (#5098)

block projection pushdown on unnest (#5093)

projection_node always do projection locally if no… (#5090)

🛠️ Other improvements

deprecate name argument in drop (#5099)

improve py-polars/Makefile (#5089)

Thank you to all our contributors for making this release possible! @alexander-beedie, @owrior, @ritchie46 and @slonik-az
Source code(tar.gz)
Source code(zip)
py-0.14.17(Oct 3, 2022)
🚀 Performance improvements

more conservative JIT sort settings (#5080)

Thank you to all our contributors for making this release possible! @mcrumiller, @ritchie46 and @zundertj
Source code(tar.gz)
Source code(zip)
py-0.14.16(Oct 3, 2022)
🚀 Performance improvements

sort and unsort join key if other side is sorted (#5069)

do not rechunk left joins (#5066)

✨ Enhancements

deprecate boolean mask for Series indexing (#5075)

small plan and profile chart improvements (#5067)

add gantt chart plot to LazyFrame::profile (#5063)

Support Series init as struct from @dataclass and annotated NamedTuple (#5057)

🐞 Bug fixes

fix iso_year for Date dtype (#5074)

tz-aware get_idx (#5072)

Fix empty method detection when PYTHONOPTIMIZE=2 (#5043)

fix bug in unneeded projection pruning (#5071)

remove overloads for from_arrow (#5065)

Improve printing controls of DataFrame and Series (#5047)

Double projections should be checked on input schema (#5058)

Add missing cse param to LazyFrame "profile" method (#5054)

🛠️ Other improvements

Default to zstd parquet compression (#5060)

Refactor show_graph (#5059)

Use release-drafter to draft releases with changelogs (#5033)

Update Makefile (#5056)

Parametric test coverage for EWM functions (#5011)

Thank you to all our contributors for making this release possible! @alexander-beedie, @egorchakov, @matteosantama, @ritchie46, @slonik-az, @stinodego and @zundertj
Source code(tar.gz)
Source code(zip)
py-polars-v0.14.15(Oct 1, 2022)

Source code(tar.gz)
Source code(zip)
rs-0.24.3(Oct 1, 2022)

Source code(tar.gz)
Source code(zip)
rust-polars-v0.24.0(Sep 18, 2022)
New rust polars release! :rocket:

This is the release of rust polars 0.24.0. This release comes with a lot of bug fixes, performance improvements and added functionality. The changes that stand out are larger than RAM memory mapping of IPC files and a new common-subplan-optimization that prunes duplicated sub-plan from the query plan and thereby potentially save a lot of duplicated work.

See more

Full changelog

crates.io

documentation

Update to arrow2 0.14.0

See the 0.14.0 release for all upstream improvements.

New Contributors

@ydarma made their first contribution in https://github.com/pola-rs/polars/pull/4269

@gaoxinge made their first contribution in https://github.com/pola-rs/polars/pull/4300

@SimonSchneider made their first contribution in https://github.com/pola-rs/polars/pull/4436

@lorenzwalthert made their first contribution in https://github.com/pola-rs/polars/pull/4445

@neeldug made their first contribution in https://github.com/pola-rs/polars/pull/4384

@isaacthefallenapple made their first contribution in https://github.com/pola-rs/polars/pull/4522

@Chuxiaof made their first contribution in https://github.com/pola-rs/polars/pull/4524

@luk-f-a made their first contribution in https://github.com/pola-rs/polars/pull/4565

@OneRaynyDay made their first contribution in https://github.com/pola-rs/polars/pull/4621

@abalkin made their first contribution in https://github.com/pola-rs/polars/pull/4650

@tikkanz made their first contribution in https://github.com/pola-rs/polars/pull/4676

@hpux735 made their first contribution in https://github.com/pola-rs/polars/pull/4693

@huang12zheng made their first contribution in https://github.com/pola-rs/polars/pull/4823

@owrior made their first contribution in https://github.com/pola-rs/polars/pull/4840

@jly36963 made their first contribution in https://github.com/pola-rs/polars/pull/4886

Full Changelog: https://github.com/pola-rs/polars/compare/rust-polars-v0.23.0...rust-polars-v0.24.0
Source code(tar.gz)
Source code(zip)
rust-polars-v0.23.0(Aug 4, 2022)
What's Changed

respect ipc column ordering by @ritchie46 in https://github.com/pola-rs/polars/pull/3591

zfill expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3593

Patch release by @ritchie46 in https://github.com/pola-rs/polars/pull/3595

Fix TOML typos by @ryanrussell in https://github.com/pola-rs/polars/pull/3598

Anonymous scan lazyframe by @universalmind303 in https://github.com/pola-rs/polars/pull/3561

ljust and rjust expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/3603

cast string to categorical in 'is_in' by @ritchie46 in https://github.com/pola-rs/polars/pull/3606

python data type units by @ritchie46 in https://github.com/pola-rs/polars/pull/3609

unset sorted metadata on append by @ritchie46 in https://github.com/pola-rs/polars/pull/3610

feat(nodejs): scan json by @universalmind303 in https://github.com/pola-rs/polars/pull/3611

Expand regex function input by @ritchie46 in https://github.com/pola-rs/polars/pull/3613

node 0.5.3 release by @universalmind303 in https://github.com/pola-rs/polars/pull/3612

improve when then otherwise for lists by @ritchie46 in https://github.com/pola-rs/polars/pull/3614

python polars 0.13.44 by @ritchie46 in https://github.com/pola-rs/polars/pull/3615

Fix mode for multiple modes by @GregoryBL in https://github.com/pola-rs/polars/pull/3566

fix empty list edge case by @ritchie46 in https://github.com/pola-rs/polars/pull/3621

fix invalid concat dtype by @ritchie46 in https://github.com/pola-rs/polars/pull/3622

respect n_rows by @ritchie46 in https://github.com/pola-rs/polars/pull/3624

Python: scan_ipc/parquet can scan from fsspec sources e.g. s3. by @ritchie46 in https://github.com/pola-rs/polars/pull/3626

Fix Series init (as pl.Object dtype) from mixed-type input and extend test coverage by @alexander-beedie in https://github.com/pola-rs/polars/pull/3627

restrict parallel branches in lazy Union by @ritchie46 in https://github.com/pola-rs/polars/pull/3628

native exp expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3629

python dict parallel dataframe creation by @ritchie46 in https://github.com/pola-rs/polars/pull/3630

Enhanced column typedef/inference support for DataFrame init by @alexander-beedie in https://github.com/pola-rs/polars/pull/3633

fix row count file projection pushdown by @ritchie46 in https://github.com/pola-rs/polars/pull/3635

fix list concat by @ritchie46 in https://github.com/pola-rs/polars/pull/3636

rust publish makefile by @ritchie46 in https://github.com/pola-rs/polars/pull/3637

improve explode of empty lists by @ritchie46 in https://github.com/pola-rs/polars/pull/3638

Improve numpy ufunc support. fixes: #3228 by @ghuls in https://github.com/pola-rs/polars/pull/3583

Update various python build requirements. by @ghuls in https://github.com/pola-rs/polars/pull/3641

is_in for struct dtype by @ritchie46 in https://github.com/pola-rs/polars/pull/3639

Update black and change some code so is sees it as a call chain. by @ghuls in https://github.com/pola-rs/polars/pull/3645

concat list determine supertype by @ritchie46 in https://github.com/pola-rs/polars/pull/3649

update arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/3650

Parallel csv writer by @ritchie46 in https://github.com/pola-rs/polars/pull/3652

fix groups state in complex aggregation by @ritchie46 in https://github.com/pola-rs/polars/pull/3656

Rust Comment Readability Fixes by @ryanrussell in https://github.com/pola-rs/polars/pull/3662

Add Expr.reverse() Python API example by @cnpryer in https://github.com/pola-rs/polars/pull/3660

Added StringCache Python API example by @cnpryer in https://github.com/pola-rs/polars/pull/3659

improve dtype selection by @ritchie46 in https://github.com/pola-rs/polars/pull/3664

accept regex in filter by @ritchie46 in https://github.com/pola-rs/polars/pull/3666

python: improve html render by @ritchie46 in https://github.com/pola-rs/polars/pull/3667

Python: infer_schema_len arg to from_dicts by @ritchie46 in https://github.com/pola-rs/polars/pull/3669

Add LICENSE link to py-polars by @gyscos in https://github.com/pola-rs/polars/pull/3674

python: fix and test globbing by @ritchie46 in https://github.com/pola-rs/polars/pull/3675

python polars 0.13.45 by @ritchie46 in https://github.com/pola-rs/polars/pull/3676

Add useful example for pl.StringCache(). by @ghuls in https://github.com/pola-rs/polars/pull/3677

Fix StringCache docstring typo by @cnpryer in https://github.com/pola-rs/polars/pull/3678

Fix polars.Expr.apply() Python API docs text by @cnpryer in https://github.com/pola-rs/polars/pull/3661

Anonymous scan enhancements & cleanup by @universalmind303 in https://github.com/pola-rs/polars/pull/3657

add pyarrow install to quickstart setup by @ritchie46 in https://github.com/pola-rs/polars/pull/3682

fix oob in sorted groupby by @ritchie46 in https://github.com/pola-rs/polars/pull/3681

fix branch supertypes by @ritchie46 in https://github.com/pola-rs/polars/pull/3683

fix cargo.toml for docs.rs by @ritchie46 in https://github.com/pola-rs/polars/pull/3684

python polars 0.13.46 by @ritchie46 in https://github.com/pola-rs/polars/pull/3686

ndjson reader complex types support by @universalmind303 in https://github.com/pola-rs/polars/pull/3665

fix groupby aggregation on empty df by @ritchie46 in https://github.com/pola-rs/polars/pull/3688

Nodejs groupbyrolling by @universalmind303 in https://github.com/pola-rs/polars/pull/3670

Add pl.Expr.hash Python example by @cnpryer in https://github.com/pola-rs/polars/pull/3679

Adding 'line-height' at 95% to df _html.py print by @LVG77 in https://github.com/pola-rs/polars/pull/3691

unique counts for logical types by @ritchie46 in https://github.com/pola-rs/polars/pull/3694

Update arrow and prepare for mutable arithmetics by @ritchie46 in https://github.com/pola-rs/polars/pull/3695

Improve lit agg by @ritchie46 in https://github.com/pola-rs/polars/pull/3702

panic on invalid groupby rolling input by @ritchie46 in https://github.com/pola-rs/polars/pull/3703

docs: Readability improvements in py-polars by @ryanrussell in https://github.com/pola-rs/polars/pull/3700

docs: polars-lazy readability improvements by @ryanrussell in https://github.com/pola-rs/polars/pull/3701

Python: parallel concat df by @gunjunlee in https://github.com/pola-rs/polars/pull/3671

fix ipc column order by @ritchie46 in https://github.com/pola-rs/polars/pull/3706

nodejs release by @universalmind303 in https://github.com/pola-rs/polars/pull/3698

add coc by @ritchie46 in https://github.com/pola-rs/polars/pull/3712

inplace arithmetic by @ritchie46 in https://github.com/pola-rs/polars/pull/3709

format empty df by @ritchie46 in https://github.com/pola-rs/polars/pull/3719

Add typing overloads for DataFrame.hstack() by @adamgreg in https://github.com/pola-rs/polars/pull/3697

Add Series to DataFrame.with_columns() argument annotation by @adamgreg in https://github.com/pola-rs/polars/pull/3696

fix rolling groupby ordering with 'by' argument by @ritchie46 in https://github.com/pola-rs/polars/pull/3720

allow literal as aggregation by @ritchie46 in https://github.com/pola-rs/polars/pull/3722

Improve performance of categorical casting by @ritchie46 in https://github.com/pola-rs/polars/pull/3724

Add flag to allow str.contains to search for string literals (#3711) by @alexander-beedie in https://github.com/pola-rs/polars/pull/3718

fix join negative keys by @ritchie46 in https://github.com/pola-rs/polars/pull/3730

fix arr.get() offsets by @ritchie46 in https://github.com/pola-rs/polars/pull/3731

update arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/3732

fix from_pandas object null array by @ritchie46 in https://github.com/pola-rs/polars/pull/3733

python polars 0.13.47 by @ritchie46 in https://github.com/pola-rs/polars/pull/3734

Replace OOB slice indexing with spare_capacity_mut by @saethlin in https://github.com/pola-rs/polars/pull/3737

pow fast paths by @ritchie46 in https://github.com/pola-rs/polars/pull/3738

Simplify contains check that opts-in to contains_literal fast-path by @alexander-beedie in https://github.com/pola-rs/polars/pull/3736

fix aritmetic bug introduced in #3709 by @ritchie46 in https://github.com/pola-rs/polars/pull/3741

check nan in sort by single column by @ritchie46 in https://github.com/pola-rs/polars/pull/3742

python fix concat by @ritchie46 in https://github.com/pola-rs/polars/pull/3743

patch python polars 0.13.48 by @ritchie46 in https://github.com/pola-rs/polars/pull/3744

ternary literal predicates by @ritchie46 in https://github.com/pola-rs/polars/pull/3747

python polars 0.13.49 by @ritchie46 in https://github.com/pola-rs/polars/pull/3748

unset sorted on take by @ritchie46 in https://github.com/pola-rs/polars/pull/3756

reexport polars for extension libraries by @universalmind303 in https://github.com/pola-rs/polars/pull/3760

add global pl by @universalmind303 in https://github.com/pola-rs/polars/pull/3763

arg_where expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3757

update arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/3762

python lhs power and broadcast by @ritchie46 in https://github.com/pola-rs/polars/pull/3768

allow regex expansion in binary/ternary expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/3769

str.ends_with/ str.starts_with by @ritchie46 in https://github.com/pola-rs/polars/pull/3770

fix bug in agg projections and init tpch schema tests by @ritchie46 in https://github.com/pola-rs/polars/pull/3771

always include offset in groupby_dynamic by @ritchie46 in https://github.com/pola-rs/polars/pull/3779

Cache file reads (tpch 2/7) ~5% faster by @ritchie46 in https://github.com/pola-rs/polars/pull/3774

python fix arr.contains type by @ritchie46 in https://github.com/pola-rs/polars/pull/3782

improve predicate combination and schema state by @ritchie46 in https://github.com/pola-rs/polars/pull/3788

fix duration computation by @ritchie46 in https://github.com/pola-rs/polars/pull/3790

Update arrow2 to support IPC Stream Reading with projections by @joshuataylor in https://github.com/pola-rs/polars/pull/3793

Some API alignment (missing funcs) between DataFrame, LazyFrame, and Series by @alexander-beedie in https://github.com/pola-rs/polars/pull/3791

Docs: sort entries within subsections by @alexander-beedie in https://github.com/pola-rs/polars/pull/3794

csv don't skip delimiter in whitespace trimming by @ritchie46 in https://github.com/pola-rs/polars/pull/3796

don't copy the sorted flag on many operations by @ritchie46 in https://github.com/pola-rs/polars/pull/3795

csv don't skip trailing delimiters when infering schema. by @ghuls in https://github.com/pola-rs/polars/pull/3799

Allow date_range to produce date ranges as well as datetime by @alexander-beedie in https://github.com/pola-rs/polars/pull/3798

quarter expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3797

Update rustc to 2022-06-22 by @ritchie46 in https://github.com/pola-rs/polars/pull/3801

Fix Node installation instructions by @Smittyvb in https://github.com/pola-rs/polars/pull/3804

python polars 0.13.50 by @ritchie46 in https://github.com/pola-rs/polars/pull/3802

rolling groupby fix index column output order by @ritchie46 in https://github.com/pola-rs/polars/pull/3806

Add support for IPC Streaming Read/Write by @joshuataylor in https://github.com/pola-rs/polars/pull/3783

chore: chunked_array readability improvements by @ryanrussell in https://github.com/pola-rs/polars/pull/3810

Add serde feature to field to fix serde feature by @joshuataylor in https://github.com/pola-rs/polars/pull/3808

fix join asof on floats by @ritchie46 in https://github.com/pola-rs/polars/pull/3812

chore: /polars/polars-core/src/frame/ readability by @ryanrussell in https://github.com/pola-rs/polars/pull/3813

Fixing small typos in docs by @thatlittleboy in https://github.com/pola-rs/polars/pull/3811

fix join asof tolerance by @ritchie46 in https://github.com/pola-rs/polars/pull/3816

docs: use quotes in pip install instruction by @thatlittleboy in https://github.com/pola-rs/polars/pull/3820

Improve parquet reading performance ~35-40% by @ritchie46 in https://github.com/pola-rs/polars/pull/3821

from anyvalue for small integers by @ritchie46 in https://github.com/pola-rs/polars/pull/3826

add date offset by @ritchie46 in https://github.com/pola-rs/polars/pull/3827

fix sorted unique by @ritchie46 in https://github.com/pola-rs/polars/pull/3837

fix ternary groupby agg_list/not_aggregated combination by @ritchie46 in https://github.com/pola-rs/polars/pull/3835

don't parallelize upsample by @ritchie46 in https://github.com/pola-rs/polars/pull/3836

python fix time divide by zero by @ritchie46 in https://github.com/pola-rs/polars/pull/3838

Improve map/apply docstrings by @braaannigan in https://github.com/pola-rs/polars/pull/3750

don't cache in-expression window functions by @ritchie46 in https://github.com/pola-rs/polars/pull/3840

Hypothesis testing framework integrations for Polars by @alexander-beedie in https://github.com/pola-rs/polars/pull/3842

docs: Improve expr.string documentation by @thatlittleboy in https://github.com/pola-rs/polars/pull/3841

make hypothesis optional and don't fail if not installed by @ritchie46 in https://github.com/pola-rs/polars/pull/3849

update arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/3848

python: fix time conversion by @ritchie46 in https://github.com/pola-rs/polars/pull/3851

Make frame/series asserts more resilient against integer overflow by @alexander-beedie in https://github.com/pola-rs/polars/pull/3850

parquet: allow writing smaller row groups by @ritchie46 in https://github.com/pola-rs/polars/pull/3852

python polars 0.13.51 by @ritchie46 in https://github.com/pola-rs/polars/pull/3854

allow branching null with struct dtype by @ritchie46 in https://github.com/pola-rs/polars/pull/3856

Address distinction between DataType and DataType() by @alexander-beedie in https://github.com/pola-rs/polars/pull/3857

Deprecate df/ldf argument to .join by @thomasaarholt in https://github.com/pola-rs/polars/pull/3855

null_probability functionality for dataframes/series test strategies. by @alexander-beedie in https://github.com/pola-rs/polars/pull/3860

Modern style type hints by @stinodego in https://github.com/pola-rs/polars/pull/3863

Concise empty class syntax by @stinodego in https://github.com/pola-rs/polars/pull/3864

fix groups after take expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3881

fix predicate pushdown in union + count expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3882

add join/union branch in window cache keys by @ritchie46 in https://github.com/pola-rs/polars/pull/3884

Fast/cheap empty clone ops by @alexander-beedie in https://github.com/pola-rs/polars/pull/3883

parquet read: fix remaining_rows counter by @ritchie46 in https://github.com/pola-rs/polars/pull/3887

Parquet writing: reduce heap allocs by @ritchie46 in https://github.com/pola-rs/polars/pull/3879

Negative-indexing support for additional functions, and frame-level take_every by @alexander-beedie in https://github.com/pola-rs/polars/pull/3888

Make numpy an optional requirement by @stinodego in https://github.com/pola-rs/polars/pull/3861

Address deprecation warnings while running pytest by @stinodego in https://github.com/pola-rs/polars/pull/3889

Fix reading of gzipped CSV files. Fixes: #3895 by @ghuls in https://github.com/pola-rs/polars/pull/3896

Relocate hypothesis unit tests to parallel tests_parametric dir by @alexander-beedie in https://github.com/pola-rs/polars/pull/3899

Assign dtypes to expected columns when dtypes is a list and column se… by @ghuls in https://github.com/pola-rs/polars/pull/3901

docs: fix link to series method in DataFrame by @duskmoon314 in https://github.com/pola-rs/polars/pull/3897

docs: Improve py-polars docs by @thatlittleboy in https://github.com/pola-rs/polars/pull/3873

Complete pythonic slice support (inc. negative indexing/stride) for DataFrame and Series by @alexander-beedie in https://github.com/pola-rs/polars/pull/3904

Update docstring outputs by @ghuls in https://github.com/pola-rs/polars/pull/3912

Make embedded CSV test strings easier to read. by @ghuls in https://github.com/pola-rs/polars/pull/3907

Quiet an unnecessary warning (tests), and minor optimisation for slices with negative stride by @alexander-beedie in https://github.com/pola-rs/polars/pull/3913

fix dataframe explode with empty lists by @ritchie46 in https://github.com/pola-rs/polars/pull/3916

Implement pow/rpow for Series by @stinodego in https://github.com/pola-rs/polars/pull/3908

Fix Series __setitem__ and take by @stinodego in https://github.com/pola-rs/polars/pull/3910

fix negative offset in groupby_rolling by @ritchie46 in https://github.com/pola-rs/polars/pull/3918

make string formatting configurable by @ritchie46 in https://github.com/pola-rs/polars/pull/3919

Expr docstrings by @braaannigan in https://github.com/pola-rs/polars/pull/3871

parquet: parallelize over row groups ~3x by @ritchie46 in https://github.com/pola-rs/polars/pull/3924

Don't unwrap IPC Stream, instead use ? to not panic by @joshuataylor in https://github.com/pola-rs/polars/pull/3927

Corrected .select type hint to Sequence[str, Expr] by @thomasaarholt in https://github.com/pola-rs/polars/pull/3931

add impl from anyvalue for literal by @savente93 in https://github.com/pola-rs/polars/pull/3921

update arrow: ipc limit and reduce categorical-> dictionary bound checks by @ritchie46 in https://github.com/pola-rs/polars/pull/3926

fix window expression case by @ritchie46 in https://github.com/pola-rs/polars/pull/3937

fix oob panic on expand_at_index and series from pyarrow chunkedarray by @ritchie46 in https://github.com/pola-rs/polars/pull/3938

block equality/ordering based predicates on null producing joins by @ritchie46 in https://github.com/pola-rs/polars/pull/3939

Extended with_columns to allow **kwargs style named expressions by @alexander-beedie in https://github.com/pola-rs/polars/pull/3917

upcast float16 to float32 by @ritchie46 in https://github.com/pola-rs/polars/pull/3940

python: fix already mutable borrowed append by @ritchie46 in https://github.com/pola-rs/polars/pull/3943

Fixed assert_frame_equal and assert_series_equal for NaN values by @alexander-beedie in https://github.com/pola-rs/polars/pull/3941

Add from_numpy constructor by @stinodego in https://github.com/pola-rs/polars/pull/3944

Fix Pandas date_range warnings in tests by @zundertj in https://github.com/pola-rs/polars/pull/3945

fix ipc ordering by @ritchie46 in https://github.com/pola-rs/polars/pull/3947

Remove "import polars as pl" from docstrings by @zundertj in https://github.com/pola-rs/polars/pull/3948

[docs] improve python polars documentation by @thatlittleboy in https://github.com/pola-rs/polars/pull/3954

Modern style type hints for the test suite by @stinodego in https://github.com/pola-rs/polars/pull/3949

Fixed most See Also docstring formatting, quietened the last warnings coming from doctests by @alexander-beedie in https://github.com/pola-rs/polars/pull/3932

python: loossen truncate sorted restriction in docstring by @ritchie46 in https://github.com/pola-rs/polars/pull/3956

groupby apply: use inner type to infer dtype by @ritchie46 in https://github.com/pola-rs/polars/pull/3955

python polars 0.13.52 by @ritchie46 in https://github.com/pola-rs/polars/pull/3957

Fix pytest warning by @stinodego in https://github.com/pola-rs/polars/pull/3962

Update README.md by @cxtruong70 in https://github.com/pola-rs/polars/pull/3959

implicit datelike string comparison warning by @ritchie46 in https://github.com/pola-rs/polars/pull/3967

fix count union predicate by @ritchie46 in https://github.com/pola-rs/polars/pull/3969

docs: conventions, mwe and docstring fixes by @thatlittleboy in https://github.com/pola-rs/polars/pull/3973

Pythonic slice support for LazyFrame (efficient computation paths only) by @alexander-beedie in https://github.com/pola-rs/polars/pull/3970

add from_numpy to docs by @thatlittleboy in https://github.com/pola-rs/polars/pull/3976

use bitflags crate by @ritchie46 in https://github.com/pola-rs/polars/pull/3978

fix accidentally slow cross join by @ritchie46 in https://github.com/pola-rs/polars/pull/3980

ensure main lazyframe gets file cache opt state by @ritchie46 in https://github.com/pola-rs/polars/pull/3981

chore(tests): small readability fixes by @ryanrussell in https://github.com/pola-rs/polars/pull/3989

Remove unnessary imports by @zundertj in https://github.com/pola-rs/polars/pull/3988

Add support for loading a collection of parquet files by @andrei-ionescu in https://github.com/pola-rs/polars/pull/3894

improve from dictionary -> categorical by @ritchie46 in https://github.com/pola-rs/polars/pull/3996

fix col aggregation schema and ternary on empty series by @ritchie46 in https://github.com/pola-rs/polars/pull/3995

release memory on 0% selectivity by @ritchie46 in https://github.com/pola-rs/polars/pull/4000

col(dtypes).exclude() by @ritchie46 in https://github.com/pola-rs/polars/pull/4001

fix explode offsets for empty lists by @ritchie46 in https://github.com/pola-rs/polars/pull/4005

reduce peak memory of reading parquet by row groups ~-22% by @ritchie46 in https://github.com/pola-rs/polars/pull/4006

fix rolling groupby with negative windows by @ritchie46 in https://github.com/pola-rs/polars/pull/4010

fix: Lazyframe::from(lp) #3877 by @universalmind303 in https://github.com/pola-rs/polars/pull/4012

Date encode types by @ritchie46 in https://github.com/pola-rs/polars/pull/4013

csv: allow multiple null values by @ritchie46 in https://github.com/pola-rs/polars/pull/4016

python polars 0.13.53 by @ritchie46 in https://github.com/pola-rs/polars/pull/4017

Improve lazy state struct by @ritchie46 in https://github.com/pola-rs/polars/pull/4008

python: fix pyarrow imports by @ritchie46 in https://github.com/pola-rs/polars/pull/4025

fix lazy schema by @ritchie46 in https://github.com/pola-rs/polars/pull/4027

Align the exclude docstrings and annotation by @thatlittleboy in https://github.com/pola-rs/polars/pull/4020

docs: add mwe and internal links by @thatlittleboy in https://github.com/pola-rs/polars/pull/4019

impl explode for nested lists by @ritchie46 in https://github.com/pola-rs/polars/pull/4028

allow joining on expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/4029

allow nulls last in sort by expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/4030

python polars 0.13.54 by @ritchie46 in https://github.com/pola-rs/polars/pull/4031

feat: implement contains for DataFrame and LazyFrame by @thatlittleboy in https://github.com/pola-rs/polars/pull/4035

Remove py-polars legacy package by @stinodego in https://github.com/pola-rs/polars/pull/4037

Native trigonometry functions by @stinodego in https://github.com/pola-rs/polars/pull/4034

parquet: stop reading when slice is reached by @ritchie46 in https://github.com/pola-rs/polars/pull/4046

fix cross join by @ritchie46 in https://github.com/pola-rs/polars/pull/4045

More trigonometry by @stinodego in https://github.com/pola-rs/polars/pull/4047

Update flake8 settings by @stinodego in https://github.com/pola-rs/polars/pull/4038

pivot: fix categorical logicaltype by @ritchie46 in https://github.com/pola-rs/polars/pull/4048

Update mypy settings by @stinodego in https://github.com/pola-rs/polars/pull/4049

fix: reproducible Expr.hash by @thatlittleboy in https://github.com/pola-rs/polars/pull/4033

Fix constructor orient type hint by @stinodego in https://github.com/pola-rs/polars/pull/3961

Improve coverage report settings by @stinodego in https://github.com/pola-rs/polars/pull/4039

Added literal param to string-replace functions, optimized replace performance in small-string regime (30-80% faster) by @alexander-beedie in https://github.com/pola-rs/polars/pull/4057

parquet: low memory arg by @ritchie46 in https://github.com/pola-rs/polars/pull/4050

Upgrade Windows 10 tests, benchmark and doc jobs to Python3.10 by @zundertj in https://github.com/pola-rs/polars/pull/4059

Revert "Upgrade Windows 10 tests, benchmark and doc jobs to Python3.10" by @ritchie46 in https://github.com/pola-rs/polars/pull/4062

fill_null expr: ensure minimal supertype by @ritchie46 in https://github.com/pola-rs/polars/pull/4061

Fix connector-x integration for PostgreSQL by @valxv in https://github.com/pola-rs/polars/pull/4063

node updates by @universalmind303 in https://github.com/pola-rs/polars/pull/3984

python polars 0.13.55 by @ritchie46 in https://github.com/pola-rs/polars/pull/4064

Handle wrong input for orient argument by @stinodego in https://github.com/pola-rs/polars/pull/4065

Turn on doctests; fix wrong examples by @zundertj in https://github.com/pola-rs/polars/pull/4060

Mypy warn redundant casts by @zundertj in https://github.com/pola-rs/polars/pull/4055

Add mypy optional error codes by @stinodego in https://github.com/pola-rs/polars/pull/4054

recursively convert arrow logical types in to_arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/4067

improve unique performance by @ritchie46 in https://github.com/pola-rs/polars/pull/4070

Small formatting fixes by @stinodego in https://github.com/pola-rs/polars/pull/4071

[mypy] Add error codes by @stinodego in https://github.com/pola-rs/polars/pull/4072

reduce contention of global string cache: >4x performance improvement by @ritchie46 in https://github.com/pola-rs/polars/pull/4078

Add lazy() method to LazyFrame by @zundertj in https://github.com/pola-rs/polars/pull/4077

[flake8] Enable flake8-bugbear extension by @stinodego in https://github.com/pola-rs/polars/pull/4073

csv: allow reading with different eol character by @ritchie46 in https://github.com/pola-rs/polars/pull/4080

docs: rework some MWE and minor formatting fixes by @thatlittleboy in https://github.com/pola-rs/polars/pull/4082

Upgrade maturin to 0.13.0 by @messense in https://github.com/pola-rs/polars/pull/4086

dataframe display: use POLARS_FMT_STR_LEN by @ritchie46 in https://github.com/pola-rs/polars/pull/4088

don't allow comparing local categoricals by @ritchie46 in https://github.com/pola-rs/polars/pull/4087

implement list hash for simply nested lists by @ritchie46 in https://github.com/pola-rs/polars/pull/4090

improve error on missing column access by @ritchie46 in https://github.com/pola-rs/polars/pull/4095

value_counts add sorted argument by @ritchie46 in https://github.com/pola-rs/polars/pull/4094

from_rows improve schema correctness by @ritchie46 in https://github.com/pola-rs/polars/pull/4097

Cache length of ChunkedArray. by @ritchie46 in https://github.com/pola-rs/polars/pull/4105

fix explode with empty lists by @ritchie46 in https://github.com/pola-rs/polars/pull/4113

fix so rank by @ritchie46 in https://github.com/pola-rs/polars/pull/4114

fix explode for sliced arrays by @ritchie46 in https://github.com/pola-rs/polars/pull/4115

python: to_numpy use first type as supertype by @ritchie46 in https://github.com/pola-rs/polars/pull/4116

python: remove css line for vscode by @ritchie46 in https://github.com/pola-rs/polars/pull/4117

Remove read_excel hacks by @cnpryer in https://github.com/pola-rs/polars/pull/4081

python allow set by string by @ritchie46 in https://github.com/pola-rs/polars/pull/4118

fill_nan preserve name by @ritchie46 in https://github.com/pola-rs/polars/pull/4119

Fix prefix/suffix docstrings. by @ghuls in https://github.com/pola-rs/polars/pull/4122

allow summing of duration in selection context by @ritchie46 in https://github.com/pola-rs/polars/pull/4124

python: improve setitem by @ritchie46 in https://github.com/pola-rs/polars/pull/4121

python polars 0.13.56 by @ritchie46 in https://github.com/pola-rs/polars/pull/4127

Assert deprecation warning on DataFrame.setitem in tests by @zundertj in https://github.com/pola-rs/polars/pull/4126

Run PR workflows on definition changes by @zundertj in https://github.com/pola-rs/polars/pull/4125

fix 'fatal: unsafe repository' in python build by @ritchie46 in https://github.com/pola-rs/polars/pull/4129

Nested dict by @ritchie46 in https://github.com/pola-rs/polars/pull/4131

improve performance of building global string cache from arrow dictio… by @ritchie46 in https://github.com/pola-rs/polars/pull/4132

csv writer quote if string contains new line char by @ritchie46 in https://github.com/pola-rs/polars/pull/4134

fix explode edge cases by @ritchie46 in https://github.com/pola-rs/polars/pull/4133

add pl.cut utility by @ritchie46 in https://github.com/pola-rs/polars/pull/4137

python polars 0.13.57 by @ritchie46 in https://github.com/pola-rs/polars/pull/4141

Mypy disallow untyped calls by @ritchie46 in https://github.com/pola-rs/polars/pull/4140

Improve re-raises of Exceptions by @zundertj in https://github.com/pola-rs/polars/pull/4142

pivot fix categorical index by @ritchie46 in https://github.com/pola-rs/polars/pull/4149

Fix typo by @stinodego in https://github.com/pola-rs/polars/pull/4146

Wrap long strings by @stinodego in https://github.com/pola-rs/polars/pull/4144

Fix Python line lengths to 88 characters by @stinodego in https://github.com/pola-rs/polars/pull/4152

add is_in for categoricals by @ritchie46 in https://github.com/pola-rs/polars/pull/4153

python 0.13.58 by @ritchie46 in https://github.com/pola-rs/polars/pull/4154

Docstring lints & improvements by @stinodego in https://github.com/pola-rs/polars/pull/4155

pivot: fix logical type of multiple indexes by @ritchie46 in https://github.com/pola-rs/polars/pull/4159

more tests by @ritchie46 in https://github.com/pola-rs/polars/pull/4163

Use latest arrow2 to support latest nightly rust by @gyscos in https://github.com/pola-rs/polars/pull/4162

Fix invalid inputs for trigonometric functions by @stinodego in https://github.com/pola-rs/polars/pull/4164

update schema in udfs by @ritchie46 in https://github.com/pola-rs/polars/pull/4165

python: expose idx type by @ritchie46 in https://github.com/pola-rs/polars/pull/4167

Improve getitem for Dataframe/Series. by @ghuls in https://github.com/pola-rs/polars/pull/4160

Dataframe equality by @stinodego in https://github.com/pola-rs/polars/pull/4076

Docstring improvements & enable lints by @stinodego in https://github.com/pola-rs/polars/pull/4161

Native implementation of the sign function by @stinodego in https://github.com/pola-rs/polars/pull/4147

Minor docs updates by @stinodego in https://github.com/pola-rs/polars/pull/4173

Validation for groupby arguments by @stinodego in https://github.com/pola-rs/polars/pull/4176

update arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/4177

throw error on schema failure by @ritchie46 in https://github.com/pola-rs/polars/pull/4178

with_columns update on duplicates by @ritchie46 in https://github.com/pola-rs/polars/pull/4179

fold regex expand by @ritchie46 in https://github.com/pola-rs/polars/pull/4181

python: prefer pyarrow when we can memory map the file by @ritchie46 in https://github.com/pola-rs/polars/pull/4182

window functions: sort cached groups if needed by @ritchie46 in https://github.com/pola-rs/polars/pull/4184

reduce supertype match by calling twice/ allow Some(tz)/None supertype by @ritchie46 in https://github.com/pola-rs/polars/pull/4186

Added const empty initializer to DataFrame by @TheDan64 in https://github.com/pola-rs/polars/pull/4187

fix utf8 explode for nulls and empty strings by @ritchie46 in https://github.com/pola-rs/polars/pull/4189

type-coercion: ignore unknown untill replaced by @ritchie46 in https://github.com/pola-rs/polars/pull/4192

python: always use stdlib http reader and improve memmap ipc reader a… by @ritchie46 in https://github.com/pola-rs/polars/pull/4193

slice pushdown for cross joins by @ritchie46 in https://github.com/pola-rs/polars/pull/4194

csv: ignore quoted lines in skip lines by @ritchie46 in https://github.com/pola-rs/polars/pull/4191

Small fixes in type formatting by @stinodego in https://github.com/pola-rs/polars/pull/4195

use native ndjson reader by @ritchie46 in https://github.com/pola-rs/polars/pull/4196

python polars: 0.13.59 by @ritchie46 in https://github.com/pola-rs/polars/pull/4198

Miscellaneous improvements by @matteosantama in https://github.com/pola-rs/polars/pull/4203

Add flake8 extension: comprehensions by @stinodego in https://github.com/pola-rs/polars/pull/4200

Add flake8 extension: simplify by @stinodego in https://github.com/pola-rs/polars/pull/4201

don't use pyarrow read if we have categoricals in the schema by @ritchie46 in https://github.com/pola-rs/polars/pull/4205

python: don't lock gil in arr.contains by @ritchie46 in https://github.com/pola-rs/polars/pull/4210

fix nested struct append by @ritchie46 in https://github.com/pola-rs/polars/pull/4217

use default context for col upstream col expression type by @ritchie46 in https://github.com/pola-rs/polars/pull/4219

ensure weekday starts at 0 by @ritchie46 in https://github.com/pola-rs/polars/pull/4220

python datetime consistency by @ritchie46 in https://github.com/pola-rs/polars/pull/4221

python: improve error by @ritchie46 in https://github.com/pola-rs/polars/pull/4223

Upgrade black, blackdoc, mypy, flake8 by @matteosantama in https://github.com/pola-rs/polars/pull/4209

python: ensure utf8 encoding when writing dot file by @ritchie46 in https://github.com/pola-rs/polars/pull/4225

convert arrow map to list by @ritchie46 in https://github.com/pola-rs/polars/pull/4226

fast path for sorted min/max by @ritchie46 in https://github.com/pola-rs/polars/pull/4228

Set no_implicit_reexport = true in pyproject.toml by @matteosantama in https://github.com/pola-rs/polars/pull/4211

fix and improve rolling_skew by @ritchie46 in https://github.com/pola-rs/polars/pull/4232

ternary expr: validate predicate in groupby context by @ritchie46 in https://github.com/pola-rs/polars/pull/4237

Overload pl.from_arrow type hints by @matteosantama in https://github.com/pola-rs/polars/pull/4236

python: allow horizontal expanding sum by @ritchie46 in https://github.com/pola-rs/polars/pull/4242

improve strictness/consistency of when then otherwise by @ritchie46 in https://github.com/pola-rs/polars/pull/4241

reinstate old ternary behavior as experimental by @ritchie46 in https://github.com/pola-rs/polars/pull/4244

correct dtype for power by @ritchie46 in https://github.com/pola-rs/polars/pull/4246

csv: improve data/datetime/bool overwrite by @ritchie46 in https://github.com/pola-rs/polars/pull/4247

Release rust 0.23.0 by @ritchie46 in https://github.com/pola-rs/polars/pull/4248

New Contributors

@GregoryBL made their first contribution in https://github.com/pola-rs/polars/pull/3566

@gyscos made their first contribution in https://github.com/pola-rs/polars/pull/3674

@LVG77 made their first contribution in https://github.com/pola-rs/polars/pull/3691

@gunjunlee made their first contribution in https://github.com/pola-rs/polars/pull/3671

@saethlin made their first contribution in https://github.com/pola-rs/polars/pull/3737

@joshuataylor made their first contribution in https://github.com/pola-rs/polars/pull/3793

@Smittyvb made their first contribution in https://github.com/pola-rs/polars/pull/3804

@thatlittleboy made their first contribution in https://github.com/pola-rs/polars/pull/3811

@braaannigan made their first contribution in https://github.com/pola-rs/polars/pull/3750

@thomasaarholt made their first contribution in https://github.com/pola-rs/polars/pull/3855

@duskmoon314 made their first contribution in https://github.com/pola-rs/polars/pull/3897

@savente93 made their first contribution in https://github.com/pola-rs/polars/pull/3921

@cxtruong70 made their first contribution in https://github.com/pola-rs/polars/pull/3959

@andrei-ionescu made their first contribution in https://github.com/pola-rs/polars/pull/3894

@valxv made their first contribution in https://github.com/pola-rs/polars/pull/4063

@matteosantama made their first contribution in https://github.com/pola-rs/polars/pull/4203

Full Changelog: https://github.com/pola-rs/polars/compare/rust-polars-v0.22.1...rust-polars-v0.23.0
Source code(tar.gz)
Source code(zip)
rust-polars-v0.22.1(Jun 6, 2022)
What's Changed

partial support for list arithmetic by @ritchie46 in https://github.com/pola-rs/polars/pull/3307

shuffle sample option by @ritchie46 in https://github.com/pola-rs/polars/pull/3308

improve predicate pushdown by @ritchie46 in https://github.com/pola-rs/polars/pull/3313

Improve partitioned agg by @ritchie46 in https://github.com/pola-rs/polars/pull/3314

list to struct by @ritchie46 in https://github.com/pola-rs/polars/pull/3317

oncecell in favor of lazy_static by @ritchie46 in https://github.com/pola-rs/polars/pull/3319

Update cummax documentation by @briandk in https://github.com/pola-rs/polars/pull/3323

scan pyarrow dataset by @ritchie46 in https://github.com/pola-rs/polars/pull/3327

fix panic in csv parser by @ritchie46 in https://github.com/pola-rs/polars/pull/3339

implement anyvalue -> datatype for all variants by @ritchie46 in https://github.com/pola-rs/polars/pull/3340

remove badge by @ritchie46 in https://github.com/pola-rs/polars/pull/3341

Added PartitionedWriter for disk partitioning. by @illumination-k in https://github.com/pola-rs/polars/pull/3331

Fast json by @universalmind303 in https://github.com/pola-rs/polars/pull/3324

add hash to rust expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/3350

serde for group options by @elferherrera in https://github.com/pola-rs/polars/pull/3349

Check if length of index in pivot operation is non-zero. Fixes: #3343. by @ghuls in https://github.com/pola-rs/polars/pull/3346

improve agg_list performance of chunked numerical data by @ritchie46 in https://github.com/pola-rs/polars/pull/3351

Fix init of DataFrame with empty dataset (eg:"[]") and column/schema typedefs by @alexander-beedie in https://github.com/pola-rs/polars/pull/3353

rechunk on default sort and groupby by @ritchie46 in https://github.com/pola-rs/polars/pull/3354

more partitioned groupby by @ritchie46 in https://github.com/pola-rs/polars/pull/3355

Add extension_module in python example by @Maxyme in https://github.com/pola-rs/polars/pull/3358

allow join on same cat source by @ritchie46 in https://github.com/pola-rs/polars/pull/3363

fix rename same name by @ritchie46 in https://github.com/pola-rs/polars/pull/3364

initial timezone support by @ritchie46 in https://github.com/pola-rs/polars/pull/3357

pivot index maintain logical type by @ritchie46 in https://github.com/pola-rs/polars/pull/3367

use array_ref in favor of chunks by @ritchie46 in https://github.com/pola-rs/polars/pull/3368

entropy normalization arg by @ritchie46 in https://github.com/pola-rs/polars/pull/3369

categorical keep type in comparisson by @ritchie46 in https://github.com/pola-rs/polars/pull/3370

rechunk in asof and allow concat to empty df by @ritchie46 in https://github.com/pola-rs/polars/pull/3376

improve overflow of numeric mean by @ritchie46 in https://github.com/pola-rs/polars/pull/3377

fix parquet stats by @ritchie46 in https://github.com/pola-rs/polars/pull/3378

delay rechunk optimization by @ritchie46 in https://github.com/pola-rs/polars/pull/3381

Allow Z in native strpttime by @ritchie46 in https://github.com/pola-rs/polars/pull/3382

more partitioned aggregators by @ritchie46 in https://github.com/pola-rs/polars/pull/3385

improve partition_by by @ritchie46 in https://github.com/pola-rs/polars/pull/3386

Add overload support to partition_by. by @ghuls in https://github.com/pola-rs/polars/pull/3388

Check if some arguments for read_csv and scan_csv got a 1 byte input. by @ghuls in https://github.com/pola-rs/polars/pull/3389

fix rayon SO in partition_by by @ritchie46 in https://github.com/pola-rs/polars/pull/3391

fix bug in predicate pushdown on dependent predicates by @ritchie46 in https://github.com/pola-rs/polars/pull/3394

fix predicate pushdown for predicates that do aggregations by @ritchie46 in https://github.com/pola-rs/polars/pull/3396

cumulative_eval by @ritchie46 in https://github.com/pola-rs/polars/pull/3400

ensure that Cast expressions first updates groups before it flattens by @ritchie46 in https://github.com/pola-rs/polars/pull/3401

improve and simplify ternary aggregation by @ritchie46 in https://github.com/pola-rs/polars/pull/3403

fix explode empty df by @ritchie46 in https://github.com/pola-rs/polars/pull/3405

Improve list builders, iteration and construction by @ritchie46 in https://github.com/pola-rs/polars/pull/3419

feature gate timezones by @ritchie46 in https://github.com/pola-rs/polars/pull/3422

fix cumulative_eval on window expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/3421

csv allow only header and fix lazy rename by @ritchie46 in https://github.com/pola-rs/polars/pull/3423

upgrade arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/3425

infer dtype of empty list in recursive list construction & fix struct.arr take by @ritchie46 in https://github.com/pola-rs/polars/pull/3433

fix struct list concat by @ritchie46 in https://github.com/pola-rs/polars/pull/3435

csv parser fallback on chrono if datetime pattern fails by @ritchie46 in https://github.com/pola-rs/polars/pull/3436

improve rolling_quantile kernel (no nulls) ~28x by @ritchie46 in https://github.com/pola-rs/polars/pull/3437

improve rolling_{min/max/sum/mean} prerformance ~3.4x by @ritchie46 in https://github.com/pola-rs/polars/pull/3444

struct add chunk and impl reverse by @ritchie46 in https://github.com/pola-rs/polars/pull/3445

fix struct equality by @ritchie46 in https://github.com/pola-rs/polars/pull/3446

Struct error on different dict orders by @ritchie46 in https://github.com/pola-rs/polars/pull/3447

Inherit Exception in fallback exception classes by @adamgreg in https://github.com/pola-rs/polars/pull/3450

Struct creations/append/extend stricter schema by @ritchie46 in https://github.com/pola-rs/polars/pull/3454

don't allow predicate pushdown if compared column is being coerced by @ritchie46 in https://github.com/pola-rs/polars/pull/3457

improve rolling_min/max for columns with null values by @ritchie46 in https://github.com/pola-rs/polars/pull/3458

Improve rolling_sum/rolling_mean for windows with null values. by @ritchie46 in https://github.com/pola-rs/polars/pull/3466

explode series after slide fast path by @ritchie46 in https://github.com/pola-rs/polars/pull/3467

Improve struct by @ritchie46 in https://github.com/pola-rs/polars/pull/3468

improve rolling_var performance by @ritchie46 in https://github.com/pola-rs/polars/pull/3470

power by expression and improve rust lazy ergonomics by @ritchie46 in https://github.com/pola-rs/polars/pull/3475

add specialized rolling_std kernel by @ritchie46 in https://github.com/pola-rs/polars/pull/3476

fix null commutativity by @ritchie46 in https://github.com/pola-rs/polars/pull/3479

use anyvalue if first apply list result is empty by @ritchie46 in https://github.com/pola-rs/polars/pull/3480

Added describe method to rust library by @glennpierce in https://github.com/pola-rs/polars/pull/3320

Groupby Optimization for sorted keys: ~15x perf gain. by @ritchie46 in https://github.com/pola-rs/polars/pull/3489

make cat merge fallible and loossen restrictions on categorical appends by @ritchie46 in https://github.com/pola-rs/polars/pull/3491

Fix LazyFrame.join_asof documentation reference by @adamgreg in https://github.com/pola-rs/polars/pull/3493

feat: support pl.Time in Series.str.strptime by @fsimkovic in https://github.com/pola-rs/polars/pull/3496

str().extract_all / str().count_match by @ritchie46 in https://github.com/pola-rs/polars/pull/3507

add apply to cookbooks by @ritchie46 in https://github.com/pola-rs/polars/pull/3504

support all arrow dictionary keys < 64 bit by @ritchie46 in https://github.com/pola-rs/polars/pull/3508

fix accidental quadratic behavior in rolling_groupby by @ritchie46 in https://github.com/pola-rs/polars/pull/3510

Fix some unit test deprecation warnings by @adamgreg in https://github.com/pola-rs/polars/pull/3503

Experimental Allow rolling_<agg> expressions to determine window size by another {Date, Datetime} series. by @ritchie46 in https://github.com/pola-rs/polars/pull/3514

use specialize kernels in rolling_groupby aggregation ~10x perf gain (window of 100 elements) by @ritchie46 in https://github.com/pola-rs/polars/pull/3515

reduce probability of quadratic behavior in min/max rolling by @ritchie46 in https://github.com/pola-rs/polars/pull/3516

adjust for kleene logic in drop_na by @ritchie46 in https://github.com/pola-rs/polars/pull/3529

fix aggregation of empty list by @ritchie46 in https://github.com/pola-rs/polars/pull/3527

fix sorting of chunked numeric arrays by @ritchie46 in https://github.com/pola-rs/polars/pull/3528

adjust for kleene logic in drop_na by @ritchie46 in https://github.com/pola-rs/polars/pull/3530

Improve rolling min max by @ritchie46 in https://github.com/pola-rs/polars/pull/3531

fix null aggregation edge case by @ritchie46 in https://github.com/pola-rs/polars/pull/3536

allow concat/append expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/3541

make sort by multiple columns parallel by @ritchie46 in https://github.com/pola-rs/polars/pull/3549

allow more aggregations on dtype duration by @ritchie46 in https://github.com/pola-rs/polars/pull/3550

use first series to validate length by @ritchie46 in https://github.com/pola-rs/polars/pull/3551

Raise a more helpful TypeError when trying to subscript a LazyFrame. by @ghuls in https://github.com/pola-rs/polars/pull/3554

Readability Fixes r2 by @ryanrussell in https://github.com/pola-rs/polars/pull/3556

add count_match, extract_all to python ref guide by @ritchie46 in https://github.com/pola-rs/polars/pull/3558

fill_null limits by @ritchie46 in https://github.com/pola-rs/polars/pull/3559

test sortedness propagation by @ritchie46 in https://github.com/pola-rs/polars/pull/3560

update boolean aggregates and ensure they return IdxSize by @ritchie46 in https://github.com/pola-rs/polars/pull/3563

Improve parse_lines error message. by @ghuls in https://github.com/pola-rs/polars/pull/3569

sorted_merge_join by @ritchie46 in https://github.com/pola-rs/polars/pull/3505

Rust Readability Improvements by @ryanrussell in https://github.com/pola-rs/polars/pull/3573

fix invalid fast path of sorted joins and improve sortedness propagation by @ritchie46 in https://github.com/pola-rs/polars/pull/3577

prevent expensive type coercion in expression and fix when->then->oth… by @ritchie46 in https://github.com/pola-rs/polars/pull/3579

Updated the fmt feature flag error message by @TheDan64 in https://github.com/pola-rs/polars/pull/3586

Fix u16 Series formatting. by @ghuls in https://github.com/pola-rs/polars/pull/3584

update arrow to crates.io: ~2x json parsing improvement by @ritchie46 in https://github.com/pola-rs/polars/pull/3588

New Contributors

@kianmeng made their first contribution in https://github.com/pola-rs/polars/pull/3311

@briandk made their first contribution in https://github.com/pola-rs/polars/pull/3323

@EwoutH made their first contribution in https://github.com/pola-rs/polars/pull/3352

@adamgreg made their first contribution in https://github.com/pola-rs/polars/pull/3450

@ryanrussell made their first contribution in https://github.com/pola-rs/polars/pull/3488

@fsimkovic made their first contribution in https://github.com/pola-rs/polars/pull/3496

@chitralverma made their first contribution in https://github.com/pola-rs/polars/pull/3578

@TheDan64 made their first contribution in https://github.com/pola-rs/polars/pull/3586

Full Changelog: https://github.com/pola-rs/polars/compare/rust-polars-v0.21.1...rust-polars-v0.22.1
Source code(tar.gz)
Source code(zip)
rust-polars-v0.21.1(Jun 6, 2022)
What's Changed

Remove crate num_cpus from polars by @dandxy89 in https://github.com/pola-rs/polars/pull/2890

temporarely pin crossbeam-epoch by @ritchie46 in https://github.com/pola-rs/polars/pull/2902

fix unique and drop by @ritchie46 in https://github.com/pola-rs/polars/pull/2908

fix explode of empty lists by @ritchie46 in https://github.com/pola-rs/polars/pull/2910

fix function input expansion by @ritchie46 in https://github.com/pola-rs/polars/pull/2913

fix compilation lazy + string by @ritchie46 in https://github.com/pola-rs/polars/pull/2914

respect dtype overwrite when schema is overwritten in lazy csv scanner by @ritchie46 in https://github.com/pola-rs/polars/pull/2915

deprecate to_ and string cache in lazy by @ritchie46 in https://github.com/pola-rs/polars/pull/2916

Refactor: move most temporal related code to polars-time. by @ritchie46 in https://github.com/pola-rs/polars/pull/2918

improve datetime inference by @ritchie46 in https://github.com/pola-rs/polars/pull/2923

rename distinct to unique by @ritchie46 in https://github.com/pola-rs/polars/pull/2926

fix some warning by @ritchie46 in https://github.com/pola-rs/polars/pull/2927

improve date/datetime inference by @ritchie46 in https://github.com/pola-rs/polars/pull/2925

fix fill_nan dtypes by @ritchie46 in https://github.com/pola-rs/polars/pull/2933

fix future calculation in groupby dynamic by @ritchie46 in https://github.com/pola-rs/polars/pull/2935

add tolerance to asof + by by @ritchie46 in https://github.com/pola-rs/polars/pull/2937

fix(scan_csv): handle empty csv file exception by @LuisCardosoOliveira in https://github.com/pola-rs/polars/pull/2934

handle Utf8Owned AnyValue for DataType by @cigrainger in https://github.com/pola-rs/polars/pull/2944

Fix argsort by @ritchie46 in https://github.com/pola-rs/polars/pull/2946

value_counts and unique_counts expression by @ritchie46 in https://github.com/pola-rs/polars/pull/2947

use schema in 'with_columns' to amortize lookups and fix bug in emptr… by @ritchie46 in https://github.com/pola-rs/polars/pull/2949

add native log and entropy expression by @ritchie46 in https://github.com/pola-rs/polars/pull/2952

csv parsing: skip whitespace on failed parse by @ritchie46 in https://github.com/pola-rs/polars/pull/2953

Literal in groupby context, arange and repeat by @ritchie46 in https://github.com/pola-rs/polars/pull/2958

Huge perf improvement of many expressions and ListChunked::from_iter perf by @ritchie46 in https://github.com/pola-rs/polars/pull/2962

update groups in count() agg and correctly update state by @ritchie46 in https://github.com/pola-rs/polars/pull/2963

add sign by @ritchie46 in https://github.com/pola-rs/polars/pull/2977

see kurtosis as aggregation by @ritchie46 in https://github.com/pola-rs/polars/pull/2993

fix groups state after apply by @ritchie46 in https://github.com/pola-rs/polars/pull/2992

Home directory support by @cjermain in https://github.com/pola-rs/polars/pull/2940

make sure that sort does not index empty list by @ritchie46 in https://github.com/pola-rs/polars/pull/2996

python: improve arithmetic consistency by @ritchie46 in https://github.com/pola-rs/polars/pull/3001

python: add apply on struct dtype by @ritchie46 in https://github.com/pola-rs/polars/pull/3003

fix null in non-fast-explode explode of numeric arrays by @ritchie46 in https://github.com/pola-rs/polars/pull/3006

also expand rename in filters by @ritchie46 in https://github.com/pola-rs/polars/pull/3008

fix when then with literal by @ritchie46 in https://github.com/pola-rs/polars/pull/3009

fix groups update to match exploded offsets by @ritchie46 in https://github.com/pola-rs/polars/pull/3010

add duration expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3017

allow nested groupby in groupby_rolling by @ritchie46 in https://github.com/pola-rs/polars/pull/3018

Fix read_parquet with list having nested struct by @cjermain in https://github.com/pola-rs/polars/pull/2991

fix outer join schema by @ritchie46 in https://github.com/pola-rs/polars/pull/3021

lazy: fix drop all by @ritchie46 in https://github.com/pola-rs/polars/pull/3023

fix schemas of groupby rolling/dynamic by @ritchie46 in https://github.com/pola-rs/polars/pull/3028

fix div by zero by @ritchie46 in https://github.com/pola-rs/polars/pull/3031

fix incorrect match in agg_mean by @ritchie46 in https://github.com/pola-rs/polars/pull/3030

check alias in whole expr on opt by @ritchie46 in https://github.com/pola-rs/polars/pull/3032

align groups in binary when they not align by @ritchie46 in https://github.com/pola-rs/polars/pull/3033

only expand function inputs if wildcard expansion allows it by @ritchie46 in https://github.com/pola-rs/polars/pull/3039

fix when_then_chain containing nulls by @ritchie46 in https://github.com/pola-rs/polars/pull/3040

fixed typo in format_path docstring by @cnpryer in https://github.com/pola-rs/polars/pull/3045

fix when-then-chain by @ritchie46 in https://github.com/pola-rs/polars/pull/3048

throw error on empty keyed groupby by @ritchie46 in https://github.com/pola-rs/polars/pull/3049

compare expand_cols by variant not exact datatype by @ritchie46 in https://github.com/pola-rs/polars/pull/3050

dot: use apply instead of map by @ritchie46 in https://github.com/pola-rs/polars/pull/3051

check output length of all 'map' expressions by @ritchie46 in https://github.com/pola-rs/polars/pull/3052

error on invalid asof_join by input by @ritchie46 in https://github.com/pola-rs/polars/pull/3053

improve performance of asof_join by equal or more than 2 keys by @ritchie46 in https://github.com/pola-rs/polars/pull/3055

remove unneeded expensive assert by @ritchie46 in https://github.com/pola-rs/polars/pull/3069

improve boolean null comparsions consistency by @ritchie46 in https://github.com/pola-rs/polars/pull/3068

fix entropy by @ritchie46 in https://github.com/pola-rs/polars/pull/3070

fix explode empty lists by @ritchie46 in https://github.com/pola-rs/polars/pull/3083

Lazy: update schema in explode op by @ritchie46 in https://github.com/pola-rs/polars/pull/3084

CSV datetime inference 3x performance improvement by @ritchie46 in https://github.com/pola-rs/polars/pull/2950

[polars-sql] Adding SQL Context, SELECT and GROUP BY by @potter420 in https://github.com/pola-rs/polars/pull/3024

Default sample n param to 1 by @cnpryer in https://github.com/pola-rs/polars/pull/3090

Expose 'rechunk' param from "read_ipc" for consistency (default behaviour unchanged) by @alexander-beedie in https://github.com/pola-rs/polars/pull/3088

Add optional seeding for sampling by @cnpryer in https://github.com/pola-rs/polars/pull/3080

default to native strptime by @ritchie46 in https://github.com/pola-rs/polars/pull/3093

Raise error in sample() if n and frac are both passed by @cnpryer in https://github.com/pola-rs/polars/pull/3091

split up planner by @ritchie46 in https://github.com/pola-rs/polars/pull/3095

add test for #3097 by @ritchie46 in https://github.com/pola-rs/polars/pull/3098

Initial support for serde/pickling expressions. by @ritchie46 in https://github.com/pola-rs/polars/pull/3096

Adding nested struct support by fixing ArrayRef determination by @cjermain in https://github.com/pola-rs/polars/pull/3103

Enhanced columns param for DataFrame init, additionally allowing for inline type specification by @alexander-beedie in https://github.com/pola-rs/polars/pull/3100

Improve rolling agg by @ritchie46 in https://github.com/pola-rs/polars/pull/3101

add estimate_size methods by @ritchie46 in https://github.com/pola-rs/polars/pull/3110

fix and test estimated_size by @ritchie46 in https://github.com/pola-rs/polars/pull/3113

remove unused datafusion integration by @ritchie46 in https://github.com/pola-rs/polars/pull/3115

Nodejs writejson fix & avro read/write by @universalmind303 in https://github.com/pola-rs/polars/pull/3116

Parquet statistics: don't panic by @ritchie46 in https://github.com/pola-rs/polars/pull/3127

lazy: expand cols in filter by @ritchie46 in https://github.com/pola-rs/polars/pull/3128

melt extra arguments by @ritchie46 in https://github.com/pola-rs/polars/pull/3133

Lazy: Don't materialize whole table in JOIN followed by SLICE by @ritchie46 in https://github.com/pola-rs/polars/pull/3136

Pushdown SLICE to GROUPBY nodes by @ritchie46 in https://github.com/pola-rs/polars/pull/3138

Switch from unmaintained jemalloctor to maintained tikv-jemallocator. by @ghuls in https://github.com/pola-rs/polars/pull/3141

Polars vs Pivot: Round 3 🥊 ~2-25x improvement by @ritchie46 in https://github.com/pola-rs/polars/pull/3143

DataFrame::partition_by by @ritchie46 in https://github.com/pola-rs/polars/pull/3148

Add semi and anti joins. by @ritchie46 in https://github.com/pola-rs/polars/pull/3149

derive clone for lazy groupby by @elferherrera in https://github.com/pola-rs/polars/pull/3156

pushdown slice to sort nodes by @ritchie46 in https://github.com/pola-rs/polars/pull/3159

slice_pushdown projections by @ritchie46 in https://github.com/pola-rs/polars/pull/3160

lazy err on not found col by @ritchie46 in https://github.com/pola-rs/polars/pull/3169

improve inner join performance by @ritchie46 in https://github.com/pola-rs/polars/pull/3168

fix duration filters with different time units by @marcvanheerden in https://github.com/pola-rs/polars/pull/3179

fix overflow in agg_mean by @ritchie46 in https://github.com/pola-rs/polars/pull/3183

list eval expression by @ritchie46 in https://github.com/pola-rs/polars/pull/3185

Supporting Struct comparison and any/all API by @cjermain in https://github.com/pola-rs/polars/pull/3180

struct logical type arrow conversion by @ritchie46 in https://github.com/pola-rs/polars/pull/3193

make series comparissons fallible by @ritchie46 in https://github.com/pola-rs/polars/pull/3192

fix_pivot by @ritchie46 in https://github.com/pola-rs/polars/pull/3199

recursively convert arrow by @ritchie46 in https://github.com/pola-rs/polars/pull/3200

fix arr.eval type inference by @ritchie46 in https://github.com/pola-rs/polars/pull/3203

Improve Left join on chunked data by @ritchie46 in https://github.com/pola-rs/polars/pull/3177

polars-ops by @ritchie46 in https://github.com/pola-rs/polars/pull/3212

Fix tree traversal complexity by @ritchie46 in https://github.com/pola-rs/polars/pull/3213

Adding struct column tests by @ishmandoo in https://github.com/pola-rs/polars/pull/3209

struct: handle validity by @ritchie46 in https://github.com/pola-rs/polars/pull/3217

bug template bounce resolved bugs by @ritchie46 in https://github.com/pola-rs/polars/pull/3218

add duration minutes by @ritchie46 in https://github.com/pola-rs/polars/pull/3219

fix partition boundary by @ritchie46 in https://github.com/pola-rs/polars/pull/3223

Option to check column order when comparing polars dataframes by @physinet in https://github.com/pola-rs/polars/pull/3206

fix dispatch of quantile aggregations by @ritchie46 in https://github.com/pola-rs/polars/pull/3234

Improving array refs for to_list by @cjermain in https://github.com/pola-rs/polars/pull/3231

fix offsets in categorical merge by @ritchie46 in https://github.com/pola-rs/polars/pull/3242

Serialize/Deserialize LazyFrames/Logical plans by @ritchie46 in https://github.com/pola-rs/polars/pull/3244

setup serializable function + null_count expr by @ritchie46 in https://github.com/pola-rs/polars/pull/3247

improve ternary in groupby context by @ritchie46 in https://github.com/pola-rs/polars/pull/3248

fix skew autoexplode and add test by @marcvanheerden in https://github.com/pola-rs/polars/pull/3251

quantile agg; update grouptuples by @ritchie46 in https://github.com/pola-rs/polars/pull/3252

Only pass dtype to array, if not None: Fixes #3253 by @ghuls in https://github.com/pola-rs/polars/pull/3257

polars 0.21.0 by @ritchie46 in https://github.com/pola-rs/polars/pull/3258

do not write empty chunk to parquet by @ritchie46 in https://github.com/pola-rs/polars/pull/3259

Improve partitioned groupby by @ritchie46 in https://github.com/pola-rs/polars/pull/3263

improve sample_perf by @ritchie46 in https://github.com/pola-rs/polars/pull/3264

add iso strptime patterns by @ritchie46 in https://github.com/pola-rs/polars/pull/3265

add partial decompression in read_csv by @ritchie46 in https://github.com/pola-rs/polars/pull/3268

fix partitoned and error don't ignore errors by @ritchie46 in https://github.com/pola-rs/polars/pull/3273

fix row count for u64 idx by @ritchie46 in https://github.com/pola-rs/polars/pull/3285

Code coverage for Rust/Python by @cjermain in https://github.com/pola-rs/polars/pull/3278

Improve groupby states by @ritchie46 in https://github.com/pola-rs/polars/pull/3291

recursive list builder in rows by @ritchie46 in https://github.com/pola-rs/polars/pull/3293

Fix ipc_read_schema so Path() and filename which start with "~/" work. by @ghuls in https://github.com/pola-rs/polars/pull/3297

New Contributors

@LuisCardosoOliveira made their first contribution in https://github.com/pola-rs/polars/pull/2934

@keiv-fly made their first contribution in https://github.com/pola-rs/polars/pull/2930

@cigrainger made their first contribution in https://github.com/pola-rs/polars/pull/2944

@slonik-az made their first contribution in https://github.com/pola-rs/polars/pull/3124

@physinet made their first contribution in https://github.com/pola-rs/polars/pull/3215

Full Changelog*: https://github.com/pola-rs/polars/compare/rust-polars-v0.20.0...rust-polars-v0.21.
Source code(tar.gz)
Source code(zip)
rust-polars-v0.20.0(Mar 14, 2022)
New rust polars release! :rocket:

This release of 286 commits is here thanks to the contributions of: (in no specific order):

@moritzwilksch

@JakobGM

@illumination-k

@tamasfe

@ghuls

@alexander-beedie

@Maxyme

@universalmind303

@qiemem

@glennpierce

@nmandery

@ilsley

@marcvanheerden

did I forget your contribution, please ping me, I do this manually :see_no_evil:

Full changelog

crates.io

documentation

Most notable changes are:

Many bug fixes.

Many performance improvements.

features

Made representation of groups tuples more cache friendly #2431

Remove Seek requirement of readers

Add groupby_rolling as new entrance to expression API.

Improve CSV parsers stability and performance on several occasions

Horizontal aggregations are parallelized #2454

Reduce pivot code bloat and improve performance #2458

Struct data type added.

Extend methods that allow modification of the same memory if Arc::ref_count == 1

Avro readers and writers.

Improved rules of window expressions.

Support for us time unit.

Parquet use statistics in query optimizations.

Optimize projections in lazy computations. (Mostly useful when you deal with a large number of columns e.g. millions).

Improve performance and flexibility of melt operation @2799

new expressions

str.split

str.split_inclusive

arr.join

unique_stable

str.split_exact

count expression that does not require column names

arr.arg_min

arr.arg_max

arr.diff

arr.shift

Update to arrow2 0.10.0

See the 0.10.0 release for all upstream improvements.
Source code(tar.gz)
Source code(zip)