Extra iterator adaptors, iterator methods, free functions, and macros.

Last update: Dec 30, 2022

Related tags

Overview

Itertools

Extra iterator adaptors, functions and macros.

Please read the API documentation here

How to use with cargo:

[dependencies]
itertools = "0.10.0"

How to use in your crate:

use itertools::Itertools;

How to contribute

Fix a bug or implement a new thing
Include tests for your new feature, preferably a quickcheck test
Make a Pull Request

For new features, please first consider filing a PR to rust-lang/rust, adding your new feature to the Iterator trait of the standard library, if you believe it is reasonable. If it isn't accepted there, proposing it for inclusion in itertools is a good idea. The reason for doing is this is so that we avoid future breakage as with .flatten(). However, if your feature involves heap allocation, such as storing elements in a Vec<T>, then it can't be accepted into libcore, and you should propose it for itertools directly instead.

License

Dual-licensed to be compatible with the Rust project.

Licensed under the Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0 or the MIT license http://opensource.org/licenses/MIT, at your option. This file may not be copied, modified, or distributed except according to those terms.

Comments

Add exactly_one function

Adds a small function I found uses for personally. Can't be added to the core library because in the error case it allocates a vec and stores data in it.

opened by Xaeroxe 26
Preparation for next release series (0.5.x)
We will remove a lot of deprecated items, and rename some.

This is on the road towards an eventual 1.0 version of the library.

Notes

Numerical things will move because we don't want num-traits in itertools. linspace will move to a crate that uses num-traits

Use free functions to construct iterators that are sources. For example Unfold::new(x, y) → unfold(x, y). Follows libstd convention (std::iter::once).

Unstable/nightly parts will need to live in another crate

Rename and move stuff around to increase consistency. This is the best for the long run.

See the commits for an overview

See also #87

Fixes #86
opened by bluss 23
Powerset iterator adaptor

Implements a powerset iterator adaptor that iterates over all subsets of the input iterator's elements. Returns vectors representing these subsets. Internally uses Combinations of increasing length.

I've taken the strategy of using a 'position' field that acts both as a means to to detect the special case of the first element and also allows optimal size_hint() implementation.

Additionally there is a commit to improve performance that alters Combinations implementation slightly. I've added Combinations benchmark as a stand-alone commit to allow checking for performance regressions. Powerset performance after this commit improves some cases (with small sizes of n) by 10-30%

This is my first attempt at a Rust contribution, happy to put in whatever discussion/work to get this merged. Cheers!
waiting-on-review

opened by willcrozi 20
Add k-way merge adaptor.

Merges an arbitrary number of iterators in ascending order.

Uses std's BinaryHeap to decide which iterator to take from next. This seems quite heavyweight. Two-way merge benchmarks take roughly ten times longer than the dedicated two-way merge adaptor. Profiling identifies BinaryHeaps sift_up as the hot-spot.

Not completely sure about the interface, the double use of IntoIterator in the free-standing function in particular.

opened by bsteinb 20
Add `exhaust` which simply exhausts the iterator

This is useful for iterators which are simply used for their side effects.

The new function is added to make intent clearer for places where you'd previously call count or similar to exhaust the iterator.

opened by tbu- 18
Added `diff.rs` module with `diff` and `copy_on_diff` - for efficiently "diff"ing and caching non-`Clone` iterators.

I originally wrote this module for a couple of cases in conrod where I wanted to cache a non-Clone iterator without having to collect the iterator into a Vec for comparison in case the cache needed updating.

I've just come across another similar case in a personal project, so before copying code across I thought I'd approach the itertools team in case you deem it worthy of adding to your luscious collection.

Either way, any feedback appreciated!

opened by mitchmindtree 18
$Add Itertools::filter_{,map_}results$

Add Itertools::filter_{,map_}results

I found myself wanting filter_map_results() and just implemented filter_results() in the same stretch. I hope you find it useful!

I wasn't sure if #236 applies to these new iterators; I just implemented them.

opened by gin-ahirsch 16
Permutations
Fixes #285.

This implementation is based on the Python implementation specified in the issue. There are quite a few points which should be considered.

Method name, k-less form

The adaptor function is called permutations, for user familiarity. However, since the size of the output can be specified, a more mathematically accurate name would probably be k_permutations. Perhaps two adaptor functions would be better: .k_permutations(k: usize) and .permutations() (which just sets k = vals.len())?

Item value ordering/distinctness

Input items are not inspected in any way; they are treated purely by their initial index. This means that:

Permutations are yielded in lexicographical order of the index, not by any PartialOrd implementation of the items.

Identical items will not be detected, and will result in some identical permutations.

1 can be achieved by the user by collecting-and-sorting their input.

2 is a little trickier, but can be achieved by filtering the output. However, I think there is a more efficient algorithm to avoid duplicated. Maybe we should provide this option?

Permutations from source buffer/indices

In addition to the iterator adaptor, I've added Permutations::from_vals, to create directly from a Vec or a slice. This saves some cloning compared to using source.[into_]iter().permutations(k).

There is also Permutations::new(n, k), which is functionally equivalent to (0..n).permutations(k), but a little faster (about 0.6x the run time).

But perhaps you would consider these unnecessary additions to the API?

PermutationSource trait

These different implementations (from Vec/slice/just indices) are achieved with the trait PermutationSource. It's visible to the user to implement for other structures if they wish, but this is probably of limited value. There's not much harm in allowing the user to access it, but again, maybe it's just API bloat. (or a future breaking change when it's removed/changed...)

On the other hand, perhaps there are other places in the library which could benefit from taking a source generically? Any adaptor where input elements are used more than once will need to store them, and it might be more efficient to allow users to supply the memory directly. This could be done in another PR.

For completeness, I also made full implementations of the three variations, without the trait, for benchmarking. The pure-indices implementation is about 10% slower using the trait, but Vecs and slices are unaffected (or even a little faster...).

Combinations changes

I've made small changes to the Combinations documentation to match Permutations - most significantly, replacing the letter n with k. I've also changed all uses of the variable n to k in the implementation... but maybe this is considered commit noise :)

There are some other potential changes which would bring Combinations and Permutations more in line with one another.

As mentioned above, Combinations might benefit from using a _____Source trait, to allow direct memory buffers.

My Permutations implementations doesn't make use of LazyBuffer. Perhaps the buffer is useful for iterators which never complete, and have a very small k value. Has any benchmarking been done?

Either way, it makes sense for both adaptors to either use or not use LazyBuffer. Maybe it could be integrated into the _______Source trait if it's useful?

Let me know what you think when you have the chance. (Sorry for submitting two reasonably big PRs at once. I hope you get a chance to go through it all eventually!)
waiting-on-author
opened by tobz1000 14
Add Itertools::tree_fold1

This is useful for non-associative operations like building a tree (folding over a binop constructor) or combining floating point data (where this order keeps the magnitudes more similar).

opened by scottmcm 14
Add ProcessResults
This has been living in the standard library for a while and seems like it could be more widely useful.

TODO

[x] Missing IntoIterator conversions

[x] Missing size_hint
opened by shepmaster 14
Add into_grouping_map for efficient group-and-fold operations
Adds two functions on the Itertools trait, into_grouping_map and into_grouping_map_by.

into_grouping_map expects an iter of (K, V) where the K will be used as key and V as value. into_grouping_map_by expect only an iter of V as values, the keys will be calculated using the provided functions.

Both of them return a GroupingMap, which is just a wrapper on an iterator. Since it introduces a lot of related methods I thought it would be better to separate them from the Itertools trait. This also prevents duplicating every method for the _by version.

All of these functions have in common the fact they perform efficient group-and-fold operations without allocating temporary vecs like you would normally do if you used into_group_map + into_iter + map + collect::<HashMap<_, _>>().

Here's the possible issues I can see, I would like to hear some feedback before trying to fix any of them:

name: I initially though of grouping_by which is the java/kotlin equivalent, but later changed it to into_grouping_map to match the already existing into_group_map and to differentiate from group_by;

fold_first: the equivalent function in the Itertools trait is fold1 but there's an unstable stdlib function that does the same thing and is called fold_first. I decided to be consistent with the stdlib;

minmax return type is the already existing MinMaxResult, but the NoElements variant is never returned. I didn't want to duplicate that struct thinking it could cause confusion (and if I did what name could I have chosen?);

sum and product: They dont' use the Sum and Product traits but instead require V: Add<V, Output=V> and V: Mul<V, Output=V>. They're pretty much a wrapper around fold_first. I don't really know if they're meaningful or if I should just remove them;

almost every closure takes as one of the parameters a reference to the key. It bloats them a bit, but could be useful if computing the key is a relatively expensive operation.

no scan function. Even though it is an iterator adapter I could sort of "collect" it into a Vec (more like extend) but I don't really see an use for this;

~~no integration tests for the _by and _by_key versions of min, max and minmax. To be fair I was a bit lazy, but I also couldn't find any integration test for the normal minmax_by and minmax_by_key so I though it was fine;~~ added;

no benchmark (I don't think it is required but probably would be cool to compare this with into_group_map;

I didn't want to touch the rest of the library, but I guess we could implement into_group_map in terms of into_grouping_map?

Related issues: #457, #309 Related PR: #406 (in particular relates to the into_group_map_by_fold function that was proposed but later removed)
opened by SkiFire13 13

Add a `PeekMap` iterator adaptor

Add an iterator that allows mapping with (next_item, Option(&peek_item). This is useful for parsers. Exact semantics may very, but the goal is to get something like tuple_windows without needing to clone

100% credit to @talchas on Discord (I don't see that user here) for the implementation:

use std::iter::Peekable;
pub struct PeekMap<I: Iterator, F>(Peekable<I>, F);
pub fn peek_map<R, I: Iterator, F: FnMut(I::Item, Option<&I::Item>) -> R>(it: Peekable<I>, f: F) -> PeekMap<I, F> {
    PeekMap(it, f)
}
impl<R, I: Iterator, F: FnMut(I::Item, Option<&I::Item>) -> R> Iterator for PeekMap<I, F> {
    type Item = R;
    fn next(&mut self) -> Option<R> {
        let x = self.0.next()?;
        Some((self.1)(x, self.0.peek()))
    }
}

opened by tgross35 0

Revisit panicking in `std::fmt::Display` implementations as a footgun

Panicking in std::fmt::Display implementations is a glaring footgun. Here is an example, where you are guaranteed to spend several hours debugging it, and potentially you will never find a cause because the panic happens in production deeply inside the logging system, which is your eyes, but they are now blind. Also, you won't notice this during development, because this log statement occurs in some obscure error-handling code which is hit 0.00001% of the time:

use std::collections::HashMap;
use tracing_subscriber::prelude::*;
use itertools::Itertools;

fn main() {
    let (loki, _join_handle) = tracing_loki::layer(
        "http://loki:3100".parse().unwrap(),
        HashMap::new(),
        HashMap::new(),
    )
    .unwrap();

    tracing_subscriber::registry()
        .with(tracing_subscriber::fmt::layer())
        // Commenting out the following line fixes the panic, but it's not obvious.
        // Basically any other layer that runs the `fmt::Display` implementation
        // of the `Itertools::format` second time after `fmt` subscriber may suffer from this panic
        .with(loki)
        .init();

    std::panic::set_hook(Box::new(move |_| {
        tracing::error!("panic happened");
    }));

    // panics here deep inside tracing when several layers display the value
    tracing::info!(field = %[1, 2, 3].iter().format(","), "Log event");

    eprintln!("This is never hit, and the panic `error!()` is never printed");
}

With this you'll see only this log output without any indication of panic:

Solution

Don't panic in std::fmt::Display implementations. Instead, the best approach would be to format a message with a lot of CAPS letters that say what the problem there is in the rendered message. Or panicking behavior could be made conditional upon a feature flag or maybe some other way of configuration, like a global variable or setup function

opened by Veetaha 0

Cannot use std::iter::try_collect after importing itertools

The try_collect method in itertools is shadowing the method of the same name in std::iter on nightly : https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.try_collect

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=97cdf9666d33eeca8a69acdd1e59948e

#![allow(unused)]
#![feature(iterator_try_collect)]

use itertools::Itertools; // 0.10.5

fn main() {
    let u = vec![Some(1), Some(2), Some(3)];
    let v = u.into_iter().try_collect::<Vec<i32>>();
    assert_eq!(v, Some(vec![1, 2, 3]));
}

Compiling playground v0.0.1 (/playground)
error[[E0107]](https://doc.rust-lang.org/nightly/error-index.html#E0107): this associated function takes 3 generic arguments but 1 generic argument was supplied
    --> src/main.rs:8:27
     |
8    |     let v = u.into_iter().try_collect::<Vec<i32>>();
     |                           ^^^^^^^^^^^   -------- supplied 1 generic argument
     |                           |
     |                           expected 3 generic arguments
     |
note: associated function defined here, with 3 generic parameters: `T`, `U`, `E`
    --> /playground/.cargo/registry/src/github.com-1ecc6299db9ec823/itertools-0.10.5/src/lib.rs:2009:8
     |
2009 |     fn try_collect<T, U, E>(self) -> Result<U, E>
     |        ^^^^^^^^^^^ -  -  -
help: add missing generic arguments
     |
8    |     let v = u.into_iter().try_collect::<Vec<i32>, U, E>();
     |                                                 ++++++

error[[E0271]](https://doc.rust-lang.org/nightly/error-index.html#E0271): expected `IntoIter<Option<{integer}>>` to be an iterator that yields `Result<Vec<i32>, _>`, but it yields `Option<{integer}>`
    --> src/main.rs:8:27
     |
8    |     let v = u.into_iter().try_collect::<Vec<i32>>();
     |                           ^^^^^^^^^^^ expected enum `Result`, found enum `Option`
     |
     = note: expected enum `Result<Vec<i32>, _>`
                found enum `Option<{integer}>`
note: the method call chain might not have had the expected associated types
    --> src/main.rs:8:15
     |
7    |     let u = vec![Some(1), Some(2), Some(3)];
     |             ------------------------------- this expression has type `Vec<Option<{integer}>>`
8    |     let v = u.into_iter().try_collect::<Vec<i32>>();
     |               ^^^^^^^^^^^ `Iterator::Item` is `Option<{integer}>` here
note: required by a bound in `itertools::Itertools::try_collect`
    --> /playground/.cargo/registry/src/github.com-1ecc6299db9ec823/itertools-0.10.5/src/lib.rs:2011:32
     |
2011 |         Self: Sized + Iterator<Item = Result<T, E>>,
     |                                ^^^^^^^^^^^^^^^^^^^ required by this bound in `Itertools::try_collect`

error[[E0308]](https://doc.rust-lang.org/nightly/error-index.html#E0308): mismatched types
 --> src/main.rs:9:5
  |
9 |     assert_eq!(v, Some(vec![1, 2, 3]));
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected enum `Result`, found enum `Option`
  |
  = note: expected enum `Result<_, _>`
             found enum `Option<Vec<{integer}>>`
  = note: this error originates in the macro `assert_eq` (in Nightly builds, run with -Z macro-backtrace for more info)

Some errors have detailed explanations: E0107, E0271, E0308.
For more information about an error, try `rustc --explain E0107`.
error: could not compile `playground` due to 3 previous errors

opened by ShiveshM 0

group_by referencing temporary value issue

I really like group_by, but there is a limitation I sometimes run into, when transforming iterators, due to the temporary values. Currently that forces to consume the group_by always locally.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=18e06b09b59c1260ba0574d48c272eff


use itertools::Itertools;
fn return_group_by(iter: impl Iterator<Item=u32> + 'static) -> impl Iterator<Item=u32> + 'static{

    iter.group_by(move |idx| {
        get_block_id(*idx)
    })
    .into_iter()
    .flat_map(move |(block_id, chunk)| {
        // handle all elements in one block at once
        chunk.map(move |idx_in_block| {
            idx_in_block + 10
        })
    })
}

fn get_block_id(val: u32) -> u32{
    val >> 16
}

Compiling playground v0.0.1 (/playground)
error[[E0515]](https://doc.rust-lang.org/stable/error-index.html#E0515): cannot return value referencing temporary value
  [--> src/lib.rs:5:5](https://play.rust-lang.org/#)

opened by PSeitz 0

Implement ExactSizeIterator for CircularTupleWindows
.circular_tuple_windows() will produce the same amount of elements as the upstream iterator, so I believe the implementation should be pretty simple.

Speaking about the practical example, I want to convert a vector of 2D points to pairs of normalized vectors.

let vecs = shape .into_iter() .circular_tuple_windows::<(_, _)>() .map(|((x1, y1), (x2, y2)): (Point, Point)| (x2 - x1, y2 - y1) as Point) .circular_tuple_windows(); // trait `ExactSizeIterator` is not satisfied
opened by xamgore 0