Extra iterator adaptors, iterator methods, free functions, and macros.

Overview

Itertools

Extra iterator adaptors, functions and macros.

Please read the API documentation here

build_status crates

How to use with cargo:

[dependencies]
itertools = "0.10.0"

How to use in your crate:

use itertools::Itertools;

How to contribute

  • Fix a bug or implement a new thing
  • Include tests for your new feature, preferably a quickcheck test
  • Make a Pull Request

For new features, please first consider filing a PR to rust-lang/rust, adding your new feature to the Iterator trait of the standard library, if you believe it is reasonable. If it isn't accepted there, proposing it for inclusion in itertools is a good idea. The reason for doing is this is so that we avoid future breakage as with .flatten(). However, if your feature involves heap allocation, such as storing elements in a Vec<T>, then it can't be accepted into libcore, and you should propose it for itertools directly instead.

License

Dual-licensed to be compatible with the Rust project.

Licensed under the Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0 or the MIT license http://opensource.org/licenses/MIT, at your option. This file may not be copied, modified, or distributed except according to those terms.

Comments
  • Add exactly_one function

    Add exactly_one function

    Adds a small function I found uses for personally. Can't be added to the core library because in the error case it allocates a vec and stores data in it.

    opened by Xaeroxe 26
  • Preparation for next release series (0.5.x)

    Preparation for next release series (0.5.x)

    We will remove a lot of deprecated items, and rename some.

    This is on the road towards an eventual 1.0 version of the library.

    Notes

    • Numerical things will move because we don't want num-traits in itertools. linspace will move to a crate that uses num-traits
    • Use free functions to construct iterators that are sources. For example Unfold::new(x, y)unfold(x, y). Follows libstd convention (std::iter::once).
    • Unstable/nightly parts will need to live in another crate
    • Rename and move stuff around to increase consistency. This is the best for the long run.
    • See the commits for an overview

    See also #87

    Fixes #86

    opened by bluss 23
  • Powerset iterator adaptor

    Powerset iterator adaptor

    Implements a powerset iterator adaptor that iterates over all subsets of the input iterator's elements. Returns vectors representing these subsets. Internally uses Combinations of increasing length.

    I've taken the strategy of using a 'position' field that acts both as a means to to detect the special case of the first element and also allows optimal size_hint() implementation.

    Additionally there is a commit to improve performance that alters Combinations implementation slightly. I've added Combinations benchmark as a stand-alone commit to allow checking for performance regressions. Powerset performance after this commit improves some cases (with small sizes of n) by 10-30%

    This is my first attempt at a Rust contribution, happy to put in whatever discussion/work to get this merged. Cheers!

    waiting-on-review 
    opened by willcrozi 20
  • Add k-way merge adaptor.

    Add k-way merge adaptor.

    Merges an arbitrary number of iterators in ascending order.

    Uses std's BinaryHeap to decide which iterator to take from next. This seems quite heavyweight. Two-way merge benchmarks take roughly ten times longer than the dedicated two-way merge adaptor. Profiling identifies BinaryHeaps sift_up as the hot-spot.

    Not completely sure about the interface, the double use of IntoIterator in the free-standing function in particular.

    opened by bsteinb 20
  • Add `exhaust` which simply exhausts the iterator

    Add `exhaust` which simply exhausts the iterator

    This is useful for iterators which are simply used for their side effects.

    The new function is added to make intent clearer for places where you'd previously call count or similar to exhaust the iterator.

    opened by tbu- 18
  • Added `diff.rs` module with `diff` and `copy_on_diff` - for efficiently

    Added `diff.rs` module with `diff` and `copy_on_diff` - for efficiently "diff"ing and caching non-`Clone` iterators.

    I originally wrote this module for a couple of cases in conrod where I wanted to cache a non-Clone iterator without having to collect the iterator into a Vec for comparison in case the cache needed updating.

    I've just come across another similar case in a personal project, so before copying code across I thought I'd approach the itertools team in case you deem it worthy of adding to your luscious collection.

    Either way, any feedback appreciated!

    opened by mitchmindtree 18
  • Add Itertools::filter_{,map_}results

    Add Itertools::filter_{,map_}results

    I found myself wanting filter_map_results() and just implemented filter_results() in the same stretch. I hope you find it useful!

    I wasn't sure if #236 applies to these new iterators; I just implemented them.

    opened by gin-ahirsch 16
  • Permutations

    Permutations

    Fixes #285.

    This implementation is based on the Python implementation specified in the issue. There are quite a few points which should be considered.

    Method name, k-less form

    The adaptor function is called permutations, for user familiarity. However, since the size of the output can be specified, a more mathematically accurate name would probably be k_permutations. Perhaps two adaptor functions would be better: .k_permutations(k: usize) and .permutations() (which just sets k = vals.len())?

    Item value ordering/distinctness

    Input items are not inspected in any way; they are treated purely by their initial index. This means that:

    1. Permutations are yielded in lexicographical order of the index, not by any PartialOrd implementation of the items.
    2. Identical items will not be detected, and will result in some identical permutations.

    1 can be achieved by the user by collecting-and-sorting their input.

    2 is a little trickier, but can be achieved by filtering the output. However, I think there is a more efficient algorithm to avoid duplicated. Maybe we should provide this option?

    Permutations from source buffer/indices

    In addition to the iterator adaptor, I've added Permutations::from_vals, to create directly from a Vec or a slice. This saves some cloning compared to using source.[into_]iter().permutations(k).

    There is also Permutations::new(n, k), which is functionally equivalent to (0..n).permutations(k), but a little faster (about 0.6x the run time).

    But perhaps you would consider these unnecessary additions to the API?

    PermutationSource trait

    These different implementations (from Vec/slice/just indices) are achieved with the trait PermutationSource. It's visible to the user to implement for other structures if they wish, but this is probably of limited value. There's not much harm in allowing the user to access it, but again, maybe it's just API bloat. (or a future breaking change when it's removed/changed...)

    On the other hand, perhaps there are other places in the library which could benefit from taking a source generically? Any adaptor where input elements are used more than once will need to store them, and it might be more efficient to allow users to supply the memory directly. This could be done in another PR.

    For completeness, I also made full implementations of the three variations, without the trait, for benchmarking. The pure-indices implementation is about 10% slower using the trait, but Vecs and slices are unaffected (or even a little faster...).

    Combinations changes

    I've made small changes to the Combinations documentation to match Permutations - most significantly, replacing the letter n with k. I've also changed all uses of the variable n to k in the implementation... but maybe this is considered commit noise :)

    There are some other potential changes which would bring Combinations and Permutations more in line with one another.

    As mentioned above, Combinations might benefit from using a _____Source trait, to allow direct memory buffers.

    My Permutations implementations doesn't make use of LazyBuffer. Perhaps the buffer is useful for iterators which never complete, and have a very small k value. Has any benchmarking been done?

    Either way, it makes sense for both adaptors to either use or not use LazyBuffer. Maybe it could be integrated into the _______Source trait if it's useful?


    Let me know what you think when you have the chance. (Sorry for submitting two reasonably big PRs at once. I hope you get a chance to go through it all eventually!)

    waiting-on-author 
    opened by tobz1000 14
  • Add Itertools::tree_fold1

    Add Itertools::tree_fold1

    This is useful for non-associative operations like building a tree (folding over a binop constructor) or combining floating point data (where this order keeps the magnitudes more similar).

    opened by scottmcm 14
  • Add ProcessResults

    Add ProcessResults

    This has been living in the standard library for a while and seems like it could be more widely useful.

    TODO

    • [x] Missing IntoIterator conversions
    • [x] Missing size_hint
    opened by shepmaster 14
  • Add into_grouping_map for efficient group-and-fold operations

    Add into_grouping_map for efficient group-and-fold operations

    Adds two functions on the Itertools trait, into_grouping_map and into_grouping_map_by.

    into_grouping_map expects an iter of (K, V) where the K will be used as key and V as value. into_grouping_map_by expect only an iter of V as values, the keys will be calculated using the provided functions.

    Both of them return a GroupingMap, which is just a wrapper on an iterator. Since it introduces a lot of related methods I thought it would be better to separate them from the Itertools trait. This also prevents duplicating every method for the _by version.

    All of these functions have in common the fact they perform efficient group-and-fold operations without allocating temporary vecs like you would normally do if you used into_group_map + into_iter + map + collect::<HashMap<_, _>>().

    Here's the possible issues I can see, I would like to hear some feedback before trying to fix any of them:

    • name: I initially though of grouping_by which is the java/kotlin equivalent, but later changed it to into_grouping_map to match the already existing into_group_map and to differentiate from group_by;
    • fold_first: the equivalent function in the Itertools trait is fold1 but there's an unstable stdlib function that does the same thing and is called fold_first. I decided to be consistent with the stdlib;
    • minmax return type is the already existing MinMaxResult, but the NoElements variant is never returned. I didn't want to duplicate that struct thinking it could cause confusion (and if I did what name could I have chosen?);
    • sum and product: They dont' use the Sum and Product traits but instead require V: Add<V, Output=V> and V: Mul<V, Output=V>. They're pretty much a wrapper around fold_first. I don't really know if they're meaningful or if I should just remove them;
    • almost every closure takes as one of the parameters a reference to the key. It bloats them a bit, but could be useful if computing the key is a relatively expensive operation.
    • no scan function. Even though it is an iterator adapter I could sort of "collect" it into a Vec (more like extend) but I don't really see an use for this;
    • ~~no integration tests for the _by and _by_key versions of min, max and minmax. To be fair I was a bit lazy, but I also couldn't find any integration test for the normal minmax_by and minmax_by_key so I though it was fine;~~ added;
    • no benchmark (I don't think it is required but probably would be cool to compare this with into_group_map;
    • I didn't want to touch the rest of the library, but I guess we could implement into_group_map in terms of into_grouping_map?

    Related issues: #457, #309 Related PR: #406 (in particular relates to the into_group_map_by_fold function that was proposed but later removed)

    opened by SkiFire13 13
  • Add a `PeekMap` iterator adaptor

    Add a `PeekMap` iterator adaptor

    Add an iterator that allows mapping with (next_item, Option(&peek_item). This is useful for parsers. Exact semantics may very, but the goal is to get something like tuple_windows without needing to clone

    100% credit to @talchas on Discord (I don't see that user here) for the implementation:

    use std::iter::Peekable;
    pub struct PeekMap<I: Iterator, F>(Peekable<I>, F);
    pub fn peek_map<R, I: Iterator, F: FnMut(I::Item, Option<&I::Item>) -> R>(it: Peekable<I>, f: F) -> PeekMap<I, F> {
        PeekMap(it, f)
    }
    impl<R, I: Iterator, F: FnMut(I::Item, Option<&I::Item>) -> R> Iterator for PeekMap<I, F> {
        type Item = R;
        fn next(&mut self) -> Option<R> {
            let x = self.0.next()?;
            Some((self.1)(x, self.0.peek()))
        }
    }
    
    opened by tgross35 0
  • Revisit panicking in `std::fmt::Display` implementations as a footgun

    Revisit panicking in `std::fmt::Display` implementations as a footgun

    Panicking in std::fmt::Display implementations is a glaring footgun. Here is an example, where you are guaranteed to spend several hours debugging it, and potentially you will never find a cause because the panic happens in production deeply inside the logging system, which is your eyes, but they are now blind. Also, you won't notice this during development, because this log statement occurs in some obscure error-handling code which is hit 0.00001% of the time:

    use std::collections::HashMap;
    use tracing_subscriber::prelude::*;
    use itertools::Itertools;
    
    fn main() {
        let (loki, _join_handle) = tracing_loki::layer(
            "http://loki:3100".parse().unwrap(),
            HashMap::new(),
            HashMap::new(),
        )
        .unwrap();
    
        tracing_subscriber::registry()
            .with(tracing_subscriber::fmt::layer())
            // Commenting out the following line fixes the panic, but it's not obvious.
            // Basically any other layer that runs the `fmt::Display` implementation
            // of the `Itertools::format` second time after `fmt` subscriber may suffer from this panic
            .with(loki)
            .init();
    
        std::panic::set_hook(Box::new(move |_| {
            tracing::error!("panic happened");
        }));
    
        // panics here deep inside tracing when several layers display the value
        tracing::info!(field = %[1, 2, 3].iter().format(","), "Log event");
    
        eprintln!("This is never hit, and the panic `error!()` is never printed");
    }
    

    With this you'll see only this log output without any indication of panic:

    image

    Solution

    Don't panic in std::fmt::Display implementations. Instead, the best approach would be to format a message with a lot of CAPS letters that say what the problem there is in the rendered message. Or panicking behavior could be made conditional upon a feature flag or maybe some other way of configuration, like a global variable or setup function

    opened by Veetaha 0
  • Cannot use std::iter::try_collect after importing itertools

    Cannot use std::iter::try_collect after importing itertools

    The try_collect method in itertools is shadowing the method of the same name in std::iter on nightly : https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.try_collect

    https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=97cdf9666d33eeca8a69acdd1e59948e

    #![allow(unused)]
    #![feature(iterator_try_collect)]
    
    use itertools::Itertools; // 0.10.5
    
    fn main() {
        let u = vec![Some(1), Some(2), Some(3)];
        let v = u.into_iter().try_collect::<Vec<i32>>();
        assert_eq!(v, Some(vec![1, 2, 3]));
    }
    
    Compiling playground v0.0.1 (/playground)
    error[[E0107]](https://doc.rust-lang.org/nightly/error-index.html#E0107): this associated function takes 3 generic arguments but 1 generic argument was supplied
        --> src/main.rs:8:27
         |
    8    |     let v = u.into_iter().try_collect::<Vec<i32>>();
         |                           ^^^^^^^^^^^   -------- supplied 1 generic argument
         |                           |
         |                           expected 3 generic arguments
         |
    note: associated function defined here, with 3 generic parameters: `T`, `U`, `E`
        --> /playground/.cargo/registry/src/github.com-1ecc6299db9ec823/itertools-0.10.5/src/lib.rs:2009:8
         |
    2009 |     fn try_collect<T, U, E>(self) -> Result<U, E>
         |        ^^^^^^^^^^^ -  -  -
    help: add missing generic arguments
         |
    8    |     let v = u.into_iter().try_collect::<Vec<i32>, U, E>();
         |                                                 ++++++
    
    error[[E0271]](https://doc.rust-lang.org/nightly/error-index.html#E0271): expected `IntoIter<Option<{integer}>>` to be an iterator that yields `Result<Vec<i32>, _>`, but it yields `Option<{integer}>`
        --> src/main.rs:8:27
         |
    8    |     let v = u.into_iter().try_collect::<Vec<i32>>();
         |                           ^^^^^^^^^^^ expected enum `Result`, found enum `Option`
         |
         = note: expected enum `Result<Vec<i32>, _>`
                    found enum `Option<{integer}>`
    note: the method call chain might not have had the expected associated types
        --> src/main.rs:8:15
         |
    7    |     let u = vec![Some(1), Some(2), Some(3)];
         |             ------------------------------- this expression has type `Vec<Option<{integer}>>`
    8    |     let v = u.into_iter().try_collect::<Vec<i32>>();
         |               ^^^^^^^^^^^ `Iterator::Item` is `Option<{integer}>` here
    note: required by a bound in `itertools::Itertools::try_collect`
        --> /playground/.cargo/registry/src/github.com-1ecc6299db9ec823/itertools-0.10.5/src/lib.rs:2011:32
         |
    2011 |         Self: Sized + Iterator<Item = Result<T, E>>,
         |                                ^^^^^^^^^^^^^^^^^^^ required by this bound in `Itertools::try_collect`
    
    error[[E0308]](https://doc.rust-lang.org/nightly/error-index.html#E0308): mismatched types
     --> src/main.rs:9:5
      |
    9 |     assert_eq!(v, Some(vec![1, 2, 3]));
      |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected enum `Result`, found enum `Option`
      |
      = note: expected enum `Result<_, _>`
                 found enum `Option<Vec<{integer}>>`
      = note: this error originates in the macro `assert_eq` (in Nightly builds, run with -Z macro-backtrace for more info)
    
    Some errors have detailed explanations: E0107, E0271, E0308.
    For more information about an error, try `rustc --explain E0107`.
    error: could not compile `playground` due to 3 previous errors
    
    opened by ShiveshM 0
  • group_by referencing temporary value issue

    group_by referencing temporary value issue

    I really like group_by, but there is a limitation I sometimes run into, when transforming iterators, due to the temporary values. Currently that forces to consume the group_by always locally.

    https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=18e06b09b59c1260ba0574d48c272eff

    
    use itertools::Itertools;
    fn return_group_by(iter: impl Iterator<Item=u32> + 'static) -> impl Iterator<Item=u32> + 'static{
    
        iter.group_by(move |idx| {
            get_block_id(*idx)
        })
        .into_iter()
        .flat_map(move |(block_id, chunk)| {
            // handle all elements in one block at once
            chunk.map(move |idx_in_block| {
                idx_in_block + 10
            })
        })
    }
    
    fn get_block_id(val: u32) -> u32{
        val >> 16
    }
    
    
    
    Compiling playground v0.0.1 (/playground)
    error[[E0515]](https://doc.rust-lang.org/stable/error-index.html#E0515): cannot return value referencing temporary value
      [--> src/lib.rs:5:5](https://play.rust-lang.org/#)
    
    opened by PSeitz 0
  • Implement ExactSizeIterator for CircularTupleWindows

    Implement ExactSizeIterator for CircularTupleWindows

    .circular_tuple_windows() will produce the same amount of elements as the upstream iterator, so I believe the implementation should be pretty simple.

    Speaking about the practical example, I want to convert a vector of 2D points to pairs of normalized vectors.

      let vecs = shape
        .into_iter()
        .circular_tuple_windows::<(_, _)>()
        .map(|((x1, y1), (x2, y2)): (Point, Point)| (x2 - x1, y2 - y1) as Point)
        .circular_tuple_windows(); //  trait `ExactSizeIterator` is not satisfied
    
    opened by xamgore 0
Garbage Collector(Hyaline- Safe Memory Reclaimation) for lock free data structures

Hyaline-SMR This crate provides garbage collection using hyaline algorithm for building concurrent data structures. When a thread removes an object fr

Abishek 2 Dec 21, 2022
A lock-free multi-producer multi-consumer unbounded queue.

lf-queue A lock-free multi-producer multi-consumer unbounded queue. Examples [dependencies] lf-queue = "0.1" Single Producer - Single Consumer: use lf

Pierre Brouca 2 Sep 11, 2022
Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.

Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.

null 6.5k Dec 31, 2022
Coding-challenge - Algorithms and Data-structures, problems and solutions in Rust language using cargo-workspaces

Coding Challenge LeetCode/Hackerrank e.t.c Using this as an opportunity to improve my knowledge of rust lang If you found this repo useful to you, add

Tolumide Shopein 17 Apr 24, 2022
Array helpers for Rust's Vector and String types

array_tool Array helpers for Rust. Some of the most common methods you would use on Arrays made available on Vectors. Polymorphic implementations for

Daniel P. Clark 69 Dec 9, 2022
K-dimensional tree in Rust for fast geospatial indexing and lookup

kdtree K-dimensional tree in Rust for fast geospatial indexing and nearest neighbors lookup Crate Documentation Usage Benchmark License Usage Add kdtr

Rui Hu 152 Jan 2, 2023
Algorithms and Data Structures of all kinds written in Rust.

Classic Algorithms in Rust This repo contains the implementation of various classic algorithms for educational purposes in Rust. Right now, it is in i

Alexander González 49 Dec 14, 2022
Obake is a procedural macro for declaring and maintaining versioned data-structures.

Obake is a procedural macro for declaring and maintaining versioned data-structures. The name 'obake' is taken from the Japanese 'お化け (おばけ)', a class of supernatural beings in Japanese folklore that shapeshift.

Nathan Corbyn 174 Dec 26, 2022
A simple rust library to help create octrees and quadtrees for chunked level of detail

LodTree LodTree, a simple tree data structure for doing chunk-based level of detail. Goals The aim of this crate is to provide a generic, easy to use

Dimev 14 Dec 29, 2022
Serialize/DeSerialize for Rust built-in types and user defined types (complex struct types)

Serialize/DeSerialize for Rust built-in types and user defined types (complex struct types)

null 2 May 3, 2022
Broot - A new way to see and navigate directory trees

Broot A better way to navigate directories Installation Instructions Get an overview of a directory, even a big one br -s Notice the unlisted? That's

Canop 8k Jan 8, 2023
Rust-algorithm-club - Learn algorithms and data structures with Rust

Rust Algorithm Club ?? ?? This repo is under construction. Most materials are written in Chinese. Check it out here if you are able to read Chinese. W

Weihang Lo 360 Dec 28, 2022
Common data structures and algorithms in Rust

Contest Algorithms in Rust A collection of classic data structures and algorithms, emphasizing usability, beauty and clarity over full generality. As

Aram Ebtekar 3.3k Dec 27, 2022
A hashmap implementation, which uses hashset, and keys are contained within values.

A hashmap implementation, which uses hashset, and keys are contained within values.

Piotr Mikulski 2 Nov 29, 2022
Rust data structures and client for the PubChem REST API

pubchem.rs Rust data structures and client for the PubChem REST API. ?? Usage ?? Compound Create a Compound to query the PubChem API for a single comp

Martin Larralde 2 Jan 18, 2022
Fast, efficient, and robust memory reclamation for concurrent data structures

Seize Fast, efficient, and robust memory reclamation for concurrent data structures. Introduction Concurrent data structures are faced with the proble

Ibraheem Ahmed 240 Dec 23, 2022
A scalable message queue powered by a segmented, partitioned, replicated and immutable log.

A scalable message queue powered by a segmented, partitioned, replicated and immutable log. This is currently a work in progress. laminarmq is intende

Arindam Das 20 Dec 16, 2022
rust_aads - Rust Algorithms And Data Structures

rust_aads - Rust Algorithms And Data Structures rust_aads is an open repository with algorithms and data structures, used in computer science and comp

stepa 2 Dec 15, 2022
proc macros for generating mut and non-mut methods without duplicating code

mwt Hey! You! Read this before using! mwt was thrown together pretty quickly for personal use, because I couldn't find an existing crate that does thi

null 1 Dec 24, 2021
An iterator adapter to peek at future elements without advancing the cursor of the underlying iterator.

multipeek An iterator adapter to peek at future elements without advancing the cursor of the underlying iterator. Check out the documentation for more

Luca Palmieri 20 Jul 16, 2022