Extra iterator adaptors, iterator methods, free functions, and macros.

Overview

Itertools

Extra iterator adaptors, functions and macros.

Please read the API documentation here

build_status crates

How to use with cargo:

[dependencies]
itertools = "0.10.0"

How to use in your crate:

use itertools::Itertools;

How to contribute

  • Fix a bug or implement a new thing
  • Include tests for your new feature, preferably a quickcheck test
  • Make a Pull Request

For new features, please first consider filing a PR to rust-lang/rust, adding your new feature to the Iterator trait of the standard library, if you believe it is reasonable. If it isn't accepted there, proposing it for inclusion in itertools is a good idea. The reason for doing is this is so that we avoid future breakage as with .flatten(). However, if your feature involves heap allocation, such as storing elements in a Vec<T>, then it can't be accepted into libcore, and you should propose it for itertools directly instead.

License

Dual-licensed to be compatible with the Rust project.

Licensed under the Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0 or the MIT license http://opensource.org/licenses/MIT, at your option. This file may not be copied, modified, or distributed except according to those terms.

Comments
  • Add exactly_one function

    Add exactly_one function

    Adds a small function I found uses for personally. Can't be added to the core library because in the error case it allocates a vec and stores data in it.

    opened by Xaeroxe 26
  • Preparation for next release series (0.5.x)

    Preparation for next release series (0.5.x)

    We will remove a lot of deprecated items, and rename some.

    This is on the road towards an eventual 1.0 version of the library.

    Notes

    • Numerical things will move because we don't want num-traits in itertools. linspace will move to a crate that uses num-traits
    • Use free functions to construct iterators that are sources. For example Unfold::new(x, y)unfold(x, y). Follows libstd convention (std::iter::once).
    • Unstable/nightly parts will need to live in another crate
    • Rename and move stuff around to increase consistency. This is the best for the long run.
    • See the commits for an overview

    See also #87

    Fixes #86

    opened by bluss 23
  • Powerset iterator adaptor

    Powerset iterator adaptor

    Implements a powerset iterator adaptor that iterates over all subsets of the input iterator's elements. Returns vectors representing these subsets. Internally uses Combinations of increasing length.

    I've taken the strategy of using a 'position' field that acts both as a means to to detect the special case of the first element and also allows optimal size_hint() implementation.

    Additionally there is a commit to improve performance that alters Combinations implementation slightly. I've added Combinations benchmark as a stand-alone commit to allow checking for performance regressions. Powerset performance after this commit improves some cases (with small sizes of n) by 10-30%

    This is my first attempt at a Rust contribution, happy to put in whatever discussion/work to get this merged. Cheers!

    waiting-on-review 
    opened by willcrozi 20
  • Add k-way merge adaptor.

    Add k-way merge adaptor.

    Merges an arbitrary number of iterators in ascending order.

    Uses std's BinaryHeap to decide which iterator to take from next. This seems quite heavyweight. Two-way merge benchmarks take roughly ten times longer than the dedicated two-way merge adaptor. Profiling identifies BinaryHeaps sift_up as the hot-spot.

    Not completely sure about the interface, the double use of IntoIterator in the free-standing function in particular.

    opened by bsteinb 20
  • Add `exhaust` which simply exhausts the iterator

    Add `exhaust` which simply exhausts the iterator

    This is useful for iterators which are simply used for their side effects.

    The new function is added to make intent clearer for places where you'd previously call count or similar to exhaust the iterator.

    opened by tbu- 18
  • Added `diff.rs` module with `diff` and `copy_on_diff` - for efficiently

    Added `diff.rs` module with `diff` and `copy_on_diff` - for efficiently "diff"ing and caching non-`Clone` iterators.

    I originally wrote this module for a couple of cases in conrod where I wanted to cache a non-Clone iterator without having to collect the iterator into a Vec for comparison in case the cache needed updating.

    I've just come across another similar case in a personal project, so before copying code across I thought I'd approach the itertools team in case you deem it worthy of adding to your luscious collection.

    Either way, any feedback appreciated!

    opened by mitchmindtree 18
  • Add Itertools::filter_{,map_}results

    Add Itertools::filter_{,map_}results

    I found myself wanting filter_map_results() and just implemented filter_results() in the same stretch. I hope you find it useful!

    I wasn't sure if #236 applies to these new iterators; I just implemented them.

    opened by gin-ahirsch 16
  • Permutations

    Permutations

    Fixes #285.

    This implementation is based on the Python implementation specified in the issue. There are quite a few points which should be considered.

    Method name, k-less form

    The adaptor function is called permutations, for user familiarity. However, since the size of the output can be specified, a more mathematically accurate name would probably be k_permutations. Perhaps two adaptor functions would be better: .k_permutations(k: usize) and .permutations() (which just sets k = vals.len())?

    Item value ordering/distinctness

    Input items are not inspected in any way; they are treated purely by their initial index. This means that:

    1. Permutations are yielded in lexicographical order of the index, not by any PartialOrd implementation of the items.
    2. Identical items will not be detected, and will result in some identical permutations.

    1 can be achieved by the user by collecting-and-sorting their input.

    2 is a little trickier, but can be achieved by filtering the output. However, I think there is a more efficient algorithm to avoid duplicated. Maybe we should provide this option?

    Permutations from source buffer/indices

    In addition to the iterator adaptor, I've added Permutations::from_vals, to create directly from a Vec or a slice. This saves some cloning compared to using source.[into_]iter().permutations(k).

    There is also Permutations::new(n, k), which is functionally equivalent to (0..n).permutations(k), but a little faster (about 0.6x the run time).

    But perhaps you would consider these unnecessary additions to the API?

    PermutationSource trait

    These different implementations (from Vec/slice/just indices) are achieved with the trait PermutationSource. It's visible to the user to implement for other structures if they wish, but this is probably of limited value. There's not much harm in allowing the user to access it, but again, maybe it's just API bloat. (or a future breaking change when it's removed/changed...)

    On the other hand, perhaps there are other places in the library which could benefit from taking a source generically? Any adaptor where input elements are used more than once will need to store them, and it might be more efficient to allow users to supply the memory directly. This could be done in another PR.

    For completeness, I also made full implementations of the three variations, without the trait, for benchmarking. The pure-indices implementation is about 10% slower using the trait, but Vecs and slices are unaffected (or even a little faster...).

    Combinations changes

    I've made small changes to the Combinations documentation to match Permutations - most significantly, replacing the letter n with k. I've also changed all uses of the variable n to k in the implementation... but maybe this is considered commit noise :)

    There are some other potential changes which would bring Combinations and Permutations more in line with one another.

    As mentioned above, Combinations might benefit from using a _____Source trait, to allow direct memory buffers.

    My Permutations implementations doesn't make use of LazyBuffer. Perhaps the buffer is useful for iterators which never complete, and have a very small k value. Has any benchmarking been done?

    Either way, it makes sense for both adaptors to either use or not use LazyBuffer. Maybe it could be integrated into the _______Source trait if it's useful?


    Let me know what you think when you have the chance. (Sorry for submitting two reasonably big PRs at once. I hope you get a chance to go through it all eventually!)

    waiting-on-author 
    opened by tobz1000 14
  • Add Itertools::tree_fold1

    Add Itertools::tree_fold1

    This is useful for non-associative operations like building a tree (folding over a binop constructor) or combining floating point data (where this order keeps the magnitudes more similar).

    opened by scottmcm 14
  • Add ProcessResults

    Add ProcessResults

    This has been living in the standard library for a while and seems like it could be more widely useful.

    TODO

    • [x] Missing IntoIterator conversions
    • [x] Missing size_hint
    opened by shepmaster 14
  • Add into_grouping_map for efficient group-and-fold operations

    Add into_grouping_map for efficient group-and-fold operations

    Adds two functions on the Itertools trait, into_grouping_map and into_grouping_map_by.

    into_grouping_map expects an iter of (K, V) where the K will be used as key and V as value. into_grouping_map_by expect only an iter of V as values, the keys will be calculated using the provided functions.

    Both of them return a GroupingMap, which is just a wrapper on an iterator. Since it introduces a lot of related methods I thought it would be better to separate them from the Itertools trait. This also prevents duplicating every method for the _by version.

    All of these functions have in common the fact they perform efficient group-and-fold operations without allocating temporary vecs like you would normally do if you used into_group_map + into_iter + map + collect::<HashMap<_, _>>().

    Here's the possible issues I can see, I would like to hear some feedback before trying to fix any of them:

    • name: I initially though of grouping_by which is the java/kotlin equivalent, but later changed it to into_grouping_map to match the already existing into_group_map and to differentiate from group_by;
    • fold_first: the equivalent function in the Itertools trait is fold1 but there's an unstable stdlib function that does the same thing and is called fold_first. I decided to be consistent with the stdlib;
    • minmax return type is the already existing MinMaxResult, but the NoElements variant is never returned. I didn't want to duplicate that struct thinking it could cause confusion (and if I did what name could I have chosen?);
    • sum and product: They dont' use the Sum and Product traits but instead require V: Add<V, Output=V> and V: Mul<V, Output=V>. They're pretty much a wrapper around fold_first. I don't really know if they're meaningful or if I should just remove them;
    • almost every closure takes as one of the parameters a reference to the key. It bloats them a bit, but could be useful if computing the key is a relatively expensive operation.
    • no scan function. Even though it is an iterator adapter I could sort of "collect" it into a Vec (more like extend) but I don't really see an use for this;
    • ~~no integration tests for the _by and _by_key versions of min, max and minmax. To be fair I was a bit lazy, but I also couldn't find any integration test for the normal minmax_by and minmax_by_key so I though it was fine;~~ added;
    • no benchmark (I don't think it is required but probably would be cool to compare this with into_group_map;
    • I didn't want to touch the rest of the library, but I guess we could implement into_group_map in terms of into_grouping_map?

    Related issues: #457, #309 Related PR: #406 (in particular relates to the into_group_map_by_fold function that was proposed but later removed)

    opened by SkiFire13 13
  • Implement ExactSizeIterator for CircularTupleWindows

    Implement ExactSizeIterator for CircularTupleWindows

    .circular_tuple_windows() will produce the same amount of elements as the upstream iterator, so I believe the implementation should be pretty simple.

    Speaking about the practical example, I want to convert a vector of 2D points to pairs of normalized vectors.

      let vecs = shape
        .into_iter()
        .circular_tuple_windows::<(_, _)>()
        .map(|((x1, y1), (x2, y2)): (Point, Point)| (x2 - x1, y2 - y1) as Point)
        .circular_tuple_windows(); //  trait `ExactSizeIterator` is not satisfied
    
    opened by xamgore 0
  • using itertools::Tuples with rayon

    using itertools::Tuples with rayon

    hello, I would like to use rayon::Tuples with rayon into_par_iter, could you tell me how it would be possible to provide an example of how to implement it?

    opened by bobi6666 0
  • Feature request: `filter_parition`

    Feature request: `filter_parition`

    Starting with the use case that prompted this:

    fn get_dir(path: &path::Path) -> io::Result<(Vec<fs::DirEntry>, Vec<fs::DirEntry>)> {
        path.read_dir()?
            .filter_map(|content| content.ok())
            .filter_partition(|content| content.file_type().ok().and_then(|file_type| file_type.is_file())
    }
    

    Basically the same thing as the normal partition but the provided function has 3 return values:

    • None to drop the element
    • Some(true) to put the element in the left side of the tuple
    • Some(false) to put the element in the right side of the tuple

    Without filter_partition the best way I can think to get separate lists of files and folders is this:

    fn get_dir(path: &path::Path) -> io::Result<(Vec<fs::DirEntry>, Vec<fs::DirEntry>)> {
        path.read_dir()?
            .filter_map(|content| content.ok())
            .filter(|content| content.file_type().is_ok())
            .partition(|content| content.file_type().expect("content.file_type() to be Ok").is_file())
    }
    

    Which shouldn't ever cause issues unless a file's/folder's permissions are changed while the function is running, but that'd probably cause issues later on anyway.

    opened by Scripter17 0
  • Missing tag for version 0.10.5

    Missing tag for version 0.10.5

    Their is no tag for the version 0.10.5. It is useful when looking at the source code to be able to checkout a given tag and to be sure it represent what the doc says.

    opened by hommeabeil 0
  • converting Vec<Vec<u8>> to Vec<u8>

    converting Vec> to Vec

    Hello, I would like to ask if itertools includes something like flat_map? because I would need to convert Vec<Vec> so that at the end it becomes Vec but with the fact that between each iteration a separator would be inserted, for example 47, which represents the character / and it would be a delimiter.

    opened by bobi6666 2
Garbage Collector(Hyaline- Safe Memory Reclaimation) for lock free data structures

Hyaline-SMR This crate provides garbage collection using hyaline algorithm for building concurrent data structures. When a thread removes an object fr

Abishek 1 Oct 21, 2021
A lock-free multi-producer multi-consumer unbounded queue.

lf-queue A lock-free multi-producer multi-consumer unbounded queue. Examples [dependencies] lf-queue = "0.1" Single Producer - Single Consumer: use lf

Pierre Brouca 2 Sep 11, 2022
Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.

Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.

null 6.4k Nov 28, 2022
Coding-challenge - Algorithms and Data-structures, problems and solutions in Rust language using cargo-workspaces

Coding Challenge LeetCode/Hackerrank e.t.c Using this as an opportunity to improve my knowledge of rust lang If you found this repo useful to you, add

Tolumide Shopein 17 Apr 24, 2022
Array helpers for Rust's Vector and String types

array_tool Array helpers for Rust. Some of the most common methods you would use on Arrays made available on Vectors. Polymorphic implementations for

Daniel P. Clark 68 Nov 26, 2022
K-dimensional tree in Rust for fast geospatial indexing and lookup

kdtree K-dimensional tree in Rust for fast geospatial indexing and nearest neighbors lookup Crate Documentation Usage Benchmark License Usage Add kdtr

Rui Hu 145 Nov 19, 2022
Algorithms and Data Structures of all kinds written in Rust.

Classic Algorithms in Rust This repo contains the implementation of various classic algorithms for educational purposes in Rust. Right now, it is in i

Alexander González 47 Nov 30, 2022
Obake is a procedural macro for declaring and maintaining versioned data-structures.

Obake is a procedural macro for declaring and maintaining versioned data-structures. The name 'obake' is taken from the Japanese 'お化け (おばけ)', a class of supernatural beings in Japanese folklore that shapeshift.

Nathan Corbyn 166 Nov 17, 2022
A simple rust library to help create octrees and quadtrees for chunked level of detail

LodTree LodTree, a simple tree data structure for doing chunk-based level of detail. Goals The aim of this crate is to provide a generic, easy to use

Dimev 13 Oct 4, 2022
Serialize/DeSerialize for Rust built-in types and user defined types (complex struct types)

Serialize/DeSerialize for Rust built-in types and user defined types (complex struct types)

null 2 May 3, 2022
Broot - A new way to see and navigate directory trees

Broot A better way to navigate directories Installation Instructions Get an overview of a directory, even a big one br -s Notice the unlisted? That's

Canop 7.9k Nov 26, 2022
Rust-algorithm-club - Learn algorithms and data structures with Rust

Rust Algorithm Club ?? ?? This repo is under construction. Most materials are written in Chinese. Check it out here if you are able to read Chinese. W

Weihang Lo 355 Nov 11, 2022
Common data structures and algorithms in Rust

Contest Algorithms in Rust A collection of classic data structures and algorithms, emphasizing usability, beauty and clarity over full generality. As

Aram Ebtekar 3.2k Nov 24, 2022
A hashmap implementation, which uses hashset, and keys are contained within values.

A hashmap implementation, which uses hashset, and keys are contained within values.

Piotr Mikulski 2 Nov 29, 2022
Rust data structures and client for the PubChem REST API

pubchem.rs Rust data structures and client for the PubChem REST API. ?? Usage ?? Compound Create a Compound to query the PubChem API for a single comp

Martin Larralde 2 Jan 18, 2022
Fast, efficient, and robust memory reclamation for concurrent data structures

Seize Fast, efficient, and robust memory reclamation for concurrent data structures. Introduction Concurrent data structures are faced with the proble

Ibraheem Ahmed 238 Nov 27, 2022
A scalable message queue powered by a segmented, partitioned, replicated and immutable log.

A scalable message queue powered by a segmented, partitioned, replicated and immutable log. This is currently a work in progress. laminarmq is intende

Arindam Das 18 Nov 12, 2022
rust_aads - Rust Algorithms And Data Structures

rust_aads - Rust Algorithms And Data Structures rust_aads is an open repository with algorithms and data structures, used in computer science and comp

stepa 2 Nov 5, 2022
proc macros for generating mut and non-mut methods without duplicating code

mwt Hey! You! Read this before using! mwt was thrown together pretty quickly for personal use, because I couldn't find an existing crate that does thi

null 1 Dec 24, 2021
An iterator adapter to peek at future elements without advancing the cursor of the underlying iterator.

multipeek An iterator adapter to peek at future elements without advancing the cursor of the underlying iterator. Check out the documentation for more

Luca Palmieri 20 Jul 16, 2022