An extremely fast glob matching library in Rust.

Overview

glob-match

An extremely fast glob matching library with support for wildcards, character classes, and brace expansion.

  • Linear time matching. No exponential backtracking.
  • Zero allocations.
  • No regex compilation. Matching occurs on the glob pattern in place.
  • Support for capturing matched ranges of wildcards.
  • Thousands of tests based on Bash and micromatch.

Example

use glob_match::glob_match;

assert!(glob_match("some/**/{a,b,c}/**/needle.txt", "some/path/a/to/the/needle.txt"));

Wildcard values can also be captured using the glob_match_with_captures function. This returns a Vec containing ranges within the path string that matched dynamic parts of the glob pattern. You can use these ranges to get slices from the original path string.

use glob_match::glob_match_with_captures;

let glob = "some/**/{a,b,c}/**/needle.txt";
let path = "some/path/a/to/the/needle.txt";
let result = glob_match_with_captures(glob, path)
  .map(|v| v.into_iter().map(|capture| &path[capture]).collect());

assert_eq!(result, vec!["path", "a", "to/the"]);

Syntax

Syntax Meaning
? Matches any single character.
* Matches zero or more characters, except for path separators (e.g. /).
** Matches zero or more characters, including path separators. Must match a complete path segment (i.e. followed by a / or the end of the pattern).
[ab] Matches one of the characters contained in the brackets. Character ranges, e.g. [a-z] are also supported. Use [!ab] or [^ab] to match any character except those contained in the brackets.
{a,b} Matches one of the patterns contained in the braces. Any of the wildcard characters can be used in the sub-patterns. Braces may be nested up to 10 levels deep.
! When at the start of the glob, this negates the result. Multiple ! characters negate the glob multiple times.
\ A backslash character may be used to escape any of the above special characters.

Benchmarks

globset                 time:   [35.176 µs 35.200 µs 35.235 µs]
glob                    time:   [339.77 ns 339.94 ns 340.13 ns]
glob_match              time:   [179.76 ns 179.96 ns 180.27 ns]

Fuzzing

You can fuzz glob-match itself using cargo fuzz. See the Rust Fuzz Book for guidance on setup and installation. Follow the Rust Fuzz Book for information on how to configure and run Fuzz steps.

After discovering artifacts, use cargo fuzz fmt [target] [artifact-path] to get the original input back.

$ cargo fuzz fmt both_fuzz fuzz/artifacts/both_fuzz/slow-unit-LONG_HASH
Output of `std::fmt::Debug`:

Data {
    pat: "some pattern",
    input: "some input",
}
You might also like...
RedMaple offers an oppinionated yet extremely flexible data modeling system based on events for back-end applications.

RedMaple offers an oppinionated yet extremely flexible data modeling system based on events for back-end applications.

Maccha is an extremely extensible and themable power menu for Windows, macOS, and Linux.

Maccha I hate coffee. Maccha is an extremely extensible and themable power menu for Windows, macOS, and Linux. Plugins Plugins are written in Rust (ot

A blazingly fast rust-based bionic reader for blazingly fast reading within a terminal console 🦀
A blazingly fast rust-based bionic reader for blazingly fast reading within a terminal console 🦀

This Rust-based CLI tool reads text and returns it back in bionic reading format for blazingly fast loading and even faster reading! Bionic reading is

PyO3 bindings and Python interface to skani, a method for fast fast genomic identity calculation using sparse chaining.

🐍 ⛓️ 🧬 Pyskani PyO3 bindings and Python interface to skani, a method for fast fast genomic identity calculation using sparse chaining. 🗺️ Overview

A fast python geohash library created by wrapping rust.

Pygeohash-Fast A Fast geohasher for python. Created by wrapping the rust geohash crate with pyo3. Huge shout out to the georust community :) Currently

A fast, simple and lightweight Bloom filter library for Python, fully implemented in Rust.

rBloom A fast, simple and lightweight Bloom filter library for Python, fully implemented in Rust. It's designed to be as pythonic as possible, mimicki

A Rust library for building modular, fast and compact indexes over genomic data

mazu A Rust library for building modular, fast and compact indexes over genomic data Mazu (媽祖)... revered as a tutelary deity of seafarers, including

Fast TLSH-compatible Fuzzy Hashing Library in pure Rust

fast-tlsh: Fast TLSH-compatible Fuzzy Hashing Library in pure Rust TLSH stands for Trendmicro Locality Sensitive Hash. TLSH can be used to detect simi

Rusty fast cross-platform 2D drawing library
Rusty fast cross-platform 2D drawing library

Bly Rusty fast cross-platform 2D graphics library Concept Easy to use Bly is easy to use and yet can be called from various windowing libraries using

Comments
  • Add support for captures

    Add support for captures

    This adds support for capturing wildcard matches in a glob, similar to regex capture groups. This enables you to match a glob and then extract the dynamic parts, or insert them in a template string.

    A new glob_match_with_captures function is exported for this purpose, which returns a Vec<Range<usize>>. You can use these ranges to get slices from the path string.

    use glob_match::glob_match_with_captures;
    
    let glob = "test/**/*.js";
    let path = "test/dir/test/a.js";
    let result = glob_match_with_captures(glob, path)
      .map(|v| v.into_iter().map(|capture| &path[capture]).collect());
    
    assert_eq!(result, vec!["dir/test", "a"]);
    
    opened by devongovett 0
  • Possible Timing Issues on Untrusted Patterns

    Possible Timing Issues on Untrusted Patterns

    Hello, I saw this crate on Twitter and decided to fuzz it with a primitive algorithm: simply generate a random pattern and match it to the original input. After running for quite a while, it came up with an input that took 4 seconds that was only 177 characters. Therefore, there is most likely some point where the code is getting stuck. Sadly, I do not have time to profile and investigate further at this time. However, this might want to be reviewed if users are to use this crate on untrusted patterns as this could most likely be exploited further if the cause was pinpointed. Anyways, here is the PoC:

    use glob_match::glob_match;
    
    fn main() {
      let start = std::time::Instant::now();
      let s = "{*{??*{??**,Uz*zz}w**{*{**a,z***b*[!}w??*azzzzzzzz*!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!z[za,z&zz}w**z*z*}";
      assert!(!glob_match(s, s));
      println!(
        "{}",
        std::time::Instant::now()
          .duration_since(start)
          .as_secs_f32()
      );
    }
    

    Also notice that it only takes a long time when the pattern itself is matched with the pattern. I believe this is so because the pattern's * and named entries are being consumed by themselves in the matching process. Although, this still does not explain the slowdowns.

    Thanks

    Edit:

    The following input string takes 800ms on my machine with 102 characters so it might help narrow down the issue:

    "**** *{*{??*{??***\u{5} *{*{??*{??***\u{5},\0U\0}]*****\u{1},\0***\0,\0\0}w****,\0U\0}]*****\u{1},\0***\0,\0\0}w*****\u{1}***{}*.*\0\0*\0"
    
    opened by sno2 4
Owner
Devon Govett
Creator of @parcel-bundler. Engineer @adobe working on React Aria and React Spectrum.
Devon Govett
⚡ An extremely fast cross-compatible system information tool.

Lightfetch A extremely fast command-line system information tool written in Rust ⚡ . Gallery Sadly there isn't much to showcase right now. Download Av

bwtecode 2 Sep 12, 2022
⚡ An extremely fast cross-compatible system information tool.

Lightfetch A extremely fast command-line system information tool written in Rust ⚡ . Gallery Sadly there isn't much to showcase right now. Download Av

bwtecode 2 Sep 12, 2022
⚡ An extremely fast reimplementation of gmad.exe and gmpublish.exe

⚡ fastgmad Download An extremely fast reimplementation of gmad.exe and gmpublish.exe. Prefer to use a GUI? Check out gmpublisher! Features Up to x100

William 16 Sep 18, 2023
rpsc is a *nix command line tool to quickly search for file systems items matching varied criterions like permissions, extended attributes and much more.

rpsc rpsc is a *nix command line tool to quickly search for file systems items matching varied criterions like permissions, extended attributes and mu

null 3 Dec 15, 2022
`matchable` provides a convenient enum for checking if a piece of text is matching a string or a regex.

matchable matchable provides a convenient enum for checking if a piece of text is matching a string or a regex. The common usage of this crate is used

Pig Fang 6 Dec 19, 2022
Code for working with edge-matching puzzles in the Eternity 2 family.

e2rs Code for working with edge-matching puzzles in the Eternity 2 family. This is a WIP sketch of some APIs and algs for representing and manipulatin

Matthew Pocock 3 Jan 18, 2023
xyz is a chat platform where people sign up, play a matching game, and say goodbye

xyz is an asynchronous chat and web service What you need Docker Desktop ?? Cargo (Rust package manager) ?? Clone our project Follow the steps below t

Matthew 12 Oct 11, 2023
Extremely simple http rust servers :snowboarder:

Snowboard ?? An extremelly simple library for fast & simple TCP servers in rust [Request a feature/Report a bug] Quick start To get started with Snowb

null 3 Oct 23, 2023
An extremely high performance logging system for clients (iOS, Android, Desktop), written in Rust.

Pinenut Log 中文文档 ・ English An extremely high performance logging system for clients (iOS, Android, Desktop), written in Rust. Overview Compression Pin

Tangent 4 Dec 1, 2023
A fully modular window manager, extremely extensibile and easily approachable.

AquariWM is a fully modular window manager, allowing extreme extensibility while remaining easily approachable. Installation AquariWM is currently in

AquariWM Window Manager 8 Nov 14, 2022