🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)

Last update: Jun 5, 2022

python-daachorse

daachorse is a fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. This is a Python wrapper.

Installation

To use daachorse, run the following command:

$ pip install daachorse

Example usage

Daachorse contains some search options, ranging from basic matching with the Aho-Corasick algorithm to trickier matching. All of them will run very fast based on the double-array data structure and can be easily plugged into your application as shown below.

Finding overlapped occurrences

To search for all occurrences of registered patterns that allow for positional overlap in the input text, use find_overlapping(). When you instantiate a new automaton, unique identifiers are assigned to each pattern in the input order. The match result has the character positions of the occurrence and its identifier.

>> import daachorse
>> patterns = ['bcd', 'ab', 'a']
>> pma = daachorse.Automaton(patterns)
>> pma.find_overlapping('abcd')
[(0, 1, 2), (0, 2, 1), (1, 4, 0)]

Finding non-overlapped occurrences with standard matching

If you do not want to allow positional overlap, use find() instead. It performs the search on the Aho-Corasick automaton and reports patterns first found in each iteration.

>> import daachorse
>> patterns = ['bcd', 'ab', 'a']
>> pma = daachorse.Automaton(patterns)
>> pma.find('abcd')
[(0, 1, 2), (1, 4, 0)]

Finding non-overlapped occurrences with longest matching

If you want to search for the longest pattern without positional overlap in each iteration, use MATCH_KIND_LEFTMOST_LONGEST in the construction.

>> import daachorse
>> patterns = ['ab', 'a', 'abcd']
>> pma = daachorse.Automaton(patterns, daachorse.MATCH_KIND_LEFTMOST_LONGEST)
>> pma.find('abcd')
[(0, 4, 2)]

Finding non-overlapped occurrences with leftmost-first matching

If you want to find the the earliest registered pattern among ones starting from the search position, use MATCH_KIND_LEFTMOST_FIRST.

This is so-called the leftmost first match, a bit tricky search option. For example, in the following code, ab is reported because it is the earliest registered one.

>> import daachorse
>> patterns = ['ab', 'a', 'abcd']
>> pma = daachorse.Automaton(patterns, daachorse.MATCH_KIND_LEFTMOST_FIRST)
>> pma.find('abcd')
[(0, 2, 0)]

License

Licensed under either of

at your option.

For softwares under tests/data, follow the license terms of each software.

GitHub

https://github.com/vbkaisetsu/python-daachorse
You might also like...

A toy example showing how to run Rust code in Python for speed and progress.

PoC: Integrating Rust in Python A toy example showing how to run Rust code in Python for speed and progress. Requirements Python 3.6+ Rust 1.44+ Cargo

Feb 7, 2022

This is a simple command line application to convert bibtex to json written in Rust and Python

bibtex-to-json This is a simple command line application to convert bibtex to json written in Rust and Python. Why? To enable you to convert very big

Mar 23, 2022

Python/Rust implementations and notes from Proofs Arguments and Zero Knowledge study group

What is this? This is where I'll be collecting resources related to the Study Group on Dr. Justin Thaler's Proofs Arguments And Zero Knowledge Book. T

Jun 4, 2022

A low-level ncurses wrapper for Rust

ncurses-rs This is a very thin wrapper around the ncurses TUI lib. NOTE: The ncurses lib is terribly unsafe and ncurses-rs is only the lightest wrappe

Jun 11, 2022

ddi is a wrapper for dd. It takes all the same arguments, and all it really does is call dd in the background

ddi  is a wrapper for dd. It takes all the same arguments, and all it really does is call dd in the background

ddi A safer dd Introduction If you ever used dd, the GNU coreutil that lets you copy data from one file to another, then you may have encountered a ty

May 8, 2022

A small CLI wrapper for authenticating with SSH keys from Hashicorp Vault

A small CLI wrapper for authenticating with SSH keys from Hashicorp Vault

vaultssh A small CLI wrapper for authenticating with SSH keys from Hashicorp Vault vaultssh is a small CLI wrapper for automatically fetching and usin

May 30, 2022

A newtype wrapper that causes Debug impls to skip a field.

debug-ignore This library contains DebugIgnore, a newtype wrapper that causes a field to be skipped while printing out Debug output. Examples use debu

Apr 26, 2022

argmax is a library that allows Rust applications to avoid Argument list too long errors (E2BIG) by providing a std::process::Command wrapper with a

argmax argmax is a library that allows Rust applications to avoid Argument list too long errors (E2BIG) by providing a std::process::Command wrapper w

Jun 13, 2022

🌌⭐cosmo is a wrapper for Git essentially, allowing you to compress multiple commands into one

🌌⭐cosmo is a wrapper for Git essentially, allowing you to compress multiple commands into one

❯ Cosmo Git tooling of the future New feature: Cosmo hooks! Click here for more info! ❯ 👀 Features Config files (with defaults!) Fast Easy to use Fri

Oct 31, 2021
A compact implementation of connect four written in rust.
A compact implementation of connect four written in rust.

connect-four A compact implementation of connect four written in rust. Run the game At the moment there no pre-built binaries - but you can build it l

Jan 21, 2022
🎙 A compact library for working with user output
🎙 A compact library for working with user output

?? Storyteller A library for working with user output Table of contents ?? Introduction ?? Visualized introduction ?? Example source code ❓ Origins ??

May 28, 2022
Shellfirm - Intercept any risky patterns (default or defined by you) and prompt you a small challenge for double verification
Shellfirm - Intercept any risky patterns (default or defined by you) and prompt you a small challenge for double verification

shellfirm Opppppsss you did it again? ?? ?? ?? Protect yourself from yourself! rm -rf * git reset --hard before saving? kubectl delete ns which going

Jun 15, 2022
A command-line utility that creates project structure.
 A command-line utility that creates project structure.

petridish A command-line utility that creates project structure. If you have heard of the cookiecutter project, petridish is a rust implementation of

May 10, 2022
Chemical structure generation for protein sequences as SMILES string.
Chemical structure generation for protein sequences as SMILES string.

proteinogenic Chemical structure generation for protein sequences as SMILES string. ?? Usage This crate builds on top of purr, a crate providing primi

Mar 16, 2022
A Rust-based shell script to create a folder structure to use for a single class every semester. Mostly an excuse to use Rust.

A Rust Course Folder Shell Script PROJECT IN PROGRESS (Spring 2022) When completed, script will create a folder structure of the following schema: [ro

Apr 10, 2022
Fast DNA manipulation for Python, written in Rust.

quickdna Quickdna is a simple, fast library for working with DNA sequences. It is up to 100x faster than Biopython for some translation tasks, in part

Mar 8, 2022
Rust implementation of Python command line progress bar tool tqdm.
Rust implementation of Python command line progress bar tool tqdm.

tqdm Rust implementation of Python command line progress bar tool tqdm. From original documentation: tqdm derives from the Arabic word taqaddum (تقدّم

Jun 18, 2022
My solutions for the Advent of Code 2021 in Scala, Python, Haskell and Rust.

Advent of Code 2021 These are my Advent of Code 2021 solutions written in Scala 3, Haskell, Python and Rust. Day Title L1 L2 L3 L4 01 Sonar Sweep Scal

Jan 8, 2022
Songbird bindings for python

Songbird-Py Songbird bindings for python. The goal is to provide an easy to use alternitive to Lavalink. Its written with rust-bindings to Songbird. S

May 9, 2022