An implemenetation of (part of) the suffix array construction algorithm developed by Zhize Li (2016)

Overview

suffix-array-li2016

An implemenetation of (part of) the suffix array construction algorithm developed by Zhize Li et al. (2016). This algorithm is claimed by the authors to be in-place (i.e. O(1) space complexity) and runs in linear time (O(n) where n is the number of characters in the string). However in practice, the time complexity does not seem to be linear, as evident in the experimental results in their own paper and in experiments with my implementations.

Performance

I provide both C++ and Rust implementations, which have similar performance. Li's algorithm is not faster than the naive algorithm (naive sorting of all suffixes, which is also in-place) with a 32-bit unsigned interger alphabet until n is greater than about 800,000. Here I am referring to the Rust implementation of the algorithm in section 3, which allows a mutable string and has a stronger constraint, i.e. n >= alphabet_size. The algorithm in section 4 (the main contribution of this paper, which operates on immutable strings and allows alphabet_size > n) is expected to be slower.

// test tests::bench_li_500_1000      ... bench:     168,711 ns/iter (+/- 27,549)
// test tests::bench_li_500_10000     ... bench:   1,356,838 ns/iter (+/- 79,133)       x 8.0  x   8.0
// test tests::bench_li_500_100000    ... bench:  14,359,321 ns/iter (+/- 1,307,696)    x10.6  x  85.1
// test tests::bench_li_500_1000000   ... bench: 180,594,903 ns/iter (+/- 13,631,112)   x12.6  x1070.4
// test tests::bench_naive_500_1000    ... bench:      90,554 ns/iter (+/- 5,723)
// test tests::bench_naive_500_10000   ... bench:   1,262,911 ns/iter (+/- 58,702)      x13.9  x  10.3
// test tests::bench_naive_500_100000  ... bench:  16,251,237 ns/iter (+/- 661,166)     x12.9  x 179.5
// test tests::bench_naive_500_1000000 ... bench: 220,851,474 ns/iter (+/- 6,532,465)   x13.6  x2438.9

Curiously, when testing with 10,000,000 or more characters, Li's algorithm becomes slower than the naive algorithm (the C++ implementation has similar performance). What could be the culprit?

test tests::bench_li_500_10000000    ... bench: 4,796,887,136 ns/iter (+/- 20,770,245)  x26.56  x28433
test tests::bench_naive_500_10000000 ... bench: 3,082,987,837 ns/iter (+/- 63,887,953)  x13.95  x34045
You might also like...
As part of the IOP Stack™ Morpheus is a toolset to have gatekeeper-free identity management and verifiable claims as a 2nd layer on top of a blockchain

Internet of People Internet of People (IoP) is a software project creating a decentralized software stack that provides the building blocks and tools

An ND812 decoder implemented as part of the yaxpeax project

yaxpeax-nd812 an ND812 decoder implemented as part of the yaxpeax project, including traits provided by yaxpeax-arch. the ND812 is a 12-bit microcompu

Focus Annotator - a tool for annotation the focal plane of a part of an image
Focus Annotator - a tool for annotation the focal plane of a part of an image

Focus Annotator Focus Annotator is a tool for annotation the focal plane of a part of an image. It is a tool I built in rust as part of my master's th

Implement Quicktime screen sharing part protocol.

Quicktime Screen sharing for iOS devices implement Quicktime part protocol. take screen record from iOS devices. Thank's for quicktime_video_hack full

Lua bytecode parser written in Rust using nom, part of metaworm's lua decompiler

luac-parser (中文) lua字节码解析器, 目前支持 lua51, lua53, lua54 这是目前效果最好的lua反编译器 metaworm's luadec 的一部分 可以基于此代码定制你所需的lua字节码解析器,编译成WASM,让metaworm's luadec加载使用,来反编

Code sample for "Reading files the hard way Part 3"

read-raw-ext4 Rust code sample to read an ext4 partition from Rust, for: https://fasterthanli.me/series/reading-files-the-hard-way/part-3 Usage Don't.

The PC-based component of a two-part Linux driver for using a TI calculator as an external keyboard
The PC-based component of a two-part Linux driver for using a TI calculator as an external keyboard

Introduction i68apollo is the computer-based component of the two-part i68 (*I*nput from Motorola *68*000[fn:4]-based calculator) prototype userspace

Marvin-Blockchain-Rust: A Rust-based blockchain implementation, part of the Marvin blockchain project.

Marvin Blockchain - Rust Implementation Welcome to the Rust implementation of the Marvin Blockchain. This project is part of a comparative study on bu

Array helpers for Rust's Vector and String types

array_tool Array helpers for Rust. Some of the most common methods you would use on Arrays made available on Vectors. Polymorphic implementations for

Generic array types in Rust

generic-array This crate implements generic array types for Rust. Requires minumum Rust version of 1.36.0, or 1.41.0 for From[T; N] implementations

enum-map enum-map xfix/enum-map [enum-map] — An optimized map implementation for enums using an array to store values.

enum-map A library providing enum map providing type safe enum array. It is implemented using regular Rust arrays, so using them is as fast as using r

🐎 Daac Horse: Double-Array Aho-Corasick in Rust

🐎 daachorse Daac Horse: Double-Array Aho-Corasick Overview A fast implementation of the Aho-Corasick algorithm using Double-Array Trie. Examples use

Shade Protocol is an array of connected privacy-preserving dApps built on Secret Network

Shade Protocol Core Contracts Contract Reference Description mint doc Handles asset burning and silk minting oracle doc Handles asset price queries tr

An efficient method of heaplessly converting numbers into their string representations, storing the representation within a reusable byte array.

NumToA #![no_std] Compatible with Zero Heap Allocations The standard library provides a convenient method of converting numbers into strings, but thes

This library provides a data view for reading and writing data in a byte array.

Docs This library provides a data view for reading and writing data in a byte array. This library requires feature(generic_const_exprs) to be enabled.

A lightweight Rust library for BitVector Rank&Select operations, coupled with a generic Sparse Array implementation

A lightweight Rust library for BitVector Rank&Select operations, coupled with a generic Sparse Array implementation

A prefix tree (trie) is a data structure that allows you to store an associative array whose keys are strings

RadixTrie A prefix tree (trie) is a data structure that allows you to store an associative array whose keys are strings. It is a root tree, each edge

🦞 Rust library of natural language dictionaries using character-wise double-array tries.

🦞 Crawdad: ChaRActer-Wise Double-Array Dictionary Overview Crawdad is a library of natural language dictionaries using character-wise double-array tr

Python module implemented in Rust for counting the number of one bits in a numpy array.

bit-counter Package for counting the number of one bits in a numpy array of uint8 values. Implemented as a Python module using Rust, providing high pe

Owner
Tianyi Shi
Biochemistry student; Open-source enthusiast; Rustacean
Tianyi Shi
:construction: EXPERIMENTAL :construction: Secure hidden service webserver

narnia narnia is a fast static webserver specifically designed for Tor hidden services. It's also able to spawn a Tor thread and expose itself as a hi

null 41 Dec 30, 2022
ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations

ndarray The ndarray crate provides an n-dimensional container for general elements and for numerics. Please read the API documentation on docs.rs or t

null 2.6k Jan 7, 2023
This tool was developed as part of a course on forensic analysis and cybersecurity. It is intended to be used as a training resource to help students understand the structure and content of job files in Windows environments.

Job File Parser Job File Parser is a Rust-based tool designed for parsing both legacy binary job files and modern XML job files used by the Windows Ta

Mehrnoush 3 Aug 12, 2024
Fast suffix arrays for Rust (with Unicode support).

suffix Fast linear time & space suffix arrays for Rust. Supports Unicode! Dual-licensed under MIT or the UNLICENSE. Documentation https://docs.rs/suff

Andrew Gallant 207 Dec 26, 2022
🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)

python-daachorse daachorse is a fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. This is a Python wrap

Koichi Akabe 11 Nov 30, 2022
Implementation of algorithms for Domain Name System (DNS) Cookies construction

DNS Cookie RFC7873 left the construction of Server Cookies to the discretion of the DNS Server (implementer) which has resulted in a gallimaufry of di

Rushmore Mushambi 2 Feb 4, 2022
RustRedOps is a repository dedicated to gathering and sharing advanced techniques and malware for Red Team, with a specific focus on the Rust programming language. (In Construction)

RustRedOps In Construction.... The project is still under development Overview RustRedOps is a repository that houses various tools and projects relat

João Victor 17 Dec 14, 2023
Suite for automatically testing algorithm questions from the Polish Algorithm Olympiad.

oisuite Your number #1 tool to managing your algo questions! This software only works on UNIX-based operating systems (macOS, Linux, BSD, etc.) Projec

null 3 Nov 25, 2021
Convert UFO .glif files to SVG, whether they're part of a font or not

Convert UFO glyph files (.glif) to SVG There exists already an svg2glif, but for some reason not the opposite operation. My MFEKglif editor treats .gl

Modular Font Editor K 3 Apr 26, 2022