stringsext - search for multi-byte encoded strings in binary data

Jens Getreu

Last update: Dec 14, 2022

Related tags

Overview

title
stringsext - search for multi-byte encoded strings in binary data

stringsext is a Unicode enhancement of the GNU strings tool with additional functionalities: stringsext recognizes Cyrillic, Arabic, CJKV characters and other scripts in all supported multi-byte-encodings, while GNU strings fails in finding any of these scripts in UTF-16 and many other encodings.

stringsext prints all graphic character sequences in FILE or stdin that are at least MIN bytes long.

Unlike GNU strings stringsext can be configured to search for valid characters not only in ASCII but also in many other input encodings, e.g.: UTF-8, UTF-16BE, UTF-16LE, BIG5-2003, EUC-JP, KOI8-R and many others. The option --list-encodings shows a list of valid encoding names based on the WHATWG Encoding Standard. When more than one encoding is specified, the scan is performed in different threads simultaneously.

When searching for UTF-16 encoded strings, 96% of all possible two byte sequences, interpreted as UTF-16 code unit, relate directly to Unicode codepoints. As a result, the probability of encountering valid Unicode characters in a random byte stream, interpreted as UTF-16, is also 96%. In order to reduce this big number of false positives, stringsext provides a parametrizable Unicode-block-filter. See --encodings and --same-unicode-block options in the manual page for more details.

stringsext is mainly useful for extracting Unicode content out of non-text files.

When invoked with stringsext -e ascii stringsext can be used as GNU strings replacement.

Screenshot

stringsext -tx -e utf-8 -e utf-16le -e utf-16be \
           -n 10 -a None -u African  /dev/disk/by-uuid/567a8410

 3de2fff0+	(b UTF-16LE)	ݒݓݔݕݖݗݙݪ
 3de30000+	(b UTF-16LE)	ݫݱݶݷݸݹݺ
<3de36528 	(a UTF-8)	فيأنمامعكلأورديافىهولملكاولهبسالإنهيأيقدهلثمبهلوليبلايبكشيام
>3de36528+	(a UTF-8)	أمنتبيلنحبهممشوش
<3de3a708 	(a UTF-8)	علىإلىهذاآخرعددالىهذهصورغيركانولابينعرضذلكهنايومقالعليانالكن
>3de3a708+	(a UTF-8)	حتىقبلوحةاخرفقطعبدركنإذاكمااحدإلافيهبعضكيفبح
 3de3a780+	(a UTF-8)	ثومنوهوأناجدالهاسلمعندليسعبرصلىمنذبهاأنهمثلكنتالاحيثمصرشرححو
 3de3a7f8+	(a UTF-8)	لوفياذالكلمرةانتالفأبوخاصأنتانهاليعضووقدابنخيربنتلكمشاءوهياب
 3de3a870+	(a UTF-8)	وقصصومارقمأحدنحنعدمرأياحةكتبدونيجبمنهتحتجهةسنةيتمكرةغزةنفسبي
 3de3a8e8+	(a UTF-8)	تللهلناتلكقلبلماعنهأولشيءنورأمافيكبكلذاترتببأنهمسانكبيعفقدحس
 3de3a960+	(a UTF-8)	نلهمشعرأهلشهرقطرطلب
 3df4cca8 	(c UTF-16BE)	փօև։֋֍֏֑֛֚֓֕֗֙֜֝֞׹
<3df4cd20 	(c UTF-16BE)	־ֿ׀ׁׂ׃ׅׄ׆ׇ׈׉׊׋

Documentation

User documentation

Developer documentation

Source code

Repository

Distribution

Binaries for Ubuntu-Linux 18.04, Windows, MacOS (see below for Debian binaries)
1. Open: Releases - getreu/stringsext
2. Open the latest release.
3. Open assets.
4. Download the packed executable for your operating system.
5. Installation: see below.
Binaries and packages (usually built from latest commit):
- Executable for Windows:
  
  x86_64-pc-windows-gnu/release/stringsext.exe
- Binary for Debian 10 Buster:
  
  x86_64-unknown-linux-gnu/release/stringsext
  
  x86_64-unknown-linux-musl/release/stringsext
  
  i686-unknown-linux-gnu/release/stringsext
  
  i686-unknown-linux-musl/release/stringsext
- Package for Debian 10 Buster:
  
  x86_64-unknown-linux-gnu/debian/stringsext_2.3.4_amd64.deb
  
  i686-unknown-linux-gnu/debian/stringsext_2.3.4_i386.deb
Installable Unix man-page:
- stringsext.1.gz
Zipfile with all binaries and documentation:
- stringsext all

Building and installing

Install Rust, e.g.
```
curl https://sh.rustup.rs -sSf | sh
```

Download, compile and install:

cargo install stringsext
sudo cp ~/.cargo/bin/stringsext /usr/local/bin

This project follows Semantic Versioning.

About

Author

Jens Getreu

Apache 2 license or MIT license

Build status

Comments

Byte offsets not accurate

Hello and thank you. The byte offsets produced by using the -t flag don't appear to be entirely accurate. There are duplicates. Reading the manual it appears that they signify either a range (<), (>) or indicate that the line is an extension of a line passed the length limit (+). Somewhat of an approximation rather than an exact location like it is with the standard strings command. Is this because of the nature of the worker threads not being aware of one another and able to piece together an exact picture?

opened by STashakkori 2
No ELF? Intended behavior?

When I run stringsext on a binary, I get an empty line where the ELF designator would be. Example:

GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 test.c main .symtab .strtab .shstrtab .text .data .bss .comment .note.GNU-stack .note.gnu.property .rela.eh_frame

Whereas the typical strings function gives: ELF GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 test.c main .symtab .strtab .shstrtab .text .data .bss .comment .note.GNU-stack .note.gnu.property .rela.eh_frame

Is this by design? I would rather see that string in the output frankly. Thank you

opened by STashakkori 2
a few typos

s/Courir/Courier/ s/are then are/are then/ s/A valid strings is/A valid string is/ s/In practise/In practice,/ s/as invalid character/as an invalid character/ s/as sequence/as a sequence/ s/therefor/therefore/ s/Therefor/Therefore/

opened by oylenshpeegul 2
Implements support for start and end offsets.

Hi,

This relates to #3 in my Need for Speed, it would be great to be able to specify start and end offsets to read the file, this way one could cheaply multiprocess the entire scan by allocating different chunks of a large file to different stringsext instances.

Cheers,

Thomas
enhancement

opened by KelSolaar 1
Implement support for Regex filtering of strings.

Hi,

Follow-up of the email thread: I was looking at using stringsext to scrape paths in binary files, however, piping the output to grep for example, is very slow for large files, e.g. 25Go, I was thinking that having native Regex filtering of the found strings would maybe help here instead of piping a torrent of data via stdout.

Cheers,

Thomas
enhancement

opened by KelSolaar 1
Split out into a library crate?

Hi, this looks great! I was wondering if you would be open to splitting out the functionality into a library crate for use via crates.io. Then this repo would end up being a command line interface for the library.
enhancement

opened by rlabrecque 1

Releases(v2.3.4)

v2.3.4(Nov 8, 2021)

null
Source code(tar.gz)
Source code(zip)
stringsext--manpage.pdf(40.07 KB)
stringsext-v2.3.4-x86_64-apple-darwin.tar.gz(538.87 KB)
stringsext-v2.3.4-x86_64-pc-windows-msvc.zip(483.09 KB)
stringsext-v2.3.4-x86_64-unknown-linux-gnu.tar.gz(577.71 KB)
stringsext_2.3.4_amd64.deb(464.78 KB)
v2.3.3(Sep 22, 2020)

null
Source code(tar.gz)
Source code(zip)
stringsext--manpage.pdf(40.06 KB)
stringsext-v2.3.3-x86_64-apple-darwin.tar.gz(503.48 KB)
stringsext-v2.3.3-x86_64-pc-windows-msvc.zip(494.51 KB)
stringsext-v2.3.3-x86_64-unknown-linux-gnu.tar.gz(559.28 KB)
stringsext_2.3.3_amd64.deb(448.87 KB)
v2.3.2(Sep 19, 2020)

null
Source code(tar.gz)
Source code(zip)
stringsext--manpage.pdf(40.06 KB)
stringsext-v2.3.2-x86_64-apple-darwin.tar.gz(505.22 KB)
stringsext-v2.3.2-x86_64-pc-windows-msvc.zip(495.80 KB)
stringsext-v2.3.2-x86_64-unknown-linux-gnu.tar.gz(560.37 KB)
stringsext_2.3.2_amd64.deb(450.89 KB)
v2.3.1(Apr 4, 2020)

null
Source code(tar.gz)
Source code(zip)
stringsext--manpage.pdf(39.91 KB)
stringsext-v2.3.1-x86_64-apple-darwin.tar.gz(513.03 KB)
stringsext-v2.3.1-x86_64-pc-windows-msvc.zip(489.26 KB)
stringsext-v2.3.1-x86_64-unknown-linux-gnu.tar.gz(549.59 KB)
stringsext_2.3.1_amd64.deb(442.67 KB)
v2.3.0(Mar 19, 2020)
Changes:

996db0c47eb625e58d342c2f538bde86f638b0f2 prepare v2.3.0

6bfd5fe4e5e01f8cd7d7564a912cdbd913f1318a update online help

0c62d270121e09ac4e114c06b31087f7c4ed0c40 remove DocOpt dependency

51441fee3170560f5d2be1ca0bed27b0583fd80f doc: update examples

1ff1e3da63dea4abdf8818ca6f1ebd91d4d7c896 replace macros with constants

253bbd70e2d183976c638d61cf8517cca93ae419 migrate argument parsing from DocOpt to StructOpt

d24ca6243866cc1f4884b7ec22e724a7dc4b8fd6 doc: update build-scripts and links

748ff686f99ff906a0f3816f510ac2447f6436be build: rename scripts

This list of changes was auto generated.
Source code(tar.gz)
Source code(zip)
stringsext-v2.3.0-x86_64-apple-darwin.tar.gz(589.76 KB)
stringsext-v2.3.0-x86_64-pc-windows-msvc.zip(495.12 KB)
stringsext-v2.3.0-x86_64-unknown-linux-gnu.tar.gz(563.81 KB)
v2.2.0(Mar 19, 2020)
Changes:

4c5776894556888e11ac6ca29c1d03f39bdb1cee add continuous delivery pipeline

f2d0f482a81c2c61f0c3f93255a8ab9ba472e821 prepare v2.2.0

08a67af5712eb4f3d7efdf190639f3ccdc566671 document new command-line-option

97b2b6a3e527c6e6a233bbff84a03bf549c04cb3 add optional filter constraint

b41f408775bacf00db4c96c5425591583c850982 add additional filter parameter

ff56abf9d064e364a42a7e56347ec8b95b164318 split filter into two parts

03c6adee030e4ff611e83bd63fddaad8e5df66f9 add manifest metadata

009e713abddef60cc4fc0e1a6a0ed67e8cfc6023 upgrade dependencies

f985e57892257e2b447f14d3c4255397deebdfc9 enable Debian package builds for i386 architecture

844cdc9ba3d47931e99914d6fe89eb381ed0e21b move scripts in own folder

See More

7cc7bba56b1503b036d86f2bd4bd5411a9246194 build Debian packages

0771b47abd41b706718cc4edffa5b2f0f22cdc18 doc: add link to Gitlab

356f25d20a678535d5f940eb18a328563604d3e9 document more limitations

c6f717f728ad9893c6628c332d447f1d3374cc7a add closing quotation mark

e3bd8c19da9f940badf37ac4ac3a806d4673f317 update documentation

b67c180814815fa8f79197e2a3b33240b8ad5f87 prepare v2.1.1

edf7e89967f8e3d23d3d0be668df060e1bffce04 correct typo

e9b8392a4557084e197e920f5336540f3acf4213 avoid warning about unused labels

a47c9b71f9ed1d6360d83f313ae7ba511f445be4 set default window width to 2*64 bytes = 128 bytes

e80859d72c4fcb52448c44665dc7589685764875 better use if-then-else then && || in shell script

559f19d6120d1fb55a934e36fa05b5fa120c66c1 document coner case in source code

587a1d947ea0765841729eee75c117c2463d8781 add screenshot

6adae44628189f8f0ac7f73876dbe8c1928b0bc3 add functional test

8bff55730cfb521a4d6262a4aaa3917ddb077099 prepare v2.1.0

e777688c4a1e00374fe65652d72d21b7516a0b87 exclude NULL from AF=None

62b891f3c66a6adde416926f272cdf69f120a39d document enlargement of input_window

68bb3b212342a62440882570809721ce1ed58173 doc: correct indent

55b93b4e657d8cae1f63461ca32f1bd3473e779c add link to paper

3beb57fb22c6898ff38f90cabc71da60c590cf6a prepare v2.0.0

e099be44707fafff408e7a2039cf16b87409d648 add functional test

6a0defe4364f1927d8af523b93f0fe3dc7ca1534 prepare v1.99.5 (2.0.0pre5)

95f0cb6886413475175a8ab7eff4d1c981c0e1d6 add command-line-test

79d7b9288183bf18bcfd174b92cc154a9a53fe0c set filter defaults

356f2d67d21637e6d82a6195c91cf9e286806c8e count output-line length in charactes (not bytes)

9c62c2e7085453a15b36ca58f56b81419f7023c1 rename variable

e4ecce96a101ce69ba1b2a7b373534e34ae34880 set AF defaults closer to orig. strings

83b41544a0bc88108089e76061d44d4faa756432 enlarge scanner::window to better fill output_line

ed6020c6efb36f261c82599a26a31ad5749c6ffe detail man-page

e964d5d82350ac06bb993d16f909a26f6c011034 detail filter description

70280bb0e515bc032b6af35a414d6a160f91f60b prepare version 1.99.4 (2.0.0pre4)

5418ca26d9da850f93ca5ae631fed77c16a30579 use strict camel case in filter names

0d309654321ffe2e1af26128cbc616d09eb09fff test for captial script names

8cd56bf97c41d804357ff93afe9f911f76d81828 prepare version 1.99.3 (2.0.0pre3)

cc3e47e0a17b287bfd4c55cca1edaf81a5bd07b0 let all filter names start with capital

62dfe7dec7787dc452fe268177683a02f0e22f5d order script list alphabetically

b631b1a08af06c3b1a2143ffed82c14ce14bf8ae add Kana, CJK, and Hangul scrpts

c67931a12aa14f621b71f5a7b746112e40a76ae4 include Kana in Asian

5bf53641f534599dc94ff3685b31b000d472f5dd add Armenian and Hebrew filter

06e080d3b972287223a77a7662af734babbf4628 add links to blogposts about Stringsext

794123809c50e6955238cd9838f9faeffeecce47 update version numbers

b4fcc5834d04ddeb5b41d9c5496c56f16ad7ba28 run inner loop consistantly with stack only

15802dea6df4235121ff9cc0a8270305666ccd61 prepare v1.99.2 (2.0.0pre2)

676ba55b97ae3f8f9a4e63ddb49e44f2c1614a82 reset carry over directly after usage

058b2ba97af9094f9393b73c9cacc971fe81fe41 prepare v1.99.1 (v2.0.0pre1)

4ff03dde359098bcb07eeb389979dfb3565c569b replace codebase with version 2 branch

4a61c180a4fea135a7c28aed614339b42379bdec prepare version 1.7.1

194dd37943b1dc17eac7c5283f4c17affc76a34d use separators in long literals

81a18040d55c800d90d08f2f6eed872153d5f79e remove redundant lifetime

6195861637e4bee45b591192583d1fbf466cb0ce clean up match expression in favour of closure

68fc37a646190eca80b2fa9629c62b98a2093fc5 Rust format

95e3ef6371c29e7ccefe487c099663bab04d7568 add cross-compilation script for windows target

1dba6dae70be959bc6cba26e430a6db71490083b enable library installation by default

3a8669429b4a27efc586aaedf3fa23c80407dd37 clean more thoroughly

a6b218bbf37ccb66eeafa3ee0e761c8222df39a8 remember upcoming PR making work-around obsolete

a962804d7f41e4be6bd9a2fc4fc9425b75d543ab use eprintln macro

a76135ed3eb9c2b361f93041c932e4f5232f9158 correct path in documentation build script

d3f151f10ff583a1f19290cb0af2f4e7c9c381aa remove backup of build script

639377188e5d02fc52a692f01905385a292a1155 Merge pull request #2 from KelSolaar/patch-1

768a8be88fd8a7e49311b1b2faad92e95e904bb5 Fix typo in "stringsext--man.md" file.

ac8e8aa1275ae964b8aafcb17f6249156f781a97 replace deprecated trim_left_matches

15cb6f0a84058046015e334b71d543167bcd0400 migrate man page source to markdown

0ed67218a046d8f6bed339042f6525237c245dc2 migrate README to markdown

78d140ac95b31c35608f598bee62b6fa209606d2 prepare version 1.7.0

35b2c31be5b58f8b802aa980c116d7dd67ece608 suppress warning: methods called new usually return Self

76617b0c1d7cf4d40f78a154336dc41292df0cc6 avoid redundant initialization

2b125948012e175549e62fd1dc8275e149426369 constants have by default a 'static lifetime

28e7174e76c5f8ba053eb8f640279caafd340d6b economize one reference

9770e084e5f0f02b39995b2eda83691d00de11e4 call by value is more efficient

4c5ab70048f72c7466e3de4500263b45a8b53bbd remove explicit lifetime, can be elided

45980475abdad14d9779f6f3eacb3b1f03b3f9b3 remove redundant pattern matching

f4d4f9b56a19c72c21b16903281c5d276ae33232 remove unused import

80c1b5ab28db3f05f73f62ddf416f9ef369bbb9a call by value is more efficient

7764e09d2bd2741b94ecf01f3d3698620982a50b is_empty() is more explicit

b11a50746e2fc6360a6ae944d3e9197dc87ea39a no Box required

542a9925a6250a3b8570249bc1e64d4dff8a9600 using is_empty is clearer and more explicit

2491f8ae8e2e208380d60b9956e45e086057ad44 avoid unsafe mem::transmute

0433c58b2dfb5973e85df6e18f48f6bc04349c23 is_empty() is clearer and more explicit

a7c3b3ca41a6cdfe1c815881ad0b115cbdb83a30 remove unnecessary ref

323e3cc4dec133bfb99ac9708074644e8e5747a0 detail todo list

3bd9c5774bf8f289d49af9f988df6dbbabf808a8 document last successful build

011296cc149a71d36a2a6de61bfafcd0d44eaa81 echo building steps on console

b7f2682a2963fc72b4139b5cac627861b5821428 remove unused import

35b7941df7daf2f348914e03fa5fee9f6bf8e256 switch back to stable toolchain

655d047b3e92466e3e2d6727e1ae4d7687128743 import macros from external crates via use

5b9c4fccea87726b61d55c97209d66e4e814057b switch temporally to beta toolchain

781828b2f7d20ad0ee681bacb34826777e9f5ac2 format source code with cargo-fmt

dff02fcb093850f44e043365012c98dc1606d988 writing idiomatic code in a new edition

338d23e2755ac08c3693244a6711f68866768767 enabling the new edition to use new features

19df4d86fdf4fd10161233403640fb27a87f1bbc after cargo fix --edition

d936f42ab2d79bfcfd01124478da76065bcd8a09 make 'next iteration' test more descriptive

73b3824e3c8c3df25d40ec9385163d367d2b0073 correct typo

c8c85d478417ca43c8e5945900505b442f038732 configure Gitlab's continuous integration service

c87dc6aa06ef5c62db30eaf56771c150a3363eb4 do not clean .md notes in target/release dir

74ed5119079629b95b539a31725dbf564e758374 keep content of README.rst in root directory

398385445b58f263ad0f4b44ee99b4644f0f5615 prepare version 1.6.0

678d0ed6128b4acaf2cfd46817657489e21a3aa4 bugfix: read the input data always until the end

3d7a803ba184f3d6e3a7c8894f99e62daaeaed55 upgrade memmap to v0.7

00fd70e9f9f81673e8291cea524985814ae92201 delete unused imports in tests

1433b1fbcc69c3d071b5bba5e448e65022805562 prepare version 1.5.0

72ed522d3f3c59eb9c4a85ed948a43d76817a03f update todo

f498b03941fccab1f1c23962bf32c316b12c1a4c make Docopt use Serde

e74c883a91f044c1f7930467ba5068efd368f90b stay with memmap 0.5.2

624c4b58ab7cf3f0a18ce60c4927184d8603b992 Replace rustc_serialize with serde

fa3d9bf1888912c80c74c47cf38c76b417d500cf some ideas for future development

cb3d2199c2e0dc8e69a3f0e230c6ca4ec48ba66d dual license stringsext: Apache 2.0 or MIT

db462eb9b3e16549d4d1cbb20f10c76b9c13b8cc remove debugging symbols from binary

9ae9638cdec43803cb1603711e169fc4f13e049e prepare version 1.4.4

61ee2807777a4f04513ac752194722c6e82ba589 replace try macros with ?

f42767b3c3e7f720df40db93db814e78c81c6eaf variable outp does not need to be mutable

c8fb03152a2932312d98b2e16c5d19799ff22387 remove unused extern crates

ecfaea0bb5b58aaa47be0fc7fbe41e18ac2ea212 refactor build tree to sphinx conventions

cce7e3210f72ceccc7eb04b8e7f718f8323b533d fix broken link to API documentation

6674fc00821abf9fb7fa2cf77cdecb62241e9feb add build status icon on README

5c3d963e3be97f349099b58ab109d00c1730cc18 use Travis CI

f35da89246725fb3b4974225cb3fd14534cc794e add custom error type

f31fda01fa6c73aa74e6d7510b3ffdcb145fc6ec update man-page installation instructions

0b227b90b970a9cf8ef33f2ac6e2877c5b072db3 detail LIMITATIONS section in man-page

5f99f035c389cb2e00b063207126cdb0b5677ac4 Merge pull request #1 from oylenshpeegul/master

dc68ee7d202ea8e3dd2acd229e13ede01b14e413 a few typos

845ec12f74747f5de59e96173fc37ab8d656fda6 update README

71a8fb681061b82952011274a4277ed290fd7394 Mission outsourced, new option: --print-file-name

39e1ae7a0925b7e8841d7126869120ca1c66e9c7 prepend Unicode BOM, improve error handling

2237db13a7bb98756459d324c8ba35964666329c improve treatment of split strings, add cut label

09c0c9cfd77e719bce88abbe20f075e4b21c8125 add 2. Unicode-block-filter

e05e0711ca66d5f058aa6ae664b61c1b2d611bbd make Mission static, keep mutable in ScannerState

e408070d662966eb74a37edea5d74e469d8b0acd process multiple files given on the command line

ade97e037892efef182893b76bd017a19df23583 print U+0020..U+003F whatever the block-filter is, reduce false positives with incomplete strings

ca5ce111895e88abe96ca665f97b2f074990bdfe prepare version 1.1.1

da7e8d9dd26601759c828c16c0dee376441d74e3 never filter space or tab even with active Unicode-block-filter

f4a62751563810f64dc7a607cef30589222a656c point targets in main page to https

This list of changes was auto generated.
Source code(tar.gz)
Source code(zip)
stringsext-v2.2.0-x86_64-apple-darwin.tar.gz(861.54 KB)
stringsext-v2.2.0-x86_64-pc-windows-msvc.zip(758.72 KB)
stringsext-v2.2.0-x86_64-unknown-linux-gnu.tar.gz(856.51 KB)

Owner

Jens Getreu

GitHub https://blog.getreu.net/projects/stringsext/

A simple and lightweight fuzzy search engine that works in memory, searching for similar strings (a pun here).

simsearch A simple and lightweight fuzzy search engine that works in memory, searching for similar strings (a pun here). Documentation Usage Add the f

116 Dec 10, 2022

Shogun search - Learning the principle of search engine. This is the first time I've written Rust.

shogun_search Learning the principle of search engine. This is the first time I've written Rust. A search engine written in Rust. Current Features: Bu

5 Mar 9, 2022

TP - Binary Search Tree

Arbre binaire de recherche Dans ce TP nous allons implémenter un arbre binaire de recherche (ABR) en Rust. L’objectif est de nous familiariser avec le

0 Mar 11, 2022

Lightning Fast, Ultra Relevant, and Typo-Tolerant Search Engine

31.6k Dec 31, 2022

High-performance log search engine.

NOTE: This project is under development, please do not depend on it yet as things may break. MinSQL MinSQL is a log search engine designed with simpli

High Performance, Kubernetes Native Object Storage

359 Nov 27, 2022

Perlin: An Efficient and Ergonomic Document Search-Engine

Table of Contents 1. Perlin Perlin Perlin is a free and open-source document search engine library build on top of perlin-core. Since the first releas

70 Dec 9, 2022

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

Tantivy is a full text search engine library written in Rust. It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is no

7.4k Dec 28, 2022

A full-text search and indexing server written in Rust.

Bayard Bayard is a full-text search and indexing server written in Rust built on top of Tantivy that implements Raft Consensus Algorithm and gRPC. Ach

1.8k Dec 26, 2022

AI-powered search engine for Rust

txtai: AI-powered search engine for Rust txtai executes machine-learning workflows to transform data and build AI-powered text indices to perform simi

69 Jan 2, 2023

A full-text search engine in rust

Toshi A Full-Text Search Engine in Rust Please note that this is far from production ready, also Toshi is still under active development, I'm just slo

3.8k Jan 7, 2023

🔍TinySearch is a lightweight, fast, full-text search engine. It is designed for static websites.

tinysearch TinySearch is a lightweight, fast, full-text search engine. It is designed for static websites. TinySearch is written in Rust, and then com

2.2k Dec 31, 2022

🔎 Impossibly fast web search, made for static sites.

Stork Impossibly fast web search, made for static sites. Stork is two things. First, it's an indexer: it indexes your loosely-structured content and c

2.5k Dec 27, 2022

🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

?? Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

17.4k Jan 2, 2023

⚡ Insanely fast, 🌟 Feature-rich searching. lnx is the adaptable deployment of the tantivy search engine you never knew you wanted. Standing on the shoulders of giants.

✨ Feature Rich | ⚡ Insanely Fast An ultra-fast, adaptable deployment of the tantivy search engine via REST. ?? Standing On The Shoulders of Giants lnx

679 Jan 1, 2023

⚡ Insanely fast, 🌟 Feature-rich searching. lnx is the adaptable deployment of the tantivy search engine you never knew you wanted. Standing on the shoulders of giants.

✨ Feature Rich | ⚡ Insanely Fast An ultra-fast, adaptable deployment of the tantivy search engine via REST. ?? Standing On The Shoulders of Giants lnx

0 Apr 25, 2022

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.

weggli Introduction weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify int

2k Jan 5, 2023

stringsext - search for multi-byte encoded strings in binary data

Related tags

Overview

Screenshot

Documentation

Source code

Distribution

Building and installing

About

Comments

Byte offsets not accurate

No ELF? Intended behavior?

a few typos

Implements support for start and end offsets.

Implement support for Regex filtering of strings.

Split out into a library crate?

Releases(v2.3.4)

v2.3.4(Nov 8, 2021)

v2.3.3(Sep 22, 2020)

v2.3.2(Sep 19, 2020)

v2.3.1(Apr 4, 2020)

v2.3.0(Mar 19, 2020)

Changes:

v2.2.0(Mar 19, 2020)

Changes:

Owner

Jens Getreu

A simple and lightweight fuzzy search engine that works in memory, searching for similar strings (a pun here).

Shogun search - Learning the principle of search engine. This is the first time I've written Rust.

TP - Binary Search Tree

Lightning Fast, Ultra Relevant, and Typo-Tolerant Search Engine

High-performance log search engine.

Perlin: An Efficient and Ergonomic Document Search-Engine

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

A full-text search and indexing server written in Rust.

AI-powered search engine for Rust

A full-text search engine in rust

🔍TinySearch is a lightweight, fast, full-text search engine. It is designed for static websites.

🔎 Impossibly fast web search, made for static sites.

🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

⚡ Insanely fast, 🌟 Feature-rich searching. lnx is the adaptable deployment of the tantivy search engine you never knew you wanted. Standing on the shoulders of giants.

⚡ Insanely fast, 🌟 Feature-rich searching. lnx is the adaptable deployment of the tantivy search engine you never knew you wanted. Standing on the shoulders of giants.

Rapidly Search and Hunt through Windows Event Logs

Cross-platform, cross-browser, cross-search-engine duckduckgo-like bangs

🔎 A simple in-memory search for collections and key-value stores.

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.