rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.

Overview

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.

rga is a line-oriented search tool that allows you to look for a regex in a multitude of file types. rga wraps the awesome ripgrep and enables it to search in pdf, docx, sqlite, jpg, movie subtitles (mkv, mp4), etc.

github repo Crates.io fearless concurrency

For more detail, see this introductory blogpost: https://phiresky.github.io/blog/2019/rga--ripgrep-for-zip-targz-docx-odt-epub-jpg/

rga will recursively descend into archives and match text in every file type it knows.

Here is an example directory with different file types:

demo/
├── greeting.mkv
├── hello.odt
├── hello.sqlite3
└── somearchive.zip
├── dir
│ ├── greeting.docx
│ └── inner.tar.gz
│ └── greeting.pdf
└── greeting.epub

rga output

Integration with fzf

rga-fzf

You can use rga interactively via fzf. Add the following to your ~/.{bash,zsh}rc:

rga-fzf() {
	RG_PREFIX="rga --files-with-matches"
	local file
	file="$(
		FZF_DEFAULT_COMMAND="$RG_PREFIX '$1'" \
			fzf --sort --preview="[[ ! -z {} ]] && rga --pretty --context 5 {q} {}" \
				--phony -q "$1" \
				--bind "change:reload:$RG_PREFIX {q}" \
				--preview-window="70%:wrap"
	)" &&
	echo "opening $file" &&
	xdg-open "$file"
}

INSTALLATION

Linux x64, macOS and Windows binaries are available in GitHub Releases.

Linux

Arch Linux

simply install from AUR: yay -S ripgrep-all.

Nix

nix-env -iA nixpkgs.ripgrep-all

Debian-based

download the rga binary and get the dependencies like this:

apt install ripgrep pandoc poppler-utils ffmpeg

If ripgrep is not included in your package sources, get it from here.

rga will search for all binaries it calls in $PATH and the directory itself is in.

Windows

Install ripgrep-all via Chocolatey:

choco install ripgrep-all

Note that installing via chocolatey or scoop is the only supported download method. If you download the binary from releases manually, you will not get the dependencies (for example pdftotext from poppler).

If you get an error like VCRUNTIME140.DLL could not be found, you need to install vc_redist.x64.exe.

Homebrew/Linuxbrew

rga can be installed with Homebrew:

brew install rga

To install the dependencies that are each not strictly necessary but very useful:

brew install pandoc poppler tesseract ffmpeg

Compile from source

rga should compile with stable Rust (v1.36.0+, check with rustc --version). To build it, run the following (or the equivalent in your OS):

   ~$ apt install build-essential pandoc poppler-utils ffmpeg ripgrep cargo
   ~$ cargo install ripgrep_all
   ~$ rga --version    # this should work now

Available Adapters

rga --rga-list-adapters

Adapters:

  • ffmpeg Uses ffmpeg to extract video metadata/chapters and subtitles
    Extensions: .mkv, .mp4, .avi
  • pandoc Uses pandoc to convert binary/unreadable text documents to plain markdown-like text
    Extensions: .epub, .odt, .docx, .fb2, .ipynb
  • poppler Uses pdftotext (from poppler-utils) to extract plain text from PDF files
    Extensions: .pdf
    Mime Types: application/pdf

  • zip Reads a zip file as a stream and recurses down into its contents
    Extensions: .zip
    Mime Types: application/zip

  • decompress Reads compressed file as a stream and runs a different extractor on the contents.
    Extensions: .tgz, .tbz, .tbz2, .gz, .bz2, .xz, .zst
    Mime Types: application/gzip, application/x-bzip, application/x-xz, application/zstd

  • tar Reads a tar file as a stream and recurses down into its contents
    Extensions: .tar

  • sqlite Uses sqlite bindings to convert sqlite databases into a simple plain text format
    Extensions: .db, .db3, .sqlite, .sqlite3
    Mime Types: application/x-sqlite3

The following adapters are disabled by default, and can be enabled using '--rga-adapters=+pdfpages,tesseract':

  • pdfpages Converts a pdf to its individual pages as png files. Only useful in combination with tesseract
    Extensions: .pdf
    Mime Types: application/pdf

  • tesseract Uses tesseract to run OCR on images to make them searchable. May need -j1 to prevent overloading the system. Make sure you have tesseract installed.
    Extensions: .jpg, .png

USAGE:

rga [RGA OPTIONS] [RG OPTIONS] PATTERN [PATH ...]

FLAGS:

--rga-accurate

Use more accurate but slower matching by mime type

By default, rga will match files using file extensions. Some programs, such as sqlite3, don't care about the file extension at all, so users sometimes use any or no extension at all. With this flag, rga will try to detect the mime type of input files using the magic bytes (similar to the `file` utility), and use that to choose the adapter. Detection is only done on the first 8KiB of the file, since we can't always seek on the input (in archives).

-h, --help

Prints help information

--rga-list-adapters

List all known adapters

--rga-no-cache

Disable caching of results

By default, rga caches the extracted text, if it is small enough, to a database in ~/.cache/rga on Linux, ~/Library/Caches/rga on macOS, or C:\Users\username\AppData\Local\rga on Windows. This way, repeated searches on the same set of files will be much faster. If you pass this flag, all caching will be disabled.

--rg-help

Show help for ripgrep itself

--rg-version

Show version of ripgrep itself

-V, --version

Prints version information

OPTIONS:

--rga-adapters=<adapters>...

Change which adapters to use and in which priority order (descending)

"foo,bar" means use only adapters foo and bar. "-bar,baz" means use all default adapters except for bar and baz. "+bar,baz" means use all default adapters and also bar and baz.

--rga-cache-compression-level=<cache-compression-level>

ZSTD compression level to apply to adapter outputs before storing in cache db

Ranges from 1 - 22 [default: 12]

--rga-cache-max-blob-len=<cache-max-blob-len>

Max compressed size to cache

Longest byte length (after compression) to store in cache. Longer adapter outputs will not be cached and recomputed every time. Allowed suffixes: k M G [default: 2000000]

--rga-max-archive-recursion=<max-archive-recursion>

Maximum nestedness of archives to recurse into [default: 4]

-h shows a concise overview, --help shows more detail and advanced options.

All other options not shown here are passed directly to rg, especially [PATTERN] and [PATH ...]

Development

To enable debug logging:

export RUST_LOG=debug
export RUST_BACKTRACE=1

Also remember to disable caching with --rga-no-cache or clear the cache (~/Library/Caches/rga on macOS, ~/.cache/rga on other Unixes, or C:\Users\username\AppData\Local\rga on Windows) to debug the adapters.

Comments
  • `Ripgrep-all` runs very slowly for the first time after the computer starts.

    `Ripgrep-all` runs very slowly for the first time after the computer starts.

    On Ubuntu 20.04.3 LTS, I'm using the self-compiled git master version of ripgrep-all. I noticed that it runs very slowly for the first time after the computer starts. Therefore, I think it must work based on caching mechanism. The problem is how to maintain the cache after the computer is restarted to maximize the operation efficiency.

    Any hints for this problem will be highly appreciated.

    Regards, HZ

    opened by hongyi-zhao 8
  • Fix installation and CI

    Fix installation and CI

    • Fixes installation with the stable toolchain. Essentially it's just cargo update
    • Fixes the push pipeline, now it fails on a test
    • Fixes the release pipeline
    opened by TriplEight 8
  • error running pdf search on windows 10 - 64bit

    error running pdf search on windows 10 - 64bit

    I tried running the pdf search with the adapter "poppler" on both version 0.9.2 and 0.9.3 and I get the following error message. What am I missing here?

    Reference.pdf: preprocessor command failed: '"rga-preproc" "Reference.pdf"':
    -------------------------------------------------------------------------------
    adapter: poppler
    pdftotext version 4.00
    Copyright 1996-2017 Glyph & Cog, LLC
    Usage: pdftotext [options] <PDF-file> [<text-file>]
      -f <int>             : first page to convert
      -l <int>             : last page to convert
      -layout              : maintain original physical layout
      -simple              : simple one-column page layout
      -table               : similar to -layout, but optimized for tables
      -lineprinter         : use strict fixed-pitch/height layout
      -raw                 : keep strings in content stream order
      -fixed <number>      : assume fixed-pitch (or tabular) text
      -linespacing <number>: fixed line spacing for LinePrinter mode
      -clip                : separate clipped text
      -nodiag              : discard diagonal text
      -enc <string>        : output text encoding name
      -eol <string>        : output end-of-line convention (unix, dos, or mac)
      -nopgbrk             : don't insert page breaks between pages
      -bom                 : insert a Unicode BOM at the start of the text file
      -opw <string>        : owner password (for encrypted files)
      -upw <string>        : user password (for encrypted files)
      -q                   : don't print any messages or errors
      -cfg <string>        : configuration file to use in place of .xpdfrc
      -v                   : print copyright and version info
      -h                   : print usage information
      -help                : print usage information
      --help               : print usage information
      -?                   : print usage information
    Error: The pipe has been ended. (os error 109)
    
    opened by neelabalan 8
  • preprocessor command failed: '

    preprocessor command failed: '"rga-preproc" "/Users/user/Desktop/test/test.pdf.zip"

    I am getting this error while executing:

    rga "hello" ~/Desktop/test/

    where I have a zip file. I don't understand from the documentation whether ZIP files need an extra argument or not. Thanks in advance.

    opened by AtomicNess123 7
  • brew troubles?

    brew troubles?

    I did an install on macOS Catalina (10.15.7) using brew install rga and for all files I test this on I get this error when I try rga testing:

    ------------------------------------------------------------------------------- ./some file.pdf: preprocessor command failed: '"/usr/local/bin/rga-preproc" "./some file.pdf"': ------------------------------------------------------------------------------- adapter: poppler Error: Couldn't open file '-' Error: Broken pipe (os error 32)

    So then I went ahead and installed all the additional libraries mentioned in the readme with brew install pandoc poppler tesseract ffmpeg but this didn't seem to help at all. Even tried reinstalling rga after that.

    opened by hyperjeff 6
  • Respect .rgignore

    Respect .rgignore

    Hi, first of all thanks for this - this is incredibly useful for me. Caching makes all the difference comparing to the slow pdfgrep!

    It seems rga doesn't respect .rgignore file? Would it be possible to add it please?

    opened by rsuhada 6
  • Build fails with

    Build fails with "unstable feature" error in rkv dependency

    I tried doing cargo build (of master at commit ef2e4ebf28f) and got this error:

      $ cargo build 
          Updating crates.io index
       Downloading crates ...
        Downloaded chrono v0.4.6
        Downloaded encoding_rs v0.8.17
        [...]
         Compiling zip v0.5.2
         Compiling serde_json v1.0.39
         Compiling rkv v0.9.6
      error[E0658]: use of unstable library feature 'try_from' (see issue #33417)
         --> /home/kfogel/.cargo/registry/src/github.com-1ecc6299db9ec823/rkv-0.9.6/src/error.rs:166:11
          |
      166 | impl From<::std::num::TryFromIntError> for MigrateError {
          |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
      error[E0658]: use of unstable library feature 'try_from' (see issue #33417)
        --> /home/kfogel/.cargo/registry/src/github.com-1ecc6299db9ec823/rkv-0.9.6/src/migrate.rs:78:5
         |
      78 |     convert::TryFrom,
         |     ^^^^^^^^^^^^^^^^
      
      [...many more similar error lines...]
      
      error: aborting due to 12 previous errors
      
      For more information about this error, try `rustc --explain E0658`.
      error: Could not compile `rkv`.
      warning: build failed, waiting for other jobs to finish...
      error: build failed
      $ 
    

    I don't know much Rust, but it looks like rkv is using an unstable feature (rust bug 33417 has more about it), and that since rga depends on rkv, this affects the rga build too. I ran rustc --explain E0658 and got some information about how to solve the problem -- presumably those solutions would have to be implemented upstream in rkv, if we wanted to solve this for everyone, or else I'd have either build a modified rkv locally or get the nightly version of rustc to do the build I just tried to do.

    I'm not sure what ways might be available to solve this within rga. Ideas welcome; like I said, I don't know Rust that well.

    Anyway, this was all along the way to submitting a PR for README.md to add installation instructions. I'll submit that PR, and then in its commentary mention this issue.

    opened by kfogel 6
  • Opening a xml file and ran code inside when it wasn't supposed to (security?)

    Opening a xml file and ran code inside when it wasn't supposed to (security?)

    I used rga-fzf to search for a xml file. That file had a powershell script in it. When clicking on enter to open the file, the powershell script got executed which wasn't intended as it was malicious 😅

    I am using Manjaro (Arch Linux) with zsh and powershell+wine installed.

    Did anyone else observed that?

    Some screenshots:

    Screenshot from 2021-01-25 13-10-27 Screenshot from 2021-01-25 13-10-44

    opened by evilcel3ri 5
  • feature_request(books): detect incorrect and poor quality text

    feature_request(books): detect incorrect and poor quality text

    1. Summary

    It would be nice, if ripgrep-all will show warning, if text in the book is not written incorrect or have a bad quality.

    2. Problem

    2.1. Summary

    Some books have bad OCR layer. It is impossible to search for normal words in them. It would be nice, if ripgrep-all will detect these books.

    2.2. Details

    Books may have bad quality of searchable text. Reasons:

    1. The user who added OCR layer for the book, add incorrect language for OCR. For example, user may added English OCR layer for Russian text as in my 4.2 example.
    2. Bad quality of scanned book and/or tool which was used to add the OCR layer. See my example 3.

    I couldn't find, how I can automatically detect these books in my books list. Currently, I need manually check OCR layer quality for every book. It takes a lot of time.

    3. Compact Language Detector

    Possibly, Compact Language Detector can solve this problem.

    I installed cld2-cffi (yes, CLD3 exists, but I have problems in its installation on my Windows) → I ran this code in my Python interpreter:

    >>> import cld2
    >>> isReliable, textBytesFound, details = cld2.detect("Here text from examples 4.1—4.3")
    >>> print('  details: %s' % str(details))
    

    Possibly, would be possible get similar behavior use Rust tools. For example, see Whatlang and CLD3 langdetect.

    4. Example texts

    4.1. Normal Russian text

    Например, название Полтавы связано с названием речки Лтавы (так раньше называлась Ворскла) и означает, соответственно, «город на Лтаве». Название города Ужгород также образовано от названия реки Уж. Винница обязана своим названием речке Винничке, которая протекает через город. Название реки, в свою очередь, происходит от слова «венок»: когда-то молодые девушки собирались на ее берегу и пускали на воду венки, чтобы узнать о своем будущем. Луганск назван в честь речки Луганки.
    
    • cld2-cffi output:
    details: (Detection(language_name='RUSSIAN', language_code='ru', percent=99, score=709.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0))
    

    4.2. English OCR language for Russian text

    IIepBhle HeCKOJI:bKO COT MHJIJIHOHOB JIeT 6h1JIH nOHCTHHe KOWMapHhlMH ,n;JIH nJIaHeThI: OHa HenpephlBHO COTpHC8.JIaC:b no,n; y,n;apaMH KpynHhlx MeTeopHTOB, ChlnaBWHXCH Ha Hee H3 KOCMoca. IIoBepxHOCT:b COBpeMeHHOH JIYHhI, nOKpLITaSi MeTeopHTHhlMH KpaTepaMH, n03BOJIHeT HaM npe,n;CTaBHT:b, KaK MOrJIa BhlrJIH,n;eT:b 3eMJIH npHMepHO 4 MJIp,n; JIeT Ha3. OqeH:b CKOpO BHyrpH HaweH nJIaHeThl3apa60T8.JI tTenJIOBOH ABHraTeJI:b., rOplOqHM ,n;JIH KOToporo CJIymHJI pacn pHoaKTHBHhlX SJIeMeHTOB. B He,n;pax 3eMJIH HaqaJIOCh Me,n;JIeHHOe ,n;BHmeHHe BeeCTBa, HarpeThle CTPYH KOToporo nOAHHM8.JIHC:b BBepx, a XOJIO.D;Hhle onYCK8.JIHCh BHH3. IIJIaHeTa CT8.JIa noxoma Ha CneJIhlH nepCHK.
    
    details: (Detection(language_name='Unknown', language_code='un', percent=0, score=0.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0))
    

    Note: I remove Information Separator One gremlin characters from this text for cld2-cffi, otherwise I get traceback:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Python38\lib\site-packages\cld2\__init__.py", line 393, in detect
        raise ValueError("input contains invalid UTF-8 around byte " +
    ValueError: input contains invalid UTF-8 around byte 348 (of 792534779)
    

    4.3. Bad OCR

    1(этрин ска3ала' что знакома с книгой его )кены 3леоноры 8ирек по лекарстве[1пым расте|{ия!| аляски. ёа мой в3дох по поводу тогц что у нас в библиотеке тодько од'!а книга на эту фамилию, (этрин пообещала прислать книгу о лекарстве!тных расте]|иях аляски. !! действительно' не прошло и месяца' как у меня на столе появилась небольшая по объему эффектвого дизайва книга <а|-а5'(а'5 ш||овпшп$5 мвр1с]ш85> с изображе1|и₠ м ца обложке такого 3вакомого ка}<дому х{ителю нашей о6ласти ольховника. правда, в книге он значился 11од другим видовым названием' чем у ;1ас
    
    details: (Detection(language_name='RUSSIAN', language_code='ru', percent=59, score=503.0), Detection(language_name='SERBIAN', language_code='sr', percent=40, score=468.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0))
    

    5. Example of expected behavior

    1. ripgrep-all adapters extract text from books.

    2. CLD (or similar tool) check 2 (4 may be better) random pages for every book.

    3. If percent value is 95 (maybe another value is better; need practical tests) or more → do nothing. Else it below 95 → ripgrep-all user get a warning. Example warning text:

      WARNING! Possibly, file {Filename} have a text written not in natural language. The reason for this may be incorrect or poor quality OCR layer. Please, check your {Filename}.
      

    6. Note

    Some tools for language recognition may not solve this problem. They don't detect that the text written not in the real natural language.

    For example, I tried langdetect, TextBlob, guess_language and langid examples from this Stack Overflow answer → they show, that my 4.2 and 4.3 examples written on the real natural languages.

    Thanks.

    opened by Kristinita 5
  • Pdfgrep and rga comparison

    Pdfgrep and rga comparison

    Hello; this is not a bug or anything, just a question.

    I know that ripgrep is blazingly fast compared to other grep options, and I was recently recommended to use your ripgrep-all for my scripts to mass grep/sort/filter on pdfs. With little knowledge of specifics, I expected little from rga for search in pdfs: it seemed to me that decoding the pdf would be the time-consuming part and ripgrep would not make much difference for, say, a 200 page text over other greps. On the contrary, when I benchmarked rga against pdfgrep the difference was ridiculous and the diffs seem to clear, so no inconsistencies so far.

    Could you let me know, briefly, what makes rga so much faster than things like pdfgrep (that is, if you know or can guess)? The speed difference seems so remarkable that for my purposes makes pdfgrep useless.

    opened by ykonstant1 5
  • Update rusqlite to 0.20.0

    Update rusqlite to 0.20.0

    Fixes #22.

    Stuff to do:

    • [X] bump Rust version to stable
      • [ ] update README.md

    https://github.com/jgallagher/rusqlite/issues/543#issuecomment-515663685

    opened by Br1ght0ne 5
  • --auto-hybrid-regex bad offset into UTF string

    --auto-hybrid-regex bad offset into UTF string

    --auto-hybrid-regex should produce the same output as --pcre2 if one passes in a PCRE2 pattern. However, it throws an error on some files.

    I've included the following examples and files.

    • Offending pdf file to reproduce the rror can be downloaded here: https://onlinelibrary.wiley.com/doi/10.1111/j.1472-4642.2008.00521.x
    • Another file which does not present this problem and shows identical output: https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2486.2011.02549.x

    With --auto-hybrid-regex:

    rga '(?=.*biotic interaction)(?=.*plant)' 'Diversity and Distributions - 2008 - Catford - Reducing redundancy in invasion ecology by integrating hypotheses into a.pdf' --auto-hybrid-regex
    

    Faulty output:

    Diversity and Distributions - 2008 - Catford - Reducing redundancy in invasion ecology by integrating hypotheses into a.pdf: preprocessor command failed: '"/home/aj/.cargo/bin/rga-preproc" "Diversity and Distributions - 2008 - Catford - Reducing redundancy in invasion ecology by integrating hypotheses into a.pdf"': PCRE2: error matching: bad offset into UTF string
    

    With --pcre2:

    rga '(?=.*biotic interaction)(?=.*plant)' 'Diversity and Distributions - 2008 - Catford - Reducing redundancy in invasion ecology by integrating hypotheses into a.pdf' --pcre2
    

    Correct output:

    Page 18: Vázquez, D.P. (2006) Biotic interactions and plant invasions.
    
    opened by InvisOn 0
  • Searching encrypted files

    Searching encrypted files

    I have some password protected PDF files that I'd like to search with rga. How do I specify the password?

    More general, is it possible to pass options to the underlying adapters on invocation?

    opened by dideler 0
  • Errors not showing in stderr

    Errors not showing in stderr

    Error codes in rg can indicate whether a match was found at all, but when the same query is used with rga the error isn't passed to stderr. Am I missing something obvious?

    I'm trying to get rga to stop searching after a match is found, exit with a '0' to stderr, otherwise exiting with a '1' to stderr as is the behavior in rg.

    Thanks! Sterling

    opened by SterlingHooten 0
  • [FEATURE] replace pandoc with epub2txt2 for Epub search

    [FEATURE] replace pandoc with epub2txt2 for Epub search

    First - what an awesome project! It really makes searching of huge document libraries possible.

    Currently I have a lot of issues with Epub parsing, pandoc hangs forever with 100% CPU when parsing some EPUB files, sometimes bigger, but sometimes also on smaller ones. Currently I don't have a good workaround for this.

    I tried parsing those files that cause issues with https://github.com/kevinboone/epub2txt2 and it returns the content instantly. Also, judging by the amount of issues here with EPUB parsing, this could be a a good solution for many other issues.

    Please consider allowing to use epub2txt2 as backend for EPUB extraction.

    Thanks!

    opened by mindreframer 2
  • epub not reconized

    epub not reconized

    I have the next problem whuen I use rga: ~$ rga -s 'Losartan' Escritorio/receta.txt Amlodipino 5mg 1 caja, Alopurinol 300mg 1 caja, Carvedilol 12,5mg 2 cajas, Losartan 50 mg 2 cajas, Hipoglucin 1000mg 1 caja 60 comprimidos, Amioradona 200mg 2 cajas, Xarelto 20mg 1 caja, Atorvastatina 20mg 1 caja. Zotero/storage/Y5TFHDQX/Aquelarre.epub: preprocessor command failed: '"/home/jorge/.local/bin/rga-preproc" "Zotero/storage/Y5TFHDQX/Aquelarre.epub"':

    adapter: pandoc Couldn't extract ePub file Error: subprocess failed: ExitStatus(ExitStatus(16384))

    opened by ecovisiones 0
Releases(v0.9.6)
Owner
CS Student. ML Researcher. Fan of FOSS.
null
Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance calculations and string search.

triple_accel Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance cal

Daniel Liu 71 Sep 5, 2022
WriteForAll is a text file style checker, that compares text documents with editorial tips to make text better.

WriteForAll: tips to make text better WriteForAll is a text file style checker, that compares text documents with editorial tips to make text better.

Joel Parker Henderson 1 Nov 22, 2021
Library to calculate TF-IDF (Term Frequency - Inverse Document Frequency) for generic documents.

Library to calculate TF-IDF (Term Frequency - Inverse Document Frequency) for generic documents. The library provides strategies to act on objects that implement certain document traits (NaiveDocument, ProcessedDocument, ExpandableDocument).

Ferris Tseng 13 Sep 4, 2022
Schema2000 is a tool that parses exsiting JSON documents and tries to derive a JSON schema from these documents.

Schema 2000 Schema2000 is a tool that parses exsiting JSON documents and tries to derive a JSON schema from these documents. Currently, Schema2000 is

REWE Digital GmbH 11 May 31, 2022
A simple rust library to read and write Zip archives, which is also my pet project for learning Rust

rust-zip A simple rust library to read and write Zip archives, which is also my pet project for learning Rust. At the moment you can list the files in

Kang Seonghoon 2 Jan 5, 2022
PDF Structure Viewer, This tool is useful for when working with PDFs and/or lopdf.

PDF Structure Viewer Inspect how the PDF's structure looks. This tool is useful for when working with PDFs and/or lopdf. This application is used lopd

Ralph Bisschops 12 Aug 6, 2022
Search through millions of documents in milliseconds ⚡️

a concurrent indexer combined with fast and relevant search algorithms Introduction This repository contains the core engine used in MeiliSearch. It c

MeiliSearch 350 Sep 23, 2022
Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance calculations and string search.

triple_accel Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance cal

Daniel Liu 71 Sep 5, 2022
Tar file reading/writing for Rust

tar-rs Documentation A tar archive reading/writing library for Rust. # Cargo.toml [dependencies] tar = "0.4" Reading an archive extern crate tar; use

Alex Crichton 469 Sep 25, 2022
List of Rust books

Rust Books Books Starter Books Advanced Books Resources Books Starter Books The Rust Programming Language Free Welcome! This book will teach you about

Spiros Gerokostas 2k Sep 27, 2022
mdBook is a utility to create modern online books from Markdown files.

Create book from markdown files. Like Gitbook but implemented in Rust

The Rust Programming Language 10.7k Sep 29, 2022
tar analysis tool

alquitran Inspects tar archives and tries to spot portability issues in regard to POSIX 2017 pax specification and common tar implementations. Usage R

null 16 Aug 12, 2022
Converts books written in Markdown to HTML, LaTeX/PDF and EPUB

Crowbook Crowbook's aim is to allow you to write a book in Markdown without worrying about formatting or typography, and let the program generate HTML

Élisabeth Henry 555 Sep 20, 2022
A small script to facilitate the making of .src.spm.tar.gz packges

SPM-Helper Rust version: Installation PYTHON: install python and git Clone the repo with this command: git clone -b Python https://github.com/Soviet-L

Soviet Linux 3 Jun 24, 2022
A faster experimental wasm-based tar implementation for browsers.

@bytedance/tar-wasm A faster* experimental wasm-based tar implementation for browsers. *50-160x faster, see benchmarks below. Usage Install npm instal

Bytedance Inc. 61 Sep 14, 2022
Fast line based iteration almost entirely lifted from ripgrep's grep_searcher.

?? ripline This is not the greatest line reader in the world, this is just a tribute. Fast line based iteration almost entirely lifted from ripgrep's

Seth 11 Feb 18, 2022
ripgrep recursively searches directories for a regex pattern while respecting your gitignore

ripgrep (rg) ripgrep is a line-oriented search tool that recursively searches the current directory for a regex pattern. By default, ripgrep will resp

Andrew Gallant 33.2k Sep 22, 2022
Node.js bindings to the ripgrep library, for fast file searching in JavaScript without child processes!

ripgrepjs ripgrepjs: Node.js bindings to the ripgrep library, for direct integration with JS programs without spawning an extra subprocess! This proje

Annika 1 May 10, 2022
ripgrep recursively searches directories for a regex pattern while respecting your gitignore

ripgrep (rg) ripgrep is a line-oriented search tool that recursively searches the current directory for a regex pattern. By default, ripgrep will resp

Andrew Gallant 33.3k Sep 22, 2022