xargs + awk with pattern matching support. `ls *.bak | rargs -p '(.*)\.bak' mv {0} {1}`

Related tags

System tools rargs
Overview

Rargs is kind of xargs + awk with pattern-matching support.

Crates.io Build Status

Installation

Mac OS

brew install rargs

Nix

nix-env -i rargs

(Currently available in unstable channel)

Binary

Download in the Release Page and put it in your PATH after uncompress.

Using Cargo

cargo install --git https://github.com/lotabout/rargs.git

Example usage

Batch rename files

Suppose you have several backup files whose names match the pattern <scriptname>.sh.bak, and you want to map each filename back to <scriptname>.sh. We want to do it in a batch, so xargs is a natural choice, but how do we specify the name for each file? I believe there is no easy way.

With rargs, however, you are able to do:

ls *.bak | rargs -p '(.*)\.bak' mv {0} {1}

Here {0} refers to the whole input line, while {1} refers to the first group captured in the regular expression.

Batch download

I had a bunch of URLs and their corresponding target filenames stored in a CSV file:

URL1,filename1
URL2,filename2

I hoped there was a simple way to download and save each URL with its specified filename. With rargs there is:

cat download-list.csv | rargs -p '(?P<url>.*),(?P<filename>.*)' wget {url} -O {filename}

Here (?P<group_name>...) assigns the name group_name to the captured group. This can then be referred to as {group_name} in the command.

AWK replacement?

Suppose you have an xSV file with lots of columns, and you only want to extract and format some of them, e.g.:

nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false

Here's an example of how rargs can be used to process it:

$ cat /etc/passwd | rargs -d: echo -e 'id: "{1}"\t name: "{5}"\t rest: "{6..::}"'
id: "nobody"     name: "Unprivileged User"       rest: "/var/empty:/usr/bin/false"
id: "root"       name: "System Administrator"    rest: "/var/root:/bin/sh"
id: "daemon"     name: "System Services"         rest: "/var/root:/usr/bin/false"

rargs allow you to specify the delimiter (regex) to split the input on, and allows you to refer to the corresponding fields or field ranges. This allows it to be used as an AWK replacement for some simple but common cases.

How does it work?

  1. receive the input on stdin and split it into lines
  2. split (-d) or extract (-p) the input into named or numbered groups, with {0} matching the whole line
  3. map the named and numbered groups into a command passed as the remaining arguments, and execute the command

Features

Regexp captures

rargs allows you to use any regular expression to match the input, and captures anything you are interested in. The syntax is the standard, mostly Perl-compatible Rust regex syntax used by tools such as ripgrep.

  • positional (numbered) groups are captured with parentheses, e.g. '(\w+):(\d+)', and the corresponding groups are referred to by {1}, {2} etc. in the command
  • named groups are captured with (?P<name>...) and referred to by {name} in the command

Delimiter captures

For simple usage, you might not want to write the whole regular expression to extract parts of the line. All you want is to split the groups by some delimiter. With rargs you can achieve this by using the -d (delimiter) option.

Field ranges

We already know how to refer to captures by number ({1}) or by name ({name}). There are also cases where you might want to substitute multiple fields at the same time. rargs also supports this with field-range expressions.

Suppose we have already captured 5 groups representing the strings 1, 2, 3, 4 and 5

  • {..} gathers them all into 1 2 3 4 5 (note that they are separated by a space; this can be overridden by the -s option)
  • {..3} results in 1 2 3
  • {4..} results in 4 5
  • {2..4} results in 2 3 4
  • {3..3} results in 3

You can also specify a "local" separator (which will not affect the global setting):

  • {..3:-} results in 1-2-3
  • {..3:/} results in 1/2/3

Negative field

Sometimes you may want to refer to the last few fields but have no way to predict the total number of fields of the input. rargs offer negative fields.

Suppose we have already captured 5 groups representing the strings 1, 2, 3, 4 and 5:

  • {-1} results in 5
  • {-5} results in 1
  • {-6} results in nothing
  • {-3..} results in 3 4 5

Multiple threading

You can run commands in multiple threads to improve performance:

  • -w <num> specifies the number of workers you want to run simultaneously
  • -w 0 defaults the number of workers to the number of CPUs on your system

Special Variables

  • {LINENUM} or {LN} to refer to current line number.

Interested?

All feedback and PRs are welcome!

Comments
  • Rename threads option (`-w`) to `-j`

    Rename threads option (`-w`) to `-j`

    Many tools, such as GNU Parallel, GNU Make, fd, ripgrep, are using -j to specify the number of concurrents threads/jobs.

    Adopting the same convention would make Rargs easier to grasp.

    opened by ngirard 1
  • README: small rewrite + section reordering

    README: small rewrite + section reordering

    Hi, this is a small rewrite of the description section, plus a small section reordering ; basically I put the description + examples before the installing instructions. Cheers!

    opened by ngirard 0
  • Set `stdin` of child process to null

    Set `stdin` of child process to null

    See https://www.reddit.com/r/rust/comments/ebnooh/weird_behaviour_of_command/. The TLDR is because rargs shouldn't send data to stdin of the child process in any case, and Command will by default inherit the stdin of the parent process, we should explicitly set that.

    opened by crides 0
  • Exit code?

    Exit code?

    Firstly — thank you v much for rargs! I use it a lot. It's probably the most underrated "new cli tools" library.

    What should the exit code of an rargs command?

    Currently it seems to return successfully even if there were lots of errors in the individual commands.

    I'd like to be able to understand if any command failed (e.g. did any string fail to be grepped in a file). Is there a way of doing that with the current library?

    opened by max-sixty 1
  • Default to using multi-threading 🚀

    Default to using multi-threading 🚀

    To increase performance with the default settings, default to using multi-threading, specifically one thread per CPU. The user must explicitly opt out if they want single-threaded behavior, the previous default.

    This indirectly solves a different issue: the only way to automatically peg the number of threads to the number of CPU's was to set --worker 0 --threads 0 even though the worker argument had been deprecated.

    Fixes #18

    opened by kesyog 1
  • Default to using multi-threading

    Default to using multi-threading

    For the sake of ergonomics and performance, IMO the default number of threads should be the number of CPU's. In my experience, the cases where single-threaded behavior is required are the exception, not the rule, and users can explicitly opt in for single-threaded behavior if needed.

    This would indirectly solve a different issue: the only way to automatically peg the number of threads to the number of CPU's is to set --worker 0 --threads 0 even though the worker argument is listed as deprecated.

    opened by kesyog 1
  • Allow to only run command when given pattern is matched

    Allow to only run command when given pattern is matched

    Hi and happy new year,

    Given the most general running scenario involving a pattern

    rargs -p <pattern> <cmd>
    

    I'm unsure whether it makes sense to run cmd no matter if the pattern is matched ; but at least I can tell that it is not always what I expect from Rargs.

    Take for instance the common scenario when some files, e.g.

    1. Some name
    10. Another name
    

    need to be renamed like

    01. Some name
    10. Another name
    

    There, I wish I was able to type

    ls |rargs -p '^(\d\..*)' -- mv "'{1}'" "'0{1}'"
    

    but it wouldn't currently work since Rargs would execute mv for both files, instead of the first one only.

    opened by ngirard 2
  • panic on missing cmd

    panic on missing cmd

    Currently rargs completely panics if you pass it a program it can't exec; I'd expected it to fail of course, but maybe with a slightly better message 😄

    $ cat myfile.txt | rargs something
    thread '<unnamed>' panicked at 'command failed to start: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/libcore/result.rs:1165:5
    ...
    

    It prints the panic for each line of the input file.

    Thanks for the tool! I've been looking for something handy like this 👍

    opened by ChrisPenner 0
Releases(v0.3.0)
Owner
Jinzhou Zhang
A Program is a Process, Not a Thing
Jinzhou Zhang
1️⃣ el lisp number uno - one lisp to rule them all 🏆

luno el lisp number uno luno is the one lisp to rule them all. Still experimental, do not use it in production yet. goals embeddable small size simple

Eva Pace 3 Apr 25, 2022
This is choose, a human-friendly and fast alternative to cut and (sometimes) awk

Choose This is choose, a human-friendly and fast alternative to cut and (sometimes) awk Features terse field selection syntax similar to Python's list

Ryan Geary 1.4k Jan 7, 2023
An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.

regex A Rust library for parsing, compiling, and executing regular expressions. Its syntax is similar to Perl-style regular expressions, but lacks a f

The Rust Programming Language 2.6k Jan 8, 2023
An extremely high performance matching engine written in Rust.

Galois Introduction Galois is an extremely high performance matching engine written in Rust, typically used for the crypto currency exchange service.

UINB Tech 66 Jan 7, 2023
Finds matching solidity function signatures for a given 4 byte signature hash and arguments.

Finds matching solidity function signatures for a given 4 byte signature hash and arguments. Useful for finding collisions or 0x00000000 gas saving methods (though there are better techniques for saving gas on calldata)

null 73 Dec 22, 2022
"Algorithms for approximate string matching" in Rust, with Python bindings.

ukkonen Implementation of a bounded Levenshtein distance by Esko Ukkonen in "Algorithms for approximate string matching" in Rust, with Python bindings

Ethan Smith 1 Dec 1, 2021
The axiom profiler for exploring and visualizing SMT solver quantifier instantiations (made via E-matching).

Axiom Profiler A tool for visualising, analysing and understanding quantifier instantiations made via E-matching in a run of an SMT solver (at present

Viper Project 18 Oct 18, 2022
Simple string matching with questionmark- and star-wildcard operator

wildmatch Match strings against a simple wildcard pattern. Tests a wildcard pattern p against an input string s. Returns true only when p matches the

Armin Becher 38 Dec 18, 2022
rpsc is a *nix command line tool to quickly search for file systems items matching varied criterions like permissions, extended attributes and much more.

rpsc rpsc is a *nix command line tool to quickly search for file systems items matching varied criterions like permissions, extended attributes and mu

null 3 Dec 15, 2022
`matchable` provides a convenient enum for checking if a piece of text is matching a string or a regex.

matchable matchable provides a convenient enum for checking if a piece of text is matching a string or a regex. The common usage of this crate is used

Pig Fang 6 Dec 19, 2022
Code for working with edge-matching puzzles in the Eternity 2 family.

e2rs Code for working with edge-matching puzzles in the Eternity 2 family. This is a WIP sketch of some APIs and algs for representing and manipulatin

Matthew Pocock 3 Jan 18, 2023
An extremely fast glob matching library in Rust.

glob-match An extremely fast glob matching library with support for wildcards, character classes, and brace expansion. Linear time matching. No expone

Devon Govett 247 Jan 27, 2023
Employ your built-in wetware pattern recognition and signal processing facilities to understand your network traffic

Nethoscope Employ your built-in wetware pattern recognition and signal processing facilities to understand your network traffic. Check video on how it

Vesa Vilhonen 86 Dec 5, 2022
ripgrep recursively searches directories for a regex pattern while respecting your gitignore

ripgrep (rg) ripgrep is a line-oriented search tool that recursively searches the current directory for a regex pattern. By default, ripgrep will resp

Andrew Gallant 35k Jan 2, 2023
An iterator following a space-filling pattern over a given range

rlp-iter rlp-iter (Resolving Lattice Point Iterator) is an iterator that returns a space-filling permutation of integers in a given range. Specificall

Nathan Essex 1 May 27, 2022
ripgrep recursively searches directories for a regex pattern while respecting your gitignore

ripgrep (rg) ripgrep is a line-oriented search tool that recursively searches the current directory for a regex pattern. By default, ripgrep will resp

Andrew Gallant 35k Dec 31, 2022
Compile-time checked Builder pattern derive macro with zero-memory overhead

Compile-time checked Builder pattern derive macro with zero-memory overhead This is very much a work-in-progress. PRs welcome to bring this to product

Esteban Kuber 214 Dec 29, 2022
A collection of components and widgets that are built for bevy_ui and the ECS pattern

Widgets for Bevy UI A collection of components and widgets that are built for bevy_ui and the ECS pattern. Current State This was started recently and

Gabriel Bourgeois 3 Sep 2, 2022
Propositional logic evaluator and rule-based pattern matcher

Plogic Propositional logic evaluator and pattern transformer written in Rust. Plogic evaluates logic expressions in a REPL (Read, Execute, Print, Loop

Jan 17 Nov 25, 2022
CLI tool that extracts a regex pattern from a list of urls ( Rust )

rextract CLI tool that extracts a regex pattern from a list of urls. The tool is written in Rust and supports PCRE. Installation Step 1: Visit https:/

null 45 Dec 11, 2022