frawk is a small programming language for writing short programs processing textual data

Overview

frawk

frawk is a small programming language for writing short programs processing textual data. To a first approximation, it is an implementation of the AWK language; many common Awk programs produce equivalent output when passed to frawk. You might be interested in frawk if you want your scripts to handle escaped CSV/TSV like standard Awk fields, or if you want your scripts to execute faster.

The info subdirectory has more in-depth information on frawk:

  • Overview: what frawk is all about, how it differs from Awk.
  • Types: A quick gloss on frawk's approach to types and type inference.
  • Parallelism: An overview of frawk's parallelism support.
  • Benchmarks: A sense of the relative performance of frawk and other tools when processing large CSV or TSV files.
  • Builtin Functions Reference: A list of builtin functions implemented by frawk, including some that are new when compared with Awk.

frawk is dual-licensed under MIT or Apache 2.0.

Installation

You will need to install Rust. If you would like to use the LLVM backend, you will need an installation of LLVM 12 on your machine:

  • See this site for installation instructions on some debian-based Linux distros. See also the comments on this issue for docker files that can be used to build a binary on Ubuntu.
  • On Arch pacman -Sy llvm llvm-libs and a C compiler (e.g. clang) are sufficient as of September 2020.
  • brew install llvm@12 or similar seem to work on Mac OS.

Depending on where your package manager puts these libraries, you may need to point LLVM_SYS_120_PREFIX at the llvm library installation (e.g. /usr/lib/llvm-12 on Linux or /usr/local/opt/llvm@12 on Mac OS when installing llvm@12 via Homebrew).

Building Without LLVM

While the LLVM backend is recommended, it is possible to build frawk only with support for the Cranelift-based JIT and its bytecode interpreter. To do this, build without the llvm_backend feature. The Cranelift backend provides comparable performance to LLVM for smaller scripts, but LLVM's optimizations can sometimes deliver a substantial performance boost over Cranelift (see the benchmarks document for some examples of this).

Building Using Stable

frawk currently requires a nightly compiler by default. To compile frawk using stable, compile without the unstable feature. Using rustup default nightly, or some other method to run a nightly compiler release is otherwise required to build frawk.

Building a Binary

With those prerequisites, cloning this repository and a cargo build --release or cargo [+nightly] install --path will produce a binary that you can add to your PATH if you so choose:

$ cd 
   
    
# With LLVM
$ cargo +nightly install --path .
# Without LLVM, but with other recommended defaults
$ cargo +nightly install --path . --no-default-features --features use_jemalloc,allow_avx2,unstable

   

frawk is now on crates.io, so running cargo install frawk with the desired features should also work.

While there are no deliberate unix-isms in frawk, I have not tested it on Windows.

Bugs and Feature Requests

frawk has bugs, and many rough edges. If you notice a bug in frawk, filing an issue with an explanation of how to reproduce the error would be very helpful. There are no guarantees on response time or latency for a fix. No one works on frawk full-time. The same policy holds for feature requests.

Comments
  • Default field splitting behaving inconsistently

    Default field splitting behaving inconsistently

    $ cat fields.txt
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxx  yyyyyyyyyyyyyyyyyyyyyyyyyyyy 111111
    xxxxxxxxxxxxxxxxxxxxxxxxxxx    yyyyyyyyyyyyyyyyyyyyyyyy     222222
    xxxxxxxxxxxxxxxxxxxxxxxxxxxx  yyyyyyyyyyyyyyyyyyyyyyyyyyyy 3333333
    xxxxxxxxxxxxxxxxxxxxxxxxxx    yyyyyyyyyyyyyyyyyyyyyyyy     4444444
    
    # wrong output for 2nd and 4th lines, and the failure is different
    $ frawk '{print NR ":" $3 }' fields.txt
    1:111111
    2:yyyyyyyyyyyyyyyyyyyyyyyy
    3:3333333
    4:
    
    # works correctly if those lines are given as the sole input
    $ sed -n '2p' fields.txt | frawk '{print NR ":" $3 }'
    1:222222
    $ sed -n '4p' fields.txt | frawk '{print NR ":" $3 }'
    1:4444444
    

    The failure also seems to depend on the length of the input lines or something like that, which is why I have those long x and y in the input, couldn't find a simpler failing case.

    opened by learnbyexample 24
  • doesn't compile - clap issue

    doesn't compile - clap issue

    Hi, I'm not a rust programmer so I might be doing something wrong here but I couldn't compile frawk.

    I tried following ways to install

    cargo +nightly install --path .
    cargo  install frawk
    cargo +nightly install --path . --no-default-features --features use_jemalloc,allow_avx2,unstable
    cargo +nightly install --path . --no-default-features --features allow_avx2,unstable
    

    all failed with the following message:

    error[E0599]: no method named `multiple` found for struct `clap::Arg` in the current scope
       --> src/main.rs:306:15
        |
    306 |              .multiple(true)
        |               ^^^^^^^^ method not found in `clap::Arg<'_>`
    
    error[E0599]: no method named `multiple` found for struct `clap::Arg` in the current scope
       --> src/main.rs:329:15
        |
    329 |              .multiple(true)
        |               ^^^^^^^^ method not found in `clap::Arg<'_>`
    
    error[E0599]: no method named `multiple` found for struct `clap::Arg` in the current scope
       --> src/main.rs:350:15
        |
    350 |              .multiple(true))
        |               ^^^^^^^^ method not found in `clap::Arg<'_>`
    
    For more information about this error, try `rustc --explain E0599`.
    warning: `frawk` (bin "frawk") generated 2 warnings
    error: failed to compile `frawk v0.4.2 (/usr/src/myapp)`, intermediate artifacts can be found at `/usr/src/myapp`
    

    I'm using ubuntu 20.04 and I ran rustup default nightly for nightly compilation. Here are the version numbers in case they are needed:

    # cargo --version
    cargo 1.56.0-nightly (e96bdb0c3 2021-08-17)
    # rustc --version
    rustc 1.56.0-nightly (ad02dc46b 2021-08-26)
    # rustup --version
    rustup 1.24.3 (ce5817a94 2021-05-31)
    info: This is the version for the rustup toolchain manager, not the rustc compiler.
    info: The currently active `rustc` version is `rustc 1.56.0-nightly (ad02dc46b 2021-08-26)`
    
    opened by alperyilmaz 10
  • Feature request: Parse the header-line

    Feature request: Parse the header-line

    Given foobar.csv:

    foo,bar
    1,2
    3,4
    

    I'd like to be able to write bar_mean.awk:

    { acc += FIELDS["bar"] }
    END { print "bar's mean is " acc/NR }
    

    and run it like this:

    $ frawk -icsv --headers -f bar_mean.awk <foobar.csv
    bar's mean is 3
    

    Differences when --headers is passed:

    • The first line is skipped: it's neither processed by the awk program, nor does it count towards NR.
    • A magic hashmap is defined (I called it FIELDS for the sake of the example above) which maps header name to field value.

    If you're open to going crazy with the syntax, you could even do something like $"bar". (I think $bar is probably going too far, since that's already valid AWK.)

    opened by asayers 10
  • frawk does not build on rust-nightly due to cranelift-jit

    frawk does not build on rust-nightly due to cranelift-jit

    I am trying to build frawk without LLVM (#67) and am getting:

    $ cargo +nightly install --path . --no-default-features --features use_jemalloc,allow_avx2,unstable
       Compiling cranelift-jit v0.75.0
    error[E0034]: multiple applicable items in scope
      --> /home/motiejus/.cargo/registry/src/github.com-1ecc6299db9ec823/lalrpop-0.17.2/src/message/horiz.rs:22:14
       |
    22 |             .intersperse(self.separate)
       |              ^^^^^^^^^^^ multiple `intersperse` found
       |
       = note: candidate #1 is defined in an impl of the trait `Iterator` for the type `std::iter::Map<I, F>`
       = note: candidate #2 is defined in an impl of the trait `Itertools` for the type `T`
    help: disambiguate the associated function for candidate #1
       |
    19 ~         Iterator::intersperse(self.items
    20 +             .iter()
    21 +             .map(|c| c.min_width()), self.separate)
       |
    help: disambiguate the associated function for candidate #2
       |
    19 ~         Itertools::intersperse(self.items
    20 +             .iter()
    21 +             .map(|c| c.min_width()), self.separate)
    

    I recognize it's a broken dependency and should be reported upstream. However, I looked upstream, but didn't find a way to compile the standalone project and report it there. Guidance would be appreciated; I am new to rust ecosystem.

    motiejus ~/code/frawk $ rustup show
    Default host: x86_64-unknown-linux-gnu
    rustup home:  /home/motiejus/.rustup
    
    nightly-x86_64-unknown-linux-gnu (default)
    rustc 1.57.0-nightly (fdf65053e 2021-09-07)
    motiejus ~/code/frawk $ rustc --version
    rustc 1.57.0-nightly (fdf65053e 2021-09-07)
    motiejus ~/code/frawk $ cargo --version
    cargo 1.56.0-nightly (18751dd3f 2021-09-01)
    motiejus ~/code/frawk $ 
    
    opened by motiejus 9
  • frawk does not detect start of program without -- when -v is last argument before program start.

    frawk does not detect start of program without -- when -v is last argument before program start.

    frawk does not detect start of program without -- when -v is last argument before program start.

    ❯ seq 1 10 | frawk -v multiplier=20 '{ print $0 * multiplier }'
    must specify program at command line, or in a file via -f
    
    ❯ seq 1 10 | frawk -v multiplier=20 -- '{ print $0 * multiplier }'
    20.0
    40.0
    60.0
    80.0
    100.0
    120.0
    140.0
    160.0
    180.0
    200.0
    
    ❯ seq 1 10 | frawk -F '\t' -v multiplier=20 '{ print $0 * multiplier }'
    must specify program at command line, or in a file via -f
    
    ~/Downloads via ☕ v1.8.0 via 🐍 v3.6.7
    ❯ seq 1 10 | frawk -v multiplier=20 -F '\t' '{ print $0 * multiplier }'
    20.0
    40.0
    60.0
    80.0
    100.0
    120.0
    140.0
    160.0
    180.0
    200.0
    

    frawk also seems to have problems with parsing variables with dashes (and maybe other special characters):

    ❯ seq 1 10 | awk -v multiplier=20 -v var_with_dash=variable-with-dash -F '\t' '{ print $0 * multiplier, var_with_dash }'
    20 variable-with-dash
    40 variable-with-dash
    60 variable-with-dash
    80 variable-with-dash
    100 variable-with-dash
    120 variable-with-dash
    140 variable-with-dash
    160 variable-with-dash
    180 variable-with-dash
    200 variable-with-dash
    
    ❯ seq 1 10 | frawk -v multiplier=20 -v var_with_dash=variable-with-dash -F '\t' '{ print $0 * multiplier, var_with_dash }'
    failed to parse var at index 2:
    var_with_dash=variable-with-dash
    error:UnrecognizedToken { token: (Loc { line: 0, col: 22, offset: 22 }, Sub, Loc { line: 0, col: 23, offset: 23 }), expected: ["\"[\""] }
    
    ❯ seq 1 10 | frawk -v multiplier=20 -v 'var_with_dash=variable-with-dash' -F '\t' '{ print $0 * multiplier, var_with_dash }'
    failed to parse var at index 2:
    var_with_dash=variable-with-dash
    error:UnrecognizedToken { token: (Loc { line: 0, col: 22, offset: 22 }, Sub, Loc { line: 0, col: 23, offset: 23 }), expected: ["\"[\""] }
    
    ❯ seq 1 10 | frawk -v multiplier=20 -v 'var_with_dash="variable-with-dash"' -F '\t' '{ print $0 * multiplier, var_with_dash }'
    20.0 variable-with-dash
    40.0 variable-with-dash
    60.0 variable-with-dash
    80.0 variable-with-dash
    100.0 variable-with-dash
    120.0 variable-with-dash
    140.0 variable-with-dash
    160.0 variable-with-dash
    180.0 variable-with-dash
    200.0 variable-with-dash
    
    ❯ seq 1 10 | frawk -v multiplier=20 -v var_with_dash="variable-with-dash" -F '\t' '{ print $0 * multiplier, var_with_dash }'
    failed to parse var at index 2:
    var_with_dash=variable-with-dash
    error:UnrecognizedToken { token: (Loc { line: 0, col: 22, offset: 22 }, Sub, Loc { line: 0, col: 23, offset: 23 }), expected: ["\"[\""] }
    

    If passed vars are not a number, the variable will be empty if not quoted (once for bash and once for frawk)). Which is quite different from normal awk.

    ❯ seq 1 10 | frawk -v multiplier=20 -v X=A -v 'var_with_dash="variable-with-dash"' -F '\t' '{ print $0 * multiplier, X, var_with_dash }'
    20.0  variable-with-dash
    40.0  variable-with-dash
    60.0  variable-with-dash
    80.0  variable-with-dash
    100.0  variable-with-dash
    120.0  variable-with-dash
    140.0  variable-with-dash
    160.0  variable-with-dash
    180.0  variable-with-dash
    200.0  variable-with-dash
    
    ❯ seq 1 10 | frawk -v multiplier=20 -v 'X=A' -v 'var_with_dash="variable-with-dash"' -F '\t' '{ print $0 * multiplier, X, var_with_dash }'
    20.0  variable-with-dash
    40.0  variable-with-dash
    60.0  variable-with-dash
    80.0  variable-with-dash
    100.0  variable-with-dash
    120.0  variable-with-dash
    140.0  variable-with-dash
    160.0  variable-with-dash
    180.0  variable-with-dash
    200.0  variable-with-dash
    
    ❯ seq 1 10 | frawk -v multiplier=20 -v 'X="A"' -v 'var_with_dash="variable-with-dash"' -F '\t' '{ print $0 * multiplier, X, var_with_dash }'
    20.0 A variable-with-dash
    40.0 A variable-with-dash
    60.0 A variable-with-dash
    80.0 A variable-with-dash
    100.0 A variable-with-dash
    120.0 A variable-with-dash
    140.0 A variable-with-dash
    160.0 A variable-with-dash
    180.0 A variable-with-dash
    200.0 A variable-with-dash
    
    opened by ghuls 8
  • Add basic support for gensub function

    Add basic support for gensub function

    This adds support for gensub function, which is a common GNU extension.

    This is accomplished by defining it as a built-in function, adding a bytecode instruction for it and finally implementing it as a method on a Str.

    I am a bit unsure how to handle runtime errors: should I do something smart or just ad hoc .unwrap()s are fine?

    Another thing: not sure whether (and how) I should handle GenSub in feedback function for type propagation

    Performance can be improved by handling the very common case of using a constant for "how" parameter and introducing specialized instructions for handling g case (replace all) and number case (replace only n-th match)

    opened by DCNick3 7
  • Missing Cargo.lock

    Missing Cargo.lock

    Hi,

    I'm trying to package frawk for nixos/nixpkgs, but I cannot do that without a Cargo.lock package. Are there any reasons why one isn't included? Thanks.

    opened by ethancedwards8 6
  • Support `exit`

    Support `exit`

    exit opt_expr An exit statement causes immediate execution of the END actions or program termination if there are none or if the exit occurs in an END action. The opt_expr sets the exit value of the program unless overridden by a later exit or subsequent error.

    ❯ seq 1 10 | mawk '{ print $0; if ($1 >= 8) { last = $0; exit 33; } } END { print "last line before exit: " last;}' ; echo "Exit code: $?";
    1
    2
    3
    4
    5
    6
    7
    8
    last line before exit: 8
    Exit code: 33
    
    ❯ seq 1 10 | frawk '{ print $0; if ($1 >= 8) { last = $0; exit 33; } } END { print "last line before exit: " last;}' ; echo "Exit code: $?";
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    last line before exit: 10
    Exit code: 0
    
    
    # Same but with exit with extra parentheses (frawk does not remove extra (), (print had this issue too in the past)):
    
    ❯ seq 1 10 | mawk '{ print $0; if ($1 >= 8) { last = $0; exit(33); } } END { print "last line before exit: " last;}' ; echo "Exit code: $?";
    1
    2
    3
    4
    5
    6
    7
    8
    last line before exit: 8
    Exit code: 33
    
    ❯ seq 1 10 | frawk '{ print $0; if ($1 >= 8) { last = $0; exit(33); } } END { print "last line before exit: " last;}' ; echo "Exit code: $?";
    failed to create program context: [src/cfg.rs:1641:21] Call to unknown function "exit"
    Exit code: 1
    
    
    opened by ghuls 5
  • fast-float can be removed when using

    fast-float can be removed when using

    fast-float can be removed when using a recent rust compiler. Similar algoritm now in standard library: https://www.reddit.com/r/rust/comments/omelz4/making_rust_float_parsing_fast_libcore_edition/ https://github.com/pola-rs/polars/issues/1010

    opened by ghuls 3
  • script file and data file needs to be provided in certain order

    script file and data file needs to be provided in certain order

    When I was running the file info/scripts/groupby.sh I encountered an error at the following line:

    time $FRAWK -bllvm -F'\t' -f "$SCRIPT_FILE" "${TSV}"
    

    When I was trying to trıubleshoot the problem I noticed that

    cat datafile | frawk -f scriptfile   # works
    frawk datafile -f scriptfile         # works
    frawk -f scriptfile datafile         # fails
    

    after trying different options, I found out that in case of frawk -f scriptfile datafile the datafile needs to be provided with --. Let me demonstrate the case:

    frawk works fine with pipe:

    $ echo -e "a,b,c\n1,2,3\n4,5,6"
    a,b,c
    1,2,3
    4,5,6
    
    $ echo -e "a,b,c\n1,2,3\n4,5,6" | frawk -icsv '{ print $2; }'
    b
    2
    5
    

    If we save the script as file, it woks fine in pipe

    $ echo '{ print $2; }' > test.awk
    $ echo -e "a,b,c\n1,2,3\n4,5,6" | frawk -icsv -f test.awk
    b
    2
    5
    

    if both data and script are from file, then fails

    $ echo -e "a,b,c\n1,2,3\n4,5,6" > test.dat
    $ frawk -icsv -f test.awk test.dat
    Unrecognized token `,` found at line 3, column 4:line 3, column 5
    Expected one of "\n", "[" or "{"
    

    workaround is using --

    $ frawk -icsv -f test.awk -- test.dat
    b
    2
    5
    

    Not sure if this is bug or a feature but still it would be great if frawk can be drop-in replacement for awk, by simply replacing awk with frawk in scripts , everything should work. So, it would be great if frawk can run just like awk where datafile is provided after script file.

    I'm using frawk version 0.4.2 in Linux.

    opened by alperyilmaz 3
  • Allow more than one `-f` argument

    Allow more than one `-f` argument

    Allow more than one -f argument. Now it results in this error message:

    error: The argument '--program-file <FILE>' was provided more than once, but cannot be used multiple times

    opened by ghuls 3
  • Not issue - but a large file performance stat

    Not issue - but a large file performance stat

    I have compared nawk with frawk on a macOS Darwin: i7 with 16GB RAM.

    I used a 17GB .csv file for test using: frawk 'BEGIN{FS=","};{count[NF]++};END{for(i in count){print "With " i " fields - count is: " count[i]}}' aggregated.csv

    results: frawk - 2 min nawk - 23 min

    Impressive. Thank for the work on frawk.

    opened by theguyhill 2
  • support for parquet files

    support for parquet files

    This might sound crazy but still I wanted to propose a feature request about parquet files.

    You might ask, why? Parquet files are becoming more widespread and might even be considered as "the new csv". There are specialized tools such as duckdb to run sql commands on them. I didn't see or came across any awk-like utility which can process parquet files.

    IMHO, supporting parquet files by frawk will be a huge win for "data analysis at the commandline" camp.

    opened by alperyilmaz 2
  • Can't build on latest Arch Linux

    Can't build on latest Arch Linux

    yay -Sy frawk .................
    Compiling frawk v0.4.6 (/tmp/makepkg/frawk/src/frawk-0.4.6) Compiling tikv-jemallocator v0.4.3 error[E0554]: #![feature] may not be used on the stable release channel --> src/main.rs:2:43 | 2 | #![cfg_attr(feature = "unstable", feature(core_intrinsics))] | ^^^^^^^^^ error[E0554]: #![feature] may not be used on the stable release channel --> src/main.rs:3:43 | 3 | #![cfg_attr(feature = "unstable", feature(test))] | ^^^^ error[E0554]: #![feature] may not be used on the stable release channel --> src/main.rs:4:43 | 4 | #![cfg_attr(feature = "unstable", feature(write_all_vectored))] | ^^^^^^^^^^^^^^^^^^ For more information about this error, try rustc --explain E0554. error: could not compile frawk due to 3 previous errors ==> ERROR: A failure occurred in build().

    opened by ato2 2
  • Windows compilation breaks on `jemalloc` integration

    Windows compilation breaks on `jemalloc` integration

    To my surprise, this crate compiles with cargo build --no-default-features on Windows 10! Woot, awk on Windows! :D

    Attempting to compile with the use_jemalloc against x86_64-pc-windows-msvc (the most common Windows target AFAIK), one gets a compile error of the form:

    $ cargo check # `default` features enable `use_jemalloc`
    
    <snip>
    
    The following warnings were emitted during compilation:
    
    warning: "jemalloc support for `x86_64-pc-windows-msvc` is untested"
    
    error: failed to run custom build command for `tikv-jemalloc-sys v0.4.2+5.2.1-patched.2`
    
    Caused by:
      process didn't exit successfully: `<snip>\frawk\target\debug\build\tikv-jemalloc-sys-8070626b6b43d50a\build-script-build` (exit code: 101)
      --- stdout
      TARGET=x86_64-pc-windows-msvc
      HOST=x86_64-pc-windows-msvc
      NUM_JOBS=8
      OUT_DIR="<snip>\\target\\debug\\build\\tikv-jemalloc-sys-1f5ef3a81eb3f627\\out"
      BUILD_DIR="<snip>\\target\\debug\\build\\tikv-jemalloc-sys-1f5ef3a81eb3f627\\out\\build"
      SRC_DIR="<snip>\\.cargo\\registry\\src\\github.com-1ecc6299db9ec823\\tikv-jemalloc-sys-0.4.2+5.2.1-patched.2"
      cargo:warning="jemalloc support for `x86_64-pc-windows-msvc` is untested"
      cargo:rustc-cfg=prefixed
      OPT_LEVEL = Some("0")
      TARGET = Some("x86_64-pc-windows-msvc")
      HOST = Some("x86_64-pc-windows-msvc")
      CC_x86_64-pc-windows-msvc = None
      CC_x86_64_pc_windows_msvc = None
      HOST_CC = None
      CC = None
      CFLAGS_x86_64-pc-windows-msvc = None
      CFLAGS_x86_64_pc_windows_msvc = None
      HOST_CFLAGS = None
      CFLAGS = None
      CRATE_CC_NO_DEFAULTS = None
      CARGO_CFG_TARGET_FEATURE = Some("aes,avx,avx2,bmi1,bmi2,fma,fxsr,lzcnt,pclmulqdq,popcnt,rdrand,rdseed,sse,sse2,sse3,sse4.1,sse4.2,ssse3,xsave,xsavec,xsaveopt,xsaves")
      DEBUG = Some("true")
      CC="C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.12.25827\\bin\\HostX64\\x64\\cl.exe"
      CFLAGS="-nologo -MD -Z7 -Brepro -W4"
      JEMALLOC_REPO_DIR="jemalloc"
      --with-jemalloc-prefix=_rjem_
      running: "sh" "<snip>/frawk/target/debug/build/tikv-jemalloc-sys-1f5ef3a81eb3f627/out/build/configure" "--disable-cxx" "--with-jemalloc-prefix=_rjem_" "--with-private-namespace=_rjem_" "--host=x86_64-pc-win32" "--build=x86_64-pc-win32" "--prefix=<snip>\\frawk\\target\\debug\\build\\tikv-jemalloc-sys-1f5ef3a81eb3f627\\out"
      checking for xsltproc... /usr/bin/xsltproc
      checking for x86_64-pc-win32-gcc... C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\bin\HostX64\x64\cl.exe
      checking whether the C compiler works... no
      running: "tail" "-n" "100" "<snip>\\frawk\\target\\debug\\build\\tikv-jemalloc-sys-1f5ef3a81eb3f627\\out\\build\\config.log"
      exe=''
      exec_prefix='<snip>\frawk\target\debug\build\tikv-jemalloc-sys-1f5ef3a81eb3f627\out'
      EXEEXT=''
      EXTRA_CFLAGS=''
      EXTRA_CXXFLAGS=''
      EXTRA_LDFLAGS=''
      GREP=''
      HAVE_CXX14=''
      host_alias='x86_64-pc-win32'
      host_cpu=''
      host_os=''
      host_vendor=''
      host='x86_64-pc-win32'
      htmldir='${docdir}'
      importlib=''
      includedir='${prefix}/include'
      INCLUDEDIR='<snip>frawktargetdebugbuildtikv-jemalloc-sys-1f5ef3a81eb3f627out/include'
      infodir='${datarootdir}/info'
      INSTALL_DATA=''
      INSTALL_PROGRAM=''
      INSTALL_SCRIPT=''
      install_suffix=''
      je_=''
      JEMALLOC_CPREFIX=''
      JEMALLOC_PREFIX=''
      jemalloc_version_bugfix=''
      jemalloc_version_gid=''
      jemalloc_version_major=''
      jemalloc_version_minor=''
      jemalloc_version_nrev=''
      jemalloc_version=''
      LD_PRELOAD_VAR=''
      LD=''
      LDFLAGS='-nologo -MD -Z7 -Brepro -W4'
      LDTARGET=''
      libdir='${exec_prefix}/lib'
      LIBDIR='<snip>frawktargetdebugbuildtikv-jemalloc-sys-1f5ef3a81eb3f627out/lib'
      libdl=''
      libexecdir='${exec_prefix}/libexec'
      LIBOBJS=''
      libprefix=''
      LIBS=''
      link_whole_archive=''
      LM=''
      localedir='${datarootdir}/locale'
      localstatedir='${prefix}/var'
      LTLIBOBJS=''
      mandir='${datarootdir}/man'
      MANDIR='<snip>frawk\target\debug\build\tikv-jemalloc-sys-1f5ef3a81eb3f627\out/share/man'
      MKLIB=''
      NM=''
      o=''
      OBJEXT=''
      objroot=''
      oldincludedir='/usr/include'
      PACKAGE_BUGREPORT=''
      PACKAGE_NAME=''
      PACKAGE_STRING=''
      PACKAGE_TARNAME=''
      PACKAGE_URL=''
      PACKAGE_VERSION=''
      PATH_SEPARATOR=':'
      pdfdir='${docdir}'
      PIC_CFLAGS=''
      PREFIX='<snip>\frawk\target\debug\build\tikv-jemalloc-sys-1f5ef3a81eb3f627\out'
      prefix='<snip>\frawk\target\debug\build\tikv-jemalloc-sys-1f5ef3a81eb3f627\out'
      private_namespace=''
      program_transform_name='s,x,x,'
      psdir='${docdir}'
      RANLIB=''
      rev='2'
      RPATH_EXTRA=''
      RPATH=''
      sbindir='${exec_prefix}/sbin'
      sharedstatedir='${prefix}/com'
      SHELL='/bin/sh'
      so=''
      SOREV=''
      SPECIFIED_CFLAGS='-nologo -MD -Z7 -Brepro -W4'
      SPECIFIED_CXXFLAGS=''
      srcroot=''
      sysconfdir='${prefix}/etc'
      target_alias=''
      TEST_LD_MODE=''
      XSLROOT=''
      XSLTPROC='/usr/bin/xsltproc'
    
      ## ----------- ##
      ## confdefs.h. ##
      ## ----------- ##
    
      /* confdefs.h */
      #define PACKAGE_NAME ""
      #define PACKAGE_TARNAME ""
      #define PACKAGE_VERSION ""
      #define PACKAGE_STRING ""
      #define PACKAGE_BUGREPORT ""
      #define PACKAGE_URL ""
    
      configure: exit 77
    
      --- stderr
      configure: error: in `<snip>/frawk/target/debug/build/tikv-jemalloc-sys-1f5ef3a81eb3f627/out/build':
      configure: error: C compiler cannot create executables
      See `config.log' for more details
      thread 'main' panicked at 'command did not execute successfully: "sh" "<snip>/frawk/target/debug/build/tikv-jemalloc-sys-1f5ef3a81eb3f627/out/build/configure" "--disable-cxx" "--with-jemalloc-prefix=_rjem_" "--with-private-namespace=_rjem_" "--host=x86_64-pc-win32" "--build=x86_64-pc-win32" "--prefix=C:\\Users\\K0RYU\\workspace\\frawk\\target\\debug\\build\\tikv-jemalloc-sys-1f5ef3a81eb3f627\\out"
      expected success, got: exit code: 77', C:\Users\K0RYU\.cargo\registry\src\github.com-1ecc6299db9ec823\tikv-jemalloc-sys-0.4.2+5.2.1-patched.2\build.rs:333:9
      note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    warning: build failed, waiting for other jobs to finish...
    error: build failed
    

    Basically, jemalloc isn't something we should expect to compile on Windows right now. I think this is unfortunate, given that it's currently impossible to define a feature that only is default for a specific platform.

    opened by ErichDonGubler 2
  • Should ARM64 use be avoided?

    Should ARM64 use be avoided?

    The Overview doc says "I suspect [frawk] would run much slower on a 64-bit non-x86 architecture." Does that mean it frawk would be slower on ARM than frawk on x84, or that frawk would be slower than gawk/mawk/etc? In other words, would frawk still be faster, or should it be avoided on ARM if performance is a goal?

    opened by farski 4
  • Scientific notation is not supported in awk script.

    Scientific notation is not supported in awk script.

    Scientific notation is not supported in awk script, but works for parsing files with scientific notation.

    $ printf '6.18163e-27\n1.80782e-40\n2.38296e-05\n1.92843e-09\n7.37465e-39\n' | frawk '$1 <= 1e-8'
    
    $ printf '6.18163e-27\n1.80782e-40\n2.38296e-05\n1.92843e-09\n7.37465e-39\n' | frawk '$1 <= 1E-8'
    
    
    $ printf '6.18163e-27\n1.80782e-40\n2.38296e-05\n1.92843e-09\n7.37465e-39\n' | frawk '$1 <= 10^-8'
    6.18163e-27
    1.80782e-40
    1.92843e-09
    7.37465e-39
    
    # It understands scientific notation when parsing files itself.
    $ printf '6.18163e-27\n1.80782e-40\n2.38296e-05\n1.92843e-09\n7.37465e-39\n' | frawk '{ if ($1 <= 10^-8) { print $1, ($1 + 0.0); } }'
    6.18163e-27 6.18163e-27
    1.80782e-40 1.80782e-40
    1.92843e-09 1.92843e-9
    7.37465e-39 7.37465e-39
    
    
    $ printf '6.18163e-27\n1.80782e-40\n2.38296e-05\n1.92843e-09\n7.37465e-39\n' | gawk '$1 <= 1e-8'
    6.18163e-27
    1.80782e-40
    1.92843e-09
    7.37465e-39
    
    $ printf '6.18163e-27\n1.80782e-40\n2.38296e-05\n1.92843e-09\n7.37465e-39\n' | gawk '$1 <= 1E-8'
    6.18163e-27
    1.80782e-40
    1.92843e-09
    7.37465e-39
    
    $ printf '6.18163e-27\n1.80782e-40\n2.38296e-05\n1.92843e-09\n7.37465e-39\n' | gawk '$1 <= 10^-8'
    6.18163e-27
    1.80782e-40
    1.92843e-09
    7.37465e-39
    
    opened by ghuls 2
Releases(v0.4.7)
  • v0.4.7(Jan 2, 2023)

    This release contains a number of small improvements and updated dependencies around cranelift. The main new feature is baseline support for M1 Macs (and potentially other ARM CPUs). We still do not have ARM SIMD equivalents for parsing, so performance on ARM may still lag behind what I'd hope.

    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Jun 21, 2021)

    Initial github release. I am still experimenting with how to best go about releasing binaries. I've provided static linux x86 binaries using musl libc. Let me know if there's another platform you are interested in. I haven't yet been able to produce a static executable while linking in jemalloc or LLVM, but I may look into this more in the future. For now, I still recommend building from source if you want more control or transparency in how the binary is produced.

    Source code(tar.gz)
    Source code(zip)
    linux-x86-musl-no-llvm.tar.gz(3.35 MB)
Owner
Eli Rosenthal
Eli Rosenthal
👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike

Table of Contents What does this library do? Why does this library exist? Which languages are supported? How good is it? Why is it better than other l

Peter M. Stahl 569 Jan 3, 2023
lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike.

lingua-py lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike. Installation pip install l

messense 7 Dec 30, 2022
A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

nlprule A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based app

Benjamin Minixhofer 496 Jan 8, 2023
Natural Language Processing for Rust

rs-natural Natural language processing library written in Rust. Still very much a work in progress. Basically an experiment, but hey maybe something c

Chris Tramel 211 Dec 28, 2022
Rust-nlp is a library to use Natural Language Processing algorithm with RUST

nlp Rust-nlp Implemented algorithm Distance Levenshtein (Explanation) Jaro / Jaro-Winkler (Explanation) Phonetics Soundex (Explanation) Metaphone (Exp

Simon Paitrault 34 Dec 20, 2022
Easy reading and writing of `serde` structs to/from Google Sheets

serde_sheets Read and write structs directly from google sheets using serde and csv Implement serde::Serialize to write and serde::Deserialize to read

null 5 Jul 20, 2022
The Reactive Extensions for the Rust Programming Language

This is an implementation of reactive streams, which, at the high level, is patterned off of the interfaces and protocols defined in http://reactive-s

ReactiveX 468 Dec 20, 2022
A small random number generator hacked on top of Rust's standard library. An exercise in pointlessness.

attorand from 'atto', meaning smaller than small, and 'rand', short for random. A small random number generator hacked on top of Rust's standard libra

Isaac Clayton 1 Nov 24, 2021
A small rust library for creating regex-based lexers

A small rust library for creating regex-based lexers

nph 1 Feb 5, 2022
A small CLI utility for helping you learn japanese words made in rust 🦀

Memofante (Clique aqui ver em português) Memofante is here, a biiiig help: Do you often forget japanese words you really didn't want to forget? Do you

Tiaguinho 3 Nov 4, 2023
Rust programming, in Japanese

sabi In Japanese version https://github.com/bnjbvr/rouille. Shamelessly copied and updated from it. 日本語で Rust プログラムを書くことができます! 例 main.rs sabi::sabi! {

Yuki Toyoda 54 Dec 30, 2022
Simple Data Stealer

helfsteal Simple Data Stealer Hi All, I published basic data stealer malware with Rust. FOR EDUCATIONAL PURPOSES. You can use it for Red Team operatio

Ahmet Güler 7 Jul 7, 2022
A quick way to decode a contract's transaction data with only the contract address and abi.

tx-decoder A quick way to decode a contract's transaction data with only the contract address and abi. E.g, let tx_data = "0xe70dd2fc00000000000000000

DeGatchi 15 Feb 13, 2023
Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/

Whatlang Natural language detection for Rust with focus on simplicity and performance. Content Features Get started Documentation Supported languages

Sergey Potapov 805 Dec 28, 2022
Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models

rust-tokenizers Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigra

null 165 Jan 1, 2023
Simple, extendable and embeddable scripting language.

duckscript duckscript SDK CLI Simple, extendable and embeddable scripting language. Overview Language Goals Installation Homebrew Binary Release Ducks

Sagie Gur-Ari 356 Dec 24, 2022
A HDPSG-inspired symbolic natural language parser written in Rust

Treebender A symbolic natural language parsing library for Rust, inspired by HDPSG. What is this? This is a library for parsing natural or constructed

Theia Vogel 32 Dec 26, 2022
Ultra-fast, spookily accurate text summarizer that works on any language

pithy 0.1.0 - an absurdly fast, strangely accurate, summariser Quick example: pithy -f your_file_here.txt --sentences 4 --help: Print this help messa

Catherine Koshka 13 Oct 31, 2022
Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

null 294 Dec 23, 2022
A Google-like web search engine that provides the user with the most relevant websites in accordance to his/her query, using crawled and indexed textual data and PageRank.

Mini Google Course project for the Architecture of Computer Systems course. Overview: Architecture: We are working on multiple components of the web c

Max 11 Aug 10, 2022