Human numeric sorting program — does what `sort -h` is supposed to do!

Fredrick Brennan

Last update: Sep 25, 2022

Related tags

Command-line humnumsort

Overview

`hns` — Human Numeric Sort v0.1.0 (⏫︎2022-09-20)

© 2022 Fredrick R. Brennan and hns Authors
- Apache 2.0 licensed, see LICENSE.
man page

Packages

Commercial

Has this ever happened to you?

$ seq 1 30 | awk '{printf "data_%s.txt\n", $1}' | sort -h > important_filenames.txt
$ sort -h < important_filenames.txt

data_10.txt
data_11.txt
data_12.txt
data_13.txt
data_14.txt
data_15.txt
data_16.txt
data_17.txt
data_18.txt
data_19.txt
data_1.txt
data_20.txt
data_21.txt
data_22.txt
data_23.txt
data_24.txt
data_25.txt
data_26.txt
data_27.txt
data_28.txt
data_29.txt
data_2.txt
data_30.txt
data_3.txt
data_4.txt
data_5.txt
data_6.txt
data_7.txt
data_8.txt
data_9.txt

Oh no! You forgot that the -h flag of the GNU coreutils sort package doesn't actually do what it claims to, and can't be fixed for various historical reasons (stay with me now don't fall asleep)!

       -h, --human-numeric-sort
              compare human readable numbers (e.g., 2K 1G)

If only there was a better way!

Hi, FREDDY MAYS here with another FUR-tastic invention. All your numbers, sorted just for you!

$ mv important_filenames.txt tests/data/README_example.txt
$ hns < tests/data/README_example.shuf.txt

data_1.txt
data_2.txt
data_3.txt
data_4.txt
data_5.txt
data_6.txt
data_7.txt
data_8.txt
data_9.txt
data_10.txt
data_11.txt
data_12.txt
data_13.txt
data_14.txt
data_15.txt
data_16.txt
data_17.txt
data_18.txt
data_19.txt
data_20.txt
data_21.txt
data_22.txt
data_23.txt
data_24.txt
data_25.txt
data_26.txt
data_27.txt
data_28.txt
data_29.txt
data_30.txt

Wow!

But if you git pull in the next Unix epoch, you'll also get my super duper negative number-understanding version!

$ seq -10 10 | awk '{printf "data_%s.txt\n", $1}' | sort -h > tests/data/README_example2.shuf.txt

Before your numbers were sad and drab…

data_0.txt
data_-10.txt
data_10.txt
data_-1.txt
data_1.txt
data_-2.txt
data_2.txt
data_-3.txt
data_3.txt
data_-4.txt
data_4.txt
data_-5.txt
data_5.txt
data_-6.txt
data_6.txt
data_-7.txt
data_7.txt
data_-8.txt
data_8.txt
data_-9.txt
data_9.txt

But now they can be radically sequential! (Woah!)

$ hns < tests/data/README_example2.shuf.txt

data_-10.txt
data_-9.txt
data_-8.txt
data_-7.txt
data_-6.txt
data_-5.txt
data_-4.txt
data_-3.txt
data_-2.txt
data_-1.txt
data_0.txt
data_1.txt
data_2.txt
data_3.txt
data_4.txt
data_5.txt
data_6.txt
data_7.txt
data_8.txt
data_9.txt
data_10.txt

So simple to use! No command line options! Just standard in and standard out, one size fits all! (Really, Freddy?) (Yes!)

And it's not only for short files, oh no no no! It's written in Rust so you know it can handle even the largest data overflows, such as an entire randomly sorted Class A network!

$ time RUST_LOG=INFO target/release/hns < /tmp/0.0.0.0／8.shuf.txt > /dev/null

[2022-09-20T14:58:00Z INFO  hns] Reading done; got 16777216 lines in 348513µs
[2022-09-20T14:58:28Z INFO  hns] Sorting done; sorted 16777216 lines in 27122184µs
[2022-09-20T14:58:32Z INFO  hns] Writing done; wrote in 4370570µs

real    0m31.855s
user    0m30.306s
sys     0m1.548s

Sixteen million lines with four comparison points each sorted in under thirty seconds! And that's a Freddy Mays guarantee.

Benchmarking data

See humnumsort-test-data repository.

You may want to pull it as:

$ git pull https://github.com/ctrlcctrlv/humnumsort-test-data/ tests/data/expensive

TODO

Zero-padding via another binary?

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “as is” basis, without warranties or conditions of any kind, either express or implied. See the License for the specific language governing permissions and limitations under the License.

You might also like...

ddi is a wrapper for dd. It takes all the same arguments, and all it really does is call dd in the background

ddi A safer dd Introduction If you ever used dd, the GNU coreutil that lets you copy data from one file to another, then you may have encountered a ty

80 Sep 8, 2022

This is choose, a human-friendly and fast alternative to cut and (sometimes) awk

Choose This is choose, a human-friendly and fast alternative to cut and (sometimes) awk Features terse field selection syntax similar to Python's list

1.4k Jan 7, 2023

Grep with human-friendly search output

hgrep: Human-friendly GREP hgrep is a grep tool to search files with given pattern and print the matched code snippets with human-friendly syntax high

345 Jan 4, 2023

Codemod - Codemod is a tool/library to assist you with large-scale codebase refactors that can be partially automated but still require human oversight and occasional intervention

Codemod - Codemod is a tool/library to assist you with large-scale codebase refactors that can be partially automated but still require human oversight and occasional intervention. Codemod was developed at Facebook and released as open source.

4k Dec 29, 2022

Human numeric sorting program — does what `sort -h` is supposed to do!

Related tags

Overview

`hns` — Human Numeric Sort v0.1.0 (⏫︎2022-09-20)

Packages

Commercial

Benchmarking data

TODO

License

You might also like...

ddi is a wrapper for dd. It takes all the same arguments, and all it really does is call dd in the background

This is choose, a human-friendly and fast alternative to cut and (sometimes) awk

Grep with human-friendly search output

Codemod - Codemod is a tool/library to assist you with large-scale codebase refactors that can be partially automated but still require human oversight and occasional intervention

A simple, human-friendly, embeddable scripting language

Fuzzy Index for Python, written in Rust. Works like error-tolerant dict, keyed by a human input.

Display file sizes in human-readable units

Parallel iteration of FASTA/FASTQ files, for when sequence order doesn't matter but speed does

A BASIC language interpreter. Does not conform to existing standards. Mostly a toy.

Releases(v0.1.1)

v0.1.1(Sep 23, 2022)

v0.1.0(Sep 21, 2022)

Owner

Fredrick Brennan

Yet another sort crate, porting Golang sort package to Rust.

Technically, this does exactly what sleep does but completes much faster!

A cross-platform file sorting program

A collection of numeric types and traits for Rust.

Rust implementation of custom numeric base conversion.

Learning rust by coding different sorting algorithms in it

FileSorterX is an automatic file sorting application that sorts your files into folders based on their file extension

A Rust implementation of Glidesort, my stable adaptive quicksort/mergesort hybrid sorting algorithm.

Zenith - sort of like top or htop but with zoom-able charts, CPU, GPU, network, and disk usage

Filter, Sort & Delete Duplicate Files Recursively

Human numeric sorting program — does what `sort -h` is supposed to do!

Related tags

Overview

hns — Human Numeric Sort v0.1.0 (⏫︎2022-09-20)

Packages

Commercial

Benchmarking data

TODO

License

You might also like...

ddi is a wrapper for dd. It takes all the same arguments, and all it really does is call dd in the background

This is choose, a human-friendly and fast alternative to cut and (sometimes) awk

Grep with human-friendly search output

Codemod - Codemod is a tool/library to assist you with large-scale codebase refactors that can be partially automated but still require human oversight and occasional intervention

A simple, human-friendly, embeddable scripting language

Fuzzy Index for Python, written in Rust. Works like error-tolerant dict, keyed by a human input.

Display file sizes in human-readable units

Parallel iteration of FASTA/FASTQ files, for when sequence order doesn't matter but speed does

A BASIC language interpreter. Does not conform to existing standards. Mostly a toy.

Releases(v0.1.1)

v0.1.1(Sep 23, 2022)

v0.1.0(Sep 21, 2022)

Owner

Fredrick Brennan

Yet another sort crate, porting Golang sort package to Rust.

Technically, this does exactly what sleep does but completes much faster!

A cross-platform file sorting program

A collection of numeric types and traits for Rust.

Rust implementation of custom numeric base conversion.

Learning rust by coding different sorting algorithms in it

FileSorterX is an automatic file sorting application that sorts your files into folders based on their file extension

A Rust implementation of Glidesort, my stable adaptive quicksort/mergesort hybrid sorting algorithm.

Zenith - sort of like top or htop but with zoom-able charts, CPU, GPU, network, and disk usage

Filter, Sort & Delete Duplicate Files Recursively

`hns` — Human Numeric Sort v0.1.0 (⏫︎2022-09-20)