Human numeric sorting program — does what `sort -h` is supposed to do!

Overview

hnsHuman Numeric Sort v0.1.0 (⏫︎2022-09-20)

  • © 2022 Fredrick R. Brennan and hns Authors
    • Apache 2.0 licensed, see LICENSE.
  • man page

Packages

Commercial

Has this ever happened to you?

$ seq 1 30 | awk '{printf "data_%s.txt\n", $1}' | sort -h > important_filenames.txt
$ sort -h < important_filenames.txt
data_10.txt
data_11.txt
data_12.txt
data_13.txt
data_14.txt
data_15.txt
data_16.txt
data_17.txt
data_18.txt
data_19.txt
data_1.txt
data_20.txt
data_21.txt
data_22.txt
data_23.txt
data_24.txt
data_25.txt
data_26.txt
data_27.txt
data_28.txt
data_29.txt
data_2.txt
data_30.txt
data_3.txt
data_4.txt
data_5.txt
data_6.txt
data_7.txt
data_8.txt
data_9.txt

Oh no! You forgot that the -h flag of the GNU coreutils sort package doesn't actually do what it claims to, and can't be fixed for various historical reasons (stay with me now don't fall asleep)!

       -h, --human-numeric-sort
              compare human readable numbers (e.g., 2K 1G)

If only there was a better way!

Hi, FREDDY MAYS here with another FUR-tastic invention. All your numbers, sorted just for you!

$ mv important_filenames.txt tests/data/README_example.txt
$ hns < tests/data/README_example.shuf.txt
data_1.txt
data_2.txt
data_3.txt
data_4.txt
data_5.txt
data_6.txt
data_7.txt
data_8.txt
data_9.txt
data_10.txt
data_11.txt
data_12.txt
data_13.txt
data_14.txt
data_15.txt
data_16.txt
data_17.txt
data_18.txt
data_19.txt
data_20.txt
data_21.txt
data_22.txt
data_23.txt
data_24.txt
data_25.txt
data_26.txt
data_27.txt
data_28.txt
data_29.txt
data_30.txt

Wow!

But if you git pull in the next Unix epoch, you'll also get my super duper negative number-understanding version!

$ seq -10 10 | awk '{printf "data_%s.txt\n", $1}' | sort -h > tests/data/README_example2.shuf.txt

Before your numbers were sad and drab…

data_0.txt
data_-10.txt
data_10.txt
data_-1.txt
data_1.txt
data_-2.txt
data_2.txt
data_-3.txt
data_3.txt
data_-4.txt
data_4.txt
data_-5.txt
data_5.txt
data_-6.txt
data_6.txt
data_-7.txt
data_7.txt
data_-8.txt
data_8.txt
data_-9.txt
data_9.txt

But now they can be radically sequential! (Woah!)

$ hns < tests/data/README_example2.shuf.txt
data_-10.txt
data_-9.txt
data_-8.txt
data_-7.txt
data_-6.txt
data_-5.txt
data_-4.txt
data_-3.txt
data_-2.txt
data_-1.txt
data_0.txt
data_1.txt
data_2.txt
data_3.txt
data_4.txt
data_5.txt
data_6.txt
data_7.txt
data_8.txt
data_9.txt
data_10.txt

So simple to use! No command line options! Just standard in and standard out, one size fits all! (Really, Freddy?) (Yes!)

And it's not only for short files, oh no no no! It's written in Rust so you know it can handle even the largest data overflows, such as an entire randomly sorted Class A network!

$ time RUST_LOG=INFO target/release/hns < /tmp/0.0.0.0/8.shuf.txt > /dev/null
[2022-09-20T14:58:00Z INFO  hns] Reading done; got 16777216 lines in 348513µs
[2022-09-20T14:58:28Z INFO  hns] Sorting done; sorted 16777216 lines in 27122184µs
[2022-09-20T14:58:32Z INFO  hns] Writing done; wrote in 4370570µs
real    0m31.855s
user    0m30.306s
sys     0m1.548s

Sixteen million lines with four comparison points each sorted in under thirty seconds! And that's a Freddy Mays guarantee.

Benchmarking data

TODO

  • Zero-padding via another binary?

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “as is” basis, without warranties or conditions of any kind, either express or implied. See the License for the specific language governing permissions and limitations under the License.

You might also like...
ddi  is a wrapper for dd. It takes all the same arguments, and all it really does is call dd in the background
ddi is a wrapper for dd. It takes all the same arguments, and all it really does is call dd in the background

ddi A safer dd Introduction If you ever used dd, the GNU coreutil that lets you copy data from one file to another, then you may have encountered a ty

This is choose, a human-friendly and fast alternative to cut and (sometimes) awk
This is choose, a human-friendly and fast alternative to cut and (sometimes) awk

Choose This is choose, a human-friendly and fast alternative to cut and (sometimes) awk Features terse field selection syntax similar to Python's list

Grep with human-friendly search output
Grep with human-friendly search output

hgrep: Human-friendly GREP hgrep is a grep tool to search files with given pattern and print the matched code snippets with human-friendly syntax high

Codemod - Codemod is a tool/library to assist you with large-scale codebase refactors that can be partially automated but still require human oversight and occasional intervention

Codemod - Codemod is a tool/library to assist you with large-scale codebase refactors that can be partially automated but still require human oversight and occasional intervention. Codemod was developed at Facebook and released as open source.

A simple, human-friendly, embeddable scripting language

Mica Language reference · Rust API A simple, human-friendly scripting language, developed one feature at a time. Human-friendly syntax inspired by Rub

Fuzzy Index for Python, written in Rust. Works like error-tolerant dict, keyed by a human input.

FuzzDex FuzzDex is a fast Python library, written in Rust. It implements an in-memory fuzzy index that works like an error-tolerant dictionary keyed b

Display file sizes in human-readable units

hsize Display file sizes in human-readable units $ hsize 1000 1000000 5000000 1.00 KB 1.00 MB 5.00 MB $ hsize -p 5 1048576 12345678 1.04858 MB 12.345

Parallel iteration of FASTA/FASTQ files, for when sequence order doesn't matter but speed does

Rust-parallelfastx A truly parallel parser for FASTA/FASTQ files. Principle The input file is memory-mapped then virtually split into N chunks. Each c

A BASIC language interpreter. Does not conform to existing standards. Mostly a toy.
A BASIC language interpreter. Does not conform to existing standards. Mostly a toy.

JW-Basic A toy language that is somewhat like QBasic. Features: Graphics: 160x96 (255 colors & transparent) Text: 32x16 (4x5 font) Character set: ASCI

Releases(v0.1.1)
Owner
Fredrick Brennan
I make fonts and stuff
Fredrick Brennan
Yet another sort crate, porting Golang sort package to Rust.

IndexSort IndexSort Yet another sort crate (in place), porting Golang's standard sort package to Rust. Installation [dependencies] indexsort = "0.1.0"

Al Liu 4 Sep 28, 2022
Technically, this does exactly what sleep does but completes much faster!

hypersleep Technically does everything that sleep does but it is "blazingly fast!" For example, $ time sleep 1 real 0m1.005s user 0m0.001s sys

Nigel 4 Oct 27, 2022
A cross-platform file sorting program

Cabinet Cross-platform file sorting system that sorts files based on their attributes, such as file type, file name and date modified. Disclaimer: Not

Ray 2 Jul 12, 2022
A collection of numeric types and traits for Rust.

num A collection of numeric types and traits for Rust. This includes new types for big integers, rationals (aka fractions), and complex numbers, new t

null 813 Dec 27, 2022
Rust implementation of custom numeric base conversion.

base_custom Use any characters as your own numeric base and convert to and from decimal. This can be taken advantage of in various ways: Mathematics:

Daniel P. Clark 5 Dec 28, 2021
Learning rust by coding different sorting algorithms in it

Sorting in Rust Sorting a 32 bit unsigned integer vector by using: Radix Sort Quick Sort Merge Sort Bubble Sort Inbuilt Sort (Probably Tim Sort) This

Pulak Malhotra 1 Nov 28, 2021
FileSorterX is an automatic file sorting application that sorts your files into folders based on their file extension

FileSorterX is an automatic file sorting application that sorts your files into folders based on their file extension. With FileSorterX, you can easily keep your files organized and find what you need quickly.

Xanthus 22 Apr 4, 2023
A Rust implementation of Glidesort, my stable adaptive quicksort/mergesort hybrid sorting algorithm.

Glidesort Glidesort is a novel stable sorting algorithm that combines the best-case behavior of Timsort-style merge sorts for pre-sorted data with the

Orson Peters 1.4k Apr 22, 2023
Zenith - sort of like top or htop but with zoom-able charts, CPU, GPU, network, and disk usage

Zenith - sort of like top or htop but with zoom-able charts, CPU, GPU, network, and disk usage

Benjamin Vaisvil 1.6k Jan 4, 2023
Filter, Sort & Delete Duplicate Files Recursively

Deduplicator Find, Sort, Filter & Delete duplicate files Usage Usage: deduplicator [OPTIONS] [scan_dir_path] Arguments: [scan_dir_path] Run Dedupl

Sreedev Kodichath 108 Jan 27, 2023