Sniffer - a tool to quickly inspect csv and flat-file files for basic information

Related tags

Command-line sniffer
Overview

sniffer

sniffer is a tool to quickly inspect csv and flat-file files for basic information.

Need to see how many rows are in a csv file? Want to see the first few rows printed out to your terminal?

Then sniffer is for you!

sniffer is built with Rust and is made for the average Data Engineering or data person who frequently need to inspect csv files quicky.

The follow data is displayed about a flat file by default.

  • file size in mb.
  • number of lines per file.
  • header row is displayed.
  • First few rows are printed.
  • Option to indciate of flat-file is quoted.
  • Option to check all columns for NULL values.
  • Option to check for whitespace and beginning and end of columns.

Usage

Usage: sniffer [OPTIONS] --file-path  --delimiter 

Options:
      --file-path 
      --delimiter 
      --quote           [default: 0]
      --check-nulls             [default: 1]
      --check-whitespace   [default: 1]
  -h, --help                   Print help
  -V, --version                Print version

To use sniffer to inspect a flat-file simply pass the file-path and delimiter. cargo run -- --file-path sample.csv --delimiter , --quote 1 --check-nulls 1 This will give you output something like ...

Headers: StringRecord(["ride_id", "rideable_type", "started_at", "ended_at", "start_station_name", "start_station_id", "end_station_name", "end_station_id", "start_lat", "start_lng", "end_lat", "end_lng", "member_casual"])

'Row: StringRecord(["CBCD0D7777F0E45F", "classic_bike", "2023-02-14 11:59:42", "2023-02-14 12:13:38", "Southport Ave & Clybourn Ave", "TA1309000030", "Clark St & Schiller St", "TA1309000024", "41.920771", "-87.663712", "41.907993", "-87.631501", "casual"])

'Row: StringRecord(["F3EC5FCE5FF39DE9", "electric_bike", "2023-02-15 13:53:48", "2023-02-15 13:59:08", "Clarendon Ave & Gordon Ter", "13379", "Sheridan Rd & Lawrence Ave", "TA1309000041", "41.957879424", "-87.649583697", "41.969517", "-87.654691", "casual"])

'Row: StringRecord(["E54C1F27FA9354FF", "classic_bike", "2023-02-19 11:10:57", "2023-02-19 11:35:01", "Southport Ave & Clybourn Ave", "TA1309000030", "Aberdeen St & Monroe St", "13156", "41.920771", "-87.663712", "41.880419", "-87.655519", "member"])

number of lines: 4
No columns with nulls
No columns with whitespace at beginning or end
File size in MB: 0.001027107238769531

Testing and CI, Building.

To run pre-commit checks ... pre-commit run --all-files

You might also like...
My own image file format created for fun! Install the "hif_opener.exe" to open hif files. clone the repo and compile to make your own hif file

Why am i creating this? I wanted to create my own image format since I was 12 years old using Windows 7, tryna modify GTA San Andreas. That day, when

FileSorterX is an automatic file sorting application that sorts your files into folders based on their file extension
FileSorterX is an automatic file sorting application that sorts your files into folders based on their file extension

FileSorterX is an automatic file sorting application that sorts your files into folders based on their file extension. With FileSorterX, you can easily keep your files organized and find what you need quickly.

📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.
📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.

📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.

Xsv - A fast CSV command line toolkit written in Rust.

xsv is a command line program for indexing, slicing, analyzing, splitting and joining CSV files. Commands should be simple, fast and composable: Simpl

qsv: Ultra-fast CSV data-wrangling toolkit
qsv: Ultra-fast CSV data-wrangling toolkit

qsv is a command line program for indexing, slicing, analyzing, splitting, enriching, validating & joining CSV files. Commands are simple, fast & composable

A utility written in Rust for dumping binary information out of Mach-O files inspired by objdump

Mach-O Dump (macho-dump) An objdump like tool for exploring and manipulating Mach-O files. Note: This project is in an early stage and most of the fea

Web-based tool that allows browsing and comparing symbol and type information of Microsoft Windows binaries across different versions of the OS.
Web-based tool that allows browsing and comparing symbol and type information of Microsoft Windows binaries across different versions of the OS.

WinDiff About WinDiff is an open-source web-based tool that allows browsing and comparing symbol and type information of Microsoft Windows binaries ac

A command-line tool aiming to upload the local image used in your markdown file to the GitHub repo and replace the local file path with the returned URL.
A command-line tool aiming to upload the local image used in your markdown file to the GitHub repo and replace the local file path with the returned URL.

Pup A command line tool aiming to upload the local image used in your markdown file to the GitHub repo and replace the local file path with the return

tmplt is a command-line interface tool that allows you to quickly and easily set up project templates for various programming languages and frameworks
tmplt is a command-line interface tool that allows you to quickly and easily set up project templates for various programming languages and frameworks

tmplt A User Friendly CLI Tool For Creating New Projects With Templates About tmplt is a command-line tool that lets users quickly create new projects

Comments
  • Enhance function read_number_lines_in_file with Error handling and Unit tests

    Enhance function read_number_lines_in_file with Error handling and Unit tests

    Problem

    Function read_number_lines_in_file panics and this will break the application in the run time

    Enhancement

    • Implement Err propagation & handling in the lib.rs for function read_number_lines_in_file, So it now returns a Result enum. So now anyone uses the library will have the option to deal with errors the way they like - Panics, Continue etc
    • Implement Unit tests for the function read_number_lines_in_file

    Code Example

    pub fn read_number_lines_in_file(file_path: &str) -> Result<u32,Box<dyn Error>> {
        let mut count: u32 = 0;
        let file: fs::File = std::fs::File::open(file_path)?;
        let bf: BufReader<fs::File> = BufReader::new(file);
        for _ in bf.lines() {
            count += 1;
        }
        Ok(count)
    }
    
    opened by M-Farag 2
  • Update function has_whitespace_at_beginning_or_end

    Update function has_whitespace_at_beginning_or_end

    PR Description

    • Enhancements to the function has_whitespace_at_beginning_or_end
      • Simplifying the white space checking
      • Improving the error handling with clear error messages
      • Propagating errors to the function caller using the Result enum for better error handling
      • Adding unit tests against the function to test different use cases and check the results
      • Adding simple validation against the string length
      • Adding function documentation

    Code snippet

    fn has_whitespace_at_beginning_or_end(s: &str) -> Result<bool,&'static str> {
    
        if s.len() == 0 {
            return Ok(false);
        }
    
        let c = s.chars().take(1).last().expect("Error getting first character");
        if c.is_whitespace()  {
            return Ok(true);
        }
        let c = s.chars().rev().take(1).last().expect("Error getting last character");
        if c.is_whitespace() {
            return Ok(true);
        }
        
        Ok(false)
    }
    
    opened by M-Farag 0
Owner
Daniel B
Data Engineer. Data lover. Data warehouse expert. Python and SQL are the spice of life.
Daniel B
rsv is a command line tool to deal with small and big CSV, TXT, EXCEL files (especially >10G)

csv, excel toolkit written in Rust rsv is a command line tool to deal with small and big CSV, TXT, EXCEL files (especially >10G). rsv has following fe

Zhuang Dai 39 Jan 30, 2023
A simple CLI tool for converting CSV file content to JSON.

fast-csv-to-json A simple CLI tool for converting CSV file content to JSON. 我花了一個小時搓出來,接著優化了兩天的快速 CSV 轉 JSON CLI 小工具 Installation Install Rust with ru

Ming Chang 3 Apr 5, 2023
Yfin is the Official package manager for the Y-flat programming language

Yfin is the Official package manager for the Y-flat programming language. Yfin allows the user to install, upgrade, and uninstall packages. It also allows a user to initialize a package with the Y-flat package structure and files automatically generated. In future, Yfin will also allow users to publish packages.

Jake Roggenbuck 0 Mar 3, 2022
port sniffer, multithreading

SniffSniff I am trying to learn Rust programming language. Here is a small project that sniffs the ports of a given host. I want to give some info wha

Furkan Demir 5 Mar 5, 2023
Inspect dynamic dependencies of Mach-O binaries recursively

dylibtree dylibtree is a tool for inspecting the dynamic dependencies of a Mach-O binary recursively. It can be useful to understand what library load

Keith Smiley 53 Jul 3, 2023
A CLI utility installed as "ansi" to quickly get ANSI escape sequences. Supports the most basic ones, like colors and styles as bold or italic.

'ansi' - a CLI utility to quickly get ANSI escape codes This Rust project called ansi-escape-sequences-cli provides an executable called ansi which ca

Philipp Schuster 5 Jul 28, 2022
A simple and efficient terminal UI implementation with ratatui.rs for getting quick insights from csv files right on the terminal

CSV-GREP csv-grep is an intuitive TUI application writting with ratatui.rs for reading, viewing and quickly analysing csv files right on the terminal.

Anthony Ezeabasili 16 Mar 10, 2024
A tool for collecting rollup blocks from the Aztec Connect rollup, and exporting them to csv

Aztec Connect Data Gobbler The Aztec Connect Data gobbler is a tool made for extracting data from the Aztec Connect system using only L1 as its source

Lasse Herskind 6 Feb 17, 2023
rpsc is a *nix command line tool to quickly search for file systems items matching varied criterions like permissions, extended attributes and much more.

rpsc rpsc is a *nix command line tool to quickly search for file systems items matching varied criterions like permissions, extended attributes and mu

null 3 Dec 15, 2022
Introducing Inlyne, a GPU powered yet browsless tool to help you quickly view markdown files in the blink of an eye.

Inlyne - a GPU powered, browserless, markdown + html viewer inlyne README.md --theme dark/light About Markdown files are a wonderful tool to get forma

null 308 Jan 1, 2023