Count your code by tokens, types of syntax tree nodes, and patterns in the syntax tree. A tokei/scc/cloc alternative.

Overview

tcount

(pronounced "tee-count")

Count your code by tokens, types of syntax tree nodes, and patterns in the syntax tree.

Build Status

Quick Start

Simply run tcount in your project root to count tokens and files and print the results grouped by Language. E.g.,

tcount
────────────────────────────
 Group        Files  Tokens
────────────────────────────
 Rust            18   10309
 Go               8    4539
 Ruby             6    1301
────────────────────────────

Installation

cargo install --git https://github.com/RRethy/tcount.git

Requirements

  • Lastest stable Rust compiler.
  • Mac or Linux (untested on Windows but most functionality should work, --query option likely will not work on Windows)

tcount Cookbook

Note: None of these use --query, see Queries for information on that option

Compare size of each language in pwd

tcount
────────────────────────────
 Group        Files  Tokens
────────────────────────────
 Rust            18   10309
 Go               8    4539
 Ruby             6    1301
────────────────────────────

Top 5 files by token count

tcount --groupby=file --top=5
──────────────────────────────────
 Group              Files  Tokens
──────────────────────────────────
 ./src/count.rs         1    2451
 ./src/language.rs      1    1685
 ./src/main.rs          1    1214
 ./src/output.rs        1    1157
 ./src/cli.rs           1     757
──────────────────────────────────

Compare size of two directories

tcount --groupby=arg go/scc/ rust/tokei/
─────────────────────────────────────────────
 Group                         Files  Tokens 
─────────────────────────────────────────────
 go/scc                          170  479544 
 rust/tokei                      152   39797 
─────────────────────────────────────────────

Compare size of a Go file and a Rust file

tcount --groupby=file foo.go foo.rs
────────────────────────────
 Group        Files  Tokens
────────────────────────────
 foo.rs           1    1214
 foo.go           1     757
────────────────────────────

Count comments for each language

tcount --kind-pattern=".*comment"
──────────────────────────────────────────────────
 Group        Files  Tokens  Pattern(.*comment)
──────────────────────────────────────────────────
 Rust            18   10309                    78
 Go               7    1302                    35
 Ruby             4     802                    12
──────────────────────────────────────────────────

Note: Comment nodes can have different names depending on the parser. For a language, you can look in the node-types.json file in the parser repo to see what names are given to different nodes (e.g. Go Parser Repo's node-types.json)

Track change in project size over time

tcount --format=csv > tcount-$(date +%m-%d-%Y).csv

These CSV files can then be read and graphed using your tool of choice.

Compare size of all Go files vs all Rust files in foo/

tcount --whitelist Go Rust -- foo/
──────────────────────
 Group  Files  Tokens
──────────────────────
 Rust       9    9034
 Go         6    2011
──────────────────────

Supported languages

tcount --list-languages
──────────────────────────────────────────────────────────────────────
 Language           Extensions                        Query Dir Name 
──────────────────────────────────────────────────────────────────────
 Bash               .bash                             bash 
 BibTeX             .bib                              bibtex 
 C                  .h,.c                             c 
 C#                 .csx,.cs                          c_sharp 
 Clojure            .clj                              clojure 
 C++                .cxx,.c++,.h++,.hh,.cc,.cpp,.hpp  cpp 
 CSS                .css                              css 
 Elm                .elm                              elm 
 Erlang             .erl,.hrl                         erlang 
 Go                 .go                               go 
 HTML               .html                             html 
 Java               .java                             java 
 Javascript         .js,.mjs                          javascript 
 JSON               .json                             json 
 Julia              .jl                               julia 
 LaTeX              .tex                              latex 
 Markdown           .md                               markdown 
 OCaml              .ml                               ocaml 
 OCaml Interface    .mli                              ocaml_interface 
 Python             .pyw,.py                          python 
 Tree-sitter Query  .scm                              query 
 Ruby               .rb                               ruby 
 Rust               .rs                               rust 
 Scala              .scala,.sc                        scala 
 Svelte             .svelte                           svelte 
 Typescript         .ts                               typescript 
──────────────────────────────────────────────────────────────────────

Why count tokens instead of lines

  1. Counting lines rewards dense programs. For example,
int nums[4] = { 1, 2, 3, 4 };
int mult[4] = {0};
for (int i = 0; i < 4; i++) {
    mult[i] = nums[i] * 2;
}
printf("[%d] [%d] [%d] [%d]", mult[0], mult[1], mult[2], mult[3]);
nums := []int{1, 2, 3, 4}
mult := make([]int, 4)
for i, n := range nums {
    mult[i] = n * 2
}
fmt.Println(mult)

Are these programs the same size? They are each 6 lines long, but clearly the Go version is considerably smaller than the C version. While this is a contrived example, line counting still rewards dense programs and dense programming languages.

  1. Counting lines rewards short variable names. Is ns shorter than namespace? By bytes it is, when used throughout a project it likely will result in fewer line breaks, but I don't think a program should be considered smaller just because it uses cryptic variable names whenever possible.

  2. Counting lines penalizes line comments mixed with code. Consider the following contrived example,

v.iter() // iterate over the vector
    .map(|n| n * 2) // multiply each number by two
    .collect::Vec<u32>(); // collect the iterator into a vector of u32

Without the comments, it could be written as v.iter().map(|n| n * 2).collect::Vec<32>();.

  1. Short syntactical elements in languages are rewarded. For example:
[1, 2, 3, 4].map { |n| n * 2 }

Compared with the equivalent

[1, 2, 3, 4].map do |n|
    n * 2
end
  1. Counting lines rewards horizontal programming and penalizes vertical programming

Usage

tcount -h
tcount 0.1.0
Count your code by tokens, node kinds, and patterns in the syntax tree.

USAGE:
    tcount [FLAGS] [OPTIONS] [--] [paths]...

FLAGS:
        --count-hidden        Count hidden files
    -h, --help                Prints help information
        --list-languages      Show a list of supported languages for parsing
        --no-dot-ignore       Don't respect .ignore files
        --no-git              Don't respect gitignore and .git/info/exclude files
        --no-parent-ignore    Don't respect ignore files from parent directories
        --show-totals         Show column totals. This is not affected by --top
    -V, --version             Prints version information

OPTIONS:
        --blacklist <blacklist>...          Blacklist of languages not to parse. This is overriden by --whitelist and
                                            must be an exact match
        --format <format>                   One of table|csv [default: table]
        --groupby <groupby>                 One of language|file|arg. "arg" will group by the `paths` arguments provided
                                            [default: language]
    -k, --kind <kind>...                    kinds of nodes in the syntax tree to count. See node-types.json in the
                                            parser's repo to see the names of nodes or use https://tree-
                                            sitter.github.io/tree-sitter/playground.
    -p, --kind-pattern <kind-pattern>...    Patterns of node kinds to count in the syntax tree (e.g. ".*comment" to
                                            match nodes of type "line_comment", "block_comment", and "comment").
                                            Supports Rust regular expressions
        --query <query>...                  Tree-sitter queries to match and count. Captures can also be counted with
                                            --query=query_name@capture_name,capture_name2. See
                                            https://github.com/RRethy/tcount/blob/master/QUERIES.md for more information
        --sort-by <sort-by>                 One of group|numfiles|tokens. "group" will sort based on --groupby value
                                            [default: tokens]
        --top <top>                         How many of the top results to show
        --verbose <verbose>                 Logging level. 0 to not print errors. 1 to print IO and filesystem errors. 2
                                            to print parsing errors. 3 to print everything else. [default: 0]
        --whitelist <whitelist>...          Whitelist of languages to parse. This overrides --blacklist and must be an
                                            exact match

ARGS:
    <paths>...    Files and directories to parse and count. [default: .]

Counting Tree-sitter Queries

See QUERIES.md

Performance

tcount parses each file using a Tree-sitter parser to create a full syntax tree. This takes more time than only counting lines of code/comments so programs like tokei, scc, and cloc will typically be faster than tcount.

Here are some benchmarks using hyperfine to give an overview of how much slower it is than line counting programs:

tcount

Program Runtime
tcount 19.5 ms ± 1.7 ms
scc 13.0 ms ± 1.4 ms
tokei 7.2 ms ± 1.2 ms
cloc 1.218 s ± 0.011 s

Redis

Program Runtime
tcount 1.339 s ± 0.125 s
scc 49.9 ms ± 1.6 ms
tokei 79.9 ms ± 5.3 ms
cloc 1.331 s ± 0.016 s

CPython

Program Runtime
tcount 11.580 s ± 0.199 s
scc 256.7 ms ± 3.0 ms
tokei 512.2 ms ± 96.4 ms
cloc 12.467 s ± 0.139 s

Limitations

  • tcount does not support nested languages like ERB. This may change in the future.
  • It's not always clear what is a token, tcount treats any node in the syntax tree without children as a token. This usually works, but in some cases, like strings in the Rust Tree-sitter parser which can have children (escape codes), it may produce slightly expected results.

Why Tree-sitter

Tree-sitter has relatively efficient parsing and has support for many languages without the need to create and maintain individual parsers or lexers. Support for new languages is easy and only requires and up-to-date Tree-sitter parser.

Contributing

To add support for a new language, add it's information to https://github.com/RRethy/tcount/blob/master/src/language.rs and add the language's Tree-sitter parser crate to Cargo.toml.

Acknowledgements

All parsing is done using Tree-sitter parsers

You might also like...
List key patterns of a JSON file for jq.
List key patterns of a JSON file for jq.

jqk jqk lists key patterns of a JSON file for jq. Why? jq is a useful command line tool to filter values from a JSON file quickly on a terminal; howev

Extract patterns from unstructured log messages
Extract patterns from unstructured log messages

logu logu is for extracting patterns from (streaming) unstructured log messages. For parsing unstructured logs, it uses the parser from Drain. In simp

A cli tool to automate the building and deployment of Bitcoin nodes
A cli tool to automate the building and deployment of Bitcoin nodes

ℹ️ Automate Bitcoin builds, speed up deployment Shran is an open-source cli tool being developed to address the needs of DMG Blockchain Solutions. It

Work in progress NCBI's Common Tree alternative in the terminal

Lomanai Usage lomanai --species 'Mus musculus' --species 'Homo sapiens' # Mammalia # ++Rodentia # | \-Mus musculus # \+Primates # \-Homo sapien

Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens

Large language models (LLMs) can be used for many tasks, but often have a limited context size that can be smaller than documents you might want to use. To use documents of larger length, you often have to split your text into chunks to fit within this context size.

Build Abstract Syntax Trees and tree-walking models quickly in Rust.

astmaker Build Abstract Syntax Trees and tree-walking models quickly in Rust. Example This example creates an AST for simple math expressions, and an

A more compact and intuitive ASCII table in your terminal: an alternative to
A more compact and intuitive ASCII table in your terminal: an alternative to "man 7 ascii" and "ascii"

asciit A more compact and intuitive ASCII table in your terminal: an alternative to man 7 ascii and ascii. Colored numbers and letters are much more e

rehype plugin to use tree-sitter to highlight code in pre code blocks

rehype-tree-sitter rehype plugin to use tree-sitter to highlight code in precode blocks Contents What is this? When should I use this? Install Use

A simple lexer which creates over 75 various tokens based on the rust programming language.

Documentation. This complete Lexer/Lexical Scanner produces tokens for a string or a file path entry. The output is a Vector for the user to handle ac

Comments
  • Unable to install

    Unable to install

    I tried installing, and it fails at this spot. Any other system deps I need? I'm running macos

    # cargo install --git https://github.com/RRethy/tcount.git
        Updating git repository `https://github.com/RRethy/tcount.git`
      Installing tcount v0.1.0 (https://github.com/RRethy/tcount.git#71638e65)
        Updating crates.io index
        Updating git repository `https://github.com/latex-lsp/tree-sitter-bibtex`
        Updating git repository `https://github.com/tree-sitter/tree-sitter-c`
        Updating git repository `https://github.com/tree-sitter/tree-sitter-c-sharp`
        Updating git repository `https://github.com/sogaiu/tree-sitter-clojure`
        Updating git repository `https://github.com/tree-sitter/tree-sitter-css`
        Updating git repository `https://github.com/AbstractMachinesLab/tree-sitter-erlang`
        Updating git repository `https://github.com/tree-sitter/tree-sitter-go`
        Updating git repository `https://github.com/tree-sitter/tree-sitter-html`
        Updating git repository `https://github.com/tree-sitter/tree-sitter-json`
        Updating git repository `https://github.com/tree-sitter/tree-sitter-julia`
        Updating git repository `https://github.com/latex-lsp/tree-sitter-latex`
        Updating git repository `https://github.com/nvim-treesitter/tree-sitter-lua`
        Updating git repository `https://github.com/nvim-treesitter/tree-sitter-query`
        Updating git repository `https://github.com/tree-sitter/tree-sitter-ruby`
        Updating git repository `https://github.com/tree-sitter/tree-sitter-scala`
    error: failed to compile `tcount v0.1.0 (https://github.com/RRethy/tcount.git#71638e65)`, intermediate artifacts can be found at `/var/folders/rr/bzwz4lzx7mbbymtknwk1y7140000gn/T/cargo-installFHZdFy`
    
    Caused by:
      failed to select a version for the requirement `tree-sitter-c = "^0.16.0"`
      candidate versions found which didn't match: 0.20.1
      location searched: Git repository https://github.com/tree-sitter/tree-sitter-c
      required by package `tcount v0.1.0 (/Users/pthrasher/.cargo/git/checkouts/tcount-92c5348212cf527d/71638e6)
    
    opened by pthrasher 1
Owner
Adam P. Regasz-Rethy
Adam P. Regasz-Rethy
Tokei is a program that displays statistics about your code.

Tokei is a program that displays statistics about your code. Tokei will show the number of files, total lines within those files and code, comments, and blanks grouped by language.

null 7.5k Jan 1, 2023
Count number of ifs in your rust project!

A long awaited solution for a widely encountered problem! The will count the number of ifs in your rust project! (it can also collect some other numer

⭐️NINIKA⭐️ 4 Sep 21, 2023
Game of life rendered in your terminal with over 500+ unique patterns to choose from.

Controls a: play animation n: next generation s: stop j or down arrow: go down next pattern (note: you have to stop the animation to browse the patter

Omar Magdy 20 Dec 22, 2022
try to find the correct word with only first letter and unknown letter count.

MOTUS Current dictionaries are provided in french and can contain some words not included in the official Motus dictionary. Additionally, dictionaries

Alexandre 6 Apr 11, 2022
A tool for determining file types, an alternative to file

file-rs a tool for determining file types, an alternative to file whats done determining file extension determining file type determining file's mime

null 3 Nov 27, 2022
👑 Show in-organization ranking of GitHub activities such as review count.

gh-ranking Show in-organization ranking of GitHub activities such as review count. Installation gh extension install yukukotani/gh-ranking Usage USAG

Yuku Kotani 3 Dec 28, 2022
Shellfirm - Intercept any risky patterns (default or defined by you) and prompt you a small challenge for double verification

shellfirm Opppppsss you did it again? ?? ?? ?? Protect yourself from yourself! rm -rf * git reset --hard before saving? kubectl delete ns which going

elad 652 Dec 29, 2022
A parser and matcher for route patterns in Rust 🦀

Route Pattern A parser and matcher for a popular way to create route patterns. Patterns like these that include regular expressions, delimited in this

Dotan J. Nahum 3 Nov 24, 2022
A creator library for procedural 2D noises and patterns in Rust.

A curated list of common 2D noises and patterns in computer graphics. Mostly taken from implementations on Shadertoy. All implementations are under th

Markus Moenig 3 Nov 14, 2022
Service-Oriented Design Patterns for Rust

SOD: Service-Oriented Design Overview This crate provides Service, MutService, and AsyncService traits and associated utilities to facilitiate service

Eric Thill 3 Apr 26, 2023