Like jq, but for HTML. Uses CSS selectors to extract bits content from HTML files.

Related tags

Utilities htmlq
Overview

htmlq

Like jq, but for HTML. Uses CSS selectors to extract bits content from HTML files. Mozilla's MDN has a good reference for CSS selector syntax.

Usage

$ htmlq -h
htmlq 0.0.1
Runs CSS selectors on HTML

USAGE:
    htmlq [FLAGS] [OPTIONS] ...

FLAGS:
    -h, --help                 Prints help information
    -w, --ignore-whitespace    When printing text nodes, ignore those that consist entirely of whitespace
    -p, --pretty               Pretty-print the serialised output
    -t, --text                 Output only the contents of text nodes inside selected elements
    -V, --version              Prints version information

OPTIONS:
    -a, --attribute     Only return this attribute (if present) from selected elements
    -f, --filename           The input file. Defaults to stdin
    -o, --output             The output file. Defaults to stdout

ARGS:
    ...    The CSS expression to select
$

Examples

Using with cURL to find part of a page by ID

Get help!

">
$ curl -s https://www.rust-lang.org/ | htmlq '#get-help'
<div class="four columns mt3 mt0-l" id="get-help">
        <h4>Get help!</h4>
        <ul>
          <li>"https://doc.rust-lang.org">Documentation</a>>
          <li>"https://users.rust-lang.org">Ask a Question on the Users Forum</a>>
          <li>"http://ping.rust-lang.org">Check Website Status</a>>
        </ul>
        <div class="languages">
            <label class="hidden" for="language-footer">Language</label>
            <select id="language-footer">
                <option title="English (US)" value="en-US">English (en-US)</option>
<option title="French" value="fr">Français (fr)</option>
<option title="German" value="de">Deutsch (de)</option>

            </select>
        </div>
      </div>

Find all the links in a page

$ curl -s https://www.rust-lang.org/ | htmlq -a href a
/
/tools/install
/learn
/tools
/governance
/community
https://blog.rust-lang.org/
/learn/get-started
https://blog.rust-lang.org/2019/04/25/Rust-1.34.1.html
https://blog.rust-lang.org/2018/12/06/Rust-1.31-and-rust-2018.html
[...]
$

Get the text content of a post

$ curl -s https://nixos.org/nixos/about.html | htmlq  -t .main

          About NixOS

NixOS is a GNU/Linux distribution that aims to
improve the state of the art in system configuration management.  In
existing distributions, actions such as upgrades are dangerous:
upgrading a package can cause other packages to break, upgrading an
entire system is much less reliable than reinstalling from scratch,
you can’t safely test what the results of a configuration change will
be, you cannot easily undo changes to the system, and so on.  We want
to change that.  NixOS has many innovative features:

[...]

Pretty print HTML

(This is a bit of a work in progress)

I write about...

Issues
  • Homebrew Formula?

    Homebrew Formula?

    Is there a brew formula available to install this?

    opened by Olshansk 13
  • Improve display of code-blocks in `README.md`

    Improve display of code-blocks in `README.md`

    This PR combines a bunch of cosmetic enhancements to the readme file, plus an extra example to showcase how bat can be used to add syntax highlighting.

    The latter includes a screenshot that's uploaded as a file attachment to https://github.com/mgdm/htmlq/pull/17#issuecomment-915206105, since I didn't feel comfortable asking the author to accept a 19 KB blob of uncompressible image data in addition to a bunch of lightweight improvements.

    opened by Alhadis 4
  • Binary release

    Binary release

    Hey, it would be nice to be able to just download a binary without setting up complete Rust toolchain. :-)

    opened by piranha 4
  • Add support for syntax highlighting

    Add support for syntax highlighting

    It would be useful for people to interact with the CLI.

    For example:

    curl -sL google.com | htmlq -h -p html body \#gbar
    

    2021-09-08 20-09-51

    opened by jihchi 4
  • add a binary build github workflow

    add a binary build github workflow

    Hi! Thanks for writing such a great tool. I've added a GitHub action to automatically build the tool for windows, mac, and linux on x86_64; it runs for any tag of the form v<semver> (e.g. v1.0.0.) Pushing the tag to the repo starts the workflow, which creates a draft release and attaches binaries for each of the platforms and architectures. I've also attached a gif of the process of cutting a release as some added documentation for how it works!

    Fixes #6.

    htmlq

    opened by chrisdickinson 3
  • How to install this

    How to install this

    Could you please write a couple of lines, in the README.md how one can download, compile and install this? As it is now, I believe that only a Rust native knows how to install this.

    opened by alexanderkoponen 3
  • Add option for converting relative href to absolute.

    Add option for converting relative href to absolute.

    In the example curl -s https://www.rust-lang.org/ | htmlq -a href a the links are output as-is, for example, /policies. In order to use this with other tools, it would be useful to make these links absolute. For example, curl -s https://www.rust-lang.org/ | htmlq -u https://www.rust-lang.org/ -a href a would results in https://www.rust-lang.org/policies (i.e. any relative href attributes are converted to absolute using the base url specified with -u).

    opened by Chaz6 3
  • [Feature request]

    [Feature request]

    I will truly appreciate an invert selector, something which will display everything else other than that. It will be very useful for excluding and remove weird javascript and google ads

    opened by Aeres-u99 2
  • Add specific permissions to workflows under .github/workflows

    Add specific permissions to workflows under .github/workflows

    This PR adds specific permissions to the existing workflows under .github/workflows.

    Background

    I have implemented a GitHub App to automatically restrict permissions for the GITHUB_TOKEN in workflows. This is a security best practice as per the GitHub Actions hardening guide.

    I am trying the App out on public repositories, by forking them, installing the App on the fork, and manually creating PRs with the fixed workflows. The App automatically fixes permissions when a PR is created that creates a new workflow, so feel free to install it for future workflows, or try it out on other repos.

    I have manually reviewed the changes, and they do look good to me. If something looks off, please let me know. If you have feedback, would love to hear it. Thanks!

    opened by varunsh-coder 2
  • Add Windows Support

    Add Windows Support

    Tried to install this under windows (not WSL2) and it fails to compile:

    error: failed to build archive: function not supported
    
    error: aborting due to previous error
    
    error: could not compile `rand_core`
    
    

    Any chance you could add Windows support or just release a windows binary?

    opened by dmoath 2
Owner
Michael Maclean
Michael Maclean
List of Persian Colors and hex colors for CSS, SCSS, PHP, JS, Python, and Ruby.

Persian Colors (Iranian colors) List of Persian Colors and hex colors for CSS, SCSS, PHP, C++, QML, JS, Python, Ruby and CSharp. Persian colors Name H

Max Base 10 Nov 21, 2021
The tool like Browserslist, but written in Rust.

browserslist-rs The tool like Browserslist, but written in Rust. Try it out Before trying this crate, you're required to get Rust installed. Then, clo

Pig Fang 54 Nov 30, 2021
Generate an HTML page based on a Notion document

Notion Generator Generate an HTML page based on a Notion document! Still a bit of a work in progress, but I am about to actually use it for some actua

null 1 Nov 22, 2021
A simplified but faster version of Routerify

Routerify lite Routerify-lite is a simplified but faster version of Routerify. It only provides below functions: path matching error handling Why not

jinhua luo 6 Aug 24, 2021
Catify, but built in rust

A simple project to prettify commit messages NOW IN RUST! Commit messages are good. They provide information about tons of things. But far too many co

Alecto Irene Perez 2 Oct 20, 2021
A lightning fast version of tmux-fingers written in Rust, copy/pasting tmux like vimium/vimperator

tmux-thumbs A lightning fast version of tmux-fingers written in Rust for copy pasting with vimium/vimperator like hints. Usage Press ( prefix + Space

Ferran Basora 445 Nov 21, 2021
hado-rshado — A little macro for writing haskell-like do expressions without too much ceremony

hado Monadic haskell-like expressions brought to rust via the hado! macro What? A little macro for writing haskell-like do expressions without too muc

Lucas David Traverso 43 Jul 3, 2021
Golang like WaitGroup implementation for sync/async Rust.

wg Golang like WaitGroup implementation for sync/async Rust.

Al Liu 4 Nov 30, 2021
Concatenate Amazon S3 files remotely using flexible patterns

S3 Concat This tool has been migrated into s3-utils, please use that crate for future updates. A small utility to concatenate files in AWS S3. Designe

Isaac Whitfield 31 May 22, 2021
Czkawka is a simple, fast and easy to use app to remove unnecessary files from your computer.

Multi functional app to find duplicates, empty folders, similar images etc.

Rafał Mikrut 5.2k Nov 24, 2021
Rust crate which provides direct access to files within a Debian archive

debarchive This Rust crate provides direct access to files within a Debian archive. This crate is used by our debrep utility to generate the Packages

Pop!_OS 9 Oct 26, 2021
Rust crate for reading SER files used in astrophotography

Rust crate for reading SER files used in astrophotography.

Andy Grove 2 Oct 4, 2021
Spot coupling by finding out which files are always in the same commit

git moves-together This tells you when files in the repository frequently move together. This lets you identify where the coupling is in the system. C

Billie Thompson 14 Nov 16, 2021
File Tree Fuzzer allows you to create a pseudo-random directory hierarchy filled with some number of files.

FTZZ File Tree Fuzzer allows you to create a pseudo-random directory hierarchy filled with some number of files. Installation $ cargo +nightly install

Alex Saveau 2 Nov 24, 2021
Teleport is a simple application for sending files from Point A to Point B

Teleporter Teleporter is a small utility in the vein of netcat to send files quickly from point A to point B. It is more convenient than netcat in tha

geno 7 Nov 26, 2021
Clean up the lines of files in your code repository

lineman Clean up the lines of files in your code repository NOTE: While lineman does have tests in place to ensure it operates in a specific way, I st

Joseph T. Lyons 4 Nov 5, 2021
A Rust library to extract useful data from HTML documents, suitable for web scraping.

select.rs A library to extract useful data from HTML documents, suitable for web scraping. NOTE: The following example only works in the upcoming rele

Utkarsh Kukreti 745 Nov 26, 2021
Like grep, but uses tree-sitter grammars to search

tree-grepper Works like grep, but uses tree-sitter to search for structure instead of strings. Installing This isn't available packaged anywhere. That

Brian Hicks 100 Nov 19, 2021
Command line tool to extract various data from Blender .blend files

blendtool Command line tool to extract various data from Blender .blend files. Currently supports dumping Eevee irradiance volumes to .dds, new featur

null 2 Sep 26, 2021
Ointers is a library for representing pointers where some bits have been stolen so that they may be used by the programmer for something else

Ointers is a library for representing pointers where some bits have been stolen so that they may be used by the programmer for something else. In effect, it's a small amount of free storage

Irrustible 5 Aug 14, 2021
This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCSS AST.

CSS(less like) parser written in rust (WIP) This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCS

Huang Liuhaoran 18 Oct 27, 2021
This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCSS AST. Very early stage, do not use in production.

CSS(less like) parser written in rust (WIP) This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCS

Huang Liuhaoran 18 Oct 27, 2021
Simple and portable (but not inflexible) GUI library in C that uses the native GUI technologies of each platform it supports.

libui: a portable GUI library for C This README is being written. Status It has come to my attention that I have not been particularly clear about how

Pietro Gagliardi 10.1k Nov 21, 2021
Extract tokens by simple condition expression.

Condex Extract tokens by simple condition expression. | Docs | Latest Note | [dependencies]

Doha Lee 1 Nov 1, 2021
A Comprehensive Web Fuzzer and Content Discovery Tool

rustbuster A Comprehensive Web Fuzzer and Content Discovery Tool Introduction Check the blog post: Introducing Rustbuster — A Comprehensive Web Fuzzer

Francesco Soncina 378 Nov 22, 2021
A fast, simple, recursive content discovery tool written in Rust.

A simple, fast, recursive content discovery tool written in Rust ?? Releases ✨ Example Usage ✨ Contributing ✨ Documentation ?? ?? What the heck is a f

epi 2k Nov 29, 2021
:large_orange_diamond: Build beautiful terminal tables with automatic content wrapping

Comfy-table Comfy-table tries to provide utility for building beautiful tables, while being easy to use. Features: Dynamic arrangement of content to a

Arne Beer 277 Nov 26, 2021
Rust crate for configurable parallel web crawling, designed to crawl for content

url-crawler A configurable parallel web crawler, designed to crawl a website for content. Changelog Docs.rs Example extern crate url_crawler; use std:

Pop!_OS 56 Aug 22, 2021
A CLI application which allows you to archive Urbit channels and all linked content in them.

The Urbit Content Archiver is a small CLI application that exports channels from your Urbit ship and auto-downloads any directly linked content locall

Robert Kornacki 27 Oct 30, 2021