htmlq
Like jq, but for HTML. Uses CSS selectors to extract bits content from HTML files. Mozilla's MDN has a good reference for CSS selector syntax.
Usage
$ htmlq -h
htmlq 0.0.1
Runs CSS selectors on HTML
USAGE:
htmlq [FLAGS] [OPTIONS] ...
FLAGS:
-h, --help Prints help information
-w, --ignore-whitespace When printing text nodes, ignore those that consist entirely of whitespace
-p, --pretty Pretty-print the serialised output
-t, --text Output only the contents of text nodes inside selected elements
-V, --version Prints version information
OPTIONS:
-a, --attribute Only return this attribute (if present) from selected elements
-f, --filename The input file. Defaults to stdin
-o, --output The output file. Defaults to stdout
ARGS:
... The CSS expression to select
$
Examples
Using with cURL to find part of a page by ID ">
$ curl -s https://www.rust-lang.org/ | htmlq '#get-help' <div class="four columns mt3 mt0-l" id="get-help"> <h4>Get help!</h4> <ul> <li>"https://doc.rust-lang.org">Documentation</a>> <li>"https://users.rust-lang.org">Ask a Question on the Users Forum</a>> <li>"http://ping.rust-lang.org">Check Website Status</a>> </ul> <div class="languages"> <label class="hidden" for="language-footer">Language</label> <select id="language-footer"> <option title="English (US)" value="en-US">English (en-US)</option> <option title="French" value="fr">Français (fr)</option> <option title="German" value="de">Deutsch (de)</option> </select> </div> </div>