jq, but for HTML

Tom Forbes

Last update: Jan 5, 2023

Related tags

Web programming hq

Overview

hq

jq, but for HTML.

hq reads HTML and converts it into a JSON object based on a series of CSS selectors. The selectors are expressed in a similar way to JSON, but where the values are CSS selectors. For example:

{posts: .athing | [ {title: .titleline > a, url: .titleline > a | @(href)} ] }

This will select all .athing elements, and it will create an array (| [{...}]) of objects for each element selected. Then for each element it will select the text of the titleline > a element, and the href attribute (| @(href)).

The end result is the following structure:

{
  "posts": [
    {
      "title": "...",
      "url": "..."
    }
  ]
}

Install

cargo install html-query

Examples

Full hacker news story extraction

{posts: .athing | [{href: .titleline > a | @(href), title: .titleline > a, meta: @sibling(1) | {user: .hnuser, posted: .age | @(title) }}]}

This selects each .athing element, extracts the URL from the href attribute as well as the title. It then selects the sibling .athing element, and extracts the user and post time from that:

{
  "posts": [
    {
      "title": "...",
      "url": "...",
      "meta": {
        "posted": "...",
        "user": "..."
      }
    }
  ]
}

Special query syntax

Selecting attributes

.foo | @(href)

This will select the href attribute from the first element matching .foo.

Parents

.foo | @parent

This will return the parent element from the first element matching .foo.

Siblings

.foo | @sibling(1)

This will return the sibling element from the first element matching .foo.

Comments

Add a trim command

hey! tried to use the tool to scrape a table out of a page :/ I did hq '#<ID_OF_TABLE> > tbody and the output was not valid json :( I needed to run a pre-trim and post-trim on the data

if you can add a trim module that will remove pre and post whitespaces that would be great (I could ask the jq team the same but starting here)

opened by othorotka 2
Bump clap from 4.0.29 to 4.0.30
Bumps clap from 4.0.29 to 4.0.30.

Release notes

Sourced from clap's releases.

v4.0.30

[4.0.30] - 2022-12-21

Fixes

(error) Improve error for args_conflicts_with_subcommand

Changelog

Sourced from clap's changelog.

[4.0.30] - 2022-12-21

Fixes

(error) Improve error for args_conflicts_with_subcommand

Commits

d2d0222 chore: Release

56a0bb6 docs: Update changelog

b941a3e Merge pull request #4567 from epage/error

453ac0b fix(parser): Be less confusing with args/subcommand conflicts

2a374db test(parser): Show bad behavior

f632424 test(parser): Consolidate args_conflicts_with tests

a72f962 docs(builder): Escape non-tags

ac48e2d docs: Make less brittle for rust versions

a3381a2 docs(readme): Fix build status badge (#4559)

aa54204 Merge pull request #4555 from epage/reset

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies rust
opened by dependabot[bot] 1
Bump scraper from 0.13.0 to 0.14.0
Bumps scraper from 0.13.0 to 0.14.0.

Release notes

Sourced from scraper's releases.

0.14.0

What's Changed

Update dependencies by @teymour-aldridge in causal-agent/scraper#81

Add a test for tags with newline. by @teymour-aldridge in causal-agent/scraper#82

Implement serializer for Html by @TonalidadeHidrica in causal-agent/scraper#86

refactor: Make selectors field private by @volsa in causal-agent/scraper#87

implement DoubleEndedIterator for Select by @arctic-penguin in causal-agent/scraper#96

An Error Type for Selector::parse by @Kiwifuit in causal-agent/scraper#95

New Contributors

@TonalidadeHidrica made their first contribution in causal-agent/scraper#86

@volsa made their first contribution in causal-agent/scraper#87

@arctic-penguin made their first contribution in causal-agent/scraper#96

@Kiwifuit made their first contribution in causal-agent/scraper#95

Full Changelog: https://github.com/causal-agent/scraper/compare/v0.13.0...v0.14.0

Commits

24baef8 Version 0.14.0

3fba089 Merge pull request #95 from Kiwifuit/master

78b4d53 Removed .clone() for comment

9f2dbeb Reverted a change in render_token

8fdaaf6 Changed num.to_string()

5cb378a Resolved (some of) Clippy's warnings

7f4cf0f Switched the match block for .map_err()

a7452bc Removed caps-lock on line 100

be5d791 .clone()'d the data when rendering Tokens

24d2e4d Merge pull request #96 from arctic-penguin/master

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies rust
opened by dependabot[bot] 1
Bump thiserror from 1.0.37 to 1.0.38
Bumps thiserror from 1.0.37 to 1.0.38.

Release notes

Sourced from thiserror's releases.

1.0.38

Documentation improvements

Commits

74bfe75 Release 1.0.38

cfc7d8c Update build status badge

db78fa2 Update ui test suite to nightly-2022-12-15

c25a710 Time out workflows after 45 minutes

464e2e7 Merge pull request #200 from dtolnay/displayattr

4b06a3e Add test of Display impl nested inside display attribute

29ee95e Ui test changes for trybuild 1.0.66

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies rust
opened by dependabot[bot] 1
Bump serde_json from 1.0.89 to 1.0.91
Bumps serde_json from 1.0.89 to 1.0.91.

Release notes

Sourced from serde_json's releases.

v1.0.90

Documentation improvements

Commits

26f147f Release 1.0.91

d9cdb98 Opt out -Zrustdoc-scrape-examples on docs.rs

331511d Release 1.0.90

8753829 Replace ancient CI service provider in readme

0a43394 Update build status badge

8794844 Prevent build.rs rerunning unnecessarily on all source changes

0b54871 Time out workflows after 45 minutes

ecad462 Fix renamed let_underscore_drop lint

9295c96 Resolve needless_borrowed_reference clippy lints

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies rust
opened by dependabot[bot] 1
Bump anyhow from 1.0.66 to 1.0.68
Bumps anyhow from 1.0.66 to 1.0.68.

Release notes

Sourced from anyhow's releases.

1.0.67

Improve the backtrace captured when context() is used on an Option (#280)

Commits

867763b Release 1.0.68

c0a87d0 Opt out -Zrustdoc-scrape-examples on docs.rs

1cc707b Release 1.0.67

613b261 Update build status badge

0f922d7 Disable backtrace CI on Rust 1.50

acecd9b Update ui test suite to nightly-2022-12-15

0bac51f Time out workflows after 45 minutes

60e8800 Fix renamed let_underscore_drop lint

8d1c734 Update ui test suite to nightly-2022-11-16

451651b Update ui test suite to nightly-2022-11-11

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies rust
opened by dependabot[bot] 1
Bump nom from 7.1.1 to 7.1.2
Bumps nom from 7.1.1 to 7.1.2.

Changelog

Sourced from nom's changelog.

7.1.2 - 2023-01-01

Thanks

@joubs

@Fyko

@LoganDark

@darnuria

@jkugelman

@barower

@puzzlewolf

@epage

@cky

@wolthom

@w1ll-i-code

Changed

documentation fixes

tests fixes

limit the initial capacity of the result vector of many_m_n to 64kiB

bits parser now accept Parser implementors instead of only functions

Added

implement Tuple parsing for the unit type as a special case

implement ErrorConvert on the unit type to make it usable as error type for bits parsers

bool parser for bits input

Commits

6be62d3 v7.1.2 (#1605)

6860641 1533 implement bool function for bits (#1534)

6e45c5d Remove duplicated section from error_management.md (#1529)

3c5e08c impl ErrorConvert\<()> for () (#1583)

9c357ed Move the succ! macro to its own file (#1598)

9cff115 Ensure all examples compile (#1604)

b66ff43 fix(bits): Accept Parser, not parser-like functions (#1599)

326344a Clarify role of 'cut' (#1603)

0884bf8 Remove hidden map! and call! macros (#1601)

fadde7c Ensure prop tests treat inf/nan the same (#1600)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies rust
opened by dependabot[bot] 0
Bump clap from 4.0.29 to 4.0.32
Bumps clap from 4.0.29 to 4.0.32.

Release notes

Sourced from clap's releases.

v4.0.32

[4.0.32] - 2022-12-22

Fixes

(parser) When overriding required(true), consider args that conflict with its group

v4.0.31

[4.0.31] - 2022-12-22

Performance

Speed up parsing when a lot of different flags are present (100 unique flags)

v4.0.30

[4.0.30] - 2022-12-21

Fixes

(error) Improve error for args_conflicts_with_subcommand

Changelog

Sourced from clap's changelog.

[4.0.32] - 2022-12-22

Fixes

(parser) When overriding required(true), consider args that conflict with its group

[4.0.31] - 2022-12-22

Performance

Speed up parsing when a lot of different flags are present (100 unique flags)

[4.0.30] - 2022-12-21

Fixes

(error) Improve error for args_conflicts_with_subcommand

Commits

ec4ccf0 chore: Release

13fdb83 docs: Update changelog

b877345 Merge pull request #4573 from epage/conflict

85ecb3e fix(parser): Override required when parent group has conflict

d145b8b test(parser): Demonstrate required-overload bug

0eccd55 chore: Release

1e37c25 docs: Update changelog

dcd5fec Merge pull request #4572 from epage/group

dde22e7 style: Update for latest clippy

dd8435d perf(parser): Reduce duplicate lookups

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies rust
opened by dependabot[bot] 0
Bump pest_derive from 2.5.1 to 2.5.2
Bumps pest_derive from 2.5.1 to 2.5.2.

Release notes

Sourced from pest_derive's releases.

v2.5.2

What's Changed

Allow use of rust keywords as pest rules by @DvvCz in pest-parser/pest#750

Add Unicode Script into built-in rules. by @huacnlee in pest-parser/pest#751

New Contributors

@DvvCz made their first contribution in pest-parser/pest#750

Full Changelog: https://github.com/pest-parser/pest/compare/v2.5.1...v2.5.2

Happy Holidays and Best Wishes for 2023! ☃️🎄 🎆

Commits

024b857 bump version to 2.5.2

25ba0a2 Add Unicode Script into built-in rules. (#751)

2c47201 Allow use of rust keywords as pest rules (#750)

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies rust
opened by dependabot[bot] 0
Bump pest from 2.5.1 to 2.5.2
⚠️ Dependabot is rebasing this PR ⚠️

Rebasing might not happen immediately, so don't worry if this takes some time.

Note: if you make any changes to this PR yourself, they will take precedence over the rebase.

Bumps pest from 2.5.1 to 2.5.2.

Commits

024b857 bump version to 2.5.2

25ba0a2 Add Unicode Script into built-in rules. (#751)

2c47201 Allow use of rust keywords as pest rules (#750)

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies rust
opened by dependabot[bot] 0
Consider adding a license file

Project license is specified as MIT in Cargo.toml:

https://github.com/orf/hq/blob/a855a728959d849653a44d170689c9d999cd0e45/Cargo.toml#L7

But LICENSE file is not present in the repository. You might consider adding it.

opened by orhun 0
trim command :/

hey! tried to use the tool to scrape a table out of a page :/ I did hq '#<ID_OF_TABLE> > tbody and the output was not valid json :( I needed to run a pre-trim and post-trim on the data

if you can add a trim module that will remove pre and post whitespaces that would be great (I could ask the jq team the same but starting here)

opened by georgettica 3