Rust crate for scraping URLs from HTML pages

Overview

url-scraper

Rust crate for scraping URLs from HTML pages.

Example

extern crate url_scraper;
use url_scraper::UrlScraper;

fn main() {
    let directory = "http://phoronix.com/";

    let scraper = UrlScraper::new(directory).unwrap();
    for (text, url) in scraper.into_iter() {
        println!("{}: {}", text, url);
    }
}
You might also like...
`prometheus` backend for `metrics` crate
`prometheus` backend for `metrics` crate

metrics + prometheus = ❤️ API Docs | Changelog prometheus backend for metrics crate. Motivation Rust has at least two ecosystems regarding metrics col

Crate extending futures stream combinators, that is adding precise rate limiter
Crate extending futures stream combinators, that is adding precise rate limiter

stream-rate-limiter Stream combinator .rate_limiter(opt: RateLimitOptions) Provides way to limit stream element rate with constant intervals. It adds

Minimal self-contained crate to accept file descriptors from systemd

sd-listen-fds Exposes file descriptors passed in by systemd, the Linux init daemon, without dependencies, foreign or otherwise. Enables easy implement

Grow Rust is a Growtopia Private Server made in Rust

Grow Rust is a Growtopia Private Server made in Rust

Multiplex server for rust-analyzer, allows multiple LSP clients (editor windows) to share a single rust-analyzer instance per cargo workspace

ra-multiplex   Multiplex server for rust-analyzer, allows multiple LSP clients (editor windows) to share a single rust-analyzer instance per cargo wor

DNS Server written in Rust for fun, see https://dev.to/xfbs/writing-a-dns-server-in-rust-1gpn

DNS Fun Ever wondered how you can write a DNS server in Rust? No? Well, too bad, I'm telling you anyways. But don't worry, this is going to be a fun o

FTP client for Rust

rust-ftp FTP client for Rust Documentation rust-ftp Installation Usage License Contribution Development environment Installation FTPS support is achie

The gRPC library for Rust built on C Core library and futures

gRPC-rs gRPC-rs is a Rust wrapper of gRPC Core. gRPC is a high performance, open source universal RPC framework that puts mobile and HTTP/2 first. Sta

A library to work with CIDRs in rust

ipnetwork This is a library to work with IPv4 and IPv6 CIDRs in Rust Run Clippy by doing rustup component add clippy cargo clippy Installation This c

Comments
  • Cannot Build: Error Compiling CSS Parser

    Cannot Build: Error Compiling CSS Parser

    I have tried to reproduce the code exactly in the url_scraper documentation, but am unable to build the program.

    I think it is an error with the cssparser dependency. Any ideas how to fix?

    Error:

       Compiling cssparser v0.24.1
    error[E0506]: cannot assign to `self.input.cached_token` because it is borrowed
       --> /home/jared/.cargo/registry/src/github.com-1ecc6299db9ec823/cssparser-0.24.1/src/parser.rs:572:17
        |
    547 |     pub fn next_including_whitespace_and_comments(&mut self) -> Result<&Token<'i>, BasicParseError<'i>> {
        |                                                   - let's call the lifetime of this reference `'1`
    ...
    560 |             Some(ref cached_token)
        |                  ---------------- borrow of `self.input.cached_token` occurs here
    ...
    572 |                 self.input.cached_token = Some(CachedToken {
        |                 ^^^^^^^^^^^^^^^^^^^^^^^ assignment to borrowed `self.input.cached_token` occurs here
    ...
    584 |         Ok(token)
        |         --------- returning this value requires that `self.input.cached_token.0` is borrowed for `'1`
    
    error: aborting due to previous error
    
    For more information about this error, try `rustc --explain E0506`.
    error: could not compile `cssparser`.
    warning: build failed, waiting for other jobs to finish...
    error: build failed
    

    To Reproduce:

    My main.rs is:

    extern crate url_scraper;
    use url_scraper::UrlScraper;
    
    fn main() {
        let directory = "http://phoronix.com/";
    
        let scraper = UrlScraper::new(directory).unwrap();
        for (text, url) in scraper.into_iter() {
            println!("{}: {}", text, url);
        }
    }
    

    My Cargo.tom dependencies are:

    [dependencies]
    url-scraper = "0.1.1"
    
    opened by jaredforth 1
  • Updating scraper

    Updating scraper

    error[E0506]: cannot assign to `self.input.cached_token` because it is borrowed
       --> /home/callen/.cargo/registry/src/github.com-1ecc6299db9ec823/cssparser-0.24.1/src/parser.rs:572:17
        |
    547 |     pub fn next_including_whitespace_and_comments(&mut self) -> Result<&Token<'i>, BasicParseError<'i>> {
        |                                                   - let's call the lifetime of this reference `'1`
    ...
    560 |             Some(ref cached_token)
        |                  ---------------- borrow of `self.input.cached_token` occurs here
    ...
    572 |                 self.input.cached_token = Some(CachedToken {
        |                 ^^^^^^^^^^^^^^^^^^^^^^^ assignment to borrowed `self.input.cached_token` occurs here
    ...
    584 |         Ok(token)
        |         --------- returning this value requires that `self.input.cached_token.0` is borrowed for `'1`
    
    error: aborting due to previous error
    
    

    This compile error broke downstream: such as url-crawler. This version bump seems to clear it up.

    opened by bitemyapp 0
  • Bump scraper version

    Bump scraper version

    Referring to this issue I think this might solve compilation issues. I don't anticipate this, but you might wanna double check this doesn't break anything.

    opened by jicee13 0
Owner
Pop!_OS
An Operating System by System76
Pop!_OS
A simple tool in Rust to split urls in their protocol, host, port, path and query parts.

rexturl A simple tool to split urls in their protocol, host, port, path and query parts. Install cargo install rexturl or clone the source code and r

Volker Schwaberow 3 Oct 22, 2022
Rust crate for configurable parallel web crawling, designed to crawl for content

url-crawler A configurable parallel web crawler, designed to crawl a website for content. Changelog Docs.rs Example extern crate url_crawler; use std:

Pop!_OS 56 Aug 22, 2021
Safe Rust crate for creating socket servers and clients with ease.

bitsock Safe Rust crate for creating socket servers and clients with ease. Description This crate can be used for Client <--> Server applications of e

Lorenzo Torres 3 Nov 25, 2021
Dav-server-rs - Rust WebDAV server library. A fork of the webdav-handler crate.

dav-server-rs A fork of the webdav-handler-rs project. Generic async HTTP/Webdav handler Webdav (RFC4918) is defined as HTTP (GET/HEAD/PUT/DELETE) plu

messense 30 Dec 29, 2022
The netns-rs crate provides an ultra-simple interface for handling network namespaces in Rust.

netns-rs The netns-rs crate provides an ultra-simple interface for handling network namespaces in Rust. Changing namespaces requires elevated privileg

OpenAnolis Community 7 Dec 15, 2022
Rust utility crate for parsing, encoding and generating x25519 keys used by WireGuard

WireGuard Keys This is a utility crate for parsing, encoding and generating x25519 keys that are used by WireGuard. It exports custom types that can b

Fractal Networks 3 Aug 9, 2022
Rust crate providing a variety of automotive related libraries, such as communicating with CAN interfaces and diagnostic APIs

The Automotive Crate Welcome to the automotive crate documentation. The purpose of this crate is to help you with all things automotive related. Most

I CAN Hack 29 Mar 11, 2024
Library + CLI-Tool to measure the TTFB (time to first byte) of HTTP requests. Additionally, this crate measures the times of DNS lookup, TCP connect and TLS handshake.

TTFB: CLI + Lib to Measure the TTFB of HTTP/1.1 Requests Similar to the network tab in Google Chrome or Mozilla Firefox, this crate helps you find the

Philipp Schuster 24 Dec 1, 2022
A crate for parsing HTTP rate limit headers as per the IETF draft

rate-limits A crate for parsing HTTP rate limit headers as per the IETF draft. Inofficial implementations like the Github rate limit headers are also

Matthias 3 Jul 9, 2022
⏱ Cross-platform Prometheus style process metrics collector of metrics crate

⏱ metrics-process This crate provides Prometheus style process metrics collector of metrics crate for Linux, macOS, and Windows. Collector code is man

Alisue 12 Dec 16, 2022