A html document syntax and operation library written in Rust, use APIs similar to jQuery.

Overview

Visdom

Build Status crates.io tag GitHub license

A server-side html document syntax and operation library written in Rust, it uses apis similar to jQuery, left off the parts thoes only worked in browser(e.g. render and event related methods), and use names with snake-case instead of camel-case in javasript.

It's not only helpful for the working with web scraping, but also supported useful apis to operate text nodes, so you can use it to mix your html with dirty html segement to keep away from web scrapers.

Usage

中文 API 文档    CHANGELOG    Live Demo

main.rs

use visdom::Vis;
use std::error::Error;

fn main()-> Result<(), Box<dyn Error>>{
  let html = r##"
    
    
      
        
      
      
        
      
    
  "##;
  // load html
  let nodes = Vis::load(html)?;
  let lis_text = nodes.find("#header li").text();
  println!("{}", lis_text);
  // will output "Hello,VisDom"
  Ok(())
}

Vis

Static method:load(html: &str) -> Result>

Load the `html` string into an `Elements` collection.

Static method:load_catch(html: &str, handle: Box)>) -> Elements

Load the `html` string into an `Elements` collection, and use the handle to do with the errors such as html parse error, wrong selectors, this is useful if you don't want the process is paniced by the errors.

Static method:load_options(html: &str, options: html::ParseOptions) -> Result>

This method allowed you to define the parse options when parsing the `html` string into a document tree, the `load` method is just an alias method of this,  with the most compatible parse options parameter.
// the `load` and `load_catch` use the parse options as below
// more about the `ParseOptions`, you can see the document of `rphtml` library.
ParseOptions{
  auto_fix_unclosed_tag: true,
  auto_fix_unexpected_endtag: true,
  auto_fix_unescaped_lt: true,
  allow_self_closing: true,
}

Static method:load_options_catch(html: &str, options: html::ParseOptions, handle: Box)>) -> Elements

It's same as `load` and `load_options` methods, just exposed a parse options parameter so that you can define how to resolve errors when parsing html.

Static method:dom(ele: &BoxDynElement) -> Elements

Change the `ele` node to single node `Elements`, this will copy the `ele`, you don't need it if you just need do something with methods of the `BoxDynElement` its'own.

e.g.:

// go on the code before
let texts = lis.map(|_index, ele|{
  let ele = Vis::dom(ele);
  return String::from(ele.text());
});
// now `texts` will be a `Vec`: ["Hello,", "Vis", "Dom"]

API

The following API are inherited from the library mesdoc

Trait methods

Instance Trait Inherit Document
BoxDynNode INodeTrait None INodeTrait Document
BoxDynElement IElementTrait INodeTrait IElementTrait Document
BoxDynText ITextTrait INodeTrait ITextTrait Document
Box IDocumentTrait None IDocumentTrait Document

Collections APIs

Collections Document
Elements Elements Document
Texts Texts Document

Selector Operation

Selector API Description Remarks
The caller Self is a Elements, Return Elements Tha all APIs are same with the jQuery library
find(selector: &str) Get the descendants of each element in the Self, filtered by the selector.
filter(selector: &str) Reduce Self to those that match the selector.
filter_by(handle: |index: usize, ele: &BoxDynElement| -> bool) Reduce Self to those that pass the handle function test.
filter_in(elements: &Elements) Reduce Self to those that also in the elements
not(selector: &str) Remove elements those that match the selector from Self.
not_by(handle: |index: usize, ele: &BoxDynElement| -> bool) Remove elements those that pass the handle function test from Self.
not_in(elements: &Elements) Remove elements those that also in the elements from Self.
is(selector: &str) Check at least one element in Self is match the selector.
is_by(handle: |index: usize, ele: &BoxDynElement| -> bool) Check at least one element call the handle function return true.
is_in(elements: &Elements) Check at least one element in Self is also in elements.
is_all(selector: &str) Check if each element in Self are all matched the selector.
is_all_by(handle: |index: usize, ele: &BoxDynElement| -> bool) Check if each element in Self call the handle function are all returned true.
is_all_in(elements: &Elements) Check if each element in Self are all also in elements.
has(selector: &str) Reduce Self to those that have a descendant that matches the selector.
has_in(elements: &Elements) Reduce Self to those that have a descendant that in the elements.
children(selector: &str) Get the children of each element in Self, when the selector is not empty, will filtered by the selector.
parent(selector: &str) Get the parent of each element in Self, when the selector is not empty, will filtered by the selector.
parents(selector: &str) Get the ancestors of each element in Self, when the selector is not empty, will filtered by the selector.
parents_until(selector: &str, filter: &str, contains: bool) Get the ancestors of each element in Self, until the ancestor matched the selector, when contains is true, the matched ancestor will be included, otherwise it will exclude; when the filter is not empty, will filtered by the selector;
closest(selector: &str) Get the first matched element of each element in Self, traversing from self to it's ancestors.
siblings(selector: &str) Get the siblings of each element in Self, when the selector is not empty, will filtered by the selector.
next(selector: &str) Get the next sibling of each element in Self, when the selector is not empty, will filtered by the selector.
next_all(selector: &str) Get all following siblings of each element in Self, when the selector is not empty, will filtered by the selector.
next_until(selector: &str, filter: &str, contains: bool) Get all following siblings of each element in Self, until the sibling element matched the selector, when contains is true, the matched sibling will be included, otherwise it will exclude; when the filter is not empty, will filtered by the selector;
prev(selector: &str) Get the previous sibling of each element in Self, when the selector is not empty, will filtered by the selector.
prev_all(selector: &str) Get all preceding siblings of each element in Self, when the selector is not empty, will filtered by the selector.
prev_until(selector: &str, filter: &str, contains: bool) Get all preceding siblings of each element in Self, until the previous sibling element matched the selector, when contains is true, the matched previous sibling will be included, otherwise it will exclude; when the filter is not empty, will filtered by the selector;
eq(index: usize) Get one element at the specified index.
first() Get the first element of the set,equal to eq(0).
last() Get the last element of the set, equal to eq(len - 1).
slice(range: T) Get a subset specified by a range of indices. e.g.:slice(..3), will match the first three element.
add(eles: Elements) Get a concated element set from Self and eles, it will generate a new element set, take the ownership of the parameter eles, but have no sence with Self

Helpers

Helper API Description Remarks
length() Get the number of Self's element.
is_empty() Check if Self has no element, length() == 0.
for_each(handle: |index: usize, ele: &mut BoxDynElement| -> bool) Iterate over the elements in Self, when the handle return false, stop the iterator. You can also use each if you like less code.
map(|index: usize, ele: &BoxDynElement| -> T) -> Vec Get a collection of values by iterate the each element in Self and call the handle function.

Supported Selectors

Selectors Description Remarks
* MDN Universal Selectors
#id MDN Id Selector
.class MDN Class Selector
p MDN Type Selectors
[attr] MDN Attribute Selectors
[attr=value] See the above.
[attr*=value] See the above.
[attr|=value] See the above.
[attr~=value] See the above.
[attr^=value] See the above.
[attr$=value] See the above.
[attr!=value] jQuery supported, match the element that has an attribute of attr,but it's value is not equal to value.
span > a MDN Child Combinator match the element of a that who's parent is a span
span a MDN Descendant Combinator
span + a MDN Adjacent Sibling Combinator
span ~ a MDN Generic Sibling Combinator
span,a MDN Selector list
span.a Adjoining Selectors match an element that who's tag type is span and also has a class of .a
:empty MDN :empty Pseudo Selectors
:first-child MDN :first-child
:last-child MDN :last-child
:only-child MDN :only-child
:nth-child(nth) MDN :nth-child() nth support keyword odd and even
:nth-last-child(nth) MDN :nth-last-child()
:first-of-type MDN :first-of-type
:last-of-type MDN :last-of-type
:only-of-type MDN :only-of-type
:nth-of-type(nth) MDN :nth-of-type()
:nth-last-of-type(nth) MDN :nth-last-of-type()
:not(selector) MDN :not()
:contains(content) Match the element who's text() contains the content.
:header All title tags,alias of: h1,h2,h3,h4,h5,h6.
:input All form input tags, alias of: input,select,textarea,button.
:submit Form submit buttons, alias of: input\[type="submit"\],button\[type="submit"\].

Attribute Operation

Attribute API Description Remarks
attr(attr_name: &str) -> Option Get an atrribute of key attr_name The return value is an Option Enum IAttrValue, IAttrValue has is_true(), is_str(&str), to_list() methods.
set_attr(attr_name: &str, value: Option<&str>) Set an attribute of key attr_name,the value is an Option<&str>, when the value is None,that means the attribute does'n have a string value, it's a bool value of true.
remove_attr(attr_name: &str) Remove an attribute of key attr_name.
has_class(class_name: &str) -> bool Check if Self's ClassList contains class_name, multiple classes can be splitted by whitespaces.
add_class(class_name: &str) Add class to Self's ClassList, multiple classes can be splitted by whitespaces.
remove_class(class_name: &str) Remove class from Self's ClassList, multiple classes can be splitted by whitespaces.
toggle_class(class_name: &str) Toggle class from Self's ClassList, multiple classes can be splitted by whitespaces.

Content Operation

Content API Description Remarks
text() -> &str Get the text of each element in Self,the html entity will auto decoded.
set_text(content: &str) Set the Self's text, the html entity in content will auto encoded.
html() Get the first element in Self's html.
set_html(content: &str) Set the html to content of each element in Self.
outer_html() Get the first element in Self's outer html.
texts(limit_depth: u32) -> Texts Get the text node of each element in Self, if limit_depth is 0, will get all the descendant text nodes; if 1, will just get the children text nodes.Texts not like Elements, it doesn't have methods by implemented the IElementTrait trait, but it has append_text and prepend_text methods by implemented the ITextTrait.

DOM Operation

DOM Insertion and Remove API Description Remarks
append(elements: &Elements) Append all elements into Self, after the last child
append_to(elements: &mut Elements) The same as the above,but exchange the caller and the parameter target.
prepend(elements: &mut Elements) Append all elements into Self, befpre the first child
prepend_to(elements: &mut Elements) The same as the above,but exchange the caller and the parameter target.
insert_after(elements: &mut Elements) Insert all elements after Self
after(elements: &mut Elements) The same as the above,but exchange the caller and the parameter target.
insert_before(elements: &mut Elements) Insert all elements before Self
before(elements: &mut Elements) The same as the above,but exchange the caller and the parameter target.
remove() Remove the Self, it will take the ownership of Self, so you can't use it again.
empty() Clear the all childs of each element in Self.

Example

let html = r##"
  
"##; let root = Vis::load(html)?; let mut container = root.find("#container"); let mut second_child = root.find(".second-child"); // append the `second-child` element to the `container` container.append(&mut second_child); // then the code become to below /*
*/ // create new element by `Vis::load` let mut third_child = Vis::load(r##"
"##
)?; container.append(&mut third_child); // then the code become to below /*
*/

Depedencies

Questions & Advices & Bugs?

Welcome to report Issue to us if you have any question or bug or good advice.

License

MIT License.

Comments
  • set_html and replace_with seems not work

    set_html and replace_with seems not work

    my cargo.toml

    visdom = { version = "0.5.8", features = ["insertion", "full"] }

    the function do not work

    fn get_images(root: &mut Elements, no: &str) { root.find("img").for_each(|idx, img| { if let Some(src) = img.get_attribute("data-src") { let src = src.to_string(); let re = Regex::new("wx_fmt=([^&]*)").unwrap(); if let Some(cap) = re.captures(&src) { if let Some(fmt) = cap.get(1) { let ext = fmt.as_str(); let fname = format!("{no}-{idx}.{ext}"); let new_img = format!(""); // let new_img = Vis::load(new_img).unwrap(); // println!("{}", new_img.outer_html()); // let it = new_img.get_ref().iter().next().unwrap(); // println!("{}", it.as_ref().outer_html()); img.set_html(&new_img); // img.as_mut().replace_with(&it); // img.replace_with(it); println!("1. {}", img.outer_html()); println!("2. {}", img.html()); } } } true });

    ()
    

    }

    opened by 1984204066 6
  • How to remove a DOM element?

    How to remove a DOM element?

    Hi! 请问我该如何删除一个符合要求的节点?以及如何增加节点?

    let mut img_list = document.find("img[src]");
    img_list.for_each(|_index, ele| {
      // 如何删除符合要求的节点?
      // How should I remove the element
    });
    let svg = Vis::load("<svg></svg>").unwrap();
    // 如何把节点添加到document中?
    // How should I append this svg to the document?
    
    opened by tctco 6
  • Please add support for html method for all elements

    Please add support for html method for all elements

    Hello, I'd like to get all elements with html tags and attribute. The currently available methods are .html() and .outer_html(). However, these methods only target the first element. The .text() gets all the elements in plain text. Is there a way to achieve this purpose with html?

    opened by Random-G 5
  • 是否有更好的方式获取 select 元素的值?

    是否有更好的方式获取 select 元素的值?

    let html = r##"
    <!doctype html>
    <html>
      <body>
    	
        <select>
    	  <option value="1">1</option>
    	  <option value="2" selected="true">2</option>
    	  <option value="3">3</option>
    	</select>
    
      </body>
    </html>
    "##;
    let doc = Vis::load(html)?;
    
    let select = doc.find("#select");
    println!("select value is {:?}", select.text());
    // select value is "\n        1\n        \n        2\n        \n        3\n        \n        "
    

    如何获取select选中的值?当前代码希望获取为”2”

    opened by zhangxianhong 4
  • Update lib.rs

    Update lib.rs

    Code isn't tested. The current implementation causes problems in cases like the following:

    async fn fetchDocument(url: &String) -> Result<Elements, String> {
        let client = reqwest::Client::new();
        let res = client.get(url).send().await.unwrap();
        let body = res.text().await.unwrap();
    
        let doc = Vis::load(&body).unwrap();
    
        Ok(doc)
    }
    

    Error:

    error: cannot return value referencing local variable `body`
    

    Something like the proposed changes should solve the problem.

    opened by John0x 3
  • 如何获取节点的tag name(节点名称)?

    如何获取节点的tag name(节点名称)?

    非常好Rust库。 在使用过程中,遇到一个问题:遍历所有节点时,希望能获取节点的名称,从源码看有tag_name(),但没看到可调用的接口。

    let html = r#"
    <!doctype html>
    <html lang="en">
      <head>
        <meta charset="utf-8">
        <title>.eq</title>
      </head>
      <body>
        <ul id="menu">
          <li class="item-1">item-1</li>
          <li class="item-2">item-2</li>
          <li class="item-3">item-3</li>
        </ul>
      </body>
    </html>
    "#;
    let root = Vis::load(html)?;
    root.each(|_index, ele|{
      println!("tag:{}", ele.tag_name()); // 如何获取tag_name
      true
    })
    
    opened by zhangxianhong 3
  • Navigating sideway with `find` method

    Navigating sideway with `find` method

    Hello, is there a way to navigate sideway with find method?

    use std::ops::Index;
    use visdom::{Vis, html};
    use visdom::types::{BoxDynError, Elements};
    
    fn main() -> Result<(), BoxDynError> {
        let html = r##"
            <div class='main'>
                <p>abc</p>
                <p>def</p>
            </div>
            <div class='main'>
                <p>ghi</p>
                <p>jkl</p>
            </div>
        "##;
        let doc = Vis::load(html)?;
        let div = doc.find("div.main");
        let div_length = div.length();
        div.each(|index, ele| {
            if index < div_length {
            let p = ele.find("p");
            return true;
            } false
        });
        Ok(())
    }
    
    # error[E0599]: the method `find` exists for mutable reference `&mut Box<dyn visdom::mesdoc::interface::element::IElementTrait>`, but its trait bounds were not satisfied
    

    for div.each, ele.find("p") is not working. Also, why does .each need index?

    opened by Random-G 2
  • doc.find(

    doc.find("p:contains('好用')" panicked, only when Chinese characters appear in contains()

    use visdom::Vis;
    
    fn main() -> anyhow::Result<()> {
        let html = r#"
        <div>
        <p>Visdom is awesome</p>
        <p>Visdom 很好用</p>
        </div>
        "#;
        let doc = Vis::load(html).unwrap();
        let elements = doc.find("p:contains('Visdom')");  // work
        println!("elements : {}", elements.length());
    
        let elements = doc.find("p:contains('好用')");  // panic
        println!("elements : {}", elements.length());
        println!("done");
        Ok(())
    }
    
    
    elements : 2
    thread 'main' panicked at 'range end index 8 out of range for slice of length 5', library/core/src/slice/index.rs:73:5
    stack backtrace:
       0: rust_begin_unwind
                 at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
       1: core::panicking::panic_fmt
                 at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
       2: core::slice::index::slice_end_index_len_fail_rt
                 at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/slice/index.rs:73:5
       3: core::ops::function::FnOnce::call_once
                 at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ops/function.rs:248:5
       4: core::intrinsics::const_eval_select
                 at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/intrinsics.rs:2372:5
       5: core::slice::index::slice_end_index_len_fail
                 at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/slice/index.rs:67:9
       6: <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[T]>>::index
                 at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/slice/index.rs:304:13
       7: <core::ops::range::RangeTo<usize> as core::slice::index::SliceIndex<[T]>>::index
                 at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/slice/index.rs:351:9
       8: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
                 at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/slice/index.rs:18:9
       9: <visdom::mesdoc::selector::pattern::RegExp as visdom::mesdoc::selector::pattern::Pattern>::matched
                 at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/selector/pattern.rs:357:17
      10: visdom::mesdoc::selector::pattern::exec
                 at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/selector/pattern.rs:445:26
      11: visdom::mesdoc::selector::rule::Rule::exec_queues
                 at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/selector/rule.rs:283:54
      12: visdom::mesdoc::selector::rule::Rule::exec
                 at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/selector/rule.rs:276:3
      13: visdom::mesdoc::selector::Selector::from_str
                 at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/selector/mod.rs:157:51
      14: visdom::mesdoc::interface::elements::Elements::find
                 at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/interface/elements.rs:1098:11
      15: repl::main
                 at ./src/bin/repl.rs:17:20
      16: core::ops::function::FnOnce::call_once
                 at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ops/function.rs:248:5
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    
    opened by simon639 2
  • error: failed to get `mesdoc` as a dependency of package `visdom v0.1.4

    error: failed to get `mesdoc` as a dependency of package `visdom v0.1.4

    # mesdoc = "0.1.11"
    rphtml = "0.3.6"
    mesdoc = { path = "../mesdoc" }
    # rphtml = { path = "../rphtml"}
    

    please fix mesdoc = { path = "../mesdoc" }

    opened by zys864 1
Releases(v0.5.8)
Owner
轩子
dream is possible~
轩子
The simplest build-time framework for writing web apps with html templates and typescript

Encoped A build-time fast af tool to write static apps with html and TypeScript Features Template-based ESLint, Prettier and Rollup integration No ext

null 1 Dec 11, 2021
Scraper - HTML parsing and querying with CSS selectors

scraper HTML parsing and querying with CSS selectors. scraper is on Crates.io and GitHub. Scraper provides an interface to Servo's html5ever and selec

june 1.2k Dec 30, 2022
Generate html/js/css with rust

Generate html/js/css with rust

null 79 Sep 29, 2022
Sauron is an html web framework for building web-apps. It is heavily inspired by elm.

sauron Guide Sauron is an web framework for creating fast and interactive client side web application, as well as server-side rendering for back-end w

Jovansonlee Cesar 1.7k Dec 26, 2022
jq, but for HTML

hq jq, but for HTML. hq reads HTML and converts it into a JSON object based on a series of CSS selectors. The selectors are expressed in a similar way

Tom Forbes 511 Jan 5, 2023
lispr is a Rust macro that tries to implement a small subset of LISPs syntax in Rust

lispr lispr is a Rust macro that tries to implement a small subset of LISPs syntax in Rust. It is neither especially beautiful or efficient since it i

Jan Vaorin 0 Feb 4, 2022
A blazingly fast HTTP client with a magnificent request building syntax, made for humans.

?? glue Make requests, select JSON responses, nest them in other requests: A magnificent syntax for blazingly fast cli HTTP calls, made for humans. Ta

Michele Esposito 4 Dec 7, 2022
Rust I18n is use Rust codegen for load YAML file storage translations on compile time, and give you a t! macro for simply get translation texts.

Rust I18n Rust I18n is use Rust codegen for load YAML file storage translations on compile time, and give you a t! macro for simply get translation te

Longbridge 73 Dec 27, 2022
A full-featured and easy-to-use web framework with the Rust programming language.

Poem Framework A program is like a poem, you cannot write a poem without writing it. --- Dijkstra A full-featured and easy-to-use web framework with t

Poem Web 2.2k Jan 6, 2023
Trulang is an interpreted language that is designed to be a simple, easy to learn, and easy to use programming language.

Trulang is an interpreted language that is designed to be a simple, easy to learn, and easy to use programming language.

Bunch-of-cells 2 Nov 23, 2022
A customizable, simple and easy to use json REST API consumer

JACK is a generic JSON API client. It is useful to interact with APIs from multiple services such as Google and Twitter

Mente Binária 6 May 22, 2022
Starter template for use with the Leptos web framework and Axum.

Leptos Axum Starter Template This is a template for use with the Leptos web framework and the cargo-leptos tool using Axum. Creating your template rep

Leptos 10 Mar 4, 2023
A pure Rust implementation of the Web Local Storage API, for use in non-browser contexts

Rust Web Local Storage API A Rust implementation of the Web LocalStorage API, for use in non-browser contexts About the Web Local Storage API MDN docs

RICHΛRD ΛNΛYΛ 10 Nov 28, 2022
An API to track various stats written in Rust. Tracking Github, Wakatime, Spotify, and Duolingo

Null API API For collecting data Explore the docs » View Demo · Report Bug · Request Feature Table of Contents About The Project Built With Getting St

The Null Dev 2 Dec 15, 2022
A framework independent animation library for rust, works nicely with Iced and the others

anim A framework independent animation library for rust, works nicely with Iced and the others. Showcase How to install? Include anim in your Cargo.to

joylei 37 Nov 10, 2022
axum-serde is a library that provides multiple serde-based extractors and responders for the Axum web framework.

axum-serde ?? Overview axum-serde is a library that provides multiple serde-based extractors / responses for the Axum web framework. It also offers a

GengTeng 3 Dec 12, 2023
A Blog & RSS system written in Rust based on Luke Smith's LB.

OB - Oliver's Blog Script A Blog and RSS system written in Rust. Features Converts blog entries written in Markdown into HTML. ✍?? Keeps a rolling blo

Oliver Brotchie 19 Aug 28, 2022
Sample serverless application written in Rust

This is a simple serverless application built in Rust. It consists of an API Gateway backed by four Lambda functions and a DynamoDB table for storage.

AWS Samples 165 Jan 8, 2023
Fastest autocomplete API written in rust 🦀

rust-autocomplete-api fastest* autocomplete API written in rust ?? *probably Run it locally cargo build --release ./target/release/autocomplete-api-po

Alexander Osipenko 4 Sep 23, 2022