Visdom
A server-side html document syntax and operation library written in Rust, it uses apis similar to jQuery, left off the parts thoes only worked in browser(e.g. render and event related methods), and use names with snake-case instead of camel-case in javasript.
It's not only helpful for the working with web scraping, but also supported useful apis to operate text
nodes, so you can use it to mix your html with dirty html segement to keep away from web scrapers.
Usage
main.rs
use visdom::Vis;
use std::error::Error;
fn main()-> Result<(), Box<dyn Error>>{
let html = r##"
Hello,
Vis
Dom
"##;
// load html
let nodes = Vis::load(html)?;
let lis_text = nodes.find("#header li").text();
println!("{}", lis_text);
// will output "Hello,VisDom"
Ok(())
}
Vis
Static method:load(html: &str) -> Result
Load the `html` string into an `Elements` collection.
Static method:load_catch(html: &str, handle: Box
Load the `html` string into an `Elements` collection, and use the handle to do with the errors such as html parse error, wrong selectors, this is useful if you don't want the process is paniced by the errors.
Static method:load_options(html: &str, options: html::ParseOptions) -> Result
This method allowed you to define the parse options when parsing the `html` string into a document tree, the `load` method is just an alias method of this, with the most compatible parse options parameter.
// the `load` and `load_catch` use the parse options as below
// more about the `ParseOptions`, you can see the document of `rphtml` library.
ParseOptions{
auto_fix_unclosed_tag: true,
auto_fix_unexpected_endtag: true,
auto_fix_unescaped_lt: true,
allow_self_closing: true,
}
Static method:load_options_catch(html: &str, options: html::ParseOptions, handle: Box
It's same as `load` and `load_options` methods, just exposed a parse options parameter so that you can define how to resolve errors when parsing html.
Static method:dom(ele: &BoxDynElement) -> Elements
Change the `ele` node to single node `Elements`, this will copy the `ele`, you don't need it if you just need do something with methods of the `BoxDynElement` its'own.
e.g.:
// go on the code before
let texts = lis.map(|_index, ele|{
let ele = Vis::dom(ele);
return String::from(ele.text());
});
// now `texts` will be a `Vec`: ["Hello,", "Vis", "Dom"]
API
The following API are inherited from the library mesdoc 。
Trait methods
Instance | Trait | Inherit | Document |
---|---|---|---|
BoxDynNode | INodeTrait | None | INodeTrait Document |
BoxDynElement | IElementTrait | INodeTrait | IElementTrait Document |
BoxDynText | ITextTrait | INodeTrait | ITextTrait Document |
Box |
IDocumentTrait | None | IDocumentTrait Document |
Collections APIs
Collections | Document |
---|---|
Elements | Elements Document |
Texts | Texts Document |
Selector Operation
Selector API | Description | Remarks |
---|---|---|
The caller Self is a Elements , Return Elements |
Tha all APIs are same with the jQuery library | |
find (selector: &str) |
Get the descendants of each element in the Self , filtered by the selector . |
|
filter (selector: &str) |
Reduce Self to those that match the selector . |
|
filter_by (handle: |index: usize, ele: &BoxDynElement| -> bool) |
Reduce Self to those that pass the handle function test. |
|
filter_in (elements: &Elements) |
Reduce Self to those that also in the elements |
|
not (selector: &str) |
Remove elements those that match the selector from Self . |
|
not_by (handle: |index: usize, ele: &BoxDynElement| -> bool) |
Remove elements those that pass the handle function test from Self . |
|
not_in (elements: &Elements) |
Remove elements those that also in the elements from Self . |
|
is (selector: &str) |
Check at least one element in Self is match the selector . |
|
is_by (handle: |index: usize, ele: &BoxDynElement| -> bool) |
Check at least one element call the handle function return true . |
|
is_in (elements: &Elements) |
Check at least one element in Self is also in elements . |
|
is_all (selector: &str) |
Check if each element in Self are all matched the selector . |
|
is_all_by (handle: |index: usize, ele: &BoxDynElement| -> bool) |
Check if each element in Self call the handle function are all returned true . |
|
is_all_in (elements: &Elements) |
Check if each element in Self are all also in elements . |
|
has (selector: &str) |
Reduce Self to those that have a descendant that matches the selector . |
|
has_in (elements: &Elements) |
Reduce Self to those that have a descendant that in the elements . |
|
children (selector: &str) |
Get the children of each element in Self , when the selector is not empty, will filtered by the selector . |
|
parent (selector: &str) |
Get the parent of each element in Self , when the selector is not empty, will filtered by the selector . |
|
parents (selector: &str) |
Get the ancestors of each element in Self , when the selector is not empty, will filtered by the selector . |
|
parents_until (selector: &str, filter: &str, contains: bool) |
Get the ancestors of each element in Self , until the ancestor matched the selector , when contains is true, the matched ancestor will be included, otherwise it will exclude; when the filter is not empty, will filtered by the selector ; |
|
closest (selector: &str) |
Get the first matched element of each element in Self , traversing from self to it's ancestors. |
|
siblings (selector: &str) |
Get the siblings of each element in Self , when the selector is not empty, will filtered by the selector . |
|
next (selector: &str) |
Get the next sibling of each element in Self , when the selector is not empty, will filtered by the selector . |
|
next_all (selector: &str) |
Get all following siblings of each element in Self , when the selector is not empty, will filtered by the selector . |
|
next_until (selector: &str, filter: &str, contains: bool) |
Get all following siblings of each element in Self , until the sibling element matched the selector , when contains is true, the matched sibling will be included, otherwise it will exclude; when the filter is not empty, will filtered by the selector ; |
|
prev (selector: &str) |
Get the previous sibling of each element in Self , when the selector is not empty, will filtered by the selector . |
|
prev_all (selector: &str) |
Get all preceding siblings of each element in Self , when the selector is not empty, will filtered by the selector . |
|
prev_until (selector: &str, filter: &str, contains: bool) |
Get all preceding siblings of each element in Self , until the previous sibling element matched the selector , when contains is true, the matched previous sibling will be included, otherwise it will exclude; when the filter is not empty, will filtered by the selector ; |
|
eq (index: usize) |
Get one element at the specified index . |
|
first () |
Get the first element of the set,equal to eq(0) . |
|
last () |
Get the last element of the set, equal to eq(len - 1) . |
|
slice |
Get a subset specified by a range of indices. | e.g.:slice(..3), will match the first three element. |
add (eles: Elements) |
Get a concated element set from Self and eles , it will generate a new element set, take the ownership of the parameter eles , but have no sence with Self |
Helpers
Helper API | Description | Remarks |
---|---|---|
length () |
Get the number of Self 's element. |
|
is_empty () |
Check if Self has no element, length() == 0 . |
|
for_each (handle: |index: usize, ele: &mut BoxDynElement| -> bool) |
Iterate over the elements in Self , when the handle return false , stop the iterator. |
You can also use each if you like less code. |
map |
Get a collection of values by iterate the each element in Self and call the handle function. |
Supported Selectors
Selectors | Description | Remarks |
---|---|---|
* |
MDN Universal Selectors | |
#id |
MDN Id Selector | |
.class |
MDN Class Selector | |
p |
MDN Type Selectors | |
[attr] |
MDN Attribute Selectors | |
[attr=value] |
See the above. | |
[attr*=value] |
See the above. | |
[attr|=value] |
See the above. | |
[attr~=value] |
See the above. | |
[attr^=value] |
See the above. | |
[attr$=value] |
See the above. | |
[attr!=value] |
jQuery supported, match the element that has an attribute of attr ,but it's value is not equal to value . |
|
span > a |
MDN Child Combinator | match the element of a that who's parent is a span |
span a |
MDN Descendant Combinator | |
span + a |
MDN Adjacent Sibling Combinator | |
span ~ a |
MDN Generic Sibling Combinator | |
span,a |
MDN Selector list | |
span.a |
Adjoining Selectors | match an element that who's tag type is span and also has a class of .a |
:empty |
MDN :empty |
Pseudo Selectors |
:first-child |
MDN :first-child |
|
:last-child |
MDN :last-child |
|
:only-child |
MDN :only-child |
|
:nth-child(nth) |
MDN :nth-child() |
nth support keyword odd and even |
:nth-last-child(nth) |
MDN :nth-last-child() |
|
:first-of-type |
MDN :first-of-type |
|
:last-of-type |
MDN :last-of-type |
|
:only-of-type |
MDN :only-of-type |
|
:nth-of-type(nth) |
MDN :nth-of-type() |
|
:nth-last-of-type(nth) |
MDN :nth-last-of-type() |
|
:not(selector) |
MDN :not() |
|
:contains(content) |
Match the element who's text() contains the content. |
|
:header |
All title tags,alias of: h1,h2,h3,h4,h5,h6 . |
|
:input |
All form input tags, alias of: input,select,textarea,button . |
|
:submit |
Form submit buttons, alias of: input\[type="submit"\],button\[type="submit"\] . |
Attribute Operation
Attribute API | Description | Remarks |
---|---|---|
attr (attr_name: &str) -> Option |
Get an atrribute of key attr_name |
The return value is an Option Enum IAttrValue , IAttrValue has is_true() , is_str(&str) , to_list() methods. |
set_attr (attr_name: &str, value: Option<&str>) |
Set an attribute of key attr_name ,the value is an Option<&str> , when the value is None ,that means the attribute does'n have a string value, it's a bool value of true . |
|
remove_attr (attr_name: &str) |
Remove an attribute of key attr_name . |
|
has_class (class_name: &str) -> bool |
Check if Self 's ClassList contains class_name , multiple classes can be splitted by whitespaces. |
|
add_class (class_name: &str) |
Add class to Self 's ClassList, multiple classes can be splitted by whitespaces. |
|
remove_class (class_name: &str) |
Remove class from Self 's ClassList, multiple classes can be splitted by whitespaces. |
|
toggle_class (class_name: &str) |
Toggle class from Self 's ClassList, multiple classes can be splitted by whitespaces. |
Content Operation
Content API | Description | Remarks |
---|---|---|
text () -> &str |
Get the text of each element in Self ,the html entity will auto decoded. |
|
set_text (content: &str) |
Set the Self 's text, the html entity in content will auto encoded. |
|
html () |
Get the first element in Self 's html. |
|
set_html (content: &str) |
Set the html to content of each element in Self . |
|
outer_html () |
Get the first element in Self 's outer html. |
|
texts (limit_depth: u32) -> Texts |
Get the text node of each element in Self , if limit_depth is 0 , will get all the descendant text nodes; if 1 , will just get the children text nodes.Texts not like Elements , it doesn't have methods by implemented the IElementTrait trait, but it has append_text and prepend_text methods by implemented the ITextTrait . |
DOM Operation
DOM Insertion and Remove API | Description | Remarks |
---|---|---|
append (elements: &Elements) |
Append all elements into Self , after the last child |
|
append_to (elements: &mut Elements) |
The same as the above,but exchange the caller and the parameter target. | |
prepend (elements: &mut Elements) |
Append all elements into Self , befpre the first child |
|
prepend_to (elements: &mut Elements) |
The same as the above,but exchange the caller and the parameter target. | |
insert_after (elements: &mut Elements) |
Insert all elements after Self |
|
after (elements: &mut Elements) |
The same as the above,but exchange the caller and the parameter target. | |
insert_before (elements: &mut Elements) |
Insert all elements before Self |
|
before (elements: &mut Elements) |
The same as the above,but exchange the caller and the parameter target. | |
remove () |
Remove the Self , it will take the ownership of Self , so you can't use it again. |
|
empty () |
Clear the all childs of each element in Self . |
Example
let html = r##"
"##;
let root = Vis::load(html)?;
let mut container = root.find("#container");
let mut second_child = root.find(".second-child");
// append the `second-child` element to the `container`
container.append(&mut second_child);
// then the code become to below
/*
*/
// create new element by `Vis::load`
let mut third_child = Vis::load(r##""##)?;
container.append(&mut third_child);
// then the code become to below
/*
*/
Depedencies
- Elements API Library:https://github.com/fefit/mesdoc
- Html Parser:https://github.com/fefit/rphtml
- Html Entity encode and decode:https://github.com/fefit/htmlentity
Questions & Advices & Bugs?
Welcome to report Issue to us if you have any question or bug or good advice.