Shiva library: Implementation in Rust of a parser and generator for documents of any type
- Common Document Model (CDM) for all document types
- Parsers produce CDM
- Generators consume CDM
Document type | Parse | Generate |
---|---|---|
Plain text | + | + |
Markdown | + | + |
HTML | + | + |
+ | + | |
JSON | + | + |
XML | + | + |
CSV | + | + |
RTF | + | - |
DOCX | + | + |
XLS | + | - |
XLSX | + | + |
ODS | + | + |
Typst | - | + |
Document type | Header | Paragraph | List | Table | Image | Hyperlink | PageHeader | PageFooter |
---|---|---|---|---|---|---|---|---|
Plain text | - | + | - | - | - | - | - | - |
Markdown | + | + | + | + | + | + | - | - |
HTML | + | + | + | + | + | + | - | - |
- | + | + | - | - | - | - | - | |
DOCX | + | + | + | + | - | + | - | - |
RTF | + | + | + | + | - | + | + | + |
JSON | + | + | + | + | - | + | + | + |
XML | + | + | + | + | + | + | + | + |
CSV | - | - | - | + | - | - | - | - |
XLS | - | - | - | + | - | - | - | - |
XLSX | - | - | - | + | - | - | - | - |
ODS | - | - | - | + | - | - | - | - |
Document type | Header | Paragraph | List | Table | Image | Hyperlink | PageHeader | PageFooter |
---|---|---|---|---|---|---|---|---|
Plain text | + | + | + | + | - | + | + | + |
Markdown | + | + | + | + | + | + | + | + |
HTML | + | + | + | + | + | + | - | - |
+ | + | + | + | + | + | + | + | |
JSON | + | + | + | + | - | + | + | + |
XML | + | + | + | + | + | + | + | + |
CSV | - | - | - | + | - | - | - | - |
XLSX | - | - | - | + | - | - | - | - |
ODS | - | - | - | + | - | - | - | - |
Typst | + | + | + | + | + | + | + | + |
DOCX | + | + | + | + | + | + | - | - |
Cargo.toml
[dependencies]
shiva = { version = "1.1.1", features = ["html", "markdown", "text", "pdf", "json",
"csv", "rtf", "docx", "xml", "xls", "xlsx", "ods", "typst"] }
main.rs
fn main() {
let input_vec = std::fs::read("input.html").unwrap();
let input_bytes = bytes::Bytes::from(input_vec);
let document = shiva::html::Transformer::parse(&input_bytes).unwrap();
let output_bytes = shiva::markdown::Transformer::generate(&document).unwrap();
std::fs::write("out.md", output_bytes).unwrap();
}
git clone https://github.com/igumnoff/shiva.git
cd shiva/cli
cargo build --release
cd ./target/release/
./shiva --input-format=markdown --output-format=html --input-file=README.md --output-file=README.html
cd ./target/release/
./shiva-server --port=8080 --host=127.0.0.1
I would love to see contributions from the community. If you experience bugs, feel free to open an issue. If you would like to implement a new feature or bug fix, please follow the steps:
- Read "Contributor License Agreement (CLA)"
- Contact with me via telegram @ievkz or discord @igumnovnsk
- Confirm e-mail invitation in repository
- Do "git clone" (You don't need to fork!)
- Create branch with your assigned issue
- Create pull request to main branch
If you would like add new document type, you need to implement the following traits:
pub trait TransformerTrait {
fn parse(document: &Bytes) -> anyhow::Result<Document>;
fn generate(document: &Document) -> anyhow::Result<Bytes>;
}
Optional: shiva::core::TransformerWithImageLoaderSaverTrait (If images store outside of document for example: HTML, Markdown)
pub trait TransformerWithImageLoaderSaverTrait {
fn parse_with_loader<F>(document: &Bytes, image_loader: F) -> anyhow::Result<Document>
where F: Fn(&str) -> anyhow::Result<Bytes>;
fn generate_with_saver<F>(document: &Document, image_saver: F) -> anyhow::Result<Bytes>
where F: Fn(&Bytes, &str) -> anyhow::Result<()>;
}