This library is a pull parser for CommonMark, written in Rust

Overview

pulldown-cmark

Build Status Docs Crates.io

Documentation

This library is a pull parser for CommonMark, written in Rust. It comes with a simple command-line tool, useful for rendering to HTML, and is also designed to be easy to use from as a library.

It is designed to be:

  • Fast; a bare minimum of allocation and copying
  • Safe; written in pure Rust with no unsafe blocks (except in the opt-in SIMD feature)
  • Versatile; in particular source-maps are supported
  • Correct; the goal is 100% compliance with the CommonMark spec

Further, it optionally supports parsing footnotes, Github flavored tables, Github flavored task lists and strikethrough.

Rustc 1.42 or newer is required to build the crate.

Why a pull parser?

There are many parsers for Markdown and its variants, but to my knowledge none use pull parsing. Pull parsing has become popular for XML, especially for memory-conscious applications, because it uses dramatically less memory than constructing a document tree, but is much easier to use than push parsers. Push parsers are notoriously difficult to use, and also often error-prone because of the need for user to delicately juggle state in a series of callbacks.

In a clean design, the parsing and rendering stages are neatly separated, but this is often sacrificed in the name of performance and expedience. Many Markdown implementations mix parsing and rendering together, and even designs that try to separate them (such as the popular hoedown), make the assumption that the rendering process can be fully represented as a serialized string.

Pull parsing is in some sense the most versatile architecture. It's possible to drive a push interface, also with minimal memory, and quite straightforward to construct an AST. Another advantage is that source-map information (the mapping between parsed blocks and offsets within the source text) is readily available; you can call into_offset_iter() to create an iterator that yields (Event, Range) pairs, where the second element is the event's corresponding range in the source document.

While manipulating ASTs is the most flexible way to transform documents, operating on iterators is surprisingly easy, and quite efficient. Here, for example, is the code to transform soft line breaks into hard breaks:

let parser = parser.map(|event| match event {
	Event::SoftBreak => Event::HardBreak,
	_ => event
});

Or expanding an abbreviation in text:

event }); ">
let parser = parser.map(|event| match event {
	Event::Text(text) => Event::Text(text.replace("abbr", "abbreviation").into()),
	_ => event
});

Another simple example is code to determine the max nesting level:

let mut max_nesting = 0;
let mut level = 0;
for event in parser {
	match event {
		Event::Start(_) => {
			level += 1;
			max_nesting = std::cmp::max(max_nesting, level);
		}
		Event::End(_) => level -= 1,
		_ => ()
	}
}

There are some basic but fully functional examples of the usage of the crate in the examples directory of this repository.

Using Rust idiomatically

A lot of the internal scanning code is written at a pretty low level (it pretty much scans byte patterns for the bits of syntax), but the external interface is designed to be idiomatic Rust.

Pull parsers are at heart an iterator of events (start and end tags, text, and other bits and pieces). The parser data structure implements the Rust Iterator trait directly, and Event is an enum. Thus, you can use the full power and expressivity of Rust's iterator infrastructure, including for loops and map (as in the examples above), collecting the events into a vector (for recording, playback, and manipulation), and more.

Further, the Text event (representing text) is a small copy-on-write string. The vast majority of text fragments are just slices of the source document. For these, copy-on-write gives a convenient representation that requires no allocation or copying, but allocated strings are available when they're needed. Thus, when rendering text to HTML, most text is copied just once, from the source document to the HTML buffer.

When using the pulldown-cmark's own HTML renderer, make sure to write to a buffered target like a Vec or String. Since it performs many (very) small writes, writing directly to stdout, files, or sockets is detrimental to performance. Such writers can be wrapped in a BufWriter.

Build options

By default, the binary is built as well. If you don't want/need it, then build like this:

> cargo build --no-default-features

Or put in your Cargo.toml file:

pulldown-cmark = { version = "0.8", default-features = false }

SIMD accelerated scanners are available for the x64 platform from version 0.5 onwards. To enable them, build with simd feature:

> cargo build --release --features simd

Or add the feature to your project's Cargo.toml:

pulldown-cmark = { version = "0.8", default-features = false, features = ["simd"] }

Authors

The main author is Raph Levien. The implementation of the new design (v0.3+) was completed by Marcus Klaas de Vries.

Contributions

We gladly accept contributions via GitHub pull requests. Please see CONTRIBUTING.md for more details.

Comments
  • Support heading attribute block (especially ID and classes)

    Support heading attribute block (especially ID and classes)

    TODO

    • [x] Support ID ({#id})
      • At this stage, {.class} is simply ignored.
    • [x] Enable attribute block support only when the specific parser option is enabled
      • I'll use the name Options::ENABLE_HEADING_ATTRIBUTES.
    • [x] Support classes ({.class1 .class2})

    ~While this is WIP branch, feel free to give me advice.~ Now this branch is ready to merge.

    Summary

    By this patch, section headings will be able to have ID and classes. For example, # H1 {#id1 .heading} would be converted to <h1 id="id1" class="heading">H1</h1>.

    This is a breaking change: Tag::Heading will have multiple fields (ID and classes) instead of single HeadingLevel.

    This solves #424.

    opened by lo48576 40
  • Add boolean to tell if it's an indented code block or not

    Add boolean to tell if it's an indented code block or not

    Fixes #415.

    Once merged, can we have another release as quickly as possible please? I'd really love to be able to merge https://github.com/rust-lang/rust/pull/65894 (a few fixes depend on it as well).

    opened by GuillaumeGomez 36
  • Numerous parsing fixes

    Numerous parsing fixes

    Fixes #314, #315 and #317.

    Many thanks to @mity for reporting these issues with test cases. Such work makes fixing these things much easier! :bowing_man:

    opened by marcusklaas 25
  • Find non-linear growth patterns

    Find non-linear growth patterns

    Fix #257 Depends on #281 (which is why the diff includes those commits as well). I'm working on that branch as several patterns and issues I found have already been fixed there.

    For my notes about the concept used and different things I tried, see this hackmd.

    Basically I'm writing an intelligent fuzzer, which parses the pulldown-cmark source-code, extracts all literals, then tests if combinations of those literals result in non-linear behaviour.

    opened by oberien 23
  • The lifetimes on BrokenLinkCallback are wrong

    The lifetimes on BrokenLinkCallback are wrong

    Here is a simple callback which marks every link as working:

    fn callback<'a>(link: BrokenLink<'a>) -> Option<(CowStr<'a>, CowStr<'a>)> {
        Some(("#".into(), link.reference.into()))
    }
    

    On its own, it typechecks fine. Unfortunately, this doesn't work with Parser::with_broken_link_callback:

    fn f(txt: &str) {
        for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
        }
    }
    
    error: implementation of `FnOnce` is not general enough
       --> src/lib.rs:8:80
        |
    8   |       for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
        |                                                                                  ^^^^^^^^^^^^^ implementation of `FnOnce` is not general enough
        | 
       ::: /home/joshua/.local/lib/rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:219:1
        |
    219 | / pub trait FnOnce<Args> {
    220 | |     /// The returned type after the call operator is used.
    221 | |     #[lang = "fn_once_output"]
    222 | |     #[stable(feature = "fn_once_output", since = "1.12.0")]
    ...   |
    227 | |     extern "rust-call" fn call_once(self, args: Args) -> Self::Output;
    228 | | }
        | |_- trait `FnOnce` defined here
        |
        = note: `for<'a> fn(pulldown_cmark::BrokenLink<'a>) -> Option<(pulldown_cmark::CowStr<'a>, pulldown_cmark::CowStr<'a>)> {callback}` must implement `FnOnce<(pulldown_cmark::BrokenLink<'_>,)>`
        = note: ...but `FnOnce<(pulldown_cmark::BrokenLink<'_>,)>` is actually implemented for the type `for<'a> fn(pulldown_cmark::BrokenLink<'a>) -> Option<(pulldown_cmark::CowStr<'a>, pulldown_cmark::CowStr<'a>)> {callback}`
    
    error: aborting due to previous error
    

    The issue is that BrokenLinkCallback is typed as having the same lifetime as its outputs: https://github.com/raphlinus/pulldown-cmark/blob/e97974b8d76195c953f0d427e8725ef9ad1a0c17/src/parse.rs#L1270 That means that the callback can't e.g. be passed to two different parsers, because the first call will fix a set lifetime:

     fn f(txt: &str) {
        let mut callback = |link: BrokenLink<'_>| -> Option<(CowStr<'_>, CowStr<'_>)> {
            Some(("#".into(), link.reference.to_owned().into()))
        };
    
        for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
        }
    
        for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
        }
    }
    
    error[E0499]: cannot borrow `callback` as mutable more than once at a time
      --> src/lib.rs:11:80
       |
    8  |     for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
       |                                                                                ------------- first mutable borrow occurs here
    ...
    11 |     for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
       |                                                                                ^^^^^^^^^^^^^
       |                                                                                |
       |                                                                                second mutable borrow occurs here
       |                                                                                first borrow later used here
    

    The fix I was thinking of was something like this:

    diff --git a/src/parse.rs b/src/parse.rs
    index d6388b1..bccd68c 100644
    --- a/src/parse.rs
    +++ b/src/parse.rs
    @@ -129,12 +129,12 @@ pub struct BrokenLink<'a> {
     }
     
     /// Markdown event iterator.
    -pub struct Parser<'a> {
    -    text: &'a str,
    +pub struct Parser<'input, 'callback: 'input> {
    +    text: &'input str,
         options: Options,
         tree: Tree<Item>,
    -    allocs: Allocations<'a>,
    -    broken_link_callback: BrokenLinkCallback<'a>,
    +    allocs: Allocations<'input>,
    +    broken_link_callback: BrokenLinkCallback<'callback>,
         html_scan_guard: HtmlScanGuard,
     
         // used by inline passes. store them here for reuse
    @@ -1266,8 +1267,8 @@ pub(crate) struct HtmlScanGuard {
         pub declaration: usize,
     }
     
    -pub type BrokenLinkCallback<'a> =
    -    Option<&'a mut dyn FnMut(BrokenLink) -> Option<(CowStr<'a>, CowStr<'a>)>>;
    +pub type BrokenLinkCallback<'b> =
    +    Option<&'b mut dyn for<'a> FnMut(BrokenLink<'a>) -> Option<(CowStr<'a>, CowStr<'a>)>>;
     
     /// Markdown event and source range iterator.
     ///
    

    This does two things:

    1. Separates the lifetime of the link from the lifetime of the callback (by changing &'a FnMut() -> &'a str to &'b for<'a> FnMut() -> &'a str).
    2. Separates the lifetime of the link from the lifetime of the link (by adding a new 'callback lifetime).

    Unfortunately, this uncovers that the change can't work:

    error[E0597]: `link_label` does not live long enough
       --> src/parse.rs:457:64
        |
    145 |   impl<'a, 'b> Parser<'a, 'b> {
        |        -- lifetime `'a` defined here
    ...
    450 |                                       .or_else(|| {
        |                                                -- value captured here
    ...
    457 |                                                       reference: link_label.as_ref(),
        |                                                                  ^^^^^^^^^^ borrowed value does not live long enough
    ...
    460 | /                                                 callback(broken_link).map(|(url, title)| {
    461 | |                                                     (link_type.to_unknown(), url, title)
    462 | |                                                 })
        | |__________________________________________________- returning this value requires that `link_label` is borrowed for `'a`
    ...
    503 |                               }
        |                               - `link_label` dropped here while still borrowed
    

    The issue is that link_label is only alive for the duration for the length of a single loop iteration, not the lifetime of the input. Even though it's parameterized by a lifetime 'input, it has a Box variant, so if it gets dropped the compiler has to conservatively assume the entire link is invalid.

    I don't have a solution for this, I think it will require a redesign of the parser. But people are running into this in the real world: https://github.com/rust-lang/rust/pull/79781/files#r537769663

    opened by jyn514 19
  • End footnote definition with one blank line.

    End footnote definition with one blank line.

    According to the linked issue, a footnote definition should end with a blank line. This is similar to the rule for lists, which end with two blank lines. The code previously required two blank lines in both cases, this patch changes it to just one for footnote definitions.

    Fixes issue 20.

    opened by raphlinus 16
  • Allow for more flexible fuzzer patterns

    Allow for more flexible fuzzer patterns

    This PR allows for more flexible patterns, repeating patterns with a variable but fixed prefix and postfix. As a result, almost all of the existing scaling regression tests could be moved into the fuzzer regression suite, which means that they'll be included in the CI pipeline.

    @oberien: the fuzzer seems to panic immediately nowadays when left to fuzz, even on the master branch. I get the following trace:

    marcusklaas@localhost ~/p/fuzzer> env RUST_BACKTRACE=1 cargo run --release
        Finished release [optimized + debuginfo] target(s) in 0.05s
         Running `target/release/fuzzer`
    thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error("expected `struct`")', src/libcore/result.rs:1051:5
    stack backtrace:
       0: backtrace::backtrace::libunwind::trace
                 at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.29/src/backtrace/libunwind.rs:88
       1: backtrace::backtrace::trace_unsynchronized
                 at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.29/src/backtrace/mod.rs:66
       2: std::sys_common::backtrace::_print
                 at src/libstd/sys_common/backtrace.rs:47
       3: std::sys_common::backtrace::print
                 at src/libstd/sys_common/backtrace.rs:36
       4: std::panicking::default_hook::{{closure}}
                 at src/libstd/panicking.rs:200
       5: std::panicking::default_hook
                 at src/libstd/panicking.rs:214
       6: std::panicking::rust_panic_with_hook
                 at src/libstd/panicking.rs:481
       7: std::panicking::continue_panic_fmt
                 at src/libstd/panicking.rs:384
       8: rust_begin_unwind
                 at src/libstd/panicking.rs:311
       9: core::panicking::panic_fmt
                 at src/libcore/panicking.rs:85
      10: core::result::unwrap_failed
                 at src/libcore/result.rs:1051
      11: core::result::Result<T,E>::unwrap
                 at /rustc/bc2e84ca0939b73fcf1768209044432f6a15c2e5/src/libcore/result.rs:852
      12: <fuzzer::literals::LiteralParser::extract_literals_from_macro::Bitflags as syn::parse::Parse>::parse
                 at src/literals.rs:218
      13: core::ops::function::FnOnce::call_once
                 at /rustc/bc2e84ca0939b73fcf1768209044432f6a15c2e5/src/libcore/ops/function.rs:231
      14: <F as syn::parse::Parser>::parse2
                 at /home/marcusklaas/.cargo/registry/src/github.com-1ecc6299db9ec823/syn-0.15.34/src/parse.rs:1103
      15: syn::parse2
                 at /home/marcusklaas/.cargo/registry/src/github.com-1ecc6299db9ec823/syn-0.15.34/src/lib.rs:633
      16: fuzzer::literals::LiteralParser::extract_literals_from_macro
                 at src/literals.rs:236
      17: fuzzer::literals::LiteralParser::extract_literals_from_item
                 at src/literals.rs:134
      18: fuzzer::literals::LiteralParser::extract_literals_from_items
                 at src/literals.rs:100
      19: fuzzer::literals::LiteralParser::extract_literals_from_file
                 at src/literals.rs:95
      20: fuzzer::literals::get
                 at src/literals.rs:47
      21: fuzzer::fuzz
                 at src/main.rs:211
      22: fuzzer::main
                 at src/main.rs:148
      23: std::rt::lang_start::{{closure}}
                 at /rustc/bc2e84ca0939b73fcf1768209044432f6a15c2e5/src/libstd/rt.rs:64
      24: std::rt::lang_start_internal::{{closure}}
                 at src/libstd/rt.rs:49
      25: std::panicking::try::do_call
                 at src/libstd/panicking.rs:296
      26: __rust_maybe_catch_panic
                 at src/libpanic_unwind/lib.rs:82
      27: std::panicking::try
                 at src/libstd/panicking.rs:275
      28: std::panic::catch_unwind
                 at src/libstd/panic.rs:394
      29: std::rt::lang_start_internal
                 at src/libstd/rt.rs:48
      30: main
      31: __libc_start_main
      32: _start
    

    I haven't dug too deep, but it seems related to the token parsing. Which would be strange, since neither the fuzzer's dependencies nor the project's structure has really changed. Do you happen to know what's going on here?

    opened by marcusklaas 13
  • Write HTML to a fmt::Write instead of a String.

    Write HTML to a fmt::Write instead of a String.

    My motivation is that I would like to use this library in a static blogging engine but I use a custom buffer type.

    The downside is error handling. This may significantly impact performance (I don't know what benchmarks you usually run). The upside is that users can write directly to custom buffers as long as those buffers implement fmt::Write.

    Error Handling Alternatives:

    • Panic on error and tell users to not pass in writers that can fail. Users can always record write errors on the side.
    • Provide a custom Write trait that doesn't support error reporting.

    Also, I kept the current fresh_line behavior but we could probably make this faster if we allow extra newlines.

    Compatibility note: This requires rust 1.1 beta for Write::write_char.

    FYI, I've signed the CLA.

    opened by Stebalien 12
  • Treat broken reference links as links

    Treat broken reference links as links

    This ensures that broken_link_callback will only see broken links once.

    Closes #444

    Outdated: trouble with FnMut

    I tried to add the following test, but it fails because callbacks are only allowed to be Fn, not FnMut:

        #[test]
        fn broken_links_called_only_once() {
            let markdown = "See also [`g()`][crate::g].";
            let mut times_called = 0;
            let mut callback = |_: &str, _: &str| {
                times_called += 1;
                None
            };
            let parser = Parser::new_with_broken_link_callback(markdown, Options::empty(), Some(&callback));
            for _ in parser {}
            assert_eq!(times_called, 1);
        }
    
    error[E0525]: expected a closure that implements the `Fn` trait, but this closure only implements `FnMut`
        --> src/parse.rs:3133:28
         |
    3133 |         let mut callback = |_: &str, _: &str| {
         |                            ^^^^^^^^^^^^^^^^^^ this closure implements `FnMut`, not `Fn`
    3134 |             times_called += 1;
         |             ------------ closure is `FnMut` because it mutates the variable `times_called` here
    ...
    3137 |         let parser = Parser::new_with_broken_link_callback(markdown, Options::empty(), Some(&callback));
         |                                                                                             --------- the requirement to implement `Fn` derives from here
    

    I tried allowing FnMut closures, but that had all sorts of other things that broke, including needing to pass in Some(&mut closure) instead of Some(&closure) and removing the Clone impl for Parser. Let me know if you want me to follow up with that, I'm not sure what the best approach is there.

    opened by jyn514 11
  • Preserve link reference definitions in parser output

    Preserve link reference definitions in parser output

    Hello,

    I'm using pulldown-cmark to create a markdown code formatter. So far it's been working well and thanks for this library!

    I've hit a bit of a hurdle though because the parser does not have events for link reference definitions.

    For example, given the following text:

    [testing][Some reference]
    
    [Some reference]: https://github.com
    
    testing
    

    The parser output is the following:

    Event::Start(Paragraph)
    Event::Start(Link(Reference, Borrowed("https://github.com"), Borrowed("")))
    Event::Text(Borrowed("testing"))
    Event::End(Link(Reference, Borrowed("https://github.com"), Borrowed("")))
    Event::End(Paragraph)
    Event::Start(Paragraph)
    Event::Text(Borrowed("testing"))
    Event::End(Paragraph)
    

    Is it possible to know that the reference definition appeared between the two paragraphs? Worst case scenario, I will just parse this information out of the file myself.

    Thanks!

    opened by dsherret 11
  • Extensible rendering prototype

    Extensible rendering prototype

    This implements a prototype of option 3 listed in https://github.com/raphlinus/pulldown-cmark/issues/116. It allows for users to define their own custom rendering for a selection of events and/ or tags. This should add the required flexibility to address many use cases without having to account for them all in pulldown-cmark itself, keeping the implementation lean and fast.

    An example inline HTML stripper use case could look like this:

    use pulldown_cmark::{html, Parser, Event, Tag};
    
    let markdown_str = "No html!<foo>";
    let mut html_buf = String::new();
    let parser = Parser::new(markdown_str);
    
    html::push_html_with_extension(&mut html_buf, parser, |state, event| {
        if let Event::InlineHtml(..) = event {
            Ok(state.write("<REDACTED>"))
        } else {
            // default rendering
            Err(event)
        }
    });
    
    assert_eq!(&html_buf[..], "<p>No html!<REDACTED></p>\n");
    

    The overhead for adding custom rendering would be fairly small and none if it's not used.

    Paging issues https://github.com/raphlinus/pulldown-cmark/pull/103, https://github.com/raphlinus/pulldown-cmark/issues/116, https://github.com/raphlinus/pulldown-cmark/issues/142, https://github.com/raphlinus/pulldown-cmark/issues/130 and https://github.com/raphlinus/pulldown-cmark/issues/346.

    Paging users @RadicalZephyr, @maghoff, @Inicola, @Keats, @transitracer and @Figments.

    Would this cover your use cases? And would such an interface work for you?

    opened by marcusklaas 11
  • returning `Events` from a function

    returning `Events` from a function

    I'm using this library in an MdBook extension (mdbook-d2)

    It's working a treat, but there's one rough edge. I'd like to return some Events from a function, something like the following-

    let image_path = "path/to/image/file.svg";
    fn generate_inline_img(path: &str) -> Vec<Event<'_>> {
        let snippet = format!("![]({path})");
        Parser::new(&snippet).collect()
    }
    let _events = generate_inline_img(image_path);
    

    this doesn't work, since the Parser is carrying a reference to the local snippet variable. Basically i'd like a way to opt out of the zero-copy implementation for cases like this. I think this could be achieved if the Parser accepted an impl Into<CowStr<'input>> instead of &'input str.

    My current workaround is to manually construct the snippet like so-

    from mdbook-d2

        pub fn render(&self, ctx: RenderContext, content: &str) -> Vec<Event<'static>> {
            fs::create_dir_all(Path::new("src").join(self.output_dir())).unwrap();
    
            self.run_command(&ctx, content);
    
            let depth = ctx.path.ancestors().count() - 1;
            let rel_path: PathBuf = std::iter::repeat(Path::new(".."))
                .take(depth)
                .collect::<PathBuf>()
                .join(self.relative_file_path(&ctx));
    
            vec![
                Event::Start(Tag::Image(
                    LinkType::Inline,
                    rel_path.to_string_lossy().to_string().into(),
                    CowStr::Borrowed(""),
                )),
                Event::End(Tag::Image(
                    LinkType::Inline,
                    rel_path.to_string_lossy().to_string().into(),
                    CowStr::Borrowed(""),
                )),
            ]
        }
    

    or am I perhaps taking the wrong approach entirely in this preprocessor?

    opened by danieleades 0
  • Indented footnote definitions confuse the parser

    Indented footnote definitions confuse the parser

    The following:

    Foo[^foo]
    Bar[^bar]
    
    [^foo]:
        FooDef1
        FooDef2
    
    [^bar]:
        BarDef
    

    renders like so on GitHub:

    image

    but pulldown_cmark::Parser gets confused:

    1. It sees the definition text as indented code blocks.
    2. It does not see the end of the first definition before the start of the second.

    Running the following code:

    static MARKDOWN: &str = r#"\
    Foo[^foo]
    Bar[^bar]
    
    [^foo]:
        FooDef1
        FooDef2
    
    [^bar]:
        BarDef
    "#;
    
    fn main() {
        use pulldown_cmark::{Options, Parser};
        let options = Options::empty().union(Options::ENABLE_FOOTNOTES);
        let parser = Parser::new_ext(MARKDOWN, options);
        for event in parser {
            eprintln!("{event:?}");
        }
    }
    

    yields the following events:

    • Start(Paragraph)
    • HardBreak
    • Text(Borrowed("Foo"))
    • FootnoteReference(Borrowed("foo"))
    • SoftBreak
    • Text(Borrowed("Bar"))
    • FootnoteReference(Borrowed("bar"))
    • End(Paragraph)
    • Start(FootnoteDefinition(Borrowed("foo")))
    • Start(CodeBlock(Indented))
    • Text(Borrowed("FooDef1\n"))
    • Text(Borrowed("FooDef2\n"))
    • End(CodeBlock(Indented))
    • Start(FootnoteDefinition(Borrowed("bar")))
    • Start(CodeBlock(Indented))
    • Text(Borrowed("BarDef\n"))
    • End(CodeBlock(Indented))
    • End(FootnoteDefinition(Borrowed("bar")))
    • End(FootnoteDefinition(Borrowed("foo")))

    I would expect it to:

    • treat the indented definitions as part of the definition, and
    • end the first definition before starting the second.

    There is a workaround:

    Foo[^foo]
    Bar[^bar]
    
    [^foo]: FooDef1
        FooDef2
    
    [^bar]: BarDef
    

    This yields:

    • ...
    • Start(FootnoteDefinition(Borrowed("foo")))
    • Start(Paragraph)
    • Text(Borrowed("FooDef1"))
    • SoftBreak
    • Text(Borrowed("FooDef2"))
    • End(Paragraph)
    • End(FootnoteDefinition(Borrowed("foo")))
    • Start(FootnoteDefinition(Borrowed("bar")))
    • Start(Paragraph)
    • Text(Borrowed("BarDef"))
    • End(Paragraph)
    • End(FootnoteDefinition(Borrowed("bar")))

    i.e. by putting content on the same line as the definition begins, the parser is happy. Unfortunately, at least one code formatter in VSCode for Markdown (maybe the built-in one?) likes to format long definitions indented below the [^foo]: opener.

    opened by allenap 1
  • Support math extension

    Support math extension

    This PR adds support for mathematical expressions which was introduced to GitHub recently behind Options::ENABLE_MATH flag.

    This extension supports both inline-level and block-level expressions:

    Inline-level:  $\sqrt{3x-1}+(1+x)^2$
    
    Block-level:
    
    $$\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right)$$
    

    They are rendered as follows on GitHub:

    Inline-level: $\sqrt{3x-1}+(1+x)^2$

    Block-level:

    $$\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right)$$

    opened by rhysd 2
  • Interrupt paragraph continuing when the line is footnote definition

    Interrupt paragraph continuing when the line is footnote definition

    Fix #618

    With this PR, a footnote definition no longer requires a blank line before it.

    This is ok
    [^1]: Previous line is not blank but it's ok
    [^2]: This line is also ok
    

    This PR also flattens footnote definitions. For example,

    [^a]: outer
    > [^b]: They also cannot be inside anything else.
    

    is parsed into the following HTML with current master branch:

    <div class="footnote-definition" id="a"><sup class="footnote-definition-label">1</sup>
    <p>outer</p>
    <blockquote>
    <p><sup class="footnote-reference"><a href="#b">2</a></sup>: They also cannot be inside anything else.</p>
    </blockquote>
    </div>
    

    But with this PR, it is parsed into the following HTML:

    <div class="footnote-definition" id="a"><sup class="footnote-definition-label">1</sup>
    <p>outer</p>
    </div>
    <blockquote>
    <p><sup class="footnote-reference"><a href="#b">2</a></sup>: They also cannot be inside anything else.</p>
    </blockquote>
    

    I'm not sure this is correct behavior, but GitHub's Markdown renderer flattens the definitions as follows:

    Test ^a [^b]

    [^b]: They also cannot be inside anything else.

    opened by rhysd 0
  • Parsing `FootnoteDefinition` is broken when definitions are not separated by a blank line

    Parsing `FootnoteDefinition` is broken when definitions are not separated by a blank line

    Repro

    Convert the following Markdown input to HTML with --enable-footnotes option.

    Here is a simple footnote[^1].
    
    A footnote can also have footnote[^2].
    
    [^1]: My reference 1.
    [^2]: My reference 2.
    

    Expected behavior

    <p>Here is a simple footnote<sup class="footnote-reference"><a href="#1">1</a></sup>.</p>
    <p>A footnote can also have footnote<sup class="footnote-reference"><a href="#2">2</a></sup>.</p>
    <div class="footnote-definition" id="1"><sup class="footnote-definition-label">1</sup>
    <p>My reference 1.</p>
    </div>
    <div class="footnote-definition" id="2"><sup class="footnote-definition-label">2</sup>
    <p>My reference 2.</p>
    </div>
    

    Actual behavior

    <p>Here is a simple footnote<sup class="footnote-reference"><a href="#1">1</a></sup>.</p>
    <p>A footnote can also have footnote<sup class="footnote-reference"><a href="#2">2</a></sup>.</p>
    <div class="footnote-definition" id="1"><sup class="footnote-definition-label">1</sup>
    <p>My reference 1.
    <sup class="footnote-reference"><a href="#2">2</a></sup>: My reference 2.</p>
    </div>
    

    Only 1 FootnoteDefinition event happened and FootnoteReference event incorrectly happened while parsing the footnote definition.

    Events while parsing the above input are as follows:

    0..31: Start(Paragraph)
    0..25: Text(Borrowed("Here is a simple footnote"))
    25..29: FootnoteReference(Borrowed("1"))
    29..30: Text(Borrowed("."))
    0..31: End(Paragraph)
    32..71: Start(Paragraph)
    32..65: Text(Borrowed("A footnote can also have footnote"))
    65..69: FootnoteReference(Borrowed("2"))
    69..70: Text(Borrowed("."))
    32..71: End(Paragraph)
    72..116: Start(FootnoteDefinition(Borrowed("1")))
    78..116: Start(Paragraph)
    78..93: Text(Borrowed("My reference 1."))
    93..94: SoftBreak
    94..98: FootnoteReference(Borrowed("2"))
    98..115: Text(Borrowed(": My reference 2."))
    78..116: End(Paragraph)
    72..116: End(FootnoteDefinition(Borrowed("1")))
    EOF
    

    Note

    It seems that a parser assumes blank line to separate footnote definitions. When I added a blank line between [^1] and [^2], two footnote definitions were correctly parsed.

    Here is a simple footnote[^1].
    
    A footnote can also have footnote[^2].
    
    [^1]: My reference 1.
    
    [^2]: My reference 2.
    

    I confirmed the input is rendered correctly on GitHub as follows:

    Here is a simple footnote[^1].

    It can also have another footnote[^2].

    [^1]: My reference 1. [^2]: My reference 2.

    opened by rhysd 2
  • Is there any plan to support custom markdown directives?

    Is there any plan to support custom markdown directives?

    I have a usecase where we need to support markdown directives. Is there support for this? Or is it something I could contribute?

    This is what I mean: https://talk.commonmark.org/t/generic-directives-plugins-syntax/444

    The idea being that we could define and plug in this new directives into the AST and define custom renderers for it.

    opened by tonyalaribe 0
Releases(v0.9.2)
  • v0.9.2(Jul 26, 2022)

  • v0.9.1(Jan 17, 2022)

  • v0.9.0(Dec 22, 2021)

    This release brings a number of changes.

    New features

    • Thanks to @lo48576, pulldown now optionally supports custom header ids and classes for headers. Set ENABLE_HEADING_ATTRIBUTES in the options to enable.
    • Users can now access reference definitions, information that was previously only exposed internally.
    • Pulldown is now CommonMark 0.30 compliant.

    Changes

    • The function signature for the broken link callback has changed slightly to allow for FnMut functions.

    There have also been a number of (small) parsing bug fixes.

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Sep 1, 2020)

    This release brings support for markdown smart punctuation. Further, it comes with a renewed design for broken link callbacks. Finally, it fixes a few minor parsing bugs.

    Source code(tar.gz)
    Source code(zip)
  • v0.7.2(Jul 2, 2020)

  • v0.7.0(Feb 12, 2020)

  • v0.6.1b(Nov 11, 2019)

  • v0.6.0(Sep 6, 2019)

    This is a backward incompatible release. However, most users will not experience any breakage. It also fixes some parser correctness bugs.

    Breaking changes:

    • the get_offset method on the parser was removed. Its semantics were poorly defined and only provided users with the start offset of the next event. To get proper source mapping information which includes the entire source range for each event, upgrade the Parser to an OffsetIter using the into_offset_iter method. This produces an iterator over (Event, Range<usize>) tuples.
    • the Event::HtmlBlock and Event::InlineHTML event variants were removed. Inline HTML is now represented by regular HTML events.
    • horizontal rules are now events, and no longer (empty) tags.
    • Event::Header(i32) has been replaced by Event::Heading(u32).
    • the starting index of numbered lists is now represented by a u64 instead of a usize.
    • the FIRST_PASS option has been removed.
    Source code(tar.gz)
    Source code(zip)
  • v0.5.3(Jul 18, 2019)

  • v0.5.2(May 28, 2019)

  • v0.5.1(May 13, 2019)

    Changes:

    • removes last remaining unsafe block in default mode (without simd feature);
    • various bug fixes and guards against quadratic behavior;
    • very minor performance bumps.
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Apr 24, 2019)

    Additions:

    • CommonMark 0.29 compatibility
    • SIMD accelerated parsers feature
    • Guards against known pathological inputs causing quadratic scanning time
    • Speed improvements

    Changes:

    • Code spans are no longer tags, but are now events containing a single CowStr. This is a breaking change.
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Apr 12, 2019)

  • v0.4.0(Mar 18, 2019)

    New extensions (strikethrough, task lists), public CowStr and InlineStr and some small fixes.

    This is not backward compatible with v0.3.0, but the changes should be very manageable.

    Source code(tar.gz)
    Source code(zip)
Owner
Raph Levien
Raph Levien
Get JSON values quickly - JSON parser for Rust

get json values quickly GJSON is a Rust crate that provides a fast and simple way to get values from a json document. It has features such as one line

Josh Baker 160 Dec 29, 2022
A json5 parser for luajit

Json5 parser for luajit This crate provides json5 deserialization for luajit. Inspired and adapted from json5-rs NOTE: When compiling for macos, pleas

Joaquín León 12 Nov 19, 2022
Fontdue - The fastest font renderer in the world, written in pure rust.

Fontdue is a simple, no_std (does not use the standard library for portability), pure Rust, TrueType (.ttf/.ttc) & OpenType (.otf) font rasterizer and layout tool. It strives to make interacting with fonts as fast as possible, and currently has the lowest end to end latency for a font rasterizer.

Joe C 1k Jan 2, 2023
CLI tool to convert HOCON into valid JSON or YAML written in Rust.

{hocon:vert} CLI Tool to convert HOCON into valid JSON or YAML. Under normal circumstances this is mostly not needed because hocon configs are parsed

Mathias Oertel 23 Jan 6, 2023
Strongly typed JSON library for Rust

Serde JSON   Serde is a framework for serializing and deserializing Rust data structures efficiently and generically. [dependencies] serde_json = "1.0

null 3.6k Jan 5, 2023
JSON Schema validation library

A JSON Schema validator implementation. It compiles schema into a validation tree to have validation as fast as possible.

Dmitry Dygalo 308 Dec 30, 2022
A port of the Node.js library json-file-store

A port of the Node.js library json-file-store

Markus Kohlhase 60 Dec 19, 2022
This library implements a type macro for a zero-sized type that is Serde deserializable only from one specific value.

Monostate This library implements a type macro for a zero-sized type that is Serde deserializable only from one specific value. [dependencies] monosta

David Tolnay 125 Dec 26, 2022
An auxiliary library for the serde crate.

An auxiliary library for the serde crate.

Victor Polevoy 98 Jan 2, 2023
Rust port of simdjson

SIMD Json for Rust   Rust port of extremely fast simdjson JSON parser with serde compatibility. readme (for real!) simdjson version Currently tracking

null 737 Dec 30, 2022
JSON implementation in Rust

json-rust Parse and serialize JSON with ease. Changelog - Complete Documentation - Cargo - Repository Why? JSON is a very loose format where anything

Maciej Hirsz 500 Dec 21, 2022
Rust port of gjson,get JSON value by dotpath syntax

A-JSON Read JSON values quickly - Rust JSON Parser change name to AJSON, see issue Inspiration comes from gjson in golang Installation Add it to your

Chen Jiaju 90 Dec 6, 2022
A rust script to convert a better bibtex json file from Zotero into nice organised notes in Obsidian

Zotero to Obsidian script This is a script that takes a better bibtex JSON file exported by Zotero and generates an organised collection of reference

Sashin Exists 3 Oct 9, 2022
Typify - Compile JSON Schema documents into Rust types.

Typify Compile JSON Schema documents into Rust types. This can be used ... via the macro import_types!("types.json") to generate Rust types directly i

Oxide Computer Company 73 Dec 27, 2022
A easy and declarative way to test JSON input in Rust.

assert_json A easy and declarative way to test JSON input in Rust. assert_json is a Rust macro heavily inspired by serde json macro. Instead of creati

Charles Vandevoorde 8 Dec 5, 2022
Hjson for Rust

hjson-rust for serde { # specify rate in requests/second (because comments are helpful!) rate: 1000 // prefer c-style comments? /* feeling ol

Hjson 83 Oct 5, 2022
A small rust database that uses json in memory.

Tiny Query Database (TQDB) TQDB is a small library for creating a query-able database that is encoded with json. The library is well tested (~96.30% c

Kace Cottam 2 Jan 4, 2022
A JSON Query Language CLI tool built with Rust 🦀

JQL A JSON Query Language CLI tool built with Rust ?? ?? Core philosophy ?? Stay lightweight ?? Keep its features as simple as possible ?? Avoid redun

Davy Duperron 872 Jan 1, 2023
An implementation of the JSONPath A spec in Rust, with several extensions added on

Rust JSONPath Plus An implementation of the JSONPath A spec in Rust, with several extensions added on. This library also supports retrieving AST analy

Rune Tynan 4 Jul 13, 2022