This library is a pull parser for CommonMark, written in Rust

Raph Levien

Last update: Jan 1, 2023

Related tags

Encoding JSON pulldown-cmark

Overview

pulldown-cmark

This library is a pull parser for CommonMark, written in Rust. It comes with a simple command-line tool, useful for rendering to HTML, and is also designed to be easy to use from as a library.

It is designed to be:

Fast; a bare minimum of allocation and copying
Safe; written in pure Rust with no unsafe blocks (except in the opt-in SIMD feature)
Versatile; in particular source-maps are supported
Correct; the goal is 100% compliance with the CommonMark spec

Further, it optionally supports parsing footnotes, Github flavored tables, Github flavored task lists and strikethrough.

Rustc 1.42 or newer is required to build the crate.

Why a pull parser?

There are many parsers for Markdown and its variants, but to my knowledge none use pull parsing. Pull parsing has become popular for XML, especially for memory-conscious applications, because it uses dramatically less memory than constructing a document tree, but is much easier to use than push parsers. Push parsers are notoriously difficult to use, and also often error-prone because of the need for user to delicately juggle state in a series of callbacks.

In a clean design, the parsing and rendering stages are neatly separated, but this is often sacrificed in the name of performance and expedience. Many Markdown implementations mix parsing and rendering together, and even designs that try to separate them (such as the popular hoedown), make the assumption that the rendering process can be fully represented as a serialized string.

Pull parsing is in some sense the most versatile architecture. It's possible to drive a push interface, also with minimal memory, and quite straightforward to construct an AST. Another advantage is that source-map information (the mapping between parsed blocks and offsets within the source text) is readily available; you can call into_offset_iter() to create an iterator that yields (Event, Range) pairs, where the second element is the event's corresponding range in the source document.

While manipulating ASTs is the most flexible way to transform documents, operating on iterators is surprisingly easy, and quite efficient. Here, for example, is the code to transform soft line breaks into hard breaks:

let parser = parser.map(|event| match event {
	Event::SoftBreak => Event::HardBreak,
	_ => event
});

Or expanding an abbreviation in text:

event }); ">

let parser = parser.map(|event| match event {
	Event::Text(text) => Event::Text(text.replace("abbr", "abbreviation").into()),
	_ => event
});

Another simple example is code to determine the max nesting level:

let mut max_nesting = 0;
let mut level = 0;
for event in parser {
	match event {
		Event::Start(_) => {
			level += 1;
			max_nesting = std::cmp::max(max_nesting, level);
		}
		Event::End(_) => level -= 1,
		_ => ()
	}
}

There are some basic but fully functional examples of the usage of the crate in the examples directory of this repository.

Using Rust idiomatically

A lot of the internal scanning code is written at a pretty low level (it pretty much scans byte patterns for the bits of syntax), but the external interface is designed to be idiomatic Rust.

Pull parsers are at heart an iterator of events (start and end tags, text, and other bits and pieces). The parser data structure implements the Rust Iterator trait directly, and Event is an enum. Thus, you can use the full power and expressivity of Rust's iterator infrastructure, including for loops and map (as in the examples above), collecting the events into a vector (for recording, playback, and manipulation), and more.

Further, the Text event (representing text) is a small copy-on-write string. The vast majority of text fragments are just slices of the source document. For these, copy-on-write gives a convenient representation that requires no allocation or copying, but allocated strings are available when they're needed. Thus, when rendering text to HTML, most text is copied just once, from the source document to the HTML buffer.

When using the pulldown-cmark's own HTML renderer, make sure to write to a buffered target like a Vec or String. Since it performs many (very) small writes, writing directly to stdout, files, or sockets is detrimental to performance. Such writers can be wrapped in a BufWriter.

Build options

By default, the binary is built as well. If you don't want/need it, then build like this:

> cargo build --no-default-features

Or put in your Cargo.toml file:

pulldown-cmark = { version = "0.8", default-features = false }

SIMD accelerated scanners are available for the x64 platform from version 0.5 onwards. To enable them, build with simd feature:

> cargo build --release --features simd

Or add the feature to your project's Cargo.toml:

pulldown-cmark = { version = "0.8", default-features = false, features = ["simd"] }

Authors

The main author is Raph Levien. The implementation of the new design (v0.3+) was completed by Marcus Klaas de Vries.

Contributions

We gladly accept contributions via GitHub pull requests. Please see CONTRIBUTING.md for more details.

Comments

Support heading attribute block (especially ID and classes)
TODO

[x] Support ID ({#id})

At this stage, {.class} is simply ignored.

[x] Enable attribute block support only when the specific parser option is enabled

I'll use the name Options::ENABLE_HEADING_ATTRIBUTES.

[x] Support classes ({.class1 .class2})

~While this is WIP branch, feel free to give me advice.~ Now this branch is ready to merge.

Summary

By this patch, section headings will be able to have ID and classes. For example, # H1 {#id1 .heading} would be converted to <h1 id="id1" class="heading">H1</h1>.

This is a breaking change: Tag::Heading will have multiple fields (ID and classes) instead of single HeadingLevel.

This solves #424.
opened by lo48576 40
Add boolean to tell if it's an indented code block or not

Fixes #415.

Once merged, can we have another release as quickly as possible please? I'd really love to be able to merge https://github.com/rust-lang/rust/pull/65894 (a few fixes depend on it as well).

opened by GuillaumeGomez 36
Numerous parsing fixes

Fixes #314, #315 and #317.

Many thanks to @mity for reporting these issues with test cases. Such work makes fixing these things much easier! :bowing_man:

opened by marcusklaas 25
Find non-linear growth patterns

Fix #257 Depends on #281 (which is why the diff includes those commits as well). I'm working on that branch as several patterns and issues I found have already been fixed there.

For my notes about the concept used and different things I tried, see this hackmd.

Basically I'm writing an intelligent fuzzer, which parses the pulldown-cmark source-code, extracts all literals, then tests if combinations of those literals result in non-linear behaviour.

opened by oberien 23

The lifetimes on BrokenLinkCallback are wrong

Here is a simple callback which marks every link as working:

fn callback<'a>(link: BrokenLink<'a>) -> Option<(CowStr<'a>, CowStr<'a>)> {
    Some(("#".into(), link.reference.into()))
}

On its own, it typechecks fine. Unfortunately, this doesn't work with Parser::with_broken_link_callback:

fn f(txt: &str) {
    for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
    }
}

error: implementation of `FnOnce` is not general enough
   --> src/lib.rs:8:80
    |
8   |       for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
    |                                                                                  ^^^^^^^^^^^^^ implementation of `FnOnce` is not general enough
    | 
   ::: /home/joshua/.local/lib/rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:219:1
    |
219 | / pub trait FnOnce<Args> {
220 | |     /// The returned type after the call operator is used.
221 | |     #[lang = "fn_once_output"]
222 | |     #[stable(feature = "fn_once_output", since = "1.12.0")]
...   |
227 | |     extern "rust-call" fn call_once(self, args: Args) -> Self::Output;
228 | | }
    | |_- trait `FnOnce` defined here
    |
    = note: `for<'a> fn(pulldown_cmark::BrokenLink<'a>) -> Option<(pulldown_cmark::CowStr<'a>, pulldown_cmark::CowStr<'a>)> {callback}` must implement `FnOnce<(pulldown_cmark::BrokenLink<'_>,)>`
    = note: ...but `FnOnce<(pulldown_cmark::BrokenLink<'_>,)>` is actually implemented for the type `for<'a> fn(pulldown_cmark::BrokenLink<'a>) -> Option<(pulldown_cmark::CowStr<'a>, pulldown_cmark::CowStr<'a>)> {callback}`

error: aborting due to previous error

The issue is that BrokenLinkCallback is typed as having the same lifetime as its outputs: https://github.com/raphlinus/pulldown-cmark/blob/e97974b8d76195c953f0d427e8725ef9ad1a0c17/src/parse.rs#L1270 That means that the callback can't e.g. be passed to two different parsers, because the first call will fix a set lifetime:

 fn f(txt: &str) {
    let mut callback = |link: BrokenLink<'_>| -> Option<(CowStr<'_>, CowStr<'_>)> {
        Some(("#".into(), link.reference.to_owned().into()))
    };

    for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
    }

    for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
    }
}

error[E0499]: cannot borrow `callback` as mutable more than once at a time
  --> src/lib.rs:11:80
   |
8  |     for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
   |                                                                                ------------- first mutable borrow occurs here
...
11 |     for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
   |                                                                                ^^^^^^^^^^^^^
   |                                                                                |
   |                                                                                second mutable borrow occurs here
   |                                                                                first borrow later used here

The fix I was thinking of was something like this:

diff --git a/src/parse.rs b/src/parse.rs
index d6388b1..bccd68c 100644
--- a/src/parse.rs
+++ b/src/parse.rs
@@ -129,12 +129,12 @@ pub struct BrokenLink<'a> {
 }
 
 /// Markdown event iterator.
-pub struct Parser<'a> {
-    text: &'a str,
+pub struct Parser<'input, 'callback: 'input> {
+    text: &'input str,
     options: Options,
     tree: Tree<Item>,
-    allocs: Allocations<'a>,
-    broken_link_callback: BrokenLinkCallback<'a>,
+    allocs: Allocations<'input>,
+    broken_link_callback: BrokenLinkCallback<'callback>,
     html_scan_guard: HtmlScanGuard,
 
     // used by inline passes. store them here for reuse
@@ -1266,8 +1267,8 @@ pub(crate) struct HtmlScanGuard {
     pub declaration: usize,
 }
 
-pub type BrokenLinkCallback<'a> =
-    Option<&'a mut dyn FnMut(BrokenLink) -> Option<(CowStr<'a>, CowStr<'a>)>>;
+pub type BrokenLinkCallback<'b> =
+    Option<&'b mut dyn for<'a> FnMut(BrokenLink<'a>) -> Option<(CowStr<'a>, CowStr<'a>)>>;
 
 /// Markdown event and source range iterator.
 ///

This does two things:

Separates the lifetime of the link from the lifetime of the callback (by changing &'a FnMut() -> &'a str to &'b for<'a> FnMut() -> &'a str).
Separates the lifetime of the link from the lifetime of the link (by adding a new 'callback lifetime).

Unfortunately, this uncovers that the change can't work:

error[E0597]: `link_label` does not live long enough
   --> src/parse.rs:457:64
    |
145 |   impl<'a, 'b> Parser<'a, 'b> {
    |        -- lifetime `'a` defined here
...
450 |                                       .or_else(|| {
    |                                                -- value captured here
...
457 |                                                       reference: link_label.as_ref(),
    |                                                                  ^^^^^^^^^^ borrowed value does not live long enough
...
460 | /                                                 callback(broken_link).map(|(url, title)| {
461 | |                                                     (link_type.to_unknown(), url, title)
462 | |                                                 })
    | |__________________________________________________- returning this value requires that `link_label` is borrowed for `'a`
...
503 |                               }
    |                               - `link_label` dropped here while still borrowed

The issue is that link_label is only alive for the duration for the length of a single loop iteration, not the lifetime of the input. Even though it's parameterized by a lifetime 'input, it has a Box variant, so if it gets dropped the compiler has to conservatively assume the entire link is invalid.

I don't have a solution for this, I think it will require a redesign of the parser. But people are running into this in the real world: https://github.com/rust-lang/rust/pull/79781/files#r537769663

opened by jyn514 19

End footnote definition with one blank line.

According to the linked issue, a footnote definition should end with a blank line. This is similar to the rule for lists, which end with two blank lines. The code previously required two blank lines in both cases, this patch changes it to just one for footnote definitions.

Fixes issue 20.

opened by raphlinus 16

Allow for more flexible fuzzer patterns

This PR allows for more flexible patterns, repeating patterns with a variable but fixed prefix and postfix. As a result, almost all of the existing scaling regression tests could be moved into the fuzzer regression suite, which means that they'll be included in the CI pipeline.

@oberien: the fuzzer seems to panic immediately nowadays when left to fuzz, even on the master branch. I get the following trace:

marcusklaas@localhost ~/p/fuzzer> env RUST_BACKTRACE=1 cargo run --release
    Finished release [optimized + debuginfo] target(s) in 0.05s
     Running `target/release/fuzzer`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error("expected `struct`")', src/libcore/result.rs:1051:5
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.29/src/backtrace/libunwind.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.29/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:47
   3: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:36
   4: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:200
   5: std::panicking::default_hook
             at src/libstd/panicking.rs:214
   6: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:481
   7: std::panicking::continue_panic_fmt
             at src/libstd/panicking.rs:384
   8: rust_begin_unwind
             at src/libstd/panicking.rs:311
   9: core::panicking::panic_fmt
             at src/libcore/panicking.rs:85
  10: core::result::unwrap_failed
             at src/libcore/result.rs:1051
  11: core::result::Result<T,E>::unwrap
             at /rustc/bc2e84ca0939b73fcf1768209044432f6a15c2e5/src/libcore/result.rs:852
  12: <fuzzer::literals::LiteralParser::extract_literals_from_macro::Bitflags as syn::parse::Parse>::parse
             at src/literals.rs:218
  13: core::ops::function::FnOnce::call_once
             at /rustc/bc2e84ca0939b73fcf1768209044432f6a15c2e5/src/libcore/ops/function.rs:231
  14: <F as syn::parse::Parser>::parse2
             at /home/marcusklaas/.cargo/registry/src/github.com-1ecc6299db9ec823/syn-0.15.34/src/parse.rs:1103
  15: syn::parse2
             at /home/marcusklaas/.cargo/registry/src/github.com-1ecc6299db9ec823/syn-0.15.34/src/lib.rs:633
  16: fuzzer::literals::LiteralParser::extract_literals_from_macro
             at src/literals.rs:236
  17: fuzzer::literals::LiteralParser::extract_literals_from_item
             at src/literals.rs:134
  18: fuzzer::literals::LiteralParser::extract_literals_from_items
             at src/literals.rs:100
  19: fuzzer::literals::LiteralParser::extract_literals_from_file
             at src/literals.rs:95
  20: fuzzer::literals::get
             at src/literals.rs:47
  21: fuzzer::fuzz
             at src/main.rs:211
  22: fuzzer::main
             at src/main.rs:148
  23: std::rt::lang_start::{{closure}}
             at /rustc/bc2e84ca0939b73fcf1768209044432f6a15c2e5/src/libstd/rt.rs:64
  24: std::rt::lang_start_internal::{{closure}}
             at src/libstd/rt.rs:49
  25: std::panicking::try::do_call
             at src/libstd/panicking.rs:296
  26: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:82
  27: std::panicking::try
             at src/libstd/panicking.rs:275
  28: std::panic::catch_unwind
             at src/libstd/panic.rs:394
  29: std::rt::lang_start_internal
             at src/libstd/rt.rs:48
  30: main
  31: __libc_start_main
  32: _start

I haven't dug too deep, but it seems related to the token parsing. Which would be strange, since neither the fuzzer's dependencies nor the project's structure has really changed. Do you happen to know what's going on here?

opened by marcusklaas 13

Write HTML to a fmt::Write instead of a String.
My motivation is that I would like to use this library in a static blogging engine but I use a custom buffer type.

The downside is error handling. This may significantly impact performance (I don't know what benchmarks you usually run). The upside is that users can write directly to custom buffers as long as those buffers implement fmt::Write.

Error Handling Alternatives:

Panic on error and tell users to not pass in writers that can fail. Users can always record write errors on the side.

Provide a custom Write trait that doesn't support error reporting.

Also, I kept the current fresh_line behavior but we could probably make this faster if we allow extra newlines.

Compatibility note: This requires rust 1.1 beta for Write::write_char.

FYI, I've signed the CLA.
opened by Stebalien 12

Treat broken reference links as links

This ensures that broken_link_callback will only see broken links once.

Closes #444

Outdated: trouble with FnMut

I tried to add the following test, but it fails because callbacks are only allowed to be Fn, not FnMut:

    #[test]
    fn broken_links_called_only_once() {
        let markdown = "See also [`g()`][crate::g].";
        let mut times_called = 0;
        let mut callback = |_: &str, _: &str| {
            times_called += 1;
            None
        };
        let parser = Parser::new_with_broken_link_callback(markdown, Options::empty(), Some(&callback));
        for _ in parser {}
        assert_eq!(times_called, 1);
    }

error[E0525]: expected a closure that implements the `Fn` trait, but this closure only implements `FnMut`
    --> src/parse.rs:3133:28
     |
3133 |         let mut callback = |_: &str, _: &str| {
     |                            ^^^^^^^^^^^^^^^^^^ this closure implements `FnMut`, not `Fn`
3134 |             times_called += 1;
     |             ------------ closure is `FnMut` because it mutates the variable `times_called` here
...
3137 |         let parser = Parser::new_with_broken_link_callback(markdown, Options::empty(), Some(&callback));
     |                                                                                             --------- the requirement to implement `Fn` derives from here

I tried allowing FnMut closures, but that had all sorts of other things that broke, including needing to pass in Some(&mut closure) instead of Some(&closure) and removing the Clone impl for Parser. Let me know if you want me to follow up with that, I'm not sure what the best approach is there.

opened by jyn514 11

Preserve link reference definitions in parser output
Hello,

I'm using pulldown-cmark to create a markdown code formatter. So far it's been working well and thanks for this library!

I've hit a bit of a hurdle though because the parser does not have events for link reference definitions.

For example, given the following text:

[testing][Some reference] [Some reference]: https://github.com testing

The parser output is the following:

Event::Start(Paragraph) Event::Start(Link(Reference, Borrowed("https://github.com"), Borrowed(""))) Event::Text(Borrowed("testing")) Event::End(Link(Reference, Borrowed("https://github.com"), Borrowed(""))) Event::End(Paragraph) Event::Start(Paragraph) Event::Text(Borrowed("testing")) Event::End(Paragraph)

Is it possible to know that the reference definition appeared between the two paragraphs? Worst case scenario, I will just parse this information out of the file myself.

Thanks!
opened by dsherret 11
Extensible rendering prototype
This implements a prototype of option 3 listed in https://github.com/raphlinus/pulldown-cmark/issues/116. It allows for users to define their own custom rendering for a selection of events and/ or tags. This should add the required flexibility to address many use cases without having to account for them all in pulldown-cmark itself, keeping the implementation lean and fast.

An example inline HTML stripper use case could look like this:

use pulldown_cmark::{html, Parser, Event, Tag}; let markdown_str = "No html!<foo>"; let mut html_buf = String::new(); let parser = Parser::new(markdown_str); html::push_html_with_extension(&mut html_buf, parser, |state, event| { if let Event::InlineHtml(..) = event { Ok(state.write("<REDACTED>")) } else { // default rendering Err(event) } }); assert_eq!(&html_buf[..], "<p>No html!<REDACTED></p>\n");

The overhead for adding custom rendering would be fairly small and none if it's not used.

Paging issues https://github.com/raphlinus/pulldown-cmark/pull/103, https://github.com/raphlinus/pulldown-cmark/issues/116, https://github.com/raphlinus/pulldown-cmark/issues/142, https://github.com/raphlinus/pulldown-cmark/issues/130 and https://github.com/raphlinus/pulldown-cmark/issues/346.

Paging users @RadicalZephyr, @maghoff, @Inicola, @Keats, @transitracer and @Figments.

Would this cover your use cases? And would such an interface work for you?
opened by marcusklaas 11

returning `Events` from a function

I'm using this library in an MdBook extension (mdbook-d2)

It's working a treat, but there's one rough edge. I'd like to return some Events from a function, something like the following-

let image_path = "path/to/image/file.svg";
fn generate_inline_img(path: &str) -> Vec<Event<'_>> {
    let snippet = format!("![]({path})");
    Parser::new(&snippet).collect()
}
let _events = generate_inline_img(image_path);

this doesn't work, since the Parser is carrying a reference to the local snippet variable. Basically i'd like a way to opt out of the zero-copy implementation for cases like this. I think this could be achieved if the Parser accepted an impl Into<CowStr<'input>> instead of &'input str.

My current workaround is to manually construct the snippet like so-

from mdbook-d2

    pub fn render(&self, ctx: RenderContext, content: &str) -> Vec<Event<'static>> {
        fs::create_dir_all(Path::new("src").join(self.output_dir())).unwrap();

        self.run_command(&ctx, content);

        let depth = ctx.path.ancestors().count() - 1;
        let rel_path: PathBuf = std::iter::repeat(Path::new(".."))
            .take(depth)
            .collect::<PathBuf>()
            .join(self.relative_file_path(&ctx));

        vec![
            Event::Start(Tag::Image(
                LinkType::Inline,
                rel_path.to_string_lossy().to_string().into(),
                CowStr::Borrowed(""),
            )),
            Event::End(Tag::Image(
                LinkType::Inline,
                rel_path.to_string_lossy().to_string().into(),
                CowStr::Borrowed(""),
            )),
        ]
    }

or am I perhaps taking the wrong approach entirely in this preprocessor?

opened by danieleades 0

Indented footnote definitions confuse the parser
The following:

Foo[^foo] Bar[^bar] [^foo]: FooDef1 FooDef2 [^bar]: BarDef

renders like so on GitHub:

but pulldown_cmark::Parser gets confused:

It sees the definition text as indented code blocks.

It does not see the end of the first definition before the start of the second.

Running the following code:

static MARKDOWN: &str = r#"\ Foo[^foo] Bar[^bar] [^foo]: FooDef1 FooDef2 [^bar]: BarDef "#; fn main() { use pulldown_cmark::{Options, Parser}; let options = Options::empty().union(Options::ENABLE_FOOTNOTES); let parser = Parser::new_ext(MARKDOWN, options); for event in parser { eprintln!("{event:?}"); } }

yields the following events:

Start(Paragraph)

HardBreak

Text(Borrowed("Foo"))

FootnoteReference(Borrowed("foo"))

SoftBreak

Text(Borrowed("Bar"))

FootnoteReference(Borrowed("bar"))

End(Paragraph)

Start(FootnoteDefinition(Borrowed("foo")))

Start(CodeBlock(Indented))

Text(Borrowed("FooDef1\n"))

Text(Borrowed("FooDef2\n"))

End(CodeBlock(Indented))

Start(FootnoteDefinition(Borrowed("bar")))

Start(CodeBlock(Indented))

Text(Borrowed("BarDef\n"))

End(CodeBlock(Indented))

End(FootnoteDefinition(Borrowed("bar")))

End(FootnoteDefinition(Borrowed("foo")))

I would expect it to:

treat the indented definitions as part of the definition, and

end the first definition before starting the second.

There is a workaround:

Foo[^foo] Bar[^bar] [^foo]: FooDef1 FooDef2 [^bar]: BarDef

This yields:

...

Start(FootnoteDefinition(Borrowed("foo")))

Start(Paragraph)

Text(Borrowed("FooDef1"))

SoftBreak

Text(Borrowed("FooDef2"))

End(Paragraph)

End(FootnoteDefinition(Borrowed("foo")))

Start(FootnoteDefinition(Borrowed("bar")))

Start(Paragraph)

Text(Borrowed("BarDef"))

End(Paragraph)

End(FootnoteDefinition(Borrowed("bar")))

i.e. by putting content on the same line as the definition begins, the parser is happy. Unfortunately, at least one code formatter in VSCode for Markdown (maybe the built-in one?) likes to format long definitions indented below the [^foo]: opener.
opened by allenap 1
Support math extension
This PR adds support for mathematical expressions which was introduced to GitHub recently behind Options::ENABLE_MATH flag.

This extension supports both inline-level and block-level expressions:

Inline-level: $\sqrt{3x-1}+(1+x)^2$ Block-level: $$\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right)$$

They are rendered as follows on GitHub:

Inline-level: $\sqrt{3x-1}+(1+x)^2$

Block-level:

$$\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right)$$
opened by rhysd 2

Interrupt paragraph continuing when the line is footnote definition

Fix #618

With this PR, a footnote definition no longer requires a blank line before it.

This is ok
[^1]: Previous line is not blank but it's ok
[^2]: This line is also ok

This PR also flattens footnote definitions. For example,

[^a]: outer
> [^b]: They also cannot be inside anything else.

is parsed into the following HTML with current master branch:

<div class="footnote-definition" id="a"><sup class="footnote-definition-label">1</sup>
<p>outer</p>
<blockquote>
<p><sup class="footnote-reference"><a href="#b">2</a></sup>: They also cannot be inside anything else.</p>
</blockquote>
</div>

But with this PR, it is parsed into the following HTML:

<div class="footnote-definition" id="a"><sup class="footnote-definition-label">1</sup>
<p>outer</p>
</div>
<blockquote>
<p><sup class="footnote-reference"><a href="#b">2</a></sup>: They also cannot be inside anything else.</p>
</blockquote>

I'm not sure this is correct behavior, but GitHub's Markdown renderer flattens the definitions as follows:

Test ^a [^b]

[^b]: They also cannot be inside anything else.

opened by rhysd 0

Parsing `FootnoteDefinition` is broken when definitions are not separated by a blank line

Repro

Convert the following Markdown input to HTML with --enable-footnotes option.

Here is a simple footnote[^1].

A footnote can also have footnote[^2].

[^1]: My reference 1.
[^2]: My reference 2.

Expected behavior

<p>Here is a simple footnote<sup class="footnote-reference"><a href="#1">1</a></sup>.</p>
<p>A footnote can also have footnote<sup class="footnote-reference"><a href="#2">2</a></sup>.</p>
<div class="footnote-definition" id="1"><sup class="footnote-definition-label">1</sup>
<p>My reference 1.</p>
</div>
<div class="footnote-definition" id="2"><sup class="footnote-definition-label">2</sup>
<p>My reference 2.</p>
</div>

Actual behavior

<p>Here is a simple footnote<sup class="footnote-reference"><a href="#1">1</a></sup>.</p>
<p>A footnote can also have footnote<sup class="footnote-reference"><a href="#2">2</a></sup>.</p>
<div class="footnote-definition" id="1"><sup class="footnote-definition-label">1</sup>
<p>My reference 1.
<sup class="footnote-reference"><a href="#2">2</a></sup>: My reference 2.</p>
</div>

Only 1 FootnoteDefinition event happened and FootnoteReference event incorrectly happened while parsing the footnote definition.

Events while parsing the above input are as follows:

0..31: Start(Paragraph)
0..25: Text(Borrowed("Here is a simple footnote"))
25..29: FootnoteReference(Borrowed("1"))
29..30: Text(Borrowed("."))
0..31: End(Paragraph)
32..71: Start(Paragraph)
32..65: Text(Borrowed("A footnote can also have footnote"))
65..69: FootnoteReference(Borrowed("2"))
69..70: Text(Borrowed("."))
32..71: End(Paragraph)
72..116: Start(FootnoteDefinition(Borrowed("1")))
78..116: Start(Paragraph)
78..93: Text(Borrowed("My reference 1."))
93..94: SoftBreak
94..98: FootnoteReference(Borrowed("2"))
98..115: Text(Borrowed(": My reference 2."))
78..116: End(Paragraph)
72..116: End(FootnoteDefinition(Borrowed("1")))
EOF

Note

It seems that a parser assumes blank line to separate footnote definitions. When I added a blank line between [^1] and [^2], two footnote definitions were correctly parsed.

Here is a simple footnote[^1].

A footnote can also have footnote[^2].

[^1]: My reference 1.

[^2]: My reference 2.

I confirmed the input is rendered correctly on GitHub as follows:

Here is a simple footnote[^1].

It can also have another footnote[^2].

[^1]: My reference 1. [^2]: My reference 2.

opened by rhysd 2

Is there any plan to support custom markdown directives?

I have a usecase where we need to support markdown directives. Is there support for this? Or is it something I could contribute?

This is what I mean: https://talk.commonmark.org/t/generic-directives-plugins-syntax/444

The idea being that we could define and plug in this new directives into the AST and define custom renderers for it.

opened by tonyalaribe 0

Releases(v0.9.2)

v0.9.2(Jul 26, 2022)

This release includes fixes for a few panics and other minor bugs.
Source code(tar.gz)
Source code(zip)
v0.9.1(Jan 17, 2022)

Fixes minor parsing bug in nested lists.
Source code(tar.gz)
Source code(zip)
v0.9.0(Dec 22, 2021)
This release brings a number of changes.

New features

Thanks to @lo48576, pulldown now optionally supports custom header ids and classes for headers. Set ENABLE_HEADING_ATTRIBUTES in the options to enable.

Users can now access reference definitions, information that was previously only exposed internally.

Pulldown is now CommonMark 0.30 compliant.

Changes

The function signature for the broken link callback has changed slightly to allow for FnMut functions.

There have also been a number of (small) parsing bug fixes.
Source code(tar.gz)
Source code(zip)
v0.8.0(Sep 1, 2020)

This release brings support for markdown smart punctuation. Further, it comes with a renewed design for broken link callbacks. Finally, it fixes a few minor parsing bugs.
Source code(tar.gz)
Source code(zip)
v0.7.2(Jul 2, 2020)
Changes:

Minor parsing fixes

Source code(tar.gz)
Source code(zip)
v0.7.0(Feb 12, 2020)

Minor parsing fixes and bug fixes. Now exposes the difference between delimited code blocks and indented code blocks.
Source code(tar.gz)
Source code(zip)
v0.6.1b(Nov 11, 2019)

Minor parsing fixes.
Source code(tar.gz)
Source code(zip)
v0.6.0(Sep 6, 2019)
This is a backward incompatible release. However, most users will not experience any breakage. It also fixes some parser correctness bugs.

Breaking changes:

the get_offset method on the parser was removed. Its semantics were poorly defined and only provided users with the start offset of the next event. To get proper source mapping information which includes the entire source range for each event, upgrade the Parser to an OffsetIter using the into_offset_iter method. This produces an iterator over (Event, Range<usize>) tuples.

the Event::HtmlBlock and Event::InlineHTML event variants were removed. Inline HTML is now represented by regular HTML events.

horizontal rules are now events, and no longer (empty) tags.

Event::Header(i32) has been replaced by Event::Heading(u32).

the starting index of numbered lists is now represented by a u64 instead of a usize.

the FIRST_PASS option has been removed.

Source code(tar.gz)
Source code(zip)
v0.5.3(Jul 18, 2019)
Changes:

Addresses rare panics in emphasis routine

Fixes some parser correctness issues

Small bugfixes

Source code(tar.gz)
Source code(zip)
v0.5.2(May 28, 2019)
Changes:

bug fixes

improved parsing correctness

Source code(tar.gz)
Source code(zip)
v0.5.1(May 13, 2019)
Changes:

removes last remaining unsafe block in default mode (without simd feature);

various bug fixes and guards against quadratic behavior;

very minor performance bumps.

Source code(tar.gz)
Source code(zip)
v0.5.0(Apr 24, 2019)
Additions:

CommonMark 0.29 compatibility

SIMD accelerated parsers feature

Guards against known pathological inputs causing quadratic scanning time

Speed improvements

Changes:

Code spans are no longer tags, but are now events containing a single CowStr. This is a breaking change.

Source code(tar.gz)
Source code(zip)
v0.4.1(Apr 12, 2019)

Minor release with a number of small bug fixes. No breaking changes.
Source code(tar.gz)
Source code(zip)
v0.4.0(Mar 18, 2019)

New extensions (strikethrough, task lists), public CowStr and InlineStr and some small fixes.

This is not backward compatible with v0.3.0, but the changes should be very manageable.
Source code(tar.gz)
Source code(zip)