Extensible BBCode parser with scoping rules, auto close tags

Overview

More BBCode parsers?

Yeah! I needed something highly extensible, flexible, and specifically WITH scoping rules so it always produces correct HTML. For instance, there's stuff like:

[b]This is bold! [i]AND ITALIC?[/b] oops where's [/i]?

Where you have unmatching closing tags. While a simple regex replacement may handle this any number of ways, this library will produce:

<b>This is bold! <i>AND ITALIC?</i></b> oops where&#x27;s ?

Another example:

And so [s] I'm just like [s] opening tons of [sup] tags? Maybe I'll close this [/s] one
And so <s> I&#x27;m just like <s> opening tons of <sup> tags? Maybe I&#x27;ll close this </sup></s> one</s>

All unclosed tags are automatically closed in the correct order, including any that were left at the end of the text. Unmatched closing tags are removed. Of course, this may not be what you want, but I've found that most older or established bbcode parsers work this way. With this library, you can feel (generally) safe knowing it will produce proper HTML.

With scoping rules, you also get access to tags which can reject other tags inside of them, by specifying the only vector (more later). For instance, in the extended tagset, I have [code] which rejects all types of matches except normal text and "garbage" (characters we throw away, like \r).

Quickstart

let bbcode = BBCode::default().unwrap(); // Or catch error
let html = bbcode.parse("[url]https://github.com[/url]")
// Make sure to reuse the BBCode you created! Creating is expensive!

Or, if you want the extended set (see next section for list):

// These are just vectors, which means you can construct your own!
let mut matchers = BBCode::basics().unwrap();
let mut extras = BBCode::extras().unwrap();
matchers.append(&mut extras);

//Note: this step could be expensive, as it has to compile a couple dozen regexes
let bbcode = BBCode::from_matchers(matchers);

// Cheap copy: they share the same pre-compiled regex, share it around!
let bbcode2 = bbcode.clone();

Or, if you want to add your own tag:

let mut matchers = BBCode::basics().unwrap();

// How your tag gets turned into HTML; you are given the open tag regex capture, the 
// pre-parsed pre-escaped body, and the closing tag regex capture (if the user provided it)
let emitter = |open_capture,body,_c| {
  //NOTE: in production code, don't `unwrap` the named capture group, it might not exist!
  let color = open_capture.unwrap().name("attr").unwrap().as_str();
  format!(r#"<span style="color:{}">{}</span>"#, color, body)
};

BBCode::add_tagmatcher(&mut matchers, "color", ScopeInfo::basic(Arc::new(emitter)), None, None)?;

let bbcode = BBCode::from_matchers(matchers);

The BBCode::add_tagmatcher method constructs a bbcode tag parser for you, but you can technically construct your own matcher manually which can match almost anything. For now, if you're just trying to add basic bbcode tags, you'll see in the above:

  • First parameter is the list to append the matcher to (it adds multiple items).
  • Second is the name of the bbcode tag, all lowercase (so this would match [color])
  • Third is a special "ScopeInfo" struct, but we're calling the "basic" constructor and simply passing a boxed closure rather than configuring the entire ScopeInfo.
  • That boxed closure is a so-called EmitScope, which gives you the regex capture for the open tag, the pre-html-escaped, pre-parsed body, and the closing tag regex capture, which you can use to output (emit) the constructed html. Note that, although the opening tag is nearly always given, the closing tag is OFTEN not given, especially if the user did not close their tags. Do not rely on the last parameter (_c in the example) existing
  • Note that the opening tag capture has a named group called attr, which is the value of the attribute given in the bbcode tag. For instance, if you had [url=http://whatever]abc[/url], the match attr would house the string http://whatever (NOT pre-escaped, be careful!)
  • The last two parameters are optional newline consumption before and after the opening and closing tag. For instance, if you wanted to consume the first newline before the opening tag, and the first newline after the closing tag, those two might look like Some((1,0)), Some((0,1)) (this may change in the future)

Rocket Web example

There are many web frameworks to choose from for rust, so having an example for each would be a bit difficult. Someone suggested Rocket, so here's an example in 0.5.0_rc2:

#[macro_use] extern crate rocket;
use rocket::response::content;
use bbscope::BBCode;

#[launch]
fn rocket() -> _ {
    let bbcode = BBCode::default();
    rocket::build()
      .mount("/", routes![ index ])
      .manage(bbcode) //Add as state, you want to reuse the bbcode object!!
}

#[get("/")]
fn index(bbcode: &State<BBCode>) -> content::RawHtml<String> {
  content::RawHtml(String::from(bbcode.parse("Hey, it's [b]bbcode[/b]! [i]Oops, [u]forgot to close[/i] a tag")))
}

Default supported tags:

BBCode is so varied, there's so many crazy tags and systems and nobody was ever able to agree on any, or at least if they did, it was too late. These are the tags supported in the 'basic' set:

  • [b]bold[/b]
  • [i]italic[/i]
  • [s]strikethrough[/s]
  • [u]underline[/u]
  • [sup]superscript[/sup]
  • [sub]subscript[/sub]
  • [url=link*]url[/url] (=link attribute optional)
  • [img=link*]link[/img] (only one needed: attribute or inner)
  • [list][*]item[*]item2[/list]

Some of those may be nonstandard, and you may be missing some you find standard! If so, there's also an optional extended list:

  • [quote=cite*]a blockquote[/quote] (=cite attribute optional)
  • [code]verbatim pre[/code]
  • [icode]verbatim inline[/icode]
  • [youtube]youtube link*[/youtube] (CURRENTLY RENDERS AS LINK!)
  • [h1]big header[/h1]
  • [h2]medium header[/h2]
  • [h3]small header[/h3]
  • [anchor=name]some text linkable with #name[/anchor]
  • [spoiler=name]some text to hide[/spoiler]

And of course, the usual HTML characters are escaped everywhere: ', ", &, <, >

URLs not inside a url or img tag are auto-linked, or at least a best attempt is made at autolinking them (your mileage may vary)

Caveats:

  • Output removes \r but RETAINS \n rather than replacing with <br>. This was how an old bbcode parser I was using worked, and this was written to replace that. If there's a need, I can add modes for \n vs <br>
  • Performance was not a main concern, although you can enable additional performance features with the perf feature (enables some regex optimizations, about a 4x improvement in my testing)
  • Many rules are arbitrary and meant to copy an existing bbcode parser I used for many years

Changelog:

  • 0.0.6: Small bugfix for conditional compilation
  • 0.1.0: Full rewrite; if using BBCode::default(), or BBCode::basics() and BBCode::extras(), it should still compatible, but if you were creating custom tags at all, the entire system was scrapped in favor of the ScopeInfo and EmitScope combo
  • 0.1.1: Small bugfix to enforce Sync + Send on closures (so bbcode can be used across threads)
  • 0.1.2: Added class to "code" segments
  • 0.1.3: Added ability to convert a bbcode parser into one that only consumes the scoped tags it had
  • 0.1.4: Added secondary syntax I've seen around: [tag tag=attribute]

Future

I mostly published this for my own projects, which have specific requirements, but if for some reason this gets picked up and used and there are gaps or bugs or other required features, I'd be willing to work on it! I just don't see that happening lol

You might also like...
Attribute for defining `macro_rules!` macros with proper visibility and scoping

macro-vis This crate provides an attribute for defining macro_rules! macros that have proper visibility and scoping. The default scoping and publicity

Update Twitter profile with a meter showing how close you are to code burnout.

WakaTime Code Burnout Meter in Twitter Profile Inspired by trash's Twitter profile, this repo adds a burnout meter to your Twitter profile. It uses Wa

Grimsby is an Erlang Port written in Rust that can close its standard input while retaining standard output (and error)

Grimsby An Erlang Port provides the basic mechanism for communication from Erlang with the external world. From the Ports and Port Drivers: Erlang Ref

a chess engine written in rust. not even close to being done

Goals improve at the rust programming language. my first major project tought me how to do things in rust. I hope for this project to teach me how to

CLI Tool for tagging and organizing files by tags.

wutag ๐Ÿ”ฑ ๐Ÿท๏ธ CLI tool for tagging and organizing files by tags. Install If you use arch Linux and have AUR repositories set up you can use your favour

Trigger sounds via RFID tags or barcodes

Reads codes via RFID or 1D/2D barcode USB scanners and plays soundfiles mapped to them.

A common library and set of test cases for transforming OSM tags to lane specifications

osm2lanes See discussion for context. This repo is currently just for starting this experiment. No license chosen yet. Structure data tests.jsonโ€”tests

Verbump - A simple utility written in rust to bump and manage git semantic version tags.

Verbump - A simple utility written in rust to bump and manage git semantic version tags.

An efficient pictures manager based on custom tags and file system organization.

PicturesManager An efficient pictures manager based on custom tags and file system organization. Developed with Tauri (web app) with a Rust backend an

Read and write ID3 tags with machine-readable input and output

ID3-JSON This project's goal is to provide an easy way to read and write ID3 tags with a consistent input and output. The existing tools I've found re

Proc. macro to generate C-like `enum` tags.

Continuous Integration Documentation Crates.io #[derive(EnumTag)] This crate provides a proc. macro to derive the EnumTag trait for the given Rust enu

A fast, extensible, command-line arguments parser

parkour A fast, extensible, command-line arguments parser. Introduction ๐Ÿ“š The most popular argument parser, clap, allows you list all the possible ar

Eon-rs - A reference parser for EON (Extensible Object Notation)

eon-rs eon-rs is a Rust library for parsing EON. Installation Add eon-rs = "1.0.0" to your Cargo.toml file. Usage use eon_rs; // Read EON from file l

Extensible inline parser engine, the backend parsing engine for Lavendeux.

Lavendeux Parser - Extensible inline parser engine lavendeux-parser is an exensible parsing engine for mathematical expressions. It supports variable

A simple, lightweight and extensible command line argument parser for rust codebases

A simple, lightweight and extensible command line argument parser for rust codebases. This crate aims to provide you with an easy-to-use and extensibl

Powerful database anonymizer with flexible rules. Written in Rust.
Powerful database anonymizer with flexible rules. Written in Rust.

[Data]nymizer Powerful database anonymizer with flexible rules. Written in Rust. Datanymizer is created & supported by Evrone. What else we develop wi

Converts a MO loadorder to loot rules for manual loadorders

lootifier Converts a Mod Organizer loadorders to loot rules for manual loadorders Long Description This tool is meant for people who want to share mod

Generator of Firestore rules and type safe client code.
Generator of Firestore rules and type safe client code.

Generator of Firestore rules and type safe client code. Usage [WIP] Install from npm or curl. $ npm install -g firegen Setting your yml. # firegen.yml

Comments
  • Some people use attributes weirdly

    Some people use attributes weirdly

    This might be easy but some people, instead of doing [spoiler=yeah yeah] will do [spoiler spoiler=yeah yeah]. I think you basically just... need to have an optional (?:\s+{tag})? before the equals, if you don't need to capture the argument.

    https://smilebasicsource.com/forum/thread/sim3d

    opened by randomouscrap98 1
  • Add Send + Sync to closure limitations

    Add Send + Sync to closure limitations

    Oops, I forgot to enforce Send + Sync for the closures so they're usable in more contexts (like warp and maybe other frameworks).

    Commit https://github.com/randomouscrap98/bbscope-rust/commit/5fa12f5e9a1e0eba419cf94aac18129a93ea336b already has the fix, but I don't want to publish to crates yet (in case there are other unforseen issues); that's what this issue is for.

    opened by randomouscrap98 1
Owner
Carlos Sanchez
Carlos Sanchez
Grimsby is an Erlang Port written in Rust that can close its standard input while retaining standard output (and error)

Grimsby An Erlang Port provides the basic mechanism for communication from Erlang with the external world. From the Ports and Port Drivers: Erlang Ref

Peter Morgan 5 May 29, 2023
Djotters is a Djot parser and translater written via parser combinators, in rust.

?? Djotters Turning your Djot (markdown) into lovely HTML! Djotters is here to let you draft up a document and watch it render in real time. If you wa

Anthony Alaribe 4 Mar 26, 2024
A highly extensible runner that can execute any workflow.

Astro run Astro Run is a highly extensible runner that can execute any workflow. Features Workflow runtime for Docker Support for gRPC server to coord

Panghu 3 Aug 19, 2023
A parser for the perf.data format

linux-perf-data This repo contains a parser for the perf.data format which is output by the Linux perf tool. It also contains a main.rs which acts sim

Markus Stange 8 Dec 29, 2022
Texting Robots: A Rust native `robots.txt` parser with thorough unit testing

Texting Robots Crate texting_robots is a library for parsing robots.txt files. A key design goal of this crate is to have a thorough test suite tested

Stephen Merity 20 Aug 17, 2022
CSGO demo parser for Python

CSGO demo parser for Python Demo parser for Counter-Strike: Global Offensive. Parser is used to collect data from replay files (".dem" files). The goa

null 11 Dec 7, 2022
A minimal and fast zero-copy parser for the PE32+ file format.

peview A minimal and fast zero-copy parser for the PE32+ file format. Goal This project aims to offer a more light weight and easier to use alternativ

null 5 Dec 20, 2022
A parser for the .map file included in the aimware leak

a utility I wrote to parse the map file included with the recent aimware self-leak. there is also an IDAPython script to import the symbol information into IDA.

unknowntrojan 9 Feb 28, 2023
An LR(1) parser generator and visualizer created for educational purposes.

.lr An LR(1) parser generator and visualizer created for educational purposes. Table of Contents What is an LR(1) parser? Why did you make this? How c

Umut 80 Oct 21, 2024
Auto-Complete is an intelligent auto-completion extension for Emacs.

Auto-Complete is an intelligent auto-completion extension for Emacs. It extends the standard Emacs completion interface and provides an environment that allows users to concentrate more on their own work.

Emacs Auto-Complete 1.7k Dec 28, 2022