A complete harfbuzz's shaping algorithm port to Rust

Overview

rustybuzz

Build Status Crates.io Documentation

rustybuzz is a complete harfbuzz's shaping algorithm port to Rust.

Matches harfbuzz v2.7.0

Why?

Because you can add rustybuzz = "*" to your project and it just works. No need for a C++ compiler. No need to configure anything. No need to link to system libraries.

Conformance

rustybuzz passes 98% of harfbuzz tests (1764 to be more precise). So it's mostly identical, but there are still some tiny edge-cases which are not implemented yet or cannot be implemented at all.

Major changes

  • Subsetting removed.
  • TrueType parsing has been implemented from scratch, mostly on the ttf-parser side. And while the parsing algorithm is very different, it's not better or worse, just different.
  • Malformed fonts will cause an error. HarfBuzz uses fallback/dummy shaper in this case.
  • No font size property. Shaping is always using UnitsPerEm. You should scale the result manually.
  • Most of the TrueType and Unicode handling code was moved into separate crates.
  • rustybuzz doesn't interact with any system libraries and must produce exactly the same results on all OS'es and targets.
  • mort table is not supported, since it's deprecated by Apple.
  • No Arabic fallback shaper, since it requires subsetting.
  • No graphite library support.

Performance

At the moment, performance isn't that great. We're 1.5-2x slower than harfbuzz. Also, rustybuzz doesn't support shaping plan caching at the moment.

See benches/README.md for details.

Notes about the port

rustybuzz is not a faithful port.

harfbuzz can roughly be split into 6 parts: shaping, subsetting, TrueType parsing, Unicode routines, custom containers and utilities (harfbuzz doesn't use C++ std) and glue for system/3rd party libraries. While rustybuzz contains only shaping and some TrueType parsing. Most of the TrueType parsing was moved to the ttf-parser. Subseting was removed. Unicode code mostly moved to external crates. We don't need custom containers because Rust's std is good enough. And we do not use any non Rust libraries, so no glue code either.

In the end, we still have around 23 KLOC. While harfbuzz is around 80 KLOC.

Lines of code

As mentioned above, rustybuzz has around 23 KLOC. But this is not strictly true, because there are a lot of auto-generated data tables.

You can find the "real" code size using:

tokei --exclude unicode_norm.rs --exclude complex/vowel_constraints.rs \
      --exclude '*_machine.rs' --exclude '*_table.rs'

Which gives us around 16 KLOC, which is still a lot.

Future work

Since the port is finished, there is not much to do other than syncing it with a new harfbuzz releases. But there are still a lot of room for performance optimizations and refactoring.

Also, despite the fact that harfbuzz has a vast test suite, there are still a lot of things left to test.

Safety

The library is completely safe.

We do have one unsafe to cast between two POD structures. But except that, there are no unsafe in this library and in most of its dependencies (excluding bytemuck).

License

rustybuzz is licensed under the MIT.

harfbuzz is licensed under the Old MIT

Comments
  • API Qs/feedback

    API Qs/feedback

    Hey @RazrFalcon, just looking over your API for use in KAS-text and I have a few comments.

    I notice that Face is a wrapper around ttf-parser::Face, which KAS-text already has an instance of. I can supply the data pointer and face index no problem, but I could also directly supply a &'a Face (or possibly even just store the rustybuzz::Face the embedded ttf-parser::Face could still be accessed). But this isn't really important.

    More significant, KAS-text stores text-runs with a FontId (index in internal list) plus a size (DPU aka pixels per font unit). I see your Face object embeds the size. For my purposes would storing a single Face and adjusting the size before each shaping run be the best option?

    Your GlyphBuffer reports codepoint and cluster. KAS-text needs the text-index of each glyph (required for editing support), and trying to reconstruct this from codepoints is not ideal.

    I see you use the same UnicodeBuffer ←→ GlyphBuffer model as HarfBuzz. I'm not sure how best to cache this type since it is consumed by value in the shape method. Maybe it would be better if UnicodeBuffer and GlyphBuffer were wrappers around Box<Buffer> to avoid large copies? Or does the optimiser avoid this issue anyway?

    opened by dhardy 22
  • Port OpenType Layout.

    Port OpenType Layout.

    This ports all of GSUB and GPOS and uses ttf_parser's APIs for GDEF.

    Notable changes:

    • GDEF, GSUB & GPOS fully in Rust. rb_ot_face_t no longer knows about them.
    • Ports the API exposed by OpenType layout (originally in hb-ot-layout.cc, now in src/ot/layout/mod.rs).
    • Ports a small bit of kerning (from hb-kern.hh) because it depended on rb_ot_apply_context and rb_ot_apply_context::skipping_iterator_t which are stack-allocated and fully in Rust now, so FFI would have been complicated.
    • Removes hb_set_t. This was apparently only used for lookup acceleration (basically collecting the coverage of lookups into sets), which I removed for now.
    • Reorganizes buffer/glyph-info var allocation accessors.
    • Harfbuzz did GDEF blocklisting for some fonts. Since I use ttf_parser's APIs for this, I couldn't port that.

    Note that a lot of the original GSUB/GPOS code was unused from the beginning (originally used for harfbuzz's OpenType API). I removed all of that in the first commit.

    This PR depends on a few changes in ttf_parser. Basically, two helpers to read data at an offset, making the get method on Offsets16 public and DynArray. These changes can be found here. For now, I rewired ttf_parser in rustybuzz's Cargo.toml to that fork. I didn't open a PR there for now because I'm not sure on how you plan to integrate with ttf_parser since the crates.io version doesn't have a public parser module.

    I apologize for the PR being this huge :/

    opened by laurmaedje 21
  • Port normalization.

    Port normalization.

    Okay, I ported the normalization. This one was already a bit trickier because it was more intertwined, especially with the compose/decompose function pointers through FFI.

    Some observations:

    • The existing complex shaper's compose/decompose functions used Rust's bool through FFI, which I think was incorrect. I changed it to rb_bool_t.
    • I had to rename _rb_glyph_info_set_unicode_props to init_unicode_props in Rust because it would have clashed with the existing set_unicode_props. Furthermore, the logic is duplicated in Rust and C because there wasn't any GlyphInfo FFI so far and I didn't think it was worth it.
    • I also had to do a mimimal port of rb_ot_complex_shaper_t to get some properties needed for normalization.

    I think, maybe the main shaping logic could be ported next, but I have to look more closely whether there are still any obstacles that need to be taken care of before.

    opened by laurmaedje 16
  • Port main shaping logic.

    Port main shaping logic.

    This ports the main shaping logic, including the map, shape plan and complex shaper data structures. Basically, the only things that remain in C++ are AAT and kerning.

    Notes:

    • I merged rb_shape_plan_t and rb_ot_shape_plan_t into a single thing because the shape plan was just a thin wrapper around the OpenType shape plan.
    • There is a new aat module, but that's just a thin layer between Rust and C++. The AAT-Map is stack allocated in the shape plan on the Rust side and accessed on the C++ side. I took care that both sides have the same memory size, but I don't know if there are any pitfalls with doing it this way.
    • Shaper-specific data is stored in a Box<dyn Any>. This means that the data has to be 'static. This was mostly not a problem, except for the indic shape plan that had a reference to some lookups in the ot_map. Because the data and map are both stored in the global shape plan, this didn't work anymore (in safe Rust, without pinning or some other trickery), so I replaced the lookup slice with a Range<usize>.
    • I moved fallback shaping and normalization from the ot module into the root module because from what I saw in the shaping logic they didn't seem to be ot-specific.
    • The coords vector in Face wasn't needed anymore since I can now use ttfp_face.variation_coordinates().
    • I removed a lot of pub(crate) that was spread onto everything that touches a Buffer. Since the Buffer is not re-exported, normal pub is fine I think.
    • A lot of the rb_buffer_... functions are probably not needed anymore, but I kept them for now.
    opened by laurmaedje 8
  • Shape failed because  different `glyph_index` from `ttf-parser`

    Shape failed because different `glyph_index` from `ttf-parser`

    In my code, I find a face across Face::glyph_index(ttf-parser) to shape "⚛", but failed. I found those different logic between ttf-parser and rustybuzz.

    rustybuzz face only resolve glyph id by prefered_cmap_encoding_subtable.

        pub(crate) fn glyph_index(&self, c: u32) -> Option<GlyphId> {
            let subtable_idx = self.prefered_cmap_encoding_subtable?;
            let subtable = self.tables().cmap?.subtables.get(subtable_idx)?;
            match subtable.glyph_index(c) {
                Some(gid) => Some(gid),
                None => {
                 ...
                }
            }
        }
    

    And ttf-parser iterate all subtables

        pub fn glyph_index(&self, code_point: char) -> Option<GlyphId> {
            for encoding in self.tables.cmap?.subtables {
                if !encoding.is_unicode() {
                    continue;
                }
    
                if let Some(id) = encoding.glyph_index(u32::from(code_point)) {
                    return Some(id);
                }
            }
    
            None
        }
    

    Is this a bug? A font face contain this glyph, but not use to shape it. Or I should use another way to find suitable font to shape?

    opened by M-Adoo 7
  • Expose UNSAFE_TO_BREAK flag in public API?

    Expose UNSAFE_TO_BREAK flag in public API?

    It would be useful if the UNSAFE_TO_BREAK glyph flag were exposed in the public API so that layout algorithms can take it into account. From my understanding, it can enable line-breaking algorithms to avoid reshaping substrings multiple times or making simplifying incorrect assumptions such as that spaces do not participate in shaping.

    It seems like the simplest approach would be to expose the mask field of GlyphInfo as a bitflags struct, but I wanted to check if this was a good approach before filing a pull request.

    P.S. I'm very excited about this library; great work.

    opened by glowcoil 7
  • Small refactoring

    Small refactoring

    Here are a a few small improvements. I removed the unused buffer functions, used GlyphInfo::{as_char, as_glyph} for less try_from + unwrap, wrapped the AAT and kern functions I missed before and moved the horizontal-to-vertical char mapping from shape.rs to unicode.rs.

    PS: Would you want an aat feature-gate upstream or not? Otherwise, I would just remove AAT from my fork. And do you plan to make a new crates.io release soon or only once AAT is done?

    opened by laurmaedje 7
  • Reaching ttf_parser's face.

    Reaching ttf_parser's face.

    @RazrFalcon I've ported all the GSUB/GPOS-specific subtables (you can have a look here) and am now thinking about how to port the remaining code in the hb-ot-layout-* files. In doing that, I want to use some GDEF-related functions from ttf_parser. But it turns out that hb-ot-layout.h and hb-ot-layout.hh expose a lot of functions, which only take rb_face_t (and not rb_font_t). The ttfp_face, however, is stored in the font and not in the face and thus unreachable.

    Now, I basically see two options:

    • Either somehow move the ttfp_face into the rb_face_t (not that easy because rb_face_t is still in C++),
    • or change a lot of occurences all the way up the call chain to rb_font_t and basically don't make the font/face distiction (which rustybuzz currently doesn't do anyway at the API level).

    What do you think?

    (If you have any other comments on the code I've produced so far, feel also free to review a bit.)

    opened by laurmaedje 7
  • Coordinate with otf-fea-rs?

    Coordinate with otf-fea-rs?

    https://github.com/wrl/otf-fea-rs/ is a Rust-based compiler of FEA into GSUB/GPOS, while https://github.com/RazrFalcon/rustybuzz is a Rust-based OT shaper.

    Perhaps @wrl and @RazrFalcon could somehow coordinate so that the two libraries can interoperate. No specific idea yet but I can well imagine this to be beneficial.

    opened by twardoch 7
  • Port fallback shaper.

    Port fallback shaper.

    Since I'm excited to use this in a full-Rust WASM scenario, I decided to try and tackle one of the things on the roadmap that were marked as easy. I tried making the most straightforward, parallel adaption of the original C++ to make it not too hard to review. (Part of this is that I kept the order of functions as originally, even though I guess in Rust the general -> specific order is more common than the C-compiler-inflicted specific -> general.)

    Some things I weren't totally sure about, I marked these with NOTE(laurmaedje). One particular thing I hope I didn't confuse are the Modified and Canonical combining classes.

    opened by laurmaedje 5
  • Rustybuzz 0.5.2 breaks compatibility with 0.5.1

    Rustybuzz 0.5.2 breaks compatibility with 0.5.1

    Version 0.5.2 includes commit https://github.com/RazrFalcon/rustybuzz/commit/9e65c89c4dad60aa896ec3ed293773ce2a79d448 that bumps the ttf-parser dependency to a new major. Since ttf_parser::Face is publicly re-exported via the Deref impl for rustybyzz::Face, this results in build issues in downstream dependencies.

    I understand that this conceptual problem and that the added module re-export is an attempt at solving this so that applications don't have to depend on ttf-parser with a version and rusty buzz with a possibly incompatible version anymore.

    But meanwhile for any apps that use rustybuzz = "0.5" for example in their Cargo.toml and they end up using any ttf-parser functions, their build likely breaks as they also have ttf-parser = "0.15" in their Cargo.toml and that's now incompatible with 0.17 as per the above linked commit.

    I'd be happy to elaborate on the concrete build issue, in case that's unclear. In terms of remedy I suggest to release 0.5.3 with ttf-parser downgraded again and release 0.6 of rustbyzz with the newer ttf-parser - in case you agree that this was a breaking release.

    opened by tronical 4
  • Allow specifying `item_offset` and `item_length` as in HarfBuzz

    Allow specifying `item_offset` and `item_length` as in HarfBuzz

    HarfBuzz’s hb_buffer_add_codepoints and friends allow specifying an offset and length from the string that was passed in. This allows HarfBuzz to have the full context of the string being shaped in order to perform proper shaping across runs. I’m not sure how feasible it would be for this crate to implement similar functionality, though.

    opened by bluebear94 2
  • README does not explain

    README does not explain "subsetting"

    Hi, I just watched the whole talk by Chris Chapman on the unicode text engine stack, but the term "subsetting" that appears in the rustybuzz README many times was still new to me. It sounded as if it was about settings, but searching the web made me believe that it's from "subsets" of fonts. Yet, the explanations I found were about fonts that contain subsets of glyphs, so it is still not clear to me why that would be a feature of a font shaping library.

    If you give me an explanation, or point me to one, I can suggest a tiny(!) change to the README in a PR.

    opened by hmeine 7
  • Misleading API in `Face::from_face`.

    Misleading API in `Face::from_face`.

    The Face::from_face is defined as:

        /// Creates a new [`Face`] from [`ttf_parser::Face`].
        ///
        /// Data will be referenced, not owned.
        ///
        /// Returns `None` when face's units per EM is `None`.
        pub fn from_face(face: ttf_parser::Face<'a>) -> Option<Self> {
            Some(Face {
                units_per_em: face.units_per_em(),
                pixels_per_em: None,
                points_per_em: None,
                prefered_cmap_encoding_subtable: find_best_cmap_subtable(&face),
                gsub: face.tables().gsub.map(SubstitutionTable::new),
                gpos: face.tables().gpos.map(PositioningTable::new),
                ttfp_face: face,
            })
        }
    

    This function always returns Some and I believe the docs are invalid, as face.units_per_em() returns u16.

    opened by wdanilo 1
  • Test failed due to ambiguous python shebang

    Test failed due to ambiguous python shebang

    *** ERROR: ambiguous python shebang in /usr/share/cargo/registry/rustybuzz-0.5.0/scripts/gen-vowel-constraints.py: #!/usr/bin/env python. Change it to python3 (or python2) explicitly. *** ERROR: ambiguous python shebang in /usr/share/cargo/registry/rustybuzz-0.5.0/scripts/gen-universal-table.py: #!/usr/bin/env python. Change it to python3 (or python2) explicitly. mangling shebang in /usr/share/cargo/registry/rustybuzz-0.5.0/scripts/gen-unicode-norm-table.py from /usr/bin/env python3 to #!/usr/bin/python3 *** ERROR: ambiguous python shebang in /usr/share/cargo/registry/rustybuzz-0.5.0/scripts/gen-tag-table.py: #!/usr/bin/env python. Change it to python3 (or python2) explicitly. *** ERROR: ambiguous python shebang in /usr/share/cargo/registry/rustybuzz-0.5.0/scripts/gen-shaping-tests.py: #!/usr/bin/env python. Change it to python3 (or python2) explicitly. *** ERROR: ambiguous python shebang in /usr/share/cargo/registry/rustybuzz-0.5.0/scripts/gen-indic-table.py: #!/usr/bin/env python. Change it to python3 (or python2) explicitly.

    https://download.copr.fedorainfracloud.org/results/remilauzier/rust-unicode/fedora-rawhide-x86_64/03707746-rust-rustybuzz/builder-live.log.gz

    Fedora need a precise version of python to be declare to know which one to use.

    opened by ghost 2
  • Is it possible to push `char` into `UnicodeBuffer`?

    Is it possible to push `char` into `UnicodeBuffer`?

    I want push some chars into UnicodeBuffer for now, I have to collect into String then use push_str

    however, push_str seems just decode it into chars so I think fn push(ch: char) method can be possible or at least fn push_chars(chars: impl Iterator<Item = char>) but I'm not sure about cluster

    opened by Riey 2
  • Using `Face` with multiple font sizes

    Using `Face` with multiple font sizes

    I'm storing a Face in an Arc that is shared across multiple threads. When I need to perform shaping, I call Face::clone and set the font size on the cloned Face. This clone is not entirely free because Face contains a Vec.

    I would like for shape to take an additional argument that contains all of the mutable properties of a Face so that Face can be treated as immutable after construction. E.g.

    struct FaceConfig {
        pub pixels_per_em: Option<(u16, u16)>,
        pub points_per_em: Option<f32>,
    }
    
    pub fn shape(face: &Face, config: &FaceConfig, features: &[Feature], buffer: UnicodeBuffer) -> GlyphBuffer {
    
    opened by mahkoh 2
Owner
Evgeniy Reizner
Evgeniy Reizner
A toy ray tracer in Rust

tray_rust - A Toy Ray Tracer in Rust tray_rust is a toy physically based ray tracer built off of the techniques discussed in Physically Based Renderin

Will Usher 492 Dec 19, 2022
A low-overhead Vulkan-like GPU API for Rust.

Getting Started | Documentation | Blog gfx-rs gfx-rs is a low-level, cross-platform graphics and compute abstraction library in Rust. It consists of t

Rust Graphics Mages 5.2k Jan 8, 2023
An OpenGL function pointer loader for Rust

gl-rs Overview This repository contains the necessary building blocks for OpenGL wrapper libraries. For more information on each crate, see their resp

Brendan Zabarauskas 621 Dec 17, 2022
Safe OpenGL wrapper for the Rust language.

glium Note to current and future Glium users: Glium is no longer actively developed by its original author. That said, PRs are still welcome and maint

null 3.1k Jan 1, 2023
GLFW3 bindings and idiomatic wrapper for Rust.

glfw-rs GLFW bindings and wrapper for The Rust Programming Language. Example extern crate glfw; use glfw::{Action, Context, Key}; fn main() { le

PistonDevelopers 546 Jan 3, 2023
Safe and rich Rust wrapper around the Vulkan API

Vulkano See also vulkano.rs. Vulkano is a Rust wrapper around the Vulkan graphics API. It follows the Rust philosophy, which is that as long as you do

null 3.6k Jan 3, 2023
A vector graphics renderer using OpenGL with a Rust & C API.

bufro A vector graphics renderer using OpenGL with a Rust & C API. A Rust example can be found in examples/quickstart.rs (using glutin). A C example c

Aspect 9 Dec 15, 2022
Graph data structure library for Rust.

petgraph Graph data structure library. Supports Rust 1.41 and later. Please read the API documentation here Crate feature flags: graphmap (default) en

null 2k Jan 9, 2023
A graph library for Rust.

Gamma A graph library for Rust. Gamma provides primitives and traversals for working with graphs. It is based on ideas presented in A Minimal Graph AP

Metamolecular, LLC 122 Dec 29, 2022
Simple but powerful graph library for Rust

Graphlib Graphlib is a simple and powerful Rust graph library. This library attempts to provide a generic api for building, mutating and iterating ove

Purple Protocol 177 Nov 22, 2022
Kiss3d - Keep it simple, stupid 3d graphics engine for Rust.

Kiss3d - Keep it simple, stupid 3d graphics engine for Rust.

Sébastien Crozet 1.2k Dec 26, 2022
A cool, fast maze generator and solver written in Rust

MazeCruncher Welcome to maze cruncher! Download Standalone Here Usage To get started, just run the standalone .exe in target/release or compile and ru

null 69 Sep 20, 2022
ASCII 3D-renderer using Ray Marching technique written in Rust with NCurses

pistol ASCII renderer using Ray Marching technique written in Rust ?? with NCurses. This project is a giga-chad compared to my previous attempt to wri

Eugene Sokolov 5 Dec 10, 2022
A high-performance SVG renderer, powered by Rust based resvg and napi-rs.

resvg-js resvg-js is a high-performance SVG renderer, powered by Rust based resvg and napi-rs. Fast, safe and zero dependencies! No need for node-gyp

一丝 744 Jan 7, 2023
A little cross-platform graphics engine written in rust.

Bismuth This is a version of my C++ graphics engine named Bismuth re-written with Rust. My goal is to learn more about the Rust language and make my g

Admiral サイタマ 1 Nov 1, 2021
The library provides basic functions to work with Graphviz dot lang from rust code.

Description The library provides the basic access to the graphs in graphviz format with ability to import into or export from it. Base examples: Parse

Boris 28 Dec 16, 2022
🦀 Rust Graph Routing runtime for Apollo Federation 🚀

Apollo Router The Apollo Router is a configurable, high-performance graph router for a federated graph. Getting started Follow the quickstart tutorial

Apollo GraphQL 502 Jan 8, 2023
Rust bindings to bgfx, a cross-platform, graphics API agnostic

Rust bindings to bgfx, a cross-platform, graphics API agnostic, "Bring Your Own Engine/Framework" style rendering library.

Daniel Collin 65 Dec 24, 2022
Graph API client writen in Rust

graph-rs Now available on stable Rust at crates.io graph-rs-sdk = "0.1.0" 0.1.0 and above use stable Rust. Anything before 0.1.0 uses nightly Rust. M

Sean Reeise 56 Jan 3, 2023