Translation support for mdbook. The plugins here give you a structured way to maintain a translated book.

Overview

Gettext Translation Support for mdbook

Visit crates.io Build workflow GitHub contributors GitHub stars

The plugins here makes it easy to translate documentation written in mdbook into multiple languages. Support for translations is a long-stading feature request for mdbook.

They've been used successfully to translate Comprehensive Rust 🦀 .

Installation

Run

$ cargo install mdbook-i18n-helpers

to install the two binaries in this repository:

  • mdbook-xgettext: This program extracts the source text. It is an mdbook renderer.
  • mdbook-gettext: This program translates the book into a target language. It is an mdbook preprocessor.

Together, the two programs makes it possible to do i18n for mdbook in a standard and maintainable way.

Gettext Overview

We use the Gettext system for translations. This system is widely used for translations of open source software and it also works reasonably well for documentation.

The advantage of Gettext is that you get a structured way to approach the translations. Instead of copying Markdown files and tracking changes by hand, you modify .po files in a po/ directory. The .po files are small text-based translation databases. You update the .po files using tools (described below) and you can see at a glance how much text still needs to be translated.

Tip: You should never edit the .po files by hand. Instead use a PO editor, such as Poedit. There are also several online editors available. This will ensure that the file is encoded correctly.

There is a .po file for each language. They are named after the ISO 639 language codes: Danish would go into po/da.po, Korean would go into po/ko.po, etc. The .po files contain all the source text plus the translations. They are initialized from a messages.pot file (a PO template) which contains the extracted source text from your mdbook project.

If your source files are in English, then the messages.pot file will contain the English text and your translators will be translating from English into their target language.

We will show how to update and manipulate the .po and .pot files using the GNU Gettext utilities below.

Creating and Updating Translations

First, you need to know how to update the .pot and .po files.

As a general rule, you should never touch the auto-generated po/messages.pot file. You should not even check it into your repository since it can be fully generated from your source Markdown files.

You should also never edit the msgid entries in a po/xx.po file. If you find mistakes, you need to update the original text instead. The fixes to the original text will flow into the .po files the next time the translators update them.

Generating the PO Template

To extract the original text and generate a messages.pot file, you run mdbook with the mdbook-xgettext renderer:

$ MDBOOK_OUTPUT='{"xgettext": {"pot-file": "messages.pot"}}' \
  mdbook build -d po

You will find the generated POT file as po/messages.pot.

Initialize a New Translation

To start a new translation for a fictional xx locale, first generate the po/messages.pot file. Then use msginit to create a xx.po file:

$ msginit -i po/messages.pot -l xx -o po/xx.po

You can also simply copy po/messages.pot to po/xx.po if you don't have msginit from the GNU Gettext tools available. If you do that, then you have to update the header (the first entry with msgid "") manually to the correct language.

Tip: You can use the cloud-translate tool to quickly machine-translate a new translation. Untranslated entries will be sent through GCP Cloud Translate. Some of the translations will be wrong after this, so you must inspect them by hand afterwards.

Updating an Existing Translation

As the source text changes, translations gradually become outdated. To update the po/xx.po file with new messages, first extract the source text into a po/messages.pot template file. Then run

$ msgmerge --update po/xx.po po/messages.pot

Unchanged messages will stay intact, deleted messages are marked as old, and updated messages are marked "fuzzy". A fuzzy entry will reuse the previous translation: you should then go over it and update it as necessary before you remove the fuzzy marker.

Using Translations

This will show you how to use the translations to generate localized HTML output.

Note: mdbook-gettext will use the original untranslated text for all entries marked as "fuzzy" (visible as "Needs work" in Poedit). This is especially important when using cloud-translate for initial translation as all entries will be marked as "fuzzy".

If your text isn't translated, double-check that you have removed all "fuzzy" flags from your xx.po file.

Building a Translated Book

The translation is done using the mdbook-gettext preprocessor. Enable it in your project by adding this snippet to your book.toml file:

[preprocessor.gettext]
after = ["links"]

This will run mdbook-gettext on the source after things like {{ #include }} has been executed. This makes it possible to translate included source code.

You can leave mdbook-gettext enabled: if no language is set or if it cannot find the .po file corresponding to the language (e.g., it cannot find po/en.po for English), then it will return the book untranslated.

To use the po/xx.po file for your output, you simply set book.language to xx. You can do this on the command line:

$ MDBOOK_BOOK__LANGUAGE=xx mdbook build -d book/xx

This will set the book's language to xx and store the generated files in book/xx.

Serving a Translated Book

Like normal, you can use mdbook serve to view your translation as you work on it. You use the same command as with mdbook build above:

$ MDBOOK_BOOK__LANGUAGE=xx mdbook serve -d book/xx

To automatically reload the book when you change the po/xx.po file, add this to your book.toml file:

[build]
extra-watch-dirs = ["po"]

Publishing Translations with GitHub Actions

Please see the publish.yml workflow in the Comprehensive Rust 🦀 repository.

Contact

For questions or comments, please contact Martin Geisler or start a discussion. We would love to hear from you.


This is not an officially supported Google product.

Comments
  • Update all dependencies

    Update all dependencies

    I first updated toml to 0.7.3, but this caused

    error[E0631]: type mismatch in function arguments
        --> src/bin/mdbook-gettext.rs:80:45
         |
    80   |     let po_dir = cfg.get("po-dir").and_then(Value::as_str).unwrap_or("po");
         |                                    -------- ^^^^^^^^^^^^^
         |                                    |        |
         |                                    |        expected due to this
         |                                    |        found signature defined here
         |                                    required by a bound introduced by this call
         |
         = note: expected function signature `fn(&toml::value::Value) -> _`
                    found function signature `for<'a> fn(&'a toml::Value) -> _`
    note: required by a bound in `Option::<T>::and_then`
    

    It turned out I could remove the explicit dependency completely, though.

    opened by mgeisler 0
  • Update `polib` dependency to version 0.2.0

    Update `polib` dependency to version 0.2.0

    The API changed a little from version 0.1.0.

    The new release includes https://github.com/BrettDong/polib/pull/1, which means we can simplify the code that writes a new PO file.

    opened by mgeisler 0
  • Add support for only publishing a language if it more than NN% translated

    Add support for only publishing a language if it more than NN% translated

    When the source material keeps changing, the translations will naturally lag behind. In that case, it could be nice to only publish a new version if it mostly up-to-date, meaning it is more than NN% translated.

    This kind of functionality can be built today by looking at the output of msgfmt since it shows the number of translated and untranslated messages. We should try to package the functionality in a reusable fashion.

    enhancement 
    opened by mgeisler 0
  • Add tool which can build translations for all available languages

    Add tool which can build translations for all available languages

    The mdbook-gettext preprocessor translates a book into a single language. To translate your book into all available languages, you need to build a look yourself. An example of this can be found in the publish.yml GitHub action for Comprehensive Rust 🦀:

          - name: Build all translations
            run: |
              for po_lang in ${{ env.LANGUAGES }}; do
                  echo "::group::Building $po_lang translation"
                  MDBOOK_BOOK__LANGUAGE=$po_lang \
                  MDBOOK_OUTPUT__HTML__SITE_URL=/comprehensive-rust/$po_lang/ \
                  mdbook build -d book/$po_lang
                  echo "::endgroup::"
              done
    

    We should make this easier somehow. Idea:

    • Build a small command line tool (perhaps called mdbook-i18n) where mdbook-i18n build would do the looping seen above.
    enhancement 
    opened by mgeisler 2
  • Package a language selector

    Package a language selector

    Currently, you have to edit the mdbook theme directly to add a language selector. An example of this can be seen in Comprehensive Rust 🦀.

    We should package this up in some way to make it easy for people to apply. This seems non-trivial because the templating system used in mdbook doesn't seem to make it easy to include new blocks of code without editing the main theme.

    Some ideas:

    • Inject JavaScript into the pages and let this code build the menu client-side.
    • Write a tool which can modify the generated HTML to include the menu when mdbook build is called.
    enhancement 
    opened by mgeisler 0
  • Normalize Markdown in `.pot` files

    Normalize Markdown in `.pot` files

    When mdbook-xgettext extracts translatable text, it would be great if it could normalize the strings. This would make it possible for us to reformat the entire course without fearing that the translations get destroyed while doing so.

    The normalization would take Markdown like this

    # This is a heading
    
    This is another heading
    =======================
    
    A _little_
    paragraph.
    
    ```rust,editable
    fn main() {
        println!("Hello world!");
    }
    ```
    
    * First
    * Second
    

    and turn it into these messages in the .pot file:

    • "This is a heading" (atx heading is stripped)
    • "This is another heading" (setext heading is stripped)
    • "A _little_ paragraph." (soft-wrapped lines are unfolded)
    • "fn main() {\n println!("Hello world!");\n}" (info string is stripped, we should instead use a #, flag)
    • "First" (bullet point extracted individually)
    • "Second"

    Like in google/comprehensive-rust#318, we should do this in a step-by-step fashion and make sure to apply the transformations to the existing translations. It would also be good if we have a way to let translators update their not-yet-submitted translations.

    opened by mgeisler 9
  • Add scripts for using and updating translations

    Add scripts for using and updating translations

    The instructions for our translation pipeline in TRANSLATIONS.md don't match the actual steps in publish.yml.

    The difference is mostly because of how the GitHub Actions allows us to set environment variables using a different syntax.

    I would like to unify the two via small scripts. The scripts could be shell scripts (though that probably doesn't work well on Windows?) or they could be Rust "scripts" (more setup time).

    I'm imagining something like

    • build-translation which takes a xx locale and outputs a book in book/xx.
    • update-translation which runs both mdbook-xgettext and msgmerge for you.

    A serve-translation would probably also be nice to have.

    Instead of several scripts, a single script with subcommands could also be nice. That could probably live nicely in the i18n-helpers project since it would be tightly coupled to the other binaries there.

    good first issue 
    opened by mgeisler 0
Releases(0.1.0)
Owner
Google
Google ❤️ Open Source
Google
Sleek is a CLI tool for formatting SQL. It helps you maintain a consistent style across your SQL code, enhancing readability and productivity.

Sleek: SQL Formatter ✨ Sleek is a CLI tool for formatting SQL. It helps you maintain a consistent style across your SQL code, enhancing readability an

Nick Rempel 40 Apr 20, 2023
Catch Tailwindcss Errors at Compile-Time Before They Catch You, without making any change to your code! Supports overriding, extending, custom classes, custom modifiers, Plugins and many more 🚀🔥🦀

twust Twust is a powerful static checker in rust for TailwindCSS class names at compile-time. Table of Contents Overview Installation Usage Statement

null 15 Nov 8, 2023
Here we will show you how to build a simple parser.

A Rustic invitation to parsing: Calculator demo This is the code repository that accompanies the Rustic invitation to parsing blog post. It provides a

EqualTo 5 Apr 25, 2023
mdBook is a utility to create modern online books from Markdown files.

Create book from markdown files. Like Gitbook but implemented in Rust

The Rust Programming Language 11.6k Jan 4, 2023
An mdBook backend to output Typst markup, pdf, png, or svg

mdbook-typst mdbook-typst is a backend for mdBook. The backend converts the book to Typst markup and can output any format Typst can (currently pdf, p

Christian Legnitto 18 Dec 16, 2023
Simple CLI to (add, delete, update, create) i18n translation file 🔤 🦀

, Inrs Simple CLI to (add, delete, update, create) i18n translation file Copyright (C) 2020-2022 TheAwiteb https://github.com/TheAwiteb/inrs This pr

TheAwiteb 4 Oct 4, 2022
Rust translation of biryani-cli by Dev Ashar

Veg Biryani This is a cheap remake of the awesome project by Dev Ashar in Rust. The original project's description: Biryani is a tool created to manag

Suryansh 2 Dec 10, 2022
Valq - macros for querying and extracting value from structured data by JavaScript-like syntax

valq   valq provides a macro for querying and extracting value from structured data in very concise manner, like the JavaScript syntax. Look & Feel: u

Takumi Fujiwara 24 Dec 21, 2022
A program that provides LLMs with the ability to complete complex tasks using plugins.

SmartGPT SmartGPT is an experimental program meant to provide LLMs (particularly GPT-3.5 and GPT-4) with the ability to complete complex tasks without

Corman 8 Apr 19, 2023
Log-structured, transactional virtual block device backed by S3

mvps Log-structured, transactional virtual block device compatible with the NBD protocol. mvps stands for "multi-versioned page store". MVPS can store

Heyang Zhou 3 Dec 3, 2023
Here are my Advent of Code solutions for 2022.

advent-of-code-2022 This repository contains my Advent of Code solutions for 2022. I am doing them in Ruby, Crystal, and Rust, and as I release each d

Kirk Haines 4 Dec 31, 2022
barretenburg in rust (here we go)

barustenberg ⚠️ Work in progress ⚠️ Outline Crates Usage Testing the Project Benchmarking the Project Setting-up barustenberg-wasm Contributing Gettin

c r 9 May 3, 2023
Rust For Data book

Rust For Data This book is available for free online at https://rustfordata.com You can find the source code for book in ./rust4data-book This book is

Pedram Navid 18 Jun 25, 2023
booky is a minimalstic Tui tool for managing your growing book collection.

booky booky is a minimalistic TUI tool for managing your growing book collection. It is writtin in Rust and uses diesel as it's orm together with sqli

null 3 Jul 21, 2023
fas stand for Find all stuff and it's a go app that simplify the find command and allow you to easily search everything you nedd

fas fas stands for Find all stuff and it's a rust app that simplify the find command and allow you to easily search everything you need. Note: current

M4jrT0m 1 Dec 24, 2021
Shellfirm - Intercept any risky patterns (default or defined by you) and prompt you a small challenge for double verification

shellfirm Opppppsss you did it again? ?? ?? ?? Protect yourself from yourself! rm -rf * git reset --hard before saving? kubectl delete ns which going

elad 652 Dec 29, 2022
Dura - You shouldn't ever lose your work if you're using Git

Dura Dura is a background process that watches your Git repositories and commits your uncommitted changes without impacting HEAD, the current branch,

Tim Kellogg 4.1k Jan 8, 2023
zigfi is an open-source stocks, commodities and cryptocurrencies price monitoring CLI app, written fully in Rust, where you can organize assets you're watching easily into watchlists for easy access on your terminal.

zigfi zigfi is an open-source stocks, commodities and cryptocurrencies price monitoring CLI app, written fully in Rust, where you can organize assets

Aldrin Zigmund Cortez Velasco 18 Oct 24, 2022
This automatically patches the RoPro extension for you, allowing you to have pro_tier for free.

RoPro Patcher This automatically patches the RoPro extension for you, allowing you to have pro_tier for free. NOTE Chrome, Brave (and possibly other b

Stefan 10 Jan 1, 2023