Building an AsciiDoc toolchain in Rust

I think the phrase Make it work, make it right, make it fast [1] is somewhat fitting for describing the approach I took to build an AsciiDoc toolchain. Eighteen months ago or so (I think?), I started building acdc, an AsciiDoc parser and toolchain in Rust.

Note

If you’re wondering about "how hard can this be" when you read eighteen months, the work was done in my spare time, around a full-time job and a family I very much enjoy spending time with.

When I started, I had a few ideas but my priorities, in order, were:

  1. "make it work" (parse simple documents and convert them to html)

  2. "make it right" (parse and convert 99% of documents I could find to html)

  3. "make it fast" (could not lag behind Asciidoctor and for large and hundreds of documents be much faster - I’ll write more about speed and performance in a different blog post).

I ended up switching parser libraries, almost wrote my own parser library (still thinking about it), and build an entire toolchain as a result: a parser and multiple converters (HTML, markdown, terminal, manpage).

More recently, I also played around with a couple of proof of concepts:

  • an LSP server, which I use from Emacs

  • a live editor in the browser, in WASM

These two were mostly built using Claude Code, and even though I have verified their code, and use them daily, they were mostly vibe coded.

In this post I want to talk mostly about the first two priorities: making it work, and making it right. Before I do that though, worth a brief intro to the AsciiDoc language.

What is AsciiDoc?

If you’ve used Markdown, you already understand and know the appeal of writing structured content in plain text. AsciiDoc is basically the same but it gives you a lot more than Markdown, things like:

  • footnotes

  • admonitions

  • cross-references

  • includes

  • conditional directives,

  • table of contents

And a bunch more How does AsciiDoc compare with Markdown?. In my opinion it also has less fragmentation than all the Markdown flavours. And yes, this blog post (and this blog in general) is written in AsciiDoc [2].

Funnily enough, AsciiDoc is slightly older than Markdown but Markdown gained more traction early on and the rest is history. I still very much prefer AsciiDoc (and if it was up to me would move everything to AsciiDoc instead).

The tooling for AsciiDoc has been dominated by Asciidoctor, which is a mature and excellent Ruby implementation. In recent years, the Eclipse Foundation has also been working on formalising the AsciiDoc Language spec, which is still a draft but is evolving. There’s also an AsciiDoc TCK (Technology Compatibility Kit) for parser compliance, and a growing community of implementers (yours truly included).

Now, when I started, I felt I didn’t have a modern, user friendly AsciiDoc toolchain. And I wanted one. I truly wanted something that could go a step further than Asciidoctor. One of my key goals for example was to have much better error reporting. As a rust developer, I’m used to very high quality error reporting.

Goals for acdc

I had a few goals when I started:

  1. Build it in Rust: I really wanted a parser with all the safety guarantees Rust provides. And I also wanted an excuse to write more Rust.

  2. Experiment with parser generators: I’d been wanting to dig into PEG-based parsing for a while, and AsciiDoc seemed like a great (if challenging) target. I’ll say more about my missteps below.

  3. Follow the draft spec: I wanted to track the AsciiDoc Language as much as possible and provide implementation feedback as the spec evolves.

  4. Provide high-quality errors: If your parser just says "parse error at line 42", that’s not helpful. I wanted source locations on every AST (Abstract Syntax Tree) node, contextual error messages, and advice on how to fix things. Funnily enough the spec does state (mostly) everything should have a location/position.

  5. Be TCK compliant: At every given point in time, the parser should pass the official AsciiDoc TCK tests. The TCK tests are very basic as of this writing so I built more than 1500 tests into acdc fixtures.

Making it work

Starting with pest

I started with pest, a PEG parser generator for Rust. pest uses external .pest grammar files with a clean, declarative syntax. It’s popular, well-documented, and for block-level parsing it worked quite well.

Here’s what a section looked like in the pest grammar language, it’s fairly straightforward:

section = {
    (anchor | attribute_list | blocktitle)* ~
    section_header_start ~ section_title ~ NEWLINE{2} ~
    section_content*
}

section_level = { section_level_symbol{1,6} }
section_title = { inlines_inner }

And a paragraph:

paragraph = {
    admonition_node? ~ inlines ~ &(NEWLINE+ | EOI)
}

It kind of worked. I had seven .pest grammar files (asciidoc, block, core, delimited, document, inlines, list) covering all the block-level constructs. The grammar mapped nicely to what I expected AsciiDoc to look like, and pest's built-in error reporting was decent enough to get started.

What a fool I was at the time to think this was going to be a breeze. Things started getting a lot more fun difficult once I hit inline parsing. Ooof.

Attacking inline parsing (this was like being slapped in the face)

AsciiDoc inline parsing is, to put it diplomatically, non-trivial.

Consider the string +*bold _and italic_ text*+. In AsciiDoc, this is bold text containing italic text, all inside an inline passthrough. Now consider **un**constrained versus *constrained*, two different syntaxes with subtly different boundary rules. An asterisk at the start of a word means constrained bold (needs word boundaries). Double asterisks mean unconstrained bold (no boundary requirements). The same distinction exists for every inline formatting type: italic, monospace, highlight, etc.

Here’s what the pest grammar for inline formatting looked like at the start:

bold_text_unconstrained = { PUSH("**") ~ (!"**" ~ ANY)+ ~ POP }
bold_text = { PUSH("*") ~ (!"*" ~ ANY)+ ~ (!"**" ~ POP) }
italic_text_unconstrained = { PUSH("__") ~ (!"__" ~ ANY)+ ~ POP }
italic_text = { PUSH("_") ~ (!"_" ~ ANY)+ ~ (!"__" ~ POP) }

Looks reasonable I think. Except it doesn’t handle nesting. It doesn’t handle constrained boundary rules (what characters are allowed before and after the delimiter). It doesn’t handle the interaction between passthrough content and formatted content. And it doesn’t handle the fact that AsciiDoc’s inline substitution rules have a defined order that must be respected [3].

I spent so much time trying to make pest work in an elegant way for inline parsing. I kept adding more rules, more lookaheads, more special cases. At some point, I committed what I called the "initial inline preprocessor", with the commit message: "this is a mess". Eeeek. I was trying to add a pre-processing pass onto pest's grammar to handle the substitution ordering, and was basically struggling.

The problem for me was that pest's grammar model didn’t give me enough control over the parsing process. I couldn’t call Rust functions from within grammar rules. I couldn’t maintain state across parsing boundaries (ok, if any pest guru is reading, you can actually do this but isn’t pretty). And the boundary rules for constrained versus unconstrained markup required context-sensitive decisions that a pure PEG grammar in pest's model couldn’t express very well.

I started reading about other crates, until, an Oxide and Friends episode (I believe that’s the correct one), someone talked about rust-peg. I took a look and a few weeks later I thought I had a path and put a commit titled "trying the peg crate for inline preprocessing".

Coincidentally, at that time, location tracking also started getting in the way with pest, and rust-peg's approach made it all that much easier. Anyway…​

At that point I had a rough sketch in my mind of what and how I wanted to bring everything together but required swapping pest with rust-peg.

Making it right

Switching to rust-peg

rust-peg is a different kind of PEG parser. Instead of external grammar files, you write the grammar directly in Rust using the peg::parser! macro. This sounds like a minor difference but it requires a lot of adjustment, especially coming from pest. You can call Rust functions from within grammar rules. You can pass mutable state through the parser. You can make context-sensitive decisions at parse time. Yay!

Here’s what that same bold text parsing looks like (roughly) in rust-peg:

rule bold_text_unconstrained() -> InlineNode
    = attrs:inline_attributes()?
      start:position()
      "**"
      content_start:position()
      content:$((!(eol() / ![_] / "**") [_])+)
      "**"
      end:position!()
{?
    // Call Rust functions to process attributes, recurse into
    // nested inline content, build typed AST nodes
    let content = process_inlines(
        state, &bm, &content_start, end - 2,
        state.inline_ctx.offset, content,
    )?;
    Ok(InlineNode::BoldText(Bold {
        content,
        role,
        id,
        form: Form::Unconstrained,
        location: state.create_block_location(...),
    }))
}

The {? …​ } block allows the magic to happen. It lets me run arbitrary Rust code as part of the grammar rule, including recursively processing the content for nested inline markup. And the state parameter gives me access to parser state, inline context, and document attributes throughout the parse.

For constrained bold, where boundary rules matter, I can call functions like check_constrained_opening_boundary and check_constrained_closing_at_end directly from within the grammar rule, rejecting invalid matches at parse time rather than trying to nail those constraints purely in grammar notation.

Because now everything is just rust code, it’s easier (imo) to parse inline properly, following the two pass approach the spec describes.

A note on the two-pass approach

Inline processing works in two passes, and I tried to follow SDR-005:

  1. A preprocessing pass that applies substitutions in the spec-defined order (special characters, attributes, macros, etc.)

  2. A parsing pass that converts the preprocessed content into typed InlineNode elements

This correctly handles the full complexity of AsciiDoc inline markup: nested formatting, attribute references within inline elements, passthrough content, and special character substitutions.

The final removal of pest

The switch wasn’t fun to pull off. I think I ran both parsers in parallel, migrating piece by piece, and that took a few months of my spare time. I then did the switch and deleted pest entirely from the codebase. That definitely felt good.

Warning

One criticism I have of rust-peg is that the whole parser has to be implemented in the same file. See issue #25 and PR #181. For a grammar to be able to parse the complexity of AsciiDoc, it becomes quite unwieldy.

Thinking of compatibility and the ecosystem

I want to talk about how much compatibility the parser has achieved, because I think it’s the part of this project I’m most proud of.

The parser currently has hundreds (600+) fixture tests, each one an AsciiDoc input file paired with its expected AST output, plus hundreds of unit tests in the parser itself. These cover everything from the basics (sections, paragraphs, lists, tables, images, videos, audio) to the stuff that is much harder: constrained and unconstrained inline formatting with boundary rules, nested formatting in passthrough blocks, description lists with complex terms and continuations, cross-references with custom text, callout lists, index terms, footnotes with anchor references, book parts, stem notation, curved quotes, and a whole lot more.

All tests pass against the AsciiDoc TCK although, quite frankly, that’s nothing to shout from the rooftops as the current TCK is not super comprehensive. I also restructured the AST to comply with the Abstract Semantic Graph (ASG) format, so the parser’s output aligns with what the spec expects. In fact it had to to pass the TCK tests.

Error reporting is pretty decent and I think much better than asciidoctor, at least where I’ve seen errors come through. Every AST node tracks its source location (file, line, column, byte offset range). Errors include contextual advice as well and I used the wonderful miette library to make those pretty. Take a look below:

High quality error by acdc

The toolchain as of today

The project is now a proper toolchain:

  • acdc-parser: the core parser, published on crates.io. Handles the full AsciiDoc-to-ASG pipeline.

  • acdc: a command-line tool for converting AsciiDoc documents, with five converters:

    • HTML (with optional syntax highlighting)

    • Semantic HTML5

    • terminal (with ANSI colours)

    • manpage (native roff/troff)

    • Markdown (CommonMark and GitHub Flavored, with graceful degradation where Markdown can’t support AsciiDoc)

  • AsciiDoc playground: a live editor for writing and previewing AsciiDoc in the browser, powered by the parser compiled to WASM.

  • An LSP server: diagnostics, go-to-definition, hover, completion, rename, semantic tokens, and more. This one was basically vibe coded, and I’ve been upfront about that but I use it every day and it works well enough so added here as well.

There’s lots to be done but I’m generally happy at this point.

What have I learned?

A few things that stick with me after this:

Think about the tools in order to avoid regret, but either way just have fun. I spent six months with pest before accepting it wasn’t the right fit. That’s a lot of sunk cost (ugh). But the second attempt with rust-peg was dramatically better precisely because I understood the problem quite well due to the time spent with the grammar. I basically don’t quite feel I fully started over, more that I was re-shaping and doing so well informed.

Alignment to the spec is worth the effort. Using PEG based libraries because the spec hints at PEG felt like the right choice. It means I can read a spec rule and pretty much find or create a corresponding parser rule quickly. And when the spec evolves, updating the parser is likely to be more mechanical than a deep re-architecture I need to do.

"Make it work, make it right" protects from premature optimisation. I deliberately chose correctness and spec compliance over performance. The parser’s performance was fine and it is much much better now, but what matters more is that it’s correct (within how much I’ve read and understood the docs), it tracks the spec, it has comprehensive error reporting, and it passes the TCK. That’s in my view a much better starting point than going for "make it fast". Somewhat obvious but I feel good about having followed this.

I’ll do another post on how I’ve made it fast (I’m still working on it). Stay tuned.

Appendix A: How does AsciiDoc compare with Markdown?

I don’t want to make a "Markdown bad, AsciiDoc good" argument or comparison. And hopefully people don’t take this section that way. I do still write Markdown daily. And with the advent of LLMs, Markdown is here to stay.

In my view, Markdown’s simplicity is its greatest strength. For a README, a quick note, a chat message, Markdown is totally fine.

Is is clear though that there’s something to be said about tools like Slack and GitHub having their own flavour of Markdown. Not clear how we got here but my take is that Markdown is too simple. The moment you need footnotes, admonitions, include directives, or nested content with semantic meaning, Markdown’s simplicity kinda gets in your way. You basically end up reaching for extensions on a given platform (GitHub Flavored Markdown, MDX, Slack mrkdown, etc.) and each one just ends up annoying everyone that works in more than one of those platforms.

On top of that it also fragments the ecosystem every time a new flavour comes along.

AsciiDoc though, trades some of that simplicity for more capabilities. It’s arguably more verbose, yes, but not by much. But it’s also more expressive and powerful.

If you want a list of most of the AsciiDoc features, I put one in the parser readme. You can also just take a peek at my online editor for a quick tour of some of the stuff it can do.

Also, did you know GitHub supports rendering README.adoc, not just README.md? Now you do.

For what is worth, neither AsciiDoc or Markdown have a well defined spec for implementers but both are trying to with AsciiDoc Language and CommonMark Spec.


AsciiDoc® and AsciiDoc Language™ are trademarks of the Eclipse Foundation, Inc.


1. The quote Make it work, make it right, make it fast is usually attributed to Kent Beck I believe.
2. I use acdc to parse and render it.
3. The AsciiDoc Language spec defines this in SDR-005.
~ fin ~