Adventures in making this website:
rendering LaTeX maths

Published on 2022-11-28.

Making and managing this personal website has been an adventure. In this series, I go over the technical challenges I’ve encountered and philosophical decisions I’ve made, and I review some of the tools I’ve used along the way. After part 1 and part 2, this is part 3.

In late 2020, I decided to start a knowledge base, where I wanted to upload some of the physics notes I’d made for myself. Those notes were in LaTeX, which is the lingua franca for writing maths on computers, but unfortunately the web can’t display it out-of-the-box, so some work is needed. This post documents the wild journey I was sent on by a simple misunderstanding, leading me to my current (quite decent) maths-rendering solution.

Phase 0: the easy way

The most famous solution is perhaps MathJax, a JavaScript package that locally renders all LaTeX maths once the page has loaded. Enabling it for your website is easy: just put the following code (copy-pasted from the docs) into the <head> of your HTML template:

<script>
MathJax = {
  tex: {
    inlineMath: [['$', '$'], ['\\(', '\\)']]
  },
  svg: {
    fontCache: 'global'
  }
};
</script>
<script type="text/javascript" id="MathJax-script" async
  src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js">
</script>

And then enclose your LaTeX in $ or $$ for inline or display-mode maths, respectively. In theory, this should work for all static site generators (SSGs) as long as they preserve the $-symbols into the final HTML files. This should be the case for Zola, the SSG I originally built my site in.

For the sake of completeness, I should point out that tex-svg.js isn’t the only available flavour of MathJax; they’re all listed here, where tex and mml refer to the input formats LaTeX and MathML, while svg and chtml are the output formats: scalable vector graphics, or a bunch of inline-styled <span>s called CommonHTML, respectively. The full files enable all LaTeX extensions offered by MathJax, whereas the others fetch those on demand.

And… that’s it! If you enjoyed this guide, please like and subscribe or something like that!

Phase 1: pandoc’s red herring

… at least, that’s what I would’ve done, if I hadn’t been led astray while researching this.

You see, during my research, one of the options I explored was pandoc, a fairly well-known tool to convert documents between markup formats. Pandoc recognizes LaTeX maths, and transfers it to HTML for rendering by e.g. MathJax. From the manual (emphasis mine):

--mathjax: Use MathJax to display embedded TeX math in HTML output. TeX math will be put between \(...\) (for inline math) or \[...\] (for display math) and wrapped in <span> tags with class math. Then the MathJax JavaScript will render it.

When I first read that, I misunderstood the part about <span> tags: I thought they were needed for MathJax to render the formulas. This is false: pandoc only adds those tags for convenience when you write CSS rules. But it was too late: this misunderstanding, and the fact that few SSGs’ manuals explicitly mention maths, left me thinking that my options were very limited.

So I migrated to Hugo, since it supports pandoc’s Markdown dialect as a content format, which it processes by passing it through the pandoc executable. It even gives --mathjax as argument, so I could be sure this would work. And indeed it did! This is a legitimate solution, which I arrived at for the wrong reasons. There’s a small catch too: Hugo prides itself for its speed, but passing all content through an external program slows it down a lot… Still, my 200-page website could be built in under 10 seconds, so it wasn’t bad.

Phase 2: screw JavaScript

MathJax is pretty unwieldy: first, the 1.2MiB minified tex-chtml.js script needs to be fetched (plus fonts, or 2.1MiB for tex-svg.js that includes fonts), and then it takes some time to process all the formulas on a page. Although modern systems can run JS very fast, that’s a lot of JS, resulting in a noticeable delay when loading a page. We can do better than that.

To try to reduce the amount of transferred data, I even did a custom minimal build of MathJax, and ended up at 1.6MiB (about 450KiB after Brotli compression). But for some reason (I vaguely remember worrying about fonts) I chose the SVG renderer instead of CHTML, so I even could’ve reduced it to less than 1MiB probably. This size reduction wasn’t free though: I stripped as many features as I could, including MathJax’s accessibility options… oh dear.

Luckily, MathJax has a competitor, KaTeX. According to this test, KaTeX is consistently faster: on my system it needs roughly 150ms to run, compared to 900ms and 300ms for Mathjax v2 and v3, respectively, and the katex.min.js script is only 270KiB (note that this competes with tex-chtml.js, not tex-svg.js; KaTeX doesn’t have an SVG backend). Very impressive! It even has MathML output for accessibility.

But KaTeX has one killer feature: server-side rendering as a first-class citizen. MathJax seems to have this too thanks to MathJax-node (just look up “mathjax server-side”), but it doesn’t seem to be popular, except for use with bigger web frameworks. This post describes what I want, but I think I had problems with the page2html script they use… I don’t remember exactly why, but I decided it wasn’t practical to render MathJax at site build time. So KaTeX it is.

To make this work, I needed to migrate to Jekyll. To render maths while Jekyll builds the website, someone made the jekyll-katex plugin, and it works great! The Markdown source just needs to be wrapped in a {%katexmm%} block, and then all occurrences of $ and $$ are processed by KaTeX. However, this needs to happen before Jekyll turns the Markdown into HTML, so the most obvious HTML template won’t work:

{% comment %} Problem: {{page.content}} is implicitly "markdownified" to HTML {% endcomment %}
{% katexmm %}
  {{ page.content }}
{% endkatexmm %}

Instead, this slightly awkward construction is needed, but it isn’t too bad:

{% comment %} Solution: use explicit "markdownify" filter {% endcomment %}
{% capture content_after_katex %}
  {% katexmm %}
    {{ page.content }}
  {% endkatexmm %}
{% endcapture %}
{{ content_after_katex | markdownify }}

Or you could explicitly wrap all your text in {%katexmm%} in each content file, and use {{content}} in the template instead of {{page.content}}.

Phase 3: rewriting physics

Going from MathJax to KaTeX was easier said than done. See, we physicists have a nasty habit of using the physics package, which provides some useful macros, notably for derivatives (why isn’t this standard in LaTeX? I guess AMS’ mathematicians lost interest in differentiation; they’ve been too busy in the last decades figuring out how to pack oranges in boxes…). MathJax does support it, but KaTeX doesn’t, so my work wasn’t compatible. Damn.

But macros are, by definition, macros, so I should be able to implement them myself. Luckily, KaTeX even has support for this: one of the options I can set is a macros object; you can find my collection here in _config.yml. However, jekyll-katex doesn’t support this setting… or, well, the published version doesn’t: upstream merged this pull request over a year ago, but just never bothered to release it. No problem, it’s easy to patch my local installation.

However, KaTeX’ macro support is fairly basic: it doesn’t allow e.g. asterisks in names or variadic arguments (both heavily used by physics), so I couldn’t make one-to-one implementations of the macros I’d used. I solved this by splitting the original “smart” macros into several “dumb” ones, and updating my files by regex substitution. Below are examples of the petty incompatibilities I had to deal with. Yes, that’s a lot of backslashes: one for the shell, and one for sed. No, I won’t clean it up, I want you to suffer like I did:

# MathJax' "physics" and KaTeX' "braket" package aren't interoperable.
# Some macros are similar, but we want to keep the automatic resizing:
$ sed -i -E "s/\\\\ket\\{/\\\\Ket\\{/g"    $FILE
$ sed -i -E "s/\\\\ket\\*\\{/\\\\ket\\{/g" $FILE
# Sometimes translation isn't so trivial, so we need to define proxy macros:
$ sed -i -E "s/\\\\expval\\{/\\\\Expval\\{/g"       $FILE
$ sed -i -E "s/\\\\expval\\*\\{/\\\\expval\\{/g"    $FILE
$ sed -i -E "s/\\\\expval\\*\\*\\{/\\\\Expval\\{/g" $FILE
# And sometimes it gets a bit awkward to name those proxies...
$ sed -i -E "s/\\\\braket\\{/\\\\Inprod\\{/g"    $FILE
$ sed -i -E "s/\\\\braket\\*\\{/\\\\inprod\\{/g" $FILE

All in all, not bad at all, I just need to remember to write \Inprod instead of \braket in the future. However, some of the macros, especially for differentiation, weren’t so easy to translate, due to the flexibility physics gives you in how you pass arguments. For example, for a partial derivative \pdv, consider the following (bugged) substitutions to cover all possibilities:

# \pdv{f}{x}{y}  => \mpdv{f}{x}{y}
$ sed -i -E "s/\\\\pdv\\{([^}]*)\\}\\{([^}]*)\\}\\{([^}]*)\\}/\\\\mpdv\\{\\1\\}\\{\\2\\}\\{\\3\\}/g" $FILE
# \pdv{x}        => \pdv{}{x}
$ sed -i -E "s/\\\\pdv\\{([^}]*)\\}[^{]/\\\\pdv\\{\\}\\{\\1\\}/g" $FILE
# \pdv*{x}       => \ipdv{}{x}
$ sed -i -E "s/\\\\pdv\\*\\{([^}]*)\\}[^{]/\\\\ipdv\\{\\}\\{\\1\\}/g" $FILE
# \pdv*{f}{x}    => \ipdv{f}{x}
$ sed -i -E "s/\\\\pdv\\*/\\\\ipdv/g" $FILE
# \pdv[n]{x}     => \pdvn{n}{}{x}
$ sed -i -E "s/\\\\pdv\\[([^]]*)\\]\\{([^}]*)\\}[^{]/\\\\pdvn\\{\\1\\}\\{\\}\\{\\2\\}/g" $FILE
# \pdv[n]{f}{x}  => \pdvn{n}{f}{x}
$ sed -i -E "s/\\\\pdv\\[([^]]*)\\]/\\\\pdvn\\{\\1\\}/g" $FILE
# \pdv*[n]{x}    => \ipdvn{n}{}{x}
$ sed -i -E "s/\\\\pdv\\*\\[([^]]*)\\]\\{([^}]*)\\}[^{]/\\\\ipdvn\\{\\1\\}\\{\\}\\{\\2\\}/g" $FILE
# \pdv*[n]{f}{x} => \ipdvn{n}{f}{x}
$ sed -i -E "s/\\\\pdv\\*\\[([^]]*)\\]/\\\\ipdvn\\{\\1\\}/g" $FILE

Yes, I know some of these could be combined, but I was starting to see those backslashes double, and I prefer to list all options explicitly anyway.

The “bug” is the fundamental fact that regular expressions can’t parse context-free grammars: in this case, if an argument to \pdv contains nested braces {}, the regex capture group ([^}]) ends prematurely at the first }. I could’ve worked around this issue by writing more complex regexes, since the nesting is finite and probably no more than two levels deep, so it can still be handled by a finite state machine. But that would mean a lot more backslashes… no thanks.

So I made a judgment call: I wouldn’t trust the substitution 100% anyway, so it’d probably be best to just check all 190 pages manually to fix things. And that’s what I did; it took me five evenings, and I indeed caught several unexpected side effects of the transition to Jekyll and KaTeX. I even encountered an error from KaTeX’ automatic resizing of | inside \Braket: see how I implemented my \Expval proxy macro in _config.yml.

In the end, it was successful: I could serve statically rendered maths. The heaviest page in the knowledge base is converted into a whopping 1.06MB of HTML and MathML… but these are very compressible, so Brotli is able to bring it down to just 18KB, which amounts to a 98% reduction (holy crap). Compared to running MathJax, this really improved load times.

Phase 4: making it go faster

But jekyll-katex slowed down Jekyll a lot, because a JS interpeter is called to run katex.min.js for every page that needs it (oh no, I hope it isn’t invoked for each formula). Once my migration was done, a clean build of this site took 6m30s, which is long, but not a big problem by itself. The real issue was that processing a single maths-heavy page could take up to 15s, which is just too long while I’m writing and want to see the edits I just made.

Digging deeper, I found that jekyll-katex relies on ExecJS for JS execution, which supports several backends. In my case, it was defaulting to Node.js, which is based on Google’s V8 engine. Maybe a different backend would be faster? First I tried Duktape, but that gave a semantic error, so it can’t run katex.min.js apparently. Then I tried MiniRacer, which also uses V8, so it should work for KaTeX, but it won’t be much fa— Wait what? It’s 800% faster? How is that possible? But it uses the same eng— Sure, whatever, I’ll take it.

Thanks to this one weird trick, my website was building in 50s, and a single maths-heavy page took no more than 2s, as long as I set the EXECJS_RUNTIME variable:

$ EXECJS_RUNTIME="MiniRacer" bundle exec jekyll serve --livereload --incremental

So far so good, and I published it like this. However, Jekyll veterans may have noticed a problem: in the template, I’m using {{page.content}}, not {{content}}. The latter is unavoidably converted to Markdown, which is what I want to avoid, but {{page.content}} has an unfortunate side effect: any Liquid in the page isn’t evaluated. In other words, with this setup, I couldn’t use any templating features inside pages, such as using {%include%} for collapsible boxes.

And come on, 50s to build a 200-page static website? We can do better, surely. In this issue on GitHub for jekyll-katex, I found a link to this post, revealing that Kramdown, Jekyll’s Markdown processor, already has support for LaTeX maths if an appropriate plugin is installed. And I’m in luck: one such plugin is kramdown-math-katex, which renders everything offline!

There’s just one small wrinkle: according to Kramdown, all inline maths formulas need to be enclosed in $$ instead of $. Great, time to update all my files again. Fortunately, that was pretty easy this time:

$ sed -i -E "s/\\\$([^\$]+)\\\$/\\\$\\\$\\1\\\$\\\$/g" */index.md

Kramdown then detects based on the context whether a formula should be rendered inline or in display mode. In the end, I opted for the related plugin kramdown-math-sskatex, which allows me to provide my own newer katex.min.js file but apart from that works the same way.

With some minor reformatting, this plugin ended up working perfectly, so I again had statically-rendered maths, this time without sacrificing Liquid support. And the build times? Down to 5s, a 90% reduction from jekyll-katex. I still need to use MiniRacer to run JS: with Node, the build now takes over 15m30s. Why did Node slow down but MiniRacer speed up? Who knows! The ways of the Lord are truly mysterious.

Conclusion

If you want to add maths to your website, the easy solution is to render it on page load. For this, I recommend KaTeX over MathJax: it’s lighter and faster, and it avoids lock-in by disallowing weird LaTeX packages. In the end, I think the best solution is server-side rendering, which is the lightest and fastest of all, although it takes more work to set up. This seems to be easier with KaTeX than MathJax for some reason. I’m happy with my current implementation.


© 2023 Marcus R.A. Newman, CC BY-NC-SA 4.0.
Visitor statistics