---
title: "Adventures in making this website:
rendering LaTeX maths"
date: 2022-11-28
layout: "blog"
toc: true
---
Published on 2022-11-28, last updated on 2023-02-27.
Making and managing this personal website has been an adventure.
In this series, I go over the technical challenges I've encountered
and philosophical decisions I've made,
and I review some of the tools I've used along the way.
After [part 1](/blog/2022/website-adventures-generators/)
and [part 2](/blog/2022/website-adventures-basics/),
this is part 3,
followed by [part 4](/blog/2023/website-adventures-images/).
In late 2020, I decided to start a [knowledge base](/know/),
where I wanted to upload some of the physics notes I'd made for myself.
Those notes were in LaTeX,
which is the *lingua franca* for writing maths on computers,
but unfortunately the web can't display it out-of-the-box,
so some work is needed.
This post documents the wild journey I was sent on by a simple misunderstanding,
leading me to my current (quite decent) maths-rendering solution.
## Phase 0: the easy way
The most famous solution is perhaps
[MathJax](https://www.mathjax.org/), a JavaScript package
that locally renders all LaTeX maths once the page has loaded.
Enabling it for your website is easy:
just put the following code
(copy-pasted from [the docs](https://docs.mathjax.org/en/latest/web/configuration.html#configuration-using-an-in-line-script))
into the `
`--mathjax`: *Use MathJax to display embedded TeX math in HTML output. TeX math will be put between* `\(...\)` *(for inline math) or* `\[...\]` *(for display math)* ***and wrapped in*** `` ***tags with class*** `math`. *Then the MathJax JavaScript will render it.*When I first read that, I misunderstood the part about `` tags: I thought they were needed for MathJax to render the formulas. This is false: pandoc only adds those tags for convenience when you write CSS rules. But it was too late: this misunderstanding, and the fact that few SSGs' manuals explicitly mention maths, left me thinking that my options were very limited. So I migrated to [Hugo](https://gohugo.io/), since it [supports](https://gohugo.io/content-management/formats/#list-of-content-formats) pandoc's Markdown dialect as a content format, which it processes by passing it through the `pandoc` executable. It even gives `--mathjax` as argument, so I could be sure this would work. And indeed it did! This is a legitimate solution, which I arrived at for the wrong reasons. There's a small catch too: Hugo prides itself for its speed, but passing all content through an external program slows it down a lot... Still, my 200-page website could be built in under 10 seconds, so it wasn't bad. ## Phase 2: screw JavaScript MathJax is pretty unwieldy: first, the 1.2MiB minified `tex-chtml.js` script needs to be fetched (plus fonts, or 2.1MiB for `tex-svg.js` that includes fonts), and then it takes some time to process all the formulas on a page. Although modern systems can run JS very fast, that's a lot of JS, resulting in a noticeable delay when loading a page. We can do better than that. To try to reduce the amount of transferred data, I even did a custom minimal build of MathJax, and ended up at 1.6MiB (about 450KiB after Brotli compression). But for some reason (I vaguely remember worrying about fonts) I chose the SVG renderer instead of CHTML, so I even could've reduced it to less than 1MiB probably. This size reduction wasn't free though: I stripped as many features as I could, including MathJax's accessibility options... oh dear. Luckily, MathJax has a competitor, [KaTeX](https://katex.org/). According to [this test](https://www.intmath.com/cg5/katex-mathjax-comparison.php), KaTeX is consistently faster: on my system it needs roughly 150ms to run, compared to 900ms and 300ms for Mathjax v2 and v3, respectively, and the `katex.min.js` script is only 270KiB (note that this competes with `tex-chtml.js`, not `tex-svg.js`; KaTeX doesn't have an SVG backend). Very impressive! It even has MathML output for accessibility. But KaTeX has one killer feature: server-side rendering as a first-class citizen. MathJax seems to have this too thanks to [`MathJax-node`](https://github.com/mathjax/MathJax-node) (just look up "mathjax server-side"), but it doesn't seem to be popular, except for use with bigger web frameworks. [This post](https://a3nm.net/blog/selfhost_mathjax.html) describes what I want, but I think I had problems with the `page2html` script they use... I don't remember exactly why, but I decided it wasn't practical to render MathJax at site build time. So KaTeX it is. To make this work, I needed to migrate to [Jekyll](https://jekyllrb.com/). To render maths while Jekyll builds the website, someone made the [`jekyll-katex`](https://github.com/linjer/jekyll-katex) plugin, and it works great! The Markdown source just needs to be wrapped in a `{% raw %}{%katexmm%}{% endraw %}` block, and then all occurrences of `$` and `$$` are processed by KaTeX. However, this needs to happen *before* Jekyll turns the Markdown into HTML, so the most obvious HTML template won't work: ```liquid {% raw %}{% comment %} Problem: {{page.content}} is implicitly "markdownified" to HTML {% endcomment %} {% katexmm %} {{ page.content }} {% endkatexmm %}{% endraw %} ``` Instead, this slightly awkward construction is needed, but it isn't too bad: ```liquid {% raw %}{% comment %} Solution: use explicit "markdownify" filter {% endcomment %} {% capture content_after_katex %} {% katexmm %} {{ page.content }} {% endkatexmm %} {% endcapture %} {{ content_after_katex | markdownify }}{% endraw %} ``` Or you could explicitly wrap all your text in `{% raw %}{%katexmm%}{% endraw %}` in each content file, and use `{% raw %}{{content}}{% endraw %}` in the template instead of `{% raw %}{{page.content}}{% endraw %}`. ## Phase 3: rewriting physics Going from MathJax to KaTeX was easier said than done. See, we physicists have a nasty habit of using the [`physics`](https://www.ctan.org/pkg/physics) package, which provides some useful macros, notably for derivatives (why isn't this standard in LaTeX? I guess AMS' mathematicians lost interest in differentiation; they've been too busy in the last decades figuring out how to [pack oranges](https://en.wikipedia.org/wiki/Kepler_conjecture) in boxes...). MathJax does support it, but KaTeX doesn't, so my work wasn't compatible. Damn. But macros are, by definition, *macros*, so I should be able to implement them myself. Luckily, KaTeX even has support for this: one of the [options](https://katex.org/docs/options.html) I can set is a `macros` object; you can find my collection here in [`_config.yml`](/code/prefetch-jekyll/tree/_config.yml). However, `jekyll-katex` doesn't support this setting... or, well, the published version doesn't: upstream merged [this pull request](https://github.com/linjer/jekyll-katex/pull/34) over a year ago, but just never bothered to release it. No problem, it's easy to patch my local installation. However, KaTeX' macro support is fairly basic: it doesn't allow e.g. asterisks in names or variadic arguments (both heavily used by `physics`), so I couldn't make one-to-one implementations of the macros I'd used. I solved this by splitting the original "smart" macros into several "dumb" ones, and updating my files by regex substitution. Below are examples of the petty incompatibilities I had to deal with. Yes, that's a lot of backslashes: one for the shell, and one for `sed`. No, I won't clean it up, I want you to suffer like I did: ```sh # MathJax' "physics" and KaTeX' "braket" package aren't interoperable. # Some macros are similar, but we want to keep the automatic resizing: {% comment %}$ sed -i -E "s/\\\\bra\\{/\\\\Bra\\{/g" $FILE $ sed -i -E "s/\\\\bra\\*\\{/\\\\bra\\{/g" $FILE{% endcomment -%} $ sed -i -E "s/\\\\ket\\{/\\\\Ket\\{/g" $FILE $ sed -i -E "s/\\\\ket\\*\\{/\\\\ket\\{/g" $FILE # Sometimes translation isn't so trivial, so we need to define proxy macros: $ sed -i -E "s/\\\\expval\\{/\\\\Expval\\{/g" $FILE $ sed -i -E "s/\\\\expval\\*\\{/\\\\expval\\{/g" $FILE $ sed -i -E "s/\\\\expval\\*\\*\\{/\\\\Expval\\{/g" $FILE {% comment %}$ sed -i -E "s/\\\\matrixel\\{/\\\\matrixel\\{/g" $FILE $ sed -i -E "s/\\\\matrixel\\*\\{/\\\\matrixel\\{/g" $FILE $ sed -i -E "s/\\\\matrixel\\*\\*\\{/\\\\Matrixel\\{/g" $FILE{% endcomment -%} # And sometimes it gets a bit awkward to name those proxies... $ sed -i -E "s/\\\\braket\\{/\\\\Inprod\\{/g" $FILE $ sed -i -E "s/\\\\braket\\*\\{/\\\\inprod\\{/g" $FILE {% comment %}$ sed -i -E "s/\\\\dyad\\{/\\\\Exprod\\{/g" $FILE $ sed -i -E "s/\\\\dyad\\*\\{/\\\\exprod\\{/g" $FILE{% endcomment -%} ``` All in all, not bad at all, I just need to remember to write `\Inprod` instead of `\braket` in the future. However, some of the macros, especially for differentiation, weren't so easy to translate, due to the flexibility `physics` gives you in how you pass arguments. For example, for a partial derivative `\pdv`, consider the following (bugged) substitutions to cover all possibilities: ```sh # \pdv{f}{x}{y} => \mpdv{f}{x}{y} $ sed -i -E "s/\\\\pdv\\{([^}]*)\\}\\{([^}]*)\\}\\{([^}]*)\\}/\\\\mpdv\\{\\1\\}\\{\\2\\}\\{\\3\\}/g" $FILE # \pdv{x} => \pdv{}{x} $ sed -i -E "s/\\\\pdv\\{([^}]*)\\}[^{]/\\\\pdv\\{\\}\\{\\1\\}/g" $FILE # \pdv*{x} => \ipdv{}{x} $ sed -i -E "s/\\\\pdv\\*\\{([^}]*)\\}[^{]/\\\\ipdv\\{\\}\\{\\1\\}/g" $FILE # \pdv*{f}{x} => \ipdv{f}{x} $ sed -i -E "s/\\\\pdv\\*/\\\\ipdv/g" $FILE # \pdv[n]{x} => \pdvn{n}{}{x} $ sed -i -E "s/\\\\pdv\\[([^]]*)\\]\\{([^}]*)\\}[^{]/\\\\pdvn\\{\\1\\}\\{\\}\\{\\2\\}/g" $FILE # \pdv[n]{f}{x} => \pdvn{n}{f}{x} $ sed -i -E "s/\\\\pdv\\[([^]]*)\\]/\\\\pdvn\\{\\1\\}/g" $FILE # \pdv*[n]{x} => \ipdvn{n}{}{x} $ sed -i -E "s/\\\\pdv\\*\\[([^]]*)\\]\\{([^}]*)\\}[^{]/\\\\ipdvn\\{\\1\\}\\{\\}\\{\\2\\}/g" $FILE # \pdv*[n]{f}{x} => \ipdvn{n}{f}{x} $ sed -i -E "s/\\\\pdv\\*\\[([^]]*)\\]/\\\\ipdvn\\{\\1\\}/g" $FILE ``` Yes, I know some of these could be combined, but I was starting to see those backslashes double, and I prefer to list all options explicitly anyway. The "bug" is the fundamental fact that regular expressions can't parse context-free grammars: in this case, if an argument to `\pdv` contains nested braces `{}`, the regex capture group `([^}])` ends prematurely at the first `}`. I could've worked around this issue by writing more complex regexes, since the nesting is finite and probably no more than two levels deep, so it can still be handled by a finite state machine. But that would mean a lot more backslashes... no thanks. So I made a judgment call: I wouldn't trust the substitution 100% anyway, so it'd probably be best to just check all 190 pages manually to fix things. And that's what I did; it took me five evenings, and I indeed caught several unexpected side effects of the transition to Jekyll and KaTeX. I even encountered an error from KaTeX' automatic resizing of `|` inside `\Braket`: see how I implemented my `\Expval` proxy macro in [`_config.yml`](/code/prefetch-jekyll/tree/_config.yml#n39). In the end, it was successful: I could serve statically rendered maths. The [heaviest page](/know/concept/selection-rules/) in the knowledge base is converted into a whopping 1.06MB of HTML and MathML... but these are very compressible, so Brotli is able to bring it down to just 18KB, which amounts to a 98% reduction (holy crap). Compared to running MathJax, this really improved load times. ## Phase 4: making it go faster But `jekyll-katex` slowed down Jekyll *a lot*, because a JS interpeter is called to run `katex.min.js` for every page that needs it (oh no, I hope it isn't invoked for each formula). Once my migration was done, a clean build of this site took 6m30s, which is long, but not a big problem by itself. The real issue was that processing a single maths-heavy page could take up to 15s, which is just too long while I'm writing and want to see the edits I just made. Digging deeper, I found that `jekyll-katex` relies on [ExecJS](https://github.com/rails/execjs) for JS execution, which supports several backends. In my case, it was defaulting to [Node.js](https://nodejs.org/en/), which is based on Google's [V8](https://en.wikipedia.org/wiki/V8_(JavaScript_engine)) engine. Maybe a different backend would be faster? First I tried [Duktape](https://github.com/judofyr/duktape.rb), but that gave a semantic error, so it can't run `katex.min.js` apparently. Then I tried [MiniRacer](https://github.com/rubyjs/mini_racer), which also uses V8, so it should work for KaTeX, but it won't be much fa--- Wait what? It's 800% faster? How is that possible? But it uses the same eng--- Sure, whatever, I'll take it. Thanks to this one weird trick, my website was building in 50s, and a single maths-heavy page took no more than 2s, as long as I set the `EXECJS_RUNTIME` variable: ```sh $ EXECJS_RUNTIME="MiniRacer" bundle exec jekyll serve --livereload --incremental ``` So far so good, and I published it like this. However, Jekyll veterans may have noticed a problem: in the template, I'm using `{%raw%}{{page.content}}{%endraw%}`, not `{%raw%}{{content}}{%endraw%}`. The latter is unavoidably converted to Markdown, which is what I want to avoid, but `{%raw%}{{page.content}}{%endraw%}` has an unfortunate side effect: any Liquid in the page isn't evaluated. In other words, with this setup, I couldn't use any templating features inside pages, such as using `{%raw%}{%include%}{%endraw%}` for [collapsible boxes](/blog/2022/website-adventures-basics/#collapsible-content). And come on, 50s to build a 200-page static website? We can do better, surely. In [this issue](https://github.com/linjer/jekyll-katex/issues/35) on GitHub for `jekyll-katex`, I found a link to [this post](https://gendignoux.com/blog/2020/05/23/katex.html), revealing that Kramdown, Jekyll's Markdown processor, already has support for LaTeX maths if an appropriate plugin is installed. And I'm in luck: one such plugin is [`kramdown-math-katex`](https://github.com/kramdown/math-katex), which renders everything offline! There's just one small wrinkle: according to Kramdown, all inline maths formulas need to be enclosed in `$$` instead of `$`. Great, time to update all my files again. Fortunately, that was pretty easy this time: ```sh $ sed -i -E "s/\\\$([^\$]+)\\\$/\\\$\\\$\\1\\\$\\\$/g" */index.md ``` Kramdown then detects based on the context whether a formula should be rendered inline or in display mode. In the end, I opted for the related plugin [`kramdown-math-sskatex`](https://github.com/kramdown/math-sskatex), which allows me to provide my own newer `katex.min.js` file but apart from that works the same way. With some minor reformatting, this plugin ended up working perfectly, so I again had statically-rendered maths, this time without sacrificing Liquid support. And the build times? Down to 5s, a 90% reduction from `jekyll-katex`. I still need to use MiniRacer to run JS: with Node, the build now takes over 15m30s. Why did Node slow down but MiniRacer speed up? Who knows! The ways of the Lord are truly mysterious. ## Conclusion If you want to add maths to your website, the easy solution is to render it on page load. For this, I recommend KaTeX over MathJax: it's lighter and faster, and it avoids lock-in by disallowing weird LaTeX packages. In the end, I think the best solution is server-side rendering, which is the lightest and fastest of all, although it takes more work to set up. This seems to be easier with KaTeX than MathJax for some reason. I'm happy with my current implementation.