From f0dc83f5ca9fc934081905f8689f8a35f8fbacaa Mon Sep 17 00:00:00 2001
From: Prefetch
Date: Mon, 28 Nov 2022 21:21:44 +0100
Subject: Publish "Website adventures" part 3 about LaTeX maths
---
.../blog/2022/website-adventures-basics/index.md | 5 +-
.../2022/website-adventures-generators/index.md | 6 +-
source/blog/2022/website-adventures-maths/index.md | 404 +++++++++++++++++++++
3 files changed, 410 insertions(+), 5 deletions(-)
create mode 100644 source/blog/2022/website-adventures-maths/index.md
(limited to 'source/blog')
diff --git a/source/blog/2022/website-adventures-basics/index.md b/source/blog/2022/website-adventures-basics/index.md
index 31e0fc3..64ed446 100644
--- a/source/blog/2022/website-adventures-basics/index.md
+++ b/source/blog/2022/website-adventures-basics/index.md
@@ -5,14 +5,15 @@ layout: "blog"
toc: true
---
-Published on 2022-11-20.
+Published on 2022-11-20, last updated on 2022-11-28.
Making and managing this personal website has been an adventure.
In this series, I go over the technical challenges I've encountered
and philosophical decisions I've made,
and I review some of the tools I've used along the way.
After [part 1](/blog/2022/website-adventures-generators/),
-this is part 2, with more coming soon.
+this is part 2,
+followed by [part 3](/blog/2022/website-adventures-maths/).
diff --git a/source/blog/2022/website-adventures-generators/index.md b/source/blog/2022/website-adventures-generators/index.md
index 7e52f17..42a6230 100644
--- a/source/blog/2022/website-adventures-generators/index.md
+++ b/source/blog/2022/website-adventures-generators/index.md
@@ -5,14 +5,14 @@ layout: "blog"
toc: true
---
-Published on 2022-11-15, last updated on 2022-11-20.
+Published on 2022-11-15, last updated on 2022-11-28.
Making and managing this personal website has been an adventure.
In this series, I go over the technical challenges I've encountered
and philosophical decisions I've made,
and I review some of the tools I've used along the way.
-This is part 1, followed by [part 2](/blog/2022/website-adventures-basics/),
-with more coming.
+This is part 1, followed by [part 2](/blog/2022/website-adventures-basics/)
+and [part 3](/blog/2022/website-adventures-maths/).
diff --git a/source/blog/2022/website-adventures-maths/index.md b/source/blog/2022/website-adventures-maths/index.md
new file mode 100644
index 0000000..a49b473
--- /dev/null
+++ b/source/blog/2022/website-adventures-maths/index.md
@@ -0,0 +1,404 @@
+---
+title: "Adventures in making this website:
rendering LaTeX maths"
+date: 2022-11-28
+layout: "blog"
+toc: true
+---
+
+Published on 2022-11-28.
+
+Making and managing this personal website has been an adventure.
+In this series, I go over the technical challenges I've encountered
+and philosophical decisions I've made,
+and I review some of the tools I've used along the way.
+After [part 1](/blog/2022/website-adventures-generators/)
+and [part 2](/blog/2022/website-adventures-basics/),
+this is part 3.
+
+
+
+In late 2020, I decided to start a [knowledge base](/know/),
+where I wanted to upload some of the physics notes I'd made for myself.
+Those notes were in LaTeX,
+which is the *lingua franca* for writing maths on computers,
+but unfortunately the web can't display it out-of-the-box,
+so some work is needed.
+This post documents the wild journey I was sent on by a simple misunderstanding,
+leading me to my current (quite decent) maths-rendering solution.
+
+
+
+## Phase 0: the easy way
+
+The most famous solution is perhaps
+[MathJax](https://www.mathjax.org/), a JavaScript package
+that locally renders all LaTeX maths once the page has loaded.
+Enabling it for your website is easy:
+just put the following code
+(copy-pasted from [the docs](https://docs.mathjax.org/en/latest/web/configuration.html#configuration-using-an-in-line-script))
+into the `
+`--mathjax`: +*Use MathJax to display embedded TeX math in HTML output. TeX math will be put between* +`\(...\)` *(for inline math) or* +`\[...\]` *(for display math)* ***and wrapped in*** +`` ***tags with class*** `math`. +*Then the MathJax JavaScript will render it.* ++ +When I first read that, I misunderstood the part about `` tags: +I thought they were needed for MathJax to render the formulas. +This is false: pandoc only adds those tags for convenience when you write CSS rules. +But it was too late: this misunderstanding, and the fact that few SSGs' manuals +explicitly mention maths, left me thinking that my options were very limited. + +So I migrated to [Hugo](https://gohugo.io/), +since it [supports](https://gohugo.io/content-management/formats/#list-of-content-formats) +pandoc's Markdown dialect as a content format, +which it processes by passing it through the `pandoc` executable. +It even gives `--mathjax` as argument, so I could be sure this would work. +And indeed it did! This is a legitimate solution, +which I arrived at for the wrong reasons. +There's a small catch too: Hugo prides itself for its speed, +but passing all content through an external program slows it down a lot... +Still, my 200-page website could be built in under 10 seconds, so it wasn't bad. + + + +## Phase 2: screw JavaScript + +MathJax is pretty unwieldy: +first, the 1.2MiB minified `tex-chtml.js` script needs to be fetched +(plus fonts, or 2.1MiB for `tex-svg.js` that includes fonts), +and then it takes some time to process all the formulas on a page. +Although modern systems can run JS very fast, +that's a lot of JS, resulting in a noticeable delay when loading a page. +We can do better than that. + +To try to reduce the amount of transferred data, +I even did a custom minimal build of MathJax, +and ended up at 1.6MiB (about 450KiB after Brotli compression). +But for some reason (I vaguely remember worrying about fonts) +I chose the SVG renderer instead of CHTML, +so I even could've reduced it to less than 1MiB probably. +This size reduction wasn't free though: +I stripped as many features as I could, +including MathJax's accessibility options... oh dear. + +Luckily, MathJax has a competitor, [KaTeX](https://katex.org/). +According to [this test](https://www.intmath.com/cg5/katex-mathjax-comparison.php), +KaTeX is consistently faster: +on my system it needs roughly 150ms to run, +compared to 900ms and 300ms for Mathjax v2 and v3, respectively, +and the `katex.min.js` script is only 270KiB +(note that this competes with `tex-chtml.js`, not `tex-svg.js`; KaTeX doesn't have an SVG backend). +Very impressive! It even has MathML output for accessibility. + +But KaTeX has one killer feature: server-side rendering as a first-class citizen. +MathJax seems to have this too thanks to [`MathJax-node`](https://github.com/mathjax/MathJax-node) +(just look up "mathjax server-side"), but it doesn't seem to be popular, +except for use with bigger web frameworks. +[This post](https://a3nm.net/blog/selfhost_mathjax.html) +describes what I want, +but I think I had problems with the `page2html` script they use... +I don't remember exactly why, but I decided it wasn't practical +to render MathJax at site build time. +So KaTeX it is. + +To make this work, I needed to migrate to [Jekyll](https://jekyllrb.com/). +To render maths while Jekyll builds the website, +someone made the [`jekyll-katex`](https://github.com/linjer/jekyll-katex) plugin, +and it works great! +The Markdown source just needs to be wrapped in a `{% raw %}{%katexmm%}{% endraw %}` block, +and then all occurrences of `$` and `$$` are processed by KaTeX. +However, this needs to happen *before* Jekyll turns the Markdown into HTML, +so the most obvious HTML template won't work: + +```liquid +{% raw %}{% comment %} Problem: {{page.content}} is implicitly "markdownified" to HTML {% endcomment %} +{% katexmm %} + {{ page.content }} +{% endkatexmm %}{% endraw %} +``` + +Instead, this slightly awkward construction is needed, +but it isn't too bad: + +```liquid +{% raw %}{% comment %} Solution: use explicit "markdownify" filter {% endcomment %} +{% capture content_after_katex %} + {% katexmm %} + {{ page.content }} + {% endkatexmm %} +{% endcapture %} +{{ content_after_katex | markdownify }}{% endraw %} +``` + +Or you could explicitly wrap all your text in `{% raw %}{%katexmm%}{% endraw %}` +in each content file, +and use `{% raw %}{{content}}{% endraw %}` in the template +instead of `{% raw %}{{page.content}}{% endraw %}`. + + + +## Phase 3: rewriting physics + +Going from MathJax to KaTeX was easier said than done. +See, we physicists have a nasty habit of using +the [`physics`](https://www.ctan.org/pkg/physics) package, +which provides some useful macros, notably for derivatives +(why isn't this standard in LaTeX? +I guess AMS' mathematicians lost interest in differentiation; +they've been too busy in the last decades figuring out how to +[pack oranges](https://en.wikipedia.org/wiki/Kepler_conjecture) in boxes...). +MathJax does support it, but KaTeX doesn't, +so my work wasn't compatible. Damn. + +But macros are, by definition, *macros*, +so I should be able to implement them myself. +Luckily, KaTeX even has support for this: +one of the [options](https://katex.org/docs/options.html) +I can set is a `macros` object; you can find my collection here +in [`_config.yml`](/code/prefetch-jekyll/tree/_config.yml). +However, `jekyll-katex` doesn't support this setting... +or, well, the published version doesn't: +upstream merged [this pull request](https://github.com/linjer/jekyll-katex/pull/34) +over a year ago, but just never bothered to release it. +No problem, it's easy to patch my local installation. + +However, KaTeX' macro support is fairly basic: +it doesn't allow e.g. asterisks in names or variadic arguments +(both heavily used by `physics`), +so I couldn't make one-to-one implementations of the macros I'd used. +I solved this by splitting the original "smart" macros into several "dumb" ones, +and updating my files by regex substitution. +Below are examples of the petty incompatibilities I had to deal with. +Yes, that's a lot of backslashes: one for the shell, and one for `sed`. +No, I won't clean it up, I want you to suffer like I did: + +```sh +# MathJax' "physics" and KaTeX' "braket" package aren't interoperable. +# Some macros are similar, but we want to keep the automatic resizing: +{% comment %}$ sed -i -E "s/\\\\bra\\{/\\\\Bra\\{/g" $FILE +$ sed -i -E "s/\\\\bra\\*\\{/\\\\bra\\{/g" $FILE{% endcomment -%} +$ sed -i -E "s/\\\\ket\\{/\\\\Ket\\{/g" $FILE +$ sed -i -E "s/\\\\ket\\*\\{/\\\\ket\\{/g" $FILE +# Sometimes translation isn't so trivial, so we need to define proxy macros: +$ sed -i -E "s/\\\\expval\\{/\\\\Expval\\{/g" $FILE +$ sed -i -E "s/\\\\expval\\*\\{/\\\\expval\\{/g" $FILE +$ sed -i -E "s/\\\\expval\\*\\*\\{/\\\\Expval\\{/g" $FILE +{% comment %}$ sed -i -E "s/\\\\matrixel\\{/\\\\matrixel\\{/g" $FILE +$ sed -i -E "s/\\\\matrixel\\*\\{/\\\\matrixel\\{/g" $FILE +$ sed -i -E "s/\\\\matrixel\\*\\*\\{/\\\\Matrixel\\{/g" $FILE{% endcomment -%} +# And sometimes it gets a bit awkward to name those proxies... +$ sed -i -E "s/\\\\braket\\{/\\\\Inprod\\{/g" $FILE +$ sed -i -E "s/\\\\braket\\*\\{/\\\\inprod\\{/g" $FILE +{% comment %}$ sed -i -E "s/\\\\dyad\\{/\\\\Exprod\\{/g" $FILE +$ sed -i -E "s/\\\\dyad\\*\\{/\\\\exprod\\{/g" $FILE{% endcomment -%} +``` + +All in all, not bad at all, I just need to remember to +write `\Inprod` instead of `\braket` in the future. +However, some of the macros, especially for differentiation, +weren't so easy to translate, +due to the flexibility `physics` gives you +in how you pass arguments. +For example, for a partial derivative `\pdv`, +consider the following (bugged) substitutions to cover all possibilities: + +```sh +# \pdv{f}{x}{y} => \mpdv{f}{x}{y} +$ sed -i -E "s/\\\\pdv\\{([^}]*)\\}\\{([^}]*)\\}\\{([^}]*)\\}/\\\\mpdv\\{\\1\\}\\{\\2\\}\\{\\3\\}/g" $FILE +# \pdv{x} => \pdv{}{x} +$ sed -i -E "s/\\\\pdv\\{([^}]*)\\}[^{]/\\\\pdv\\{\\}\\{\\1\\}/g" $FILE +# \pdv*{x} => \ipdv{}{x} +$ sed -i -E "s/\\\\pdv\\*\\{([^}]*)\\}[^{]/\\\\ipdv\\{\\}\\{\\1\\}/g" $FILE +# \pdv*{f}{x} => \ipdv{f}{x} +$ sed -i -E "s/\\\\pdv\\*/\\\\ipdv/g" $FILE +# \pdv[n]{x} => \pdvn{n}{}{x} +$ sed -i -E "s/\\\\pdv\\[([^]]*)\\]\\{([^}]*)\\}[^{]/\\\\pdvn\\{\\1\\}\\{\\}\\{\\2\\}/g" $FILE +# \pdv[n]{f}{x} => \pdvn{n}{f}{x} +$ sed -i -E "s/\\\\pdv\\[([^]]*)\\]/\\\\pdvn\\{\\1\\}/g" $FILE +# \pdv*[n]{x} => \ipdvn{n}{}{x} +$ sed -i -E "s/\\\\pdv\\*\\[([^]]*)\\]\\{([^}]*)\\}[^{]/\\\\ipdvn\\{\\1\\}\\{\\}\\{\\2\\}/g" $FILE +# \pdv*[n]{f}{x} => \ipdvn{n}{f}{x} +$ sed -i -E "s/\\\\pdv\\*\\[([^]]*)\\]/\\\\ipdvn\\{\\1\\}/g" $FILE +``` + +Yes, I know some of these could be combined, +but I was starting to see those backslashes double, +and I prefer to list all options explicitly anyway. + +The "bug" is the fundamental fact that regular expressions +can't parse context-free grammars: +in this case, if an argument to `\pdv` contains nested braces `{}`, +the regex capture group `([^}])` ends prematurely at the first `}`. +I could've worked around this issue by writing more complex regexes, +since the nesting is finite and probably no more than two levels deep, +so it can still be handled by a finite state machine. +But that would mean a lot more backslashes... no thanks. + +So I made a judgment call: I wouldn't trust the substitution 100% anyway, +so it'd probably be best to just check all 190 pages manually to fix things. +And that's what I did; it took me five evenings, +and I indeed caught several unexpected side effects of the transition to Jekyll and KaTeX. +I even encountered an error from KaTeX' automatic resizing of `|` inside `\Braket`: +see how I implemented my `\Expval` proxy macro in [`_config.yml`](/code/prefetch-jekyll/tree/_config.yml#n39). + +In the end, it was successful: +I could serve statically rendered maths. +The [heaviest page](/know/concept/selection-rules/) in the knowledge base +is converted into a whopping 1.06MB of HTML and MathML... +but these are very compressible, +so Brotli is able to bring it down to just 18KB, +which amounts to a 98% reduction (holy crap). +Compared to running MathJax, this really improved load times. + + + +## Phase 4: making it go faster + +But `jekyll-katex` slowed down Jekyll *a lot*, +because a JS interpeter is called to run `katex.min.js` +for every page that needs it (oh no, I hope it isn't invoked for each formula). +Once my migration was done, +a clean build of this site took 6m30s, +which is long, but not a big problem by itself. +The real issue was that processing a single maths-heavy page could take up to 15s, +which is just too long +while I'm writing and want to see the edits I just made. + +Digging deeper, I found that `jekyll-katex` relies on +[ExecJS](https://github.com/rails/execjs) for JS execution, +which supports several backends. +In my case, it was defaulting to [Node.js](https://nodejs.org/en/), +which is based on Google's [V8](https://en.wikipedia.org/wiki/V8_(JavaScript_engine)) engine. +Maybe a different backend would be faster? +First I tried [Duktape](https://github.com/judofyr/duktape.rb), +but that gave a semantic error, so it can't run `katex.min.js` apparently. +Then I tried [MiniRacer](https://github.com/rubyjs/mini_racer), +which also uses V8, so it should work for KaTeX, but it won't be much fa--- +Wait what? It's 800% faster? How is that possible? +But it uses the same eng--- Sure, whatever, I'll take it. + +Thanks to this one weird trick, my website was building in 50s, +and a single maths-heavy page took no more than 2s, +as long as I set the `EXECJS_RUNTIME` variable: + +```sh +$ EXECJS_RUNTIME="MiniRacer" bundle exec jekyll serve --livereload --incremental +``` + +So far so good, and I published it like this. +However, Jekyll veterans may have noticed a problem: +in the template, I'm using `{%raw%}{{page.content}}{%endraw%}`, +not `{%raw%}{{content}}{%endraw%}`. +The latter is unavoidably converted to Markdown, +which is what I want to avoid, +but `{%raw%}{{page.content}}{%endraw%}` has an unfortunate side effect: +any Liquid in the page isn't evaluated. +In other words, with this setup, I couldn't use any templating features inside pages, +such as using `{%raw%}{%include%}{%endraw%}` for +[collapsible boxes](/blog/2022/website-adventures-basics/#collapsible-content). + +And come on, 50s to build a 200-page static website? We can do better, surely. +In [this issue](https://github.com/linjer/jekyll-katex/issues/35) on GitHub for `jekyll-katex`, +I found a link to [this post](https://gendignoux.com/blog/2020/05/23/katex.html), +revealing that Kramdown, Jekyll's Markdown processor, +already has support for LaTeX maths +if an appropriate plugin is installed. +And I'm in luck: one such plugin is +[`kramdown-math-katex`](https://github.com/kramdown/math-katex), +which renders everything offline! + +There's just one small wrinkle: according to Kramdown, +all inline maths formulas need to be enclosed in `$$` instead of `$`. +Great, time to update all my files again. +Fortunately, that was pretty easy this time: + +```sh +$ sed -i -E "s/\\\$([^\$]+)\\\$/\\\$\\\$\\1\\\$\\\$/g" */index.md +``` + +Kramdown then detects based on the context whether a formula +should be rendered inline or in display mode. +In the end, I opted for the related plugin +[`kramdown-math-sskatex`](https://github.com/kramdown/math-sskatex), +which allows me to provide my own newer `katex.min.js` file +but apart from that works the same way. + +With some minor reformatting, this plugin ended up working perfectly, +so I again had statically-rendered maths, +this time without sacrificing Liquid support. +And the build times? Down to 5s, a 90% reduction from `jekyll-katex`. +I still need to use MiniRacer to run JS: +with Node, the build now takes over 15m30s. +Why did Node slow down but MiniRacer speed up? Who knows! +The ways of the Lord are truly mysterious. + + + +## Conclusion + +If you want to add maths to your website, +the easy solution is to render it on page load. +For this, I recommend KaTeX over MathJax: it's lighter and faster, +and it avoids lock-in by disallowing weird LaTeX packages. +In the end, I think the best solution is server-side rendering, +which is the lightest and fastest of all, +although it takes more work to set up. +This seems to be easier with KaTeX than MathJax for some reason. +I'm happy with my current implementation. + -- cgit v1.2.3