summaryrefslogtreecommitdiff
path: root/source/blog/2022/website-adventures-maths
diff options
context:
space:
mode:
authorPrefetch2022-11-28 21:21:44 +0100
committerPrefetch2022-11-28 21:25:23 +0100
commitf0dc83f5ca9fc934081905f8689f8a35f8fbacaa (patch)
treea56a6df04c9c5ce6c78c77fb064d432470ce06f4 /source/blog/2022/website-adventures-maths
parent41420c0e32cba69d4f4e19175bd3350fed427275 (diff)
Publish "Website adventures" part 3 about LaTeX maths
Diffstat (limited to 'source/blog/2022/website-adventures-maths')
-rw-r--r--source/blog/2022/website-adventures-maths/index.md404
1 files changed, 404 insertions, 0 deletions
diff --git a/source/blog/2022/website-adventures-maths/index.md b/source/blog/2022/website-adventures-maths/index.md
new file mode 100644
index 0000000..a49b473
--- /dev/null
+++ b/source/blog/2022/website-adventures-maths/index.md
@@ -0,0 +1,404 @@
+---
+title: "Adventures in making this website:<br>rendering LaTeX maths"
+date: 2022-11-28
+layout: "blog"
+toc: true
+---
+
+Published on 2022-11-28.
+
+Making and managing this personal website has been an adventure.
+In this series, I go over the technical challenges I've encountered
+and philosophical decisions I've made,
+and I review some of the tools I've used along the way.
+After [part 1](/blog/2022/website-adventures-generators/)
+and [part 2](/blog/2022/website-adventures-basics/),
+this is part 3.
+
+
+
+In late 2020, I decided to start a [knowledge base](/know/),
+where I wanted to upload some of the physics notes I'd made for myself.
+Those notes were in LaTeX,
+which is the *lingua franca* for writing maths on computers,
+but unfortunately the web can't display it out-of-the-box,
+so some work is needed.
+This post documents the wild journey I was sent on by a simple misunderstanding,
+leading me to my current (quite decent) maths-rendering solution.
+
+
+
+## Phase 0: the easy way
+
+The most famous solution is perhaps
+[MathJax](https://www.mathjax.org/), a JavaScript package
+that locally renders all LaTeX maths once the page has loaded.
+Enabling it for your website is easy:
+just put the following code
+(copy-pasted from [the docs](https://docs.mathjax.org/en/latest/web/configuration.html#configuration-using-an-in-line-script))
+into the `<head>` of your HTML template:
+
+```html
+<script>
+MathJax = {
+ tex: {
+ inlineMath: [['$', '$'], ['\\(', '\\)']]
+ },
+ svg: {
+ fontCache: 'global'
+ }
+};
+</script>
+<script type="text/javascript" id="MathJax-script" async
+ src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js">
+</script>
+```
+
+And then enclose your LaTeX in `$` or `$$` for inline or display-mode maths, respectively.
+In theory, this should work for all static site generators (SSGs)
+as long as they preserve the `$`-symbols into the final HTML files.
+This should be the case for [Zola](https://www.getzola.org/),
+the SSG I originally built my site in.
+
+For the sake of completeness, I should point out that
+`tex-svg.js` isn't the only available flavour of MathJax;
+they're all listed [here](https://github.com/mathjax/MathJax/tree/master/es5),
+where `tex` and `mml` refer to the input formats
+LaTeX and [MathML](https://en.wikipedia.org/wiki/MathML),
+while `svg` and `chtml` are the output formats:
+scalable vector graphics,
+or a bunch of inline-styled `<span>`s called CommonHTML,
+respectively.
+The `full` files enable all LaTeX extensions offered by MathJax,
+whereas the others fetch those on demand.
+
+And... that's it! If you enjoyed this guide, please like and subscribe or something like that!
+
+
+
+## Phase 1: pandoc's red herring
+
+... at least, that's what I *would've* done,
+if I hadn't been led astray while researching this.
+
+You see, during my research,
+one of the options I explored was [pandoc](https://pandoc.org/),
+a fairly well-known tool to convert documents between markup formats.
+Pandoc recognizes LaTeX maths,
+and transfers it to HTML for rendering by e.g. MathJax.
+From the [manual](https://pandoc.org/MANUAL.html#math-rendering-in-html)
+(emphasis mine):
+
+<blockquote markdown="1">
+`--mathjax`:
+*Use MathJax to display embedded TeX math in HTML output. TeX math will be put between*
+`\(...\)` *(for inline math) or*
+`\[...\]` *(for display math)* ***and wrapped in***
+`<span>` ***tags with class*** `math`.
+*Then the MathJax JavaScript will render it.*
+</blockquote>
+
+When I first read that, I misunderstood the part about `<span>` tags:
+I thought they were needed for MathJax to render the formulas.
+This is false: pandoc only adds those tags for convenience when you write CSS rules.
+But it was too late: this misunderstanding, and the fact that few SSGs' manuals
+explicitly mention maths, left me thinking that my options were very limited.
+
+So I migrated to [Hugo](https://gohugo.io/),
+since it [supports](https://gohugo.io/content-management/formats/#list-of-content-formats)
+pandoc's Markdown dialect as a content format,
+which it processes by passing it through the `pandoc` executable.
+It even gives `--mathjax` as argument, so I could be sure this would work.
+And indeed it did! This is a legitimate solution,
+which I arrived at for the wrong reasons.
+There's a small catch too: Hugo prides itself for its speed,
+but passing all content through an external program slows it down a lot...
+Still, my 200-page website could be built in under 10 seconds, so it wasn't bad.
+
+
+
+## Phase 2: screw JavaScript
+
+MathJax is pretty unwieldy:
+first, the 1.2MiB minified `tex-chtml.js` script needs to be fetched
+(plus fonts, or 2.1MiB for `tex-svg.js` that includes fonts),
+and then it takes some time to process all the formulas on a page.
+Although modern systems can run JS very fast,
+that's a lot of JS, resulting in a noticeable delay when loading a page.
+We can do better than that.
+
+To try to reduce the amount of transferred data,
+I even did a custom minimal build of MathJax,
+and ended up at 1.6MiB (about 450KiB after Brotli compression).
+But for some reason (I vaguely remember worrying about fonts)
+I chose the SVG renderer instead of CHTML,
+so I even could've reduced it to less than 1MiB probably.
+This size reduction wasn't free though:
+I stripped as many features as I could,
+including MathJax's accessibility options... oh dear.
+
+Luckily, MathJax has a competitor, [KaTeX](https://katex.org/).
+According to [this test](https://www.intmath.com/cg5/katex-mathjax-comparison.php),
+KaTeX is consistently faster:
+on my system it needs roughly 150ms to run,
+compared to 900ms and 300ms for Mathjax v2 and v3, respectively,
+and the `katex.min.js` script is only 270KiB
+(note that this competes with `tex-chtml.js`, not `tex-svg.js`; KaTeX doesn't have an SVG backend).
+Very impressive! It even has MathML output for accessibility.
+
+But KaTeX has one killer feature: server-side rendering as a first-class citizen.
+MathJax seems to have this too thanks to [`MathJax-node`](https://github.com/mathjax/MathJax-node)
+(just look up "mathjax server-side"), but it doesn't seem to be popular,
+except for use with bigger web frameworks.
+[This post](https://a3nm.net/blog/selfhost_mathjax.html)
+describes what I want,
+but I think I had problems with the `page2html` script they use...
+I don't remember exactly why, but I decided it wasn't practical
+to render MathJax at site build time.
+So KaTeX it is.
+
+To make this work, I needed to migrate to [Jekyll](https://jekyllrb.com/).
+To render maths while Jekyll builds the website,
+someone made the [`jekyll-katex`](https://github.com/linjer/jekyll-katex) plugin,
+and it works great!
+The Markdown source just needs to be wrapped in a `{% raw %}{%katexmm%}{% endraw %}` block,
+and then all occurrences of `$` and `$$` are processed by KaTeX.
+However, this needs to happen *before* Jekyll turns the Markdown into HTML,
+so the most obvious HTML template won't work:
+
+```liquid
+{% raw %}{% comment %} Problem: {{page.content}} is implicitly "markdownified" to HTML {% endcomment %}
+{% katexmm %}
+ {{ page.content }}
+{% endkatexmm %}{% endraw %}
+```
+
+Instead, this slightly awkward construction is needed,
+but it isn't too bad:
+
+```liquid
+{% raw %}{% comment %} Solution: use explicit "markdownify" filter {% endcomment %}
+{% capture content_after_katex %}
+ {% katexmm %}
+ {{ page.content }}
+ {% endkatexmm %}
+{% endcapture %}
+{{ content_after_katex | markdownify }}{% endraw %}
+```
+
+Or you could explicitly wrap all your text in `{% raw %}{%katexmm%}{% endraw %}`
+in each content file,
+and use `{% raw %}{{content}}{% endraw %}` in the template
+instead of `{% raw %}{{page.content}}{% endraw %}`.
+
+
+
+## Phase 3: rewriting physics
+
+Going from MathJax to KaTeX was easier said than done.
+See, we physicists have a nasty habit of using
+the [`physics`](https://www.ctan.org/pkg/physics) package,
+which provides some useful macros, notably for derivatives
+(why isn't this standard in LaTeX?
+I guess AMS' mathematicians lost interest in differentiation;
+they've been too busy in the last decades figuring out how to
+[pack oranges](https://en.wikipedia.org/wiki/Kepler_conjecture) in boxes...).
+MathJax does support it, but KaTeX doesn't,
+so my work wasn't compatible. Damn.
+
+But macros are, by definition, *macros*,
+so I should be able to implement them myself.
+Luckily, KaTeX even has support for this:
+one of the [options](https://katex.org/docs/options.html)
+I can set is a `macros` object; you can find my collection here
+in [`_config.yml`](/code/prefetch-jekyll/tree/_config.yml).
+However, `jekyll-katex` doesn't support this setting...
+or, well, the published version doesn't:
+upstream merged [this pull request](https://github.com/linjer/jekyll-katex/pull/34)
+over a year ago, but just never bothered to release it.
+No problem, it's easy to patch my local installation.
+
+However, KaTeX' macro support is fairly basic:
+it doesn't allow e.g. asterisks in names or variadic arguments
+(both heavily used by `physics`),
+so I couldn't make one-to-one implementations of the macros I'd used.
+I solved this by splitting the original "smart" macros into several "dumb" ones,
+and updating my files by regex substitution.
+Below are examples of the petty incompatibilities I had to deal with.
+Yes, that's a lot of backslashes: one for the shell, and one for `sed`.
+No, I won't clean it up, I want you to suffer like I did:
+
+```sh
+# MathJax' "physics" and KaTeX' "braket" package aren't interoperable.
+# Some macros are similar, but we want to keep the automatic resizing:
+{% comment %}$ sed -i -E "s/\\\\bra\\{/\\\\Bra\\{/g" $FILE
+$ sed -i -E "s/\\\\bra\\*\\{/\\\\bra\\{/g" $FILE{% endcomment -%}
+$ sed -i -E "s/\\\\ket\\{/\\\\Ket\\{/g" $FILE
+$ sed -i -E "s/\\\\ket\\*\\{/\\\\ket\\{/g" $FILE
+# Sometimes translation isn't so trivial, so we need to define proxy macros:
+$ sed -i -E "s/\\\\expval\\{/\\\\Expval\\{/g" $FILE
+$ sed -i -E "s/\\\\expval\\*\\{/\\\\expval\\{/g" $FILE
+$ sed -i -E "s/\\\\expval\\*\\*\\{/\\\\Expval\\{/g" $FILE
+{% comment %}$ sed -i -E "s/\\\\matrixel\\{/\\\\matrixel\\{/g" $FILE
+$ sed -i -E "s/\\\\matrixel\\*\\{/\\\\matrixel\\{/g" $FILE
+$ sed -i -E "s/\\\\matrixel\\*\\*\\{/\\\\Matrixel\\{/g" $FILE{% endcomment -%}
+# And sometimes it gets a bit awkward to name those proxies...
+$ sed -i -E "s/\\\\braket\\{/\\\\Inprod\\{/g" $FILE
+$ sed -i -E "s/\\\\braket\\*\\{/\\\\inprod\\{/g" $FILE
+{% comment %}$ sed -i -E "s/\\\\dyad\\{/\\\\Exprod\\{/g" $FILE
+$ sed -i -E "s/\\\\dyad\\*\\{/\\\\exprod\\{/g" $FILE{% endcomment -%}
+```
+
+All in all, not bad at all, I just need to remember to
+write `\Inprod` instead of `\braket` in the future.
+However, some of the macros, especially for differentiation,
+weren't so easy to translate,
+due to the flexibility `physics` gives you
+in how you pass arguments.
+For example, for a partial derivative `\pdv`,
+consider the following (bugged) substitutions to cover all possibilities:
+
+```sh
+# \pdv{f}{x}{y} => \mpdv{f}{x}{y}
+$ sed -i -E "s/\\\\pdv\\{([^}]*)\\}\\{([^}]*)\\}\\{([^}]*)\\}/\\\\mpdv\\{\\1\\}\\{\\2\\}\\{\\3\\}/g" $FILE
+# \pdv{x} => \pdv{}{x}
+$ sed -i -E "s/\\\\pdv\\{([^}]*)\\}[^{]/\\\\pdv\\{\\}\\{\\1\\}/g" $FILE
+# \pdv*{x} => \ipdv{}{x}
+$ sed -i -E "s/\\\\pdv\\*\\{([^}]*)\\}[^{]/\\\\ipdv\\{\\}\\{\\1\\}/g" $FILE
+# \pdv*{f}{x} => \ipdv{f}{x}
+$ sed -i -E "s/\\\\pdv\\*/\\\\ipdv/g" $FILE
+# \pdv[n]{x} => \pdvn{n}{}{x}
+$ sed -i -E "s/\\\\pdv\\[([^]]*)\\]\\{([^}]*)\\}[^{]/\\\\pdvn\\{\\1\\}\\{\\}\\{\\2\\}/g" $FILE
+# \pdv[n]{f}{x} => \pdvn{n}{f}{x}
+$ sed -i -E "s/\\\\pdv\\[([^]]*)\\]/\\\\pdvn\\{\\1\\}/g" $FILE
+# \pdv*[n]{x} => \ipdvn{n}{}{x}
+$ sed -i -E "s/\\\\pdv\\*\\[([^]]*)\\]\\{([^}]*)\\}[^{]/\\\\ipdvn\\{\\1\\}\\{\\}\\{\\2\\}/g" $FILE
+# \pdv*[n]{f}{x} => \ipdvn{n}{f}{x}
+$ sed -i -E "s/\\\\pdv\\*\\[([^]]*)\\]/\\\\ipdvn\\{\\1\\}/g" $FILE
+```
+
+Yes, I know some of these could be combined,
+but I was starting to see those backslashes double,
+and I prefer to list all options explicitly anyway.
+
+The "bug" is the fundamental fact that regular expressions
+can't parse context-free grammars:
+in this case, if an argument to `\pdv` contains nested braces `{}`,
+the regex capture group `([^}])` ends prematurely at the first `}`.
+I could've worked around this issue by writing more complex regexes,
+since the nesting is finite and probably no more than two levels deep,
+so it can still be handled by a finite state machine.
+But that would mean a lot more backslashes... no thanks.
+
+So I made a judgment call: I wouldn't trust the substitution 100% anyway,
+so it'd probably be best to just check all 190 pages manually to fix things.
+And that's what I did; it took me five evenings,
+and I indeed caught several unexpected side effects of the transition to Jekyll and KaTeX.
+I even encountered an error from KaTeX' automatic resizing of `|` inside `\Braket`:
+see how I implemented my `\Expval` proxy macro in [`_config.yml`](/code/prefetch-jekyll/tree/_config.yml#n39).
+
+In the end, it was successful:
+I could serve statically rendered maths.
+The [heaviest page](/know/concept/selection-rules/) in the knowledge base
+is converted into a whopping 1.06MB of HTML and MathML...
+but these are very compressible,
+so Brotli is able to bring it down to just 18KB,
+which amounts to a 98% reduction (holy crap).
+Compared to running MathJax, this really improved load times.
+
+
+
+## Phase 4: making it go faster
+
+But `jekyll-katex` slowed down Jekyll *a lot*,
+because a JS interpeter is called to run `katex.min.js`
+for every page that needs it (oh no, I hope it isn't invoked for each formula).
+Once my migration was done,
+a clean build of this site took 6m30s,
+which is long, but not a big problem by itself.
+The real issue was that processing a single maths-heavy page could take up to 15s,
+which is just too long
+while I'm writing and want to see the edits I just made.
+
+Digging deeper, I found that `jekyll-katex` relies on
+[ExecJS](https://github.com/rails/execjs) for JS execution,
+which supports several backends.
+In my case, it was defaulting to [Node.js](https://nodejs.org/en/),
+which is based on Google's [V8](https://en.wikipedia.org/wiki/V8_(JavaScript_engine)) engine.
+Maybe a different backend would be faster?
+First I tried [Duktape](https://github.com/judofyr/duktape.rb),
+but that gave a semantic error, so it can't run `katex.min.js` apparently.
+Then I tried [MiniRacer](https://github.com/rubyjs/mini_racer),
+which also uses V8, so it should work for KaTeX, but it won't be much fa---
+Wait what? It's 800% faster? How is that possible?
+But it uses the same eng--- Sure, whatever, I'll take it.
+
+Thanks to this one weird trick, my website was building in 50s,
+and a single maths-heavy page took no more than 2s,
+as long as I set the `EXECJS_RUNTIME` variable:
+
+```sh
+$ EXECJS_RUNTIME="MiniRacer" bundle exec jekyll serve --livereload --incremental
+```
+
+So far so good, and I published it like this.
+However, Jekyll veterans may have noticed a problem:
+in the template, I'm using `{%raw%}{{page.content}}{%endraw%}`,
+not `{%raw%}{{content}}{%endraw%}`.
+The latter is unavoidably converted to Markdown,
+which is what I want to avoid,
+but `{%raw%}{{page.content}}{%endraw%}` has an unfortunate side effect:
+any Liquid in the page isn't evaluated.
+In other words, with this setup, I couldn't use any templating features inside pages,
+such as using `{%raw%}{%include%}{%endraw%}` for
+[collapsible boxes](/blog/2022/website-adventures-basics/#collapsible-content).
+
+And come on, 50s to build a 200-page static website? We can do better, surely.
+In [this issue](https://github.com/linjer/jekyll-katex/issues/35) on GitHub for `jekyll-katex`,
+I found a link to [this post](https://gendignoux.com/blog/2020/05/23/katex.html),
+revealing that Kramdown, Jekyll's Markdown processor,
+already has support for LaTeX maths
+if an appropriate plugin is installed.
+And I'm in luck: one such plugin is
+[`kramdown-math-katex`](https://github.com/kramdown/math-katex),
+which renders everything offline!
+
+There's just one small wrinkle: according to Kramdown,
+all inline maths formulas need to be enclosed in `$$` instead of `$`.
+Great, time to update all my files again.
+Fortunately, that was pretty easy this time:
+
+```sh
+$ sed -i -E "s/\\\$([^\$]+)\\\$/\\\$\\\$\\1\\\$\\\$/g" */index.md
+```
+
+Kramdown then detects based on the context whether a formula
+should be rendered inline or in display mode.
+In the end, I opted for the related plugin
+[`kramdown-math-sskatex`](https://github.com/kramdown/math-sskatex),
+which allows me to provide my own newer `katex.min.js` file
+but apart from that works the same way.
+
+With some minor reformatting, this plugin ended up working perfectly,
+so I again had statically-rendered maths,
+this time without sacrificing Liquid support.
+And the build times? Down to 5s, a 90% reduction from `jekyll-katex`.
+I still need to use MiniRacer to run JS:
+with Node, the build now takes over 15m30s.
+Why did Node slow down but MiniRacer speed up? Who knows!
+The ways of the Lord are truly mysterious.
+
+
+
+## Conclusion
+
+If you want to add maths to your website,
+the easy solution is to render it on page load.
+For this, I recommend KaTeX over MathJax: it's lighter and faster,
+and it avoids lock-in by disallowing weird LaTeX packages.
+In the end, I think the best solution is server-side rendering,
+which is the lightest and fastest of all,
+although it takes more work to set up.
+This seems to be easier with KaTeX than MathJax for some reason.
+I'm happy with my current implementation.
+