From f0dc83f5ca9fc934081905f8689f8a35f8fbacaa Mon Sep 17 00:00:00 2001 From: Prefetch Date: Mon, 28 Nov 2022 21:21:44 +0100 Subject: Publish "Website adventures" part 3 about LaTeX maths --- .../blog/2022/website-adventures-basics/index.md | 5 +- .../2022/website-adventures-generators/index.md | 6 +- source/blog/2022/website-adventures-maths/index.md | 404 +++++++++++++++++++++ 3 files changed, 410 insertions(+), 5 deletions(-) create mode 100644 source/blog/2022/website-adventures-maths/index.md (limited to 'source/blog/2022') diff --git a/source/blog/2022/website-adventures-basics/index.md b/source/blog/2022/website-adventures-basics/index.md index 31e0fc3..64ed446 100644 --- a/source/blog/2022/website-adventures-basics/index.md +++ b/source/blog/2022/website-adventures-basics/index.md @@ -5,14 +5,15 @@ layout: "blog" toc: true --- -Published on 2022-11-20. +Published on 2022-11-20, last updated on 2022-11-28. Making and managing this personal website has been an adventure. In this series, I go over the technical challenges I've encountered and philosophical decisions I've made, and I review some of the tools I've used along the way. After [part 1](/blog/2022/website-adventures-generators/), -this is part 2, with more coming soon. +this is part 2, +followed by [part 3](/blog/2022/website-adventures-maths/). diff --git a/source/blog/2022/website-adventures-generators/index.md b/source/blog/2022/website-adventures-generators/index.md index 7e52f17..42a6230 100644 --- a/source/blog/2022/website-adventures-generators/index.md +++ b/source/blog/2022/website-adventures-generators/index.md @@ -5,14 +5,14 @@ layout: "blog" toc: true --- -Published on 2022-11-15, last updated on 2022-11-20. +Published on 2022-11-15, last updated on 2022-11-28. Making and managing this personal website has been an adventure. In this series, I go over the technical challenges I've encountered and philosophical decisions I've made, and I review some of the tools I've used along the way. -This is part 1, followed by [part 2](/blog/2022/website-adventures-basics/), -with more coming. +This is part 1, followed by [part 2](/blog/2022/website-adventures-basics/) +and [part 3](/blog/2022/website-adventures-maths/). diff --git a/source/blog/2022/website-adventures-maths/index.md b/source/blog/2022/website-adventures-maths/index.md new file mode 100644 index 0000000..a49b473 --- /dev/null +++ b/source/blog/2022/website-adventures-maths/index.md @@ -0,0 +1,404 @@ +--- +title: "Adventures in making this website:
rendering LaTeX maths" +date: 2022-11-28 +layout: "blog" +toc: true +--- + +Published on 2022-11-28. + +Making and managing this personal website has been an adventure. +In this series, I go over the technical challenges I've encountered +and philosophical decisions I've made, +and I review some of the tools I've used along the way. +After [part 1](/blog/2022/website-adventures-generators/) +and [part 2](/blog/2022/website-adventures-basics/), +this is part 3. + + + +In late 2020, I decided to start a [knowledge base](/know/), +where I wanted to upload some of the physics notes I'd made for myself. +Those notes were in LaTeX, +which is the *lingua franca* for writing maths on computers, +but unfortunately the web can't display it out-of-the-box, +so some work is needed. +This post documents the wild journey I was sent on by a simple misunderstanding, +leading me to my current (quite decent) maths-rendering solution. + + + +## Phase 0: the easy way + +The most famous solution is perhaps +[MathJax](https://www.mathjax.org/), a JavaScript package +that locally renders all LaTeX maths once the page has loaded. +Enabling it for your website is easy: +just put the following code +(copy-pasted from [the docs](https://docs.mathjax.org/en/latest/web/configuration.html#configuration-using-an-in-line-script)) +into the `` of your HTML template: + +```html + + +``` + +And then enclose your LaTeX in `$` or `$$` for inline or display-mode maths, respectively. +In theory, this should work for all static site generators (SSGs) +as long as they preserve the `$`-symbols into the final HTML files. +This should be the case for [Zola](https://www.getzola.org/), +the SSG I originally built my site in. + +For the sake of completeness, I should point out that +`tex-svg.js` isn't the only available flavour of MathJax; +they're all listed [here](https://github.com/mathjax/MathJax/tree/master/es5), +where `tex` and `mml` refer to the input formats +LaTeX and [MathML](https://en.wikipedia.org/wiki/MathML), +while `svg` and `chtml` are the output formats: +scalable vector graphics, +or a bunch of inline-styled ``s called CommonHTML, +respectively. +The `full` files enable all LaTeX extensions offered by MathJax, +whereas the others fetch those on demand. + +And... that's it! If you enjoyed this guide, please like and subscribe or something like that! + + + +## Phase 1: pandoc's red herring + +... at least, that's what I *would've* done, +if I hadn't been led astray while researching this. + +You see, during my research, +one of the options I explored was [pandoc](https://pandoc.org/), +a fairly well-known tool to convert documents between markup formats. +Pandoc recognizes LaTeX maths, +and transfers it to HTML for rendering by e.g. MathJax. +From the [manual](https://pandoc.org/MANUAL.html#math-rendering-in-html) +(emphasis mine): + +
+`--mathjax`: +*Use MathJax to display embedded TeX math in HTML output. TeX math will be put between* +`\(...\)` *(for inline math) or* +`\[...\]` *(for display math)* ***and wrapped in*** +`` ***tags with class*** `math`. +*Then the MathJax JavaScript will render it.* +
+ +When I first read that, I misunderstood the part about `` tags: +I thought they were needed for MathJax to render the formulas. +This is false: pandoc only adds those tags for convenience when you write CSS rules. +But it was too late: this misunderstanding, and the fact that few SSGs' manuals +explicitly mention maths, left me thinking that my options were very limited. + +So I migrated to [Hugo](https://gohugo.io/), +since it [supports](https://gohugo.io/content-management/formats/#list-of-content-formats) +pandoc's Markdown dialect as a content format, +which it processes by passing it through the `pandoc` executable. +It even gives `--mathjax` as argument, so I could be sure this would work. +And indeed it did! This is a legitimate solution, +which I arrived at for the wrong reasons. +There's a small catch too: Hugo prides itself for its speed, +but passing all content through an external program slows it down a lot... +Still, my 200-page website could be built in under 10 seconds, so it wasn't bad. + + + +## Phase 2: screw JavaScript + +MathJax is pretty unwieldy: +first, the 1.2MiB minified `tex-chtml.js` script needs to be fetched +(plus fonts, or 2.1MiB for `tex-svg.js` that includes fonts), +and then it takes some time to process all the formulas on a page. +Although modern systems can run JS very fast, +that's a lot of JS, resulting in a noticeable delay when loading a page. +We can do better than that. + +To try to reduce the amount of transferred data, +I even did a custom minimal build of MathJax, +and ended up at 1.6MiB (about 450KiB after Brotli compression). +But for some reason (I vaguely remember worrying about fonts) +I chose the SVG renderer instead of CHTML, +so I even could've reduced it to less than 1MiB probably. +This size reduction wasn't free though: +I stripped as many features as I could, +including MathJax's accessibility options... oh dear. + +Luckily, MathJax has a competitor, [KaTeX](https://katex.org/). +According to [this test](https://www.intmath.com/cg5/katex-mathjax-comparison.php), +KaTeX is consistently faster: +on my system it needs roughly 150ms to run, +compared to 900ms and 300ms for Mathjax v2 and v3, respectively, +and the `katex.min.js` script is only 270KiB +(note that this competes with `tex-chtml.js`, not `tex-svg.js`; KaTeX doesn't have an SVG backend). +Very impressive! It even has MathML output for accessibility. + +But KaTeX has one killer feature: server-side rendering as a first-class citizen. +MathJax seems to have this too thanks to [`MathJax-node`](https://github.com/mathjax/MathJax-node) +(just look up "mathjax server-side"), but it doesn't seem to be popular, +except for use with bigger web frameworks. +[This post](https://a3nm.net/blog/selfhost_mathjax.html) +describes what I want, +but I think I had problems with the `page2html` script they use... +I don't remember exactly why, but I decided it wasn't practical +to render MathJax at site build time. +So KaTeX it is. + +To make this work, I needed to migrate to [Jekyll](https://jekyllrb.com/). +To render maths while Jekyll builds the website, +someone made the [`jekyll-katex`](https://github.com/linjer/jekyll-katex) plugin, +and it works great! +The Markdown source just needs to be wrapped in a `{% raw %}{%katexmm%}{% endraw %}` block, +and then all occurrences of `$` and `$$` are processed by KaTeX. +However, this needs to happen *before* Jekyll turns the Markdown into HTML, +so the most obvious HTML template won't work: + +```liquid +{% raw %}{% comment %} Problem: {{page.content}} is implicitly "markdownified" to HTML {% endcomment %} +{% katexmm %} + {{ page.content }} +{% endkatexmm %}{% endraw %} +``` + +Instead, this slightly awkward construction is needed, +but it isn't too bad: + +```liquid +{% raw %}{% comment %} Solution: use explicit "markdownify" filter {% endcomment %} +{% capture content_after_katex %} + {% katexmm %} + {{ page.content }} + {% endkatexmm %} +{% endcapture %} +{{ content_after_katex | markdownify }}{% endraw %} +``` + +Or you could explicitly wrap all your text in `{% raw %}{%katexmm%}{% endraw %}` +in each content file, +and use `{% raw %}{{content}}{% endraw %}` in the template +instead of `{% raw %}{{page.content}}{% endraw %}`. + + + +## Phase 3: rewriting physics + +Going from MathJax to KaTeX was easier said than done. +See, we physicists have a nasty habit of using +the [`physics`](https://www.ctan.org/pkg/physics) package, +which provides some useful macros, notably for derivatives +(why isn't this standard in LaTeX? +I guess AMS' mathematicians lost interest in differentiation; +they've been too busy in the last decades figuring out how to +[pack oranges](https://en.wikipedia.org/wiki/Kepler_conjecture) in boxes...). +MathJax does support it, but KaTeX doesn't, +so my work wasn't compatible. Damn. + +But macros are, by definition, *macros*, +so I should be able to implement them myself. +Luckily, KaTeX even has support for this: +one of the [options](https://katex.org/docs/options.html) +I can set is a `macros` object; you can find my collection here +in [`_config.yml`](/code/prefetch-jekyll/tree/_config.yml). +However, `jekyll-katex` doesn't support this setting... +or, well, the published version doesn't: +upstream merged [this pull request](https://github.com/linjer/jekyll-katex/pull/34) +over a year ago, but just never bothered to release it. +No problem, it's easy to patch my local installation. + +However, KaTeX' macro support is fairly basic: +it doesn't allow e.g. asterisks in names or variadic arguments +(both heavily used by `physics`), +so I couldn't make one-to-one implementations of the macros I'd used. +I solved this by splitting the original "smart" macros into several "dumb" ones, +and updating my files by regex substitution. +Below are examples of the petty incompatibilities I had to deal with. +Yes, that's a lot of backslashes: one for the shell, and one for `sed`. +No, I won't clean it up, I want you to suffer like I did: + +```sh +# MathJax' "physics" and KaTeX' "braket" package aren't interoperable. +# Some macros are similar, but we want to keep the automatic resizing: +{% comment %}$ sed -i -E "s/\\\\bra\\{/\\\\Bra\\{/g" $FILE +$ sed -i -E "s/\\\\bra\\*\\{/\\\\bra\\{/g" $FILE{% endcomment -%} +$ sed -i -E "s/\\\\ket\\{/\\\\Ket\\{/g" $FILE +$ sed -i -E "s/\\\\ket\\*\\{/\\\\ket\\{/g" $FILE +# Sometimes translation isn't so trivial, so we need to define proxy macros: +$ sed -i -E "s/\\\\expval\\{/\\\\Expval\\{/g" $FILE +$ sed -i -E "s/\\\\expval\\*\\{/\\\\expval\\{/g" $FILE +$ sed -i -E "s/\\\\expval\\*\\*\\{/\\\\Expval\\{/g" $FILE +{% comment %}$ sed -i -E "s/\\\\matrixel\\{/\\\\matrixel\\{/g" $FILE +$ sed -i -E "s/\\\\matrixel\\*\\{/\\\\matrixel\\{/g" $FILE +$ sed -i -E "s/\\\\matrixel\\*\\*\\{/\\\\Matrixel\\{/g" $FILE{% endcomment -%} +# And sometimes it gets a bit awkward to name those proxies... +$ sed -i -E "s/\\\\braket\\{/\\\\Inprod\\{/g" $FILE +$ sed -i -E "s/\\\\braket\\*\\{/\\\\inprod\\{/g" $FILE +{% comment %}$ sed -i -E "s/\\\\dyad\\{/\\\\Exprod\\{/g" $FILE +$ sed -i -E "s/\\\\dyad\\*\\{/\\\\exprod\\{/g" $FILE{% endcomment -%} +``` + +All in all, not bad at all, I just need to remember to +write `\Inprod` instead of `\braket` in the future. +However, some of the macros, especially for differentiation, +weren't so easy to translate, +due to the flexibility `physics` gives you +in how you pass arguments. +For example, for a partial derivative `\pdv`, +consider the following (bugged) substitutions to cover all possibilities: + +```sh +# \pdv{f}{x}{y} => \mpdv{f}{x}{y} +$ sed -i -E "s/\\\\pdv\\{([^}]*)\\}\\{([^}]*)\\}\\{([^}]*)\\}/\\\\mpdv\\{\\1\\}\\{\\2\\}\\{\\3\\}/g" $FILE +# \pdv{x} => \pdv{}{x} +$ sed -i -E "s/\\\\pdv\\{([^}]*)\\}[^{]/\\\\pdv\\{\\}\\{\\1\\}/g" $FILE +# \pdv*{x} => \ipdv{}{x} +$ sed -i -E "s/\\\\pdv\\*\\{([^}]*)\\}[^{]/\\\\ipdv\\{\\}\\{\\1\\}/g" $FILE +# \pdv*{f}{x} => \ipdv{f}{x} +$ sed -i -E "s/\\\\pdv\\*/\\\\ipdv/g" $FILE +# \pdv[n]{x} => \pdvn{n}{}{x} +$ sed -i -E "s/\\\\pdv\\[([^]]*)\\]\\{([^}]*)\\}[^{]/\\\\pdvn\\{\\1\\}\\{\\}\\{\\2\\}/g" $FILE +# \pdv[n]{f}{x} => \pdvn{n}{f}{x} +$ sed -i -E "s/\\\\pdv\\[([^]]*)\\]/\\\\pdvn\\{\\1\\}/g" $FILE +# \pdv*[n]{x} => \ipdvn{n}{}{x} +$ sed -i -E "s/\\\\pdv\\*\\[([^]]*)\\]\\{([^}]*)\\}[^{]/\\\\ipdvn\\{\\1\\}\\{\\}\\{\\2\\}/g" $FILE +# \pdv*[n]{f}{x} => \ipdvn{n}{f}{x} +$ sed -i -E "s/\\\\pdv\\*\\[([^]]*)\\]/\\\\ipdvn\\{\\1\\}/g" $FILE +``` + +Yes, I know some of these could be combined, +but I was starting to see those backslashes double, +and I prefer to list all options explicitly anyway. + +The "bug" is the fundamental fact that regular expressions +can't parse context-free grammars: +in this case, if an argument to `\pdv` contains nested braces `{}`, +the regex capture group `([^}])` ends prematurely at the first `}`. +I could've worked around this issue by writing more complex regexes, +since the nesting is finite and probably no more than two levels deep, +so it can still be handled by a finite state machine. +But that would mean a lot more backslashes... no thanks. + +So I made a judgment call: I wouldn't trust the substitution 100% anyway, +so it'd probably be best to just check all 190 pages manually to fix things. +And that's what I did; it took me five evenings, +and I indeed caught several unexpected side effects of the transition to Jekyll and KaTeX. +I even encountered an error from KaTeX' automatic resizing of `|` inside `\Braket`: +see how I implemented my `\Expval` proxy macro in [`_config.yml`](/code/prefetch-jekyll/tree/_config.yml#n39). + +In the end, it was successful: +I could serve statically rendered maths. +The [heaviest page](/know/concept/selection-rules/) in the knowledge base +is converted into a whopping 1.06MB of HTML and MathML... +but these are very compressible, +so Brotli is able to bring it down to just 18KB, +which amounts to a 98% reduction (holy crap). +Compared to running MathJax, this really improved load times. + + + +## Phase 4: making it go faster + +But `jekyll-katex` slowed down Jekyll *a lot*, +because a JS interpeter is called to run `katex.min.js` +for every page that needs it (oh no, I hope it isn't invoked for each formula). +Once my migration was done, +a clean build of this site took 6m30s, +which is long, but not a big problem by itself. +The real issue was that processing a single maths-heavy page could take up to 15s, +which is just too long +while I'm writing and want to see the edits I just made. + +Digging deeper, I found that `jekyll-katex` relies on +[ExecJS](https://github.com/rails/execjs) for JS execution, +which supports several backends. +In my case, it was defaulting to [Node.js](https://nodejs.org/en/), +which is based on Google's [V8](https://en.wikipedia.org/wiki/V8_(JavaScript_engine)) engine. +Maybe a different backend would be faster? +First I tried [Duktape](https://github.com/judofyr/duktape.rb), +but that gave a semantic error, so it can't run `katex.min.js` apparently. +Then I tried [MiniRacer](https://github.com/rubyjs/mini_racer), +which also uses V8, so it should work for KaTeX, but it won't be much fa--- +Wait what? It's 800% faster? How is that possible? +But it uses the same eng--- Sure, whatever, I'll take it. + +Thanks to this one weird trick, my website was building in 50s, +and a single maths-heavy page took no more than 2s, +as long as I set the `EXECJS_RUNTIME` variable: + +```sh +$ EXECJS_RUNTIME="MiniRacer" bundle exec jekyll serve --livereload --incremental +``` + +So far so good, and I published it like this. +However, Jekyll veterans may have noticed a problem: +in the template, I'm using `{%raw%}{{page.content}}{%endraw%}`, +not `{%raw%}{{content}}{%endraw%}`. +The latter is unavoidably converted to Markdown, +which is what I want to avoid, +but `{%raw%}{{page.content}}{%endraw%}` has an unfortunate side effect: +any Liquid in the page isn't evaluated. +In other words, with this setup, I couldn't use any templating features inside pages, +such as using `{%raw%}{%include%}{%endraw%}` for +[collapsible boxes](/blog/2022/website-adventures-basics/#collapsible-content). + +And come on, 50s to build a 200-page static website? We can do better, surely. +In [this issue](https://github.com/linjer/jekyll-katex/issues/35) on GitHub for `jekyll-katex`, +I found a link to [this post](https://gendignoux.com/blog/2020/05/23/katex.html), +revealing that Kramdown, Jekyll's Markdown processor, +already has support for LaTeX maths +if an appropriate plugin is installed. +And I'm in luck: one such plugin is +[`kramdown-math-katex`](https://github.com/kramdown/math-katex), +which renders everything offline! + +There's just one small wrinkle: according to Kramdown, +all inline maths formulas need to be enclosed in `$$` instead of `$`. +Great, time to update all my files again. +Fortunately, that was pretty easy this time: + +```sh +$ sed -i -E "s/\\\$([^\$]+)\\\$/\\\$\\\$\\1\\\$\\\$/g" */index.md +``` + +Kramdown then detects based on the context whether a formula +should be rendered inline or in display mode. +In the end, I opted for the related plugin +[`kramdown-math-sskatex`](https://github.com/kramdown/math-sskatex), +which allows me to provide my own newer `katex.min.js` file +but apart from that works the same way. + +With some minor reformatting, this plugin ended up working perfectly, +so I again had statically-rendered maths, +this time without sacrificing Liquid support. +And the build times? Down to 5s, a 90% reduction from `jekyll-katex`. +I still need to use MiniRacer to run JS: +with Node, the build now takes over 15m30s. +Why did Node slow down but MiniRacer speed up? Who knows! +The ways of the Lord are truly mysterious. + + + +## Conclusion + +If you want to add maths to your website, +the easy solution is to render it on page load. +For this, I recommend KaTeX over MathJax: it's lighter and faster, +and it avoids lock-in by disallowing weird LaTeX packages. +In the end, I think the best solution is server-side rendering, +which is the lightest and fastest of all, +although it takes more work to set up. +This seems to be easier with KaTeX than MathJax for some reason. +I'm happy with my current implementation. + -- cgit v1.2.3