summaryrefslogtreecommitdiff
path: root/source/blog/2022/website-adventures-maths/index.md
blob: 879a7e37e16a836a5ab51f921dd9f3843a2a49ff (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
---
title: "Adventures in making this website:<br>rendering LaTeX maths"
date: 2022-11-28
layout: "blog"
toc: true
---

Published on 2022-11-28, last updated on 2023-02-27.

Making and managing this personal website has been an adventure.
In this series, I go over the technical challenges I've encountered
and philosophical decisions I've made,
and I review some of the tools I've used along the way.
After [part 1](/blog/2022/website-adventures-generators/)
and [part 2](/blog/2022/website-adventures-basics/),
this is part 3,
followed by [part 4](/blog/2023/website-adventures-images/).



In late 2020, I decided to start a [knowledge base](/know/),
where I wanted to upload some of the physics notes I'd made for myself.
Those notes were in LaTeX,
which is the *lingua franca* for writing maths on computers,
but unfortunately the web can't display it out-of-the-box,
so some work is needed.
This post documents the wild journey I was sent on by a simple misunderstanding,
leading me to my current (quite decent) maths-rendering solution.



## Phase 0: the easy way

The most famous solution is perhaps
[MathJax](https://www.mathjax.org/), a JavaScript package
that locally renders all LaTeX maths once the page has loaded.
Enabling it for your website is easy:
just put the following code
(copy-pasted from [the docs](https://docs.mathjax.org/en/latest/web/configuration.html#configuration-using-an-in-line-script))
into the `<head>` of your HTML template:

```html
<script>
MathJax = {
  tex: {
    inlineMath: [['$', '$'], ['\\(', '\\)']]
  },
  svg: {
    fontCache: 'global'
  }
};
</script>
<script type="text/javascript" id="MathJax-script" async
  src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js">
</script>
```

And then enclose your LaTeX in `$` or `$$` for inline or display-mode maths, respectively.
In theory, this should work for all static site generators (SSGs)
as long as they preserve the `$`-symbols into the final HTML files.
This should be the case for [Zola](https://www.getzola.org/),
the SSG I originally built my site in.

For the sake of completeness, I should point out that
`tex-svg.js` isn't the only available flavour of MathJax;
they're all listed [here](https://github.com/mathjax/MathJax/tree/master/es5),
where `tex` and `mml` refer to the input formats
LaTeX and [MathML](https://en.wikipedia.org/wiki/MathML),
while `svg` and `chtml` are the output formats:
scalable vector graphics,
or a bunch of inline-styled `<span>`s called CommonHTML,
respectively.
The `full` files enable all LaTeX extensions offered by MathJax,
whereas the others fetch those on demand.

And... that's it! If you enjoyed this guide, please like and subscribe or something like that!



## Phase 1: pandoc's red herring

... at least, that's what I *would've* done,
if I hadn't been led astray while researching this.

You see, during my research,
one of the options I explored was [pandoc](https://pandoc.org/),
a fairly well-known tool to convert documents between markup formats.
Pandoc recognizes LaTeX maths,
and transfers it to HTML for rendering by e.g. MathJax.
From the [manual](https://pandoc.org/MANUAL.html#math-rendering-in-html)
(emphasis mine):

<blockquote markdown="1">
`--mathjax`:
*Use MathJax to display embedded TeX math in HTML output. TeX math will be put between*
`\(...\)` *(for inline math) or*
`\[...\]` *(for display math)* ***and wrapped in***
`<span>` ***tags with class*** `math`.
*Then the MathJax JavaScript will render it.*
</blockquote>

When I first read that, I misunderstood the part about `<span>` tags:
I thought they were needed for MathJax to render the formulas.
This is false: pandoc only adds those tags for convenience when you write CSS rules.
But it was too late: this misunderstanding, and the fact that few SSGs' manuals
explicitly mention maths, left me thinking that my options were very limited.

So I migrated to [Hugo](https://gohugo.io/),
since it [supports](https://gohugo.io/content-management/formats/#list-of-content-formats)
pandoc's Markdown dialect as a content format,
which it processes by passing it through the `pandoc` executable.
It even gives `--mathjax` as argument, so I could be sure this would work.
And indeed it did! This is a legitimate solution,
which I arrived at for the wrong reasons.
There's a small catch too: Hugo prides itself for its speed,
but passing all content through an external program slows it down a lot...
Still, my 200-page website could be built in under 10 seconds, so it wasn't bad.



## Phase 2: screw JavaScript

MathJax is pretty unwieldy:
first, the 1.2MiB minified `tex-chtml.js` script needs to be fetched
(plus fonts, or 2.1MiB for `tex-svg.js` that includes fonts),
and then it takes some time to process all the formulas on a page.
Although modern systems can run JS very fast,
that's a lot of JS, resulting in a noticeable delay when loading a page.
We can do better than that.

To try to reduce the amount of transferred data,
I even did a custom minimal build of MathJax,
and ended up at 1.6MiB (about 450KiB after Brotli compression).
But for some reason (I vaguely remember worrying about fonts)
I chose the SVG renderer instead of CHTML,
so I even could've reduced it to less than 1MiB probably.
This size reduction wasn't free though:
I stripped as many features as I could,
including MathJax's accessibility options... oh dear.

Luckily, MathJax has a competitor, [KaTeX](https://katex.org/).
According to [this test](https://www.intmath.com/cg5/katex-mathjax-comparison.php),
KaTeX is consistently faster:
on my system it needs roughly 150ms to run,
compared to 900ms and 300ms for Mathjax v2 and v3, respectively,
and the `katex.min.js` script is only 270KiB
(note that this competes with `tex-chtml.js`, not `tex-svg.js`; KaTeX doesn't have an SVG backend).
Very impressive! It even has MathML output for accessibility.

But KaTeX has one killer feature: server-side rendering as a first-class citizen.
MathJax seems to have this too thanks to [`MathJax-node`](https://github.com/mathjax/MathJax-node)
(just look up "mathjax server-side"), but it doesn't seem to be popular,
except for use with bigger web frameworks.
[This post](https://a3nm.net/blog/selfhost_mathjax.html)
describes what I want,
but I think I had problems with the `page2html` script they use...
I don't remember exactly why, but I decided it wasn't practical
to render MathJax at site build time.
So KaTeX it is.

To make this work, I needed to migrate to [Jekyll](https://jekyllrb.com/).
To render maths while Jekyll builds the website,
someone made the [`jekyll-katex`](https://github.com/linjer/jekyll-katex) plugin,
and it works great!
The Markdown source just needs to be wrapped in a `{% raw %}{%katexmm%}{% endraw %}` block,
and then all occurrences of `$` and `$$` are processed by KaTeX.
However, this needs to happen *before* Jekyll turns the Markdown into HTML,
so the most obvious HTML template won't work:

```liquid
{% raw %}{% comment %} Problem: {{page.content}} is implicitly "markdownified" to HTML {% endcomment %}
{% katexmm %}
  {{ page.content }}
{% endkatexmm %}{% endraw %}
```

Instead, this slightly awkward construction is needed,
but it isn't too bad:

```liquid
{% raw %}{% comment %} Solution: use explicit "markdownify" filter {% endcomment %}
{% capture content_after_katex %}
  {% katexmm %}
    {{ page.content }}
  {% endkatexmm %}
{% endcapture %}
{{ content_after_katex | markdownify }}{% endraw %}
```

Or you could explicitly wrap all your text in `{% raw %}{%katexmm%}{% endraw %}`
in each content file,
and use `{% raw %}{{content}}{% endraw %}` in the template
instead of `{% raw %}{{page.content}}{% endraw %}`.



## Phase 3: rewriting physics

Going from MathJax to KaTeX was easier said than done.
See, we physicists have a nasty habit of using
the [`physics`](https://www.ctan.org/pkg/physics) package,
which provides some useful macros, notably for derivatives
(why isn't this standard in LaTeX?
I guess AMS' mathematicians lost interest in differentiation;
they've been too busy in the last decades figuring out how to
[pack oranges](https://en.wikipedia.org/wiki/Kepler_conjecture) in boxes...).
MathJax does support it, but KaTeX doesn't,
so my work wasn't compatible. Damn.

But macros are, by definition, *macros*,
so I should be able to implement them myself.
Luckily, KaTeX even has support for this:
one of the [options](https://katex.org/docs/options.html)
I can set is a `macros` object; you can find my collection here
in [`_config.yml`](/code/prefetch-jekyll/tree/_config.yml).
However, `jekyll-katex` doesn't support this setting...
or, well, the published version doesn't:
upstream merged [this pull request](https://github.com/linjer/jekyll-katex/pull/34)
over a year ago, but just never bothered to release it.
No problem, it's easy to patch my local installation.

However, KaTeX' macro support is fairly basic:
it doesn't allow e.g. asterisks in names or variadic arguments
(both heavily used by `physics`),
so I couldn't make one-to-one implementations of the macros I'd used.
I solved this by splitting the original "smart" macros into several "dumb" ones,
and updating my files by regex substitution.
Below are examples of the petty incompatibilities I had to deal with.
Yes, that's a lot of backslashes: one for the shell, and one for `sed`.
No, I won't clean it up, I want you to suffer like I did:

```sh
# MathJax' "physics" and KaTeX' "braket" package aren't interoperable.
# Some macros are similar, but we want to keep the automatic resizing:
{% comment %}$ sed -i -E "s/\\\\bra\\{/\\\\Bra\\{/g"    $FILE
$ sed -i -E "s/\\\\bra\\*\\{/\\\\bra\\{/g" $FILE{% endcomment -%}
$ sed -i -E "s/\\\\ket\\{/\\\\Ket\\{/g"    $FILE
$ sed -i -E "s/\\\\ket\\*\\{/\\\\ket\\{/g" $FILE
# Sometimes translation isn't so trivial, so we need to define proxy macros:
$ sed -i -E "s/\\\\expval\\{/\\\\Expval\\{/g"       $FILE
$ sed -i -E "s/\\\\expval\\*\\{/\\\\expval\\{/g"    $FILE
$ sed -i -E "s/\\\\expval\\*\\*\\{/\\\\Expval\\{/g" $FILE
{% comment %}$ sed -i -E "s/\\\\matrixel\\{/\\\\matrixel\\{/g"       $FILE
$ sed -i -E "s/\\\\matrixel\\*\\{/\\\\matrixel\\{/g"    $FILE
$ sed -i -E "s/\\\\matrixel\\*\\*\\{/\\\\Matrixel\\{/g" $FILE{% endcomment -%}
# And sometimes it gets a bit awkward to name those proxies...
$ sed -i -E "s/\\\\braket\\{/\\\\Inprod\\{/g"    $FILE
$ sed -i -E "s/\\\\braket\\*\\{/\\\\inprod\\{/g" $FILE
{% comment %}$ sed -i -E "s/\\\\dyad\\{/\\\\Exprod\\{/g"    $FILE
$ sed -i -E "s/\\\\dyad\\*\\{/\\\\exprod\\{/g" $FILE{% endcomment -%}
```

All in all, not bad at all, I just need to remember to
write `\Inprod` instead of `\braket` in the future.
However, some of the macros, especially for differentiation,
weren't so easy to translate,
due to the flexibility `physics` gives you
in how you pass arguments.
For example, for a partial derivative `\pdv`,
consider the following (bugged) substitutions to cover all possibilities:

```sh
# \pdv{f}{x}{y}  => \mpdv{f}{x}{y}
$ sed -i -E "s/\\\\pdv\\{([^}]*)\\}\\{([^}]*)\\}\\{([^}]*)\\}/\\\\mpdv\\{\\1\\}\\{\\2\\}\\{\\3\\}/g" $FILE
# \pdv{x}        => \pdv{}{x}
$ sed -i -E "s/\\\\pdv\\{([^}]*)\\}[^{]/\\\\pdv\\{\\}\\{\\1\\}/g" $FILE
# \pdv*{x}       => \ipdv{}{x}
$ sed -i -E "s/\\\\pdv\\*\\{([^}]*)\\}[^{]/\\\\ipdv\\{\\}\\{\\1\\}/g" $FILE
# \pdv*{f}{x}    => \ipdv{f}{x}
$ sed -i -E "s/\\\\pdv\\*/\\\\ipdv/g" $FILE
# \pdv[n]{x}     => \pdvn{n}{}{x}
$ sed -i -E "s/\\\\pdv\\[([^]]*)\\]\\{([^}]*)\\}[^{]/\\\\pdvn\\{\\1\\}\\{\\}\\{\\2\\}/g" $FILE
# \pdv[n]{f}{x}  => \pdvn{n}{f}{x}
$ sed -i -E "s/\\\\pdv\\[([^]]*)\\]/\\\\pdvn\\{\\1\\}/g" $FILE
# \pdv*[n]{x}    => \ipdvn{n}{}{x}
$ sed -i -E "s/\\\\pdv\\*\\[([^]]*)\\]\\{([^}]*)\\}[^{]/\\\\ipdvn\\{\\1\\}\\{\\}\\{\\2\\}/g" $FILE
# \pdv*[n]{f}{x} => \ipdvn{n}{f}{x}
$ sed -i -E "s/\\\\pdv\\*\\[([^]]*)\\]/\\\\ipdvn\\{\\1\\}/g" $FILE
```

Yes, I know some of these could be combined,
but I was starting to see those backslashes double,
and I prefer to list all options explicitly anyway.

The "bug" is the fundamental fact that regular expressions
can't parse context-free grammars:
in this case, if an argument to `\pdv` contains nested braces `{}`,
the regex capture group `([^}])` ends prematurely at the first `}`.
I could've worked around this issue by writing more complex regexes,
since the nesting is finite and probably no more than two levels deep,
so it can still be handled by a finite state machine.
But that would mean a lot more backslashes... no thanks.

So I made a judgment call: I wouldn't trust the substitution 100% anyway,
so it'd probably be best to just check all 190 pages manually to fix things.
And that's what I did; it took me five evenings,
and I indeed caught several unexpected side effects of the transition to Jekyll and KaTeX.
I even encountered an error from KaTeX' automatic resizing of `|` inside `\Braket`:
see how I implemented my `\Expval` proxy macro in [`_config.yml`](/code/prefetch-jekyll/tree/_config.yml#n39).

In the end, it was successful:
I could serve statically rendered maths.
The [heaviest page](/know/concept/selection-rules/) in the knowledge base
is converted into a whopping 1.06MB of HTML and MathML...
but these are very compressible,
so Brotli is able to bring it down to just 18KB,
which amounts to a 98% reduction (holy crap).
Compared to running MathJax, this really improved load times.



## Phase 4: making it go faster

But `jekyll-katex` slowed down Jekyll *a lot*,
because a JS interpeter is called to run `katex.min.js`
for every page that needs it (oh no, I hope it isn't invoked for each formula).
Once my migration was done,
a clean build of this site took 6m30s,
which is long, but not a big problem by itself.
The real issue was that processing a single maths-heavy page could take up to 15s,
which is just too long
while I'm writing and want to see the edits I just made.

Digging deeper, I found that `jekyll-katex` relies on
[ExecJS](https://github.com/rails/execjs) for JS execution,
which supports several backends.
In my case, it was defaulting to [Node.js](https://nodejs.org/en/),
which is based on Google's [V8](https://en.wikipedia.org/wiki/V8_(JavaScript_engine)) engine.
Maybe a different backend would be faster?
First I tried [Duktape](https://github.com/judofyr/duktape.rb),
but that gave a semantic error, so it can't run `katex.min.js` apparently.
Then I tried [MiniRacer](https://github.com/rubyjs/mini_racer),
which also uses V8, so it should work for KaTeX, but it won't be much fa---
Wait what? It's 800% faster? How is that possible?
But it uses the same eng--- Sure, whatever, I'll take it.

Thanks to this one weird trick, my website was building in 50s,
and a single maths-heavy page took no more than 2s,
as long as I set the `EXECJS_RUNTIME` variable:

```sh
$ EXECJS_RUNTIME="MiniRacer" bundle exec jekyll serve --livereload --incremental
```

So far so good, and I published it like this.
However, Jekyll veterans may have noticed a problem:
in the template, I'm using `{%raw%}{{page.content}}{%endraw%}`,
not `{%raw%}{{content}}{%endraw%}`.
The latter is unavoidably converted to Markdown,
which is what I want to avoid,
but `{%raw%}{{page.content}}{%endraw%}` has an unfortunate side effect:
any Liquid in the page isn't evaluated.
In other words, with this setup, I couldn't use any templating features inside pages,
such as using `{%raw%}{%include%}{%endraw%}` for
[collapsible boxes](/blog/2022/website-adventures-basics/#collapsible-content).

And come on, 50s to build a 200-page static website? We can do better, surely.
In [this issue](https://github.com/linjer/jekyll-katex/issues/35) on GitHub for `jekyll-katex`,
I found a link to [this post](https://gendignoux.com/blog/2020/05/23/katex.html),
revealing that Kramdown, Jekyll's Markdown processor,
already has support for LaTeX maths
if an appropriate plugin is installed.
And I'm in luck: one such plugin is
[`kramdown-math-katex`](https://github.com/kramdown/math-katex),
which renders everything offline!

There's just one small wrinkle: according to Kramdown,
all inline maths formulas need to be enclosed in `$$` instead of `$`.
Great, time to update all my files again.
Fortunately, that was pretty easy this time:

```sh
$ sed -i -E "s/\\\$([^\$]+)\\\$/\\\$\\\$\\1\\\$\\\$/g" */index.md
```

Kramdown then detects based on the context whether a formula
should be rendered inline or in display mode.
In the end, I opted for the related plugin
[`kramdown-math-sskatex`](https://github.com/kramdown/math-sskatex),
which allows me to provide my own newer `katex.min.js` file
but apart from that works the same way.

With some minor reformatting, this plugin ended up working perfectly,
so I again had statically-rendered maths,
this time without sacrificing Liquid support.
And the build times? Down to 5s, a 90% reduction from `jekyll-katex`.
I still need to use MiniRacer to run JS:
with Node, the build now takes over 15m30s.
Why did Node slow down but MiniRacer speed up? Who knows!
The ways of the Lord are truly mysterious.



## Conclusion

If you want to add maths to your website,
the easy solution is to render it on page load.
For this, I recommend KaTeX over MathJax: it's lighter and faster,
and it avoids lock-in by disallowing weird LaTeX packages.
In the end, I think the best solution is server-side rendering,
which is the lightest and fastest of all,
although it takes more work to set up.
This seems to be easier with KaTeX than MathJax for some reason.
I'm happy with my current implementation.