From 75636ed8772512bdf38e3dec431888837eaddc5d Mon Sep 17 00:00:00 2001
From: Prefetch
Date: Mon, 20 Feb 2023 18:08:31 +0100
Subject: Improve knowledge base

---
 source/know/concept/pulay-mixing/index.md | 121 +++++++++++++++---------------
 1 file changed, 62 insertions(+), 59 deletions(-)

(limited to 'source/know/concept/pulay-mixing/index.md')

diff --git a/source/know/concept/pulay-mixing/index.md b/source/know/concept/pulay-mixing/index.md
index 6e809dd..81051f1 100644
--- a/source/know/concept/pulay-mixing/index.md
+++ b/source/know/concept/pulay-mixing/index.md
@@ -8,68 +8,70 @@ layout: "concept"
 ---
 
 Some numerical problems are most easily solved *iteratively*,
-by generating a series $$\rho_1$$, $$\rho_2$$, etc.
-converging towards the desired solution $$\rho_*$$.
+by generating a series of "solutions" $$f_1$$, $$f_2$$, etc.
+converging towards the true $$f_\infty$$.
 **Pulay mixing**, also often called
 **direct inversion in the iterative subspace** (DIIS),
 can speed up the convergence for some types of problems,
 and also helps to avoid periodic divergences.
 
 The key concept it relies on is the **residual vector** $$R_n$$
-of the $$n$$th iteration, which in some way measures the error of the current $$\rho_n$$.
-Its exact definition varies,
-but is generally along the lines of the difference between
-the input of the iteration and the raw resulting output:
+of the $$n$$th iteration, which measures the error of the current $$f_n$$.
+Its exact definition can vary,
+but it is generally the difference between
+the input $$f_n$$ of the $$n$$th iteration
+and the raw resulting output $$f_n^\mathrm{new}$$:
 
 $$\begin{aligned}
 	R_n
-	= R[\rho_n]
-	= \rho_n^\mathrm{new}[\rho_n] - \rho_n
+	\equiv R[f_n]
+	\equiv f_n^\mathrm{new}[f_n] - f_n
 \end{aligned}$$
 
-It is not always clear what to do with $$\rho_n^\mathrm{new}$$.
-Directly using it as the next input ($$\rho_{n+1} = \rho_n^\mathrm{new}$$)
+It is not always clear what to do with $$f_n^\mathrm{new}$$.
+Directly using it as the next input ($$f_{n+1} = f_n^\mathrm{new}$$)
 often leads to oscillation,
-and linear mixing ($$\rho_{n+1} = (1\!-\!f) \rho_n + f \rho_n^\mathrm{new}$$)
+and linear mixing ($$f_{n+1} = (1\!-\!c) f_n + c f_n^\mathrm{new}$$)
 can take a very long time to converge properly.
 Pulay mixing offers an improvement.
 
-The idea is to construct the next iteration's input $$\rho_{n+1}$$
-as a linear combination of the previous inputs $$\rho_1$$, $$\rho_2$$, ..., $$\rho_n$$,
-such that it is as close as possible to the optimal $$\rho_*$$:
+The idea is to construct the next iteration's input $$f_{n+1}$$
+as a linear combination of the previous inputs $$f_1$$, $$f_2$$, ..., $$f_n$$,
+such that it is as close as possible to the optimal $$f_\infty$$:
 
 $$\begin{aligned}
 	\boxed{
-		\rho_{n+1}
-		= \sum_{m = 1}^n \alpha_m \rho_m
+		f_{n+1}
+		= \sum_{m = 1}^n \alpha_m f_m
 	}
 \end{aligned}$$
 
 To do so, we make two assumptions.
-Firstly, the current $$\rho_n$$ is already close to $$\rho_*$$,
-so that such a linear combination makes sense.
-Secondly, the iteration is linear,
-such that the raw output $$\rho_{n+1}^\mathrm{new}$$
-is also a linear combination with the *same coefficients*:
+First, that the current $$f_n$$ is already close to $$f_\infty$$,
+so such a linear combination makes sense.
+Second, that the iteration is linear,
+so the raw output $$f_{n+1}^\mathrm{new}$$
+is also a linear combination *with the same coefficients*:
 
 $$\begin{aligned}
-	\rho_{n+1}^\mathrm{new}
-	= \sum_{m = 1}^n \alpha_m \rho_m^\mathrm{new}
+	f_{n+1}^\mathrm{new}
+	= \sum_{m = 1}^n \alpha_m f_m^\mathrm{new}
 \end{aligned}$$
 
-We will return to these assumptions later.
-The point is that $$R_{n+1}$$ is also a linear combination:
+We will revisit these assumptions later.
+The point is that $$R_{n+1}$$ can now also be written
+as a linear combination of old residuals $$R_m$$:
 
 $$\begin{aligned}
 	R_{n+1}
-	= \rho_{n+1}^\mathrm{new} - \rho_{n+1}
-	= \sum_{m = 1}^n \alpha_m \rho_m^\mathrm{new} - \sum_{m = 1}^n \alpha_m \rho_m
+	= f_{n+1}^\mathrm{new} - f_{n+1}
+	= \sum_{m = 1}^n \alpha_m f_m^\mathrm{new} - \sum_{m = 1}^n \alpha_m f_m
 	= \sum_{m = 1}^n \alpha_m R_m
 \end{aligned}$$
 
 The goal is to choose the coefficients $$\alpha_m$$ such that
 the norm of the error $$|R_{n+1}| \approx 0$$,
-subject to the following constraint to preserve the normalization of $$\rho_{n+1}$$:
+subject to the following constraint to preserve the normalization of $$f_{n+1}$$:
 
 $$\begin{aligned}
 	\sum_{m=1}^n \alpha_m = 1
@@ -79,20 +81,19 @@ We thus want to minimize the following quantity,
 where $$\lambda$$ is a [Lagrange multiplier](/know/concept/lagrange-multiplier/):
 
 $$\begin{aligned}
-	\Inprod{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m^*
-	= \sum_{m=1}^n \alpha_m^* \Big( \sum_{k=1}^n \alpha_k \Inprod{R_m}{R_k} + \lambda \Big)
+	\inprod{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m^*
+	= \sum_{m=1}^n \alpha_m^* \Big( \sum_{k=1}^n \alpha_k \inprod{R_m}{R_k} + \lambda \Big)
 \end{aligned}$$
 
 By differentiating the right-hand side with respect to $$\alpha_m^*$$
 and demanding that the result is zero,
-we get a system of equations that we can write in matrix form,
-which is cheap to solve:
+we get a cheap-to-solve system of equations, in matrix form:
 
 $$\begin{aligned}
 	\begin{bmatrix}
-		\Inprod{R_1}{R_1} & \cdots & \Inprod{R_1}{R_n} & 1 \\
+		\inprod{R_1}{R_1} & \cdots & \inprod{R_1}{R_n} & 1 \\
 		\vdots & \ddots & \vdots & \vdots \\
-		\Inprod{R_n}{R_1} & \cdots & \Inprod{R_n}{R_n} & 1 \\
+		\inprod{R_n}{R_1} & \cdots & \inprod{R_n}{R_n} & 1 \\
 		1 & \cdots & 1 & 0
 	\end{bmatrix}
 	\cdot
@@ -106,48 +107,50 @@ $$\begin{aligned}
 \end{aligned}$$
 
 From this, we can also see that the Lagrange multiplier
-$$\lambda = - \Inprod{R_{n+1}}{R_{n+1}}$$,
+$$\lambda = - \inprod{R_{n+1}}{R_{n+1}}$$,
 where $$R_{n+1}$$ is the *predicted* residual of the next iteration,
 subject to the two assumptions.
+This fact makes $$\lambda$$ a useful measure of convergence.
 
-However, in practice, the earlier inputs $$\rho_1$$, $$\rho_2$$, etc.
-are much further from $$\rho_*$$ than $$\rho_n$$,
-so usually only the most recent $$N\!+\!1$$ inputs $$\rho_{n - N}$$, ..., $$\rho_n$$ are used:
+In practice, the earlier inputs $$f_1$$, $$f_2$$, etc.
+are much further from $$f_\infty$$ than $$f_n$$,
+so usually only the most recent $$N\!+\!1$$ inputs $$f_{n - N}, ..., f_n$$ are used.
+This also keeps the matrix small:
 
 $$\begin{aligned}
-	\rho_{n+1}
-	= \sum_{m = n-N}^n \alpha_m \rho_m
+	f_{n+1}
+	= \sum_{m = n-N}^n \alpha_m f_m
 \end{aligned}$$
 
-You might be confused by the absence of any $$\rho_m^\mathrm{new}$$
-in the creation of $$\rho_{n+1}$$, as if the iteration's outputs are being ignored.
+You might be confused by the absence of any $$f_m^\mathrm{new}$$
+in the creation of $$f_{n+1}$$, as if the iteration's outputs are being ignored.
 This is due to the first assumption,
-which states that $$\rho_n^\mathrm{new}$$ and $$\rho_n$$ are already similar,
+which states that $$f_n^\mathrm{new}$$ and $$f_n$$ are already similar,
 such that they are basically interchangeable.
 
-Speaking of which, about those assumptions:
-while they will clearly become more accurate as $$\rho_n$$ approaches $$\rho_*$$,
-they might be very dubious in the beginning.
-A consequence of this is that the early iterations might get "trapped"
-in a suboptimal subspace spanned by $$\rho_1$$, $$\rho_2$$, etc.
-To say it another way, we would be varying $$n$$ coefficients $$\alpha_m$$
-to try to optimize a $$D$$-dimensional $$\rho_{n+1}$$,
-where in general $$D \gg n$$, at least in the beginning.
-
-There is an easy fix to this problem:
-add a small amount of the raw residual $$R_m$$
-to "nudge" $$\rho_{n+1}$$ towards the right subspace,
+Although those assumptions will clearly become more realistic as $$f_n \to f_\infty$$,
+they might be very dubious at first.
+Consequently, the early iterations may get "trapped"
+in a suboptimal subspace spanned by $$f_1$$, $$f_2$$, etc.
+Think of it like this:
+we would be varying up to $$n$$ coefficients $$\alpha_m$$
+to try to optimize a $$D$$-dimensional $$f_{n+1}$$, where usually $$D \gg n$$.
+It is almost impossible to find a decent optimum in this way!
+
+This problem is easy to fix,
+by mixing in a small amount of the raw residuals $$R_m$$
+to "nudge" $$f_{n+1}$$ towards the right subspace,
 where $$\beta \in [0,1]$$ is a tunable parameter:
 
 $$\begin{aligned}
 	\boxed{
-		\rho_{n+1}
-		= \sum_{m = N}^n \alpha_m (\rho_m + \beta R_m)
+		f_{n+1}
+		= \sum_{m = N}^n \alpha_m (f_m + \beta R_m)
 	}
 \end{aligned}$$
 
-In other words, we end up introducing a small amount of the raw outputs $$\rho_m^\mathrm{new}$$,
-while still giving more weight to iterations with smaller residuals.
+In this way, the raw outputs $$f_m^\mathrm{new}$$ are (rightfully) included via $$R_m$$,
+but we still give more weight to iterations with smaller residuals.
 
 Pulay mixing is very effective for certain types of problems,
 e.g. density functional theory,
-- 
cgit v1.2.3