summaryrefslogtreecommitdiff
path: root/source/know/concept/pulay-mixing
diff options
context:
space:
mode:
Diffstat (limited to 'source/know/concept/pulay-mixing')
-rw-r--r--source/know/concept/pulay-mixing/index.md121
1 files changed, 62 insertions, 59 deletions
diff --git a/source/know/concept/pulay-mixing/index.md b/source/know/concept/pulay-mixing/index.md
index 6e809dd..81051f1 100644
--- a/source/know/concept/pulay-mixing/index.md
+++ b/source/know/concept/pulay-mixing/index.md
@@ -8,68 +8,70 @@ layout: "concept"
---
Some numerical problems are most easily solved *iteratively*,
-by generating a series $$\rho_1$$, $$\rho_2$$, etc.
-converging towards the desired solution $$\rho_*$$.
+by generating a series of "solutions" $$f_1$$, $$f_2$$, etc.
+converging towards the true $$f_\infty$$.
**Pulay mixing**, also often called
**direct inversion in the iterative subspace** (DIIS),
can speed up the convergence for some types of problems,
and also helps to avoid periodic divergences.
The key concept it relies on is the **residual vector** $$R_n$$
-of the $$n$$th iteration, which in some way measures the error of the current $$\rho_n$$.
-Its exact definition varies,
-but is generally along the lines of the difference between
-the input of the iteration and the raw resulting output:
+of the $$n$$th iteration, which measures the error of the current $$f_n$$.
+Its exact definition can vary,
+but it is generally the difference between
+the input $$f_n$$ of the $$n$$th iteration
+and the raw resulting output $$f_n^\mathrm{new}$$:
$$\begin{aligned}
R_n
- = R[\rho_n]
- = \rho_n^\mathrm{new}[\rho_n] - \rho_n
+ \equiv R[f_n]
+ \equiv f_n^\mathrm{new}[f_n] - f_n
\end{aligned}$$
-It is not always clear what to do with $$\rho_n^\mathrm{new}$$.
-Directly using it as the next input ($$\rho_{n+1} = \rho_n^\mathrm{new}$$)
+It is not always clear what to do with $$f_n^\mathrm{new}$$.
+Directly using it as the next input ($$f_{n+1} = f_n^\mathrm{new}$$)
often leads to oscillation,
-and linear mixing ($$\rho_{n+1} = (1\!-\!f) \rho_n + f \rho_n^\mathrm{new}$$)
+and linear mixing ($$f_{n+1} = (1\!-\!c) f_n + c f_n^\mathrm{new}$$)
can take a very long time to converge properly.
Pulay mixing offers an improvement.
-The idea is to construct the next iteration's input $$\rho_{n+1}$$
-as a linear combination of the previous inputs $$\rho_1$$, $$\rho_2$$, ..., $$\rho_n$$,
-such that it is as close as possible to the optimal $$\rho_*$$:
+The idea is to construct the next iteration's input $$f_{n+1}$$
+as a linear combination of the previous inputs $$f_1$$, $$f_2$$, ..., $$f_n$$,
+such that it is as close as possible to the optimal $$f_\infty$$:
$$\begin{aligned}
\boxed{
- \rho_{n+1}
- = \sum_{m = 1}^n \alpha_m \rho_m
+ f_{n+1}
+ = \sum_{m = 1}^n \alpha_m f_m
}
\end{aligned}$$
To do so, we make two assumptions.
-Firstly, the current $$\rho_n$$ is already close to $$\rho_*$$,
-so that such a linear combination makes sense.
-Secondly, the iteration is linear,
-such that the raw output $$\rho_{n+1}^\mathrm{new}$$
-is also a linear combination with the *same coefficients*:
+First, that the current $$f_n$$ is already close to $$f_\infty$$,
+so such a linear combination makes sense.
+Second, that the iteration is linear,
+so the raw output $$f_{n+1}^\mathrm{new}$$
+is also a linear combination *with the same coefficients*:
$$\begin{aligned}
- \rho_{n+1}^\mathrm{new}
- = \sum_{m = 1}^n \alpha_m \rho_m^\mathrm{new}
+ f_{n+1}^\mathrm{new}
+ = \sum_{m = 1}^n \alpha_m f_m^\mathrm{new}
\end{aligned}$$
-We will return to these assumptions later.
-The point is that $$R_{n+1}$$ is also a linear combination:
+We will revisit these assumptions later.
+The point is that $$R_{n+1}$$ can now also be written
+as a linear combination of old residuals $$R_m$$:
$$\begin{aligned}
R_{n+1}
- = \rho_{n+1}^\mathrm{new} - \rho_{n+1}
- = \sum_{m = 1}^n \alpha_m \rho_m^\mathrm{new} - \sum_{m = 1}^n \alpha_m \rho_m
+ = f_{n+1}^\mathrm{new} - f_{n+1}
+ = \sum_{m = 1}^n \alpha_m f_m^\mathrm{new} - \sum_{m = 1}^n \alpha_m f_m
= \sum_{m = 1}^n \alpha_m R_m
\end{aligned}$$
The goal is to choose the coefficients $$\alpha_m$$ such that
the norm of the error $$|R_{n+1}| \approx 0$$,
-subject to the following constraint to preserve the normalization of $$\rho_{n+1}$$:
+subject to the following constraint to preserve the normalization of $$f_{n+1}$$:
$$\begin{aligned}
\sum_{m=1}^n \alpha_m = 1
@@ -79,20 +81,19 @@ We thus want to minimize the following quantity,
where $$\lambda$$ is a [Lagrange multiplier](/know/concept/lagrange-multiplier/):
$$\begin{aligned}
- \Inprod{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m^*
- = \sum_{m=1}^n \alpha_m^* \Big( \sum_{k=1}^n \alpha_k \Inprod{R_m}{R_k} + \lambda \Big)
+ \inprod{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m^*
+ = \sum_{m=1}^n \alpha_m^* \Big( \sum_{k=1}^n \alpha_k \inprod{R_m}{R_k} + \lambda \Big)
\end{aligned}$$
By differentiating the right-hand side with respect to $$\alpha_m^*$$
and demanding that the result is zero,
-we get a system of equations that we can write in matrix form,
-which is cheap to solve:
+we get a cheap-to-solve system of equations, in matrix form:
$$\begin{aligned}
\begin{bmatrix}
- \Inprod{R_1}{R_1} & \cdots & \Inprod{R_1}{R_n} & 1 \\
+ \inprod{R_1}{R_1} & \cdots & \inprod{R_1}{R_n} & 1 \\
\vdots & \ddots & \vdots & \vdots \\
- \Inprod{R_n}{R_1} & \cdots & \Inprod{R_n}{R_n} & 1 \\
+ \inprod{R_n}{R_1} & \cdots & \inprod{R_n}{R_n} & 1 \\
1 & \cdots & 1 & 0
\end{bmatrix}
\cdot
@@ -106,48 +107,50 @@ $$\begin{aligned}
\end{aligned}$$
From this, we can also see that the Lagrange multiplier
-$$\lambda = - \Inprod{R_{n+1}}{R_{n+1}}$$,
+$$\lambda = - \inprod{R_{n+1}}{R_{n+1}}$$,
where $$R_{n+1}$$ is the *predicted* residual of the next iteration,
subject to the two assumptions.
+This fact makes $$\lambda$$ a useful measure of convergence.
-However, in practice, the earlier inputs $$\rho_1$$, $$\rho_2$$, etc.
-are much further from $$\rho_*$$ than $$\rho_n$$,
-so usually only the most recent $$N\!+\!1$$ inputs $$\rho_{n - N}$$, ..., $$\rho_n$$ are used:
+In practice, the earlier inputs $$f_1$$, $$f_2$$, etc.
+are much further from $$f_\infty$$ than $$f_n$$,
+so usually only the most recent $$N\!+\!1$$ inputs $$f_{n - N}, ..., f_n$$ are used.
+This also keeps the matrix small:
$$\begin{aligned}
- \rho_{n+1}
- = \sum_{m = n-N}^n \alpha_m \rho_m
+ f_{n+1}
+ = \sum_{m = n-N}^n \alpha_m f_m
\end{aligned}$$
-You might be confused by the absence of any $$\rho_m^\mathrm{new}$$
-in the creation of $$\rho_{n+1}$$, as if the iteration's outputs are being ignored.
+You might be confused by the absence of any $$f_m^\mathrm{new}$$
+in the creation of $$f_{n+1}$$, as if the iteration's outputs are being ignored.
This is due to the first assumption,
-which states that $$\rho_n^\mathrm{new}$$ and $$\rho_n$$ are already similar,
+which states that $$f_n^\mathrm{new}$$ and $$f_n$$ are already similar,
such that they are basically interchangeable.
-Speaking of which, about those assumptions:
-while they will clearly become more accurate as $$\rho_n$$ approaches $$\rho_*$$,
-they might be very dubious in the beginning.
-A consequence of this is that the early iterations might get "trapped"
-in a suboptimal subspace spanned by $$\rho_1$$, $$\rho_2$$, etc.
-To say it another way, we would be varying $$n$$ coefficients $$\alpha_m$$
-to try to optimize a $$D$$-dimensional $$\rho_{n+1}$$,
-where in general $$D \gg n$$, at least in the beginning.
-
-There is an easy fix to this problem:
-add a small amount of the raw residual $$R_m$$
-to "nudge" $$\rho_{n+1}$$ towards the right subspace,
+Although those assumptions will clearly become more realistic as $$f_n \to f_\infty$$,
+they might be very dubious at first.
+Consequently, the early iterations may get "trapped"
+in a suboptimal subspace spanned by $$f_1$$, $$f_2$$, etc.
+Think of it like this:
+we would be varying up to $$n$$ coefficients $$\alpha_m$$
+to try to optimize a $$D$$-dimensional $$f_{n+1}$$, where usually $$D \gg n$$.
+It is almost impossible to find a decent optimum in this way!
+
+This problem is easy to fix,
+by mixing in a small amount of the raw residuals $$R_m$$
+to "nudge" $$f_{n+1}$$ towards the right subspace,
where $$\beta \in [0,1]$$ is a tunable parameter:
$$\begin{aligned}
\boxed{
- \rho_{n+1}
- = \sum_{m = N}^n \alpha_m (\rho_m + \beta R_m)
+ f_{n+1}
+ = \sum_{m = N}^n \alpha_m (f_m + \beta R_m)
}
\end{aligned}$$
-In other words, we end up introducing a small amount of the raw outputs $$\rho_m^\mathrm{new}$$,
-while still giving more weight to iterations with smaller residuals.
+In this way, the raw outputs $$f_m^\mathrm{new}$$ are (rightfully) included via $$R_m$$,
+but we still give more weight to iterations with smaller residuals.
Pulay mixing is very effective for certain types of problems,
e.g. density functional theory,