summaryrefslogtreecommitdiff
path: root/source/know/concept/pulay-mixing
diff options
context:
space:
mode:
authorPrefetch2022-10-14 23:25:28 +0200
committerPrefetch2022-10-14 23:25:28 +0200
commit6ce0bb9a8f9fd7d169cbb414a9537d68c5290aae (patch)
treea0abb6b22f77c0e84ed38277d14662412ce14f39 /source/know/concept/pulay-mixing
Initial commit after migration from Hugo
Diffstat (limited to 'source/know/concept/pulay-mixing')
-rw-r--r--source/know/concept/pulay-mixing/index.md160
1 files changed, 160 insertions, 0 deletions
diff --git a/source/know/concept/pulay-mixing/index.md b/source/know/concept/pulay-mixing/index.md
new file mode 100644
index 0000000..030ad46
--- /dev/null
+++ b/source/know/concept/pulay-mixing/index.md
@@ -0,0 +1,160 @@
+---
+title: "Pulay mixing"
+date: 2021-03-02
+categories:
+- Numerical methods
+layout: "concept"
+---
+
+Some numerical problems are most easily solved *iteratively*,
+by generating a series $\rho_1$, $\rho_2$, etc.
+converging towards the desired solution $\rho_*$.
+**Pulay mixing**, also often called
+**direct inversion in the iterative subspace** (DIIS),
+can speed up the convergence for some types of problems,
+and also helps to avoid periodic divergences.
+
+The key concept it relies on is the **residual vector** $R_n$
+of the $n$th iteration, which in some way measures the error of the current $\rho_n$.
+Its exact definition varies,
+but is generally along the lines of the difference between
+the input of the iteration and the raw resulting output:
+
+$$\begin{aligned}
+ R_n
+ = R[\rho_n]
+ = \rho_n^\mathrm{new}[\rho_n] - \rho_n
+\end{aligned}$$
+
+It is not always clear what to do with $\rho_n^\mathrm{new}$.
+Directly using it as the next input ($\rho_{n+1} = \rho_n^\mathrm{new}$)
+often leads to oscillation,
+and linear mixing ($\rho_{n+1} = (1\!-\!f) \rho_n + f \rho_n^\mathrm{new}$)
+can take a very long time to converge properly.
+Pulay mixing offers an improvement.
+
+The idea is to construct the next iteration's input $\rho_{n+1}$
+as a linear combination of the previous inputs $\rho_1$, $\rho_2$, ..., $\rho_n$,
+such that it is as close as possible to the optimal $\rho_*$:
+
+$$\begin{aligned}
+ \boxed{
+ \rho_{n+1}
+ = \sum_{m = 1}^n \alpha_m \rho_m
+ }
+\end{aligned}$$
+
+To do so, we make two assumptions.
+Firstly, the current $\rho_n$ is already close to $\rho_*$,
+so that such a linear combination makes sense.
+Secondly, the iteration is linear,
+such that the raw output $\rho_{n+1}^\mathrm{new}$
+is also a linear combination with the *same coefficients*:
+
+$$\begin{aligned}
+ \rho_{n+1}^\mathrm{new}
+ = \sum_{m = 1}^n \alpha_m \rho_m^\mathrm{new}
+\end{aligned}$$
+
+We will return to these assumptions later.
+The point is that $R_{n+1}$ is also a linear combination:
+
+$$\begin{aligned}
+ R_{n+1}
+ = \rho_{n+1}^\mathrm{new} - \rho_{n+1}
+ = \sum_{m = 1}^n \alpha_m \rho_m^\mathrm{new} - \sum_{m = 1}^n \alpha_m \rho_m
+ = \sum_{m = 1}^n \alpha_m R_m
+\end{aligned}$$
+
+The goal is to choose the coefficients $\alpha_m$ such that
+the norm of the error $|R_{n+1}| \approx 0$,
+subject to the following constraint to preserve the normalization of $\rho_{n+1}$:
+
+$$\begin{aligned}
+ \sum_{m=1}^n \alpha_m = 1
+\end{aligned}$$
+
+We thus want to minimize the following quantity,
+where $\lambda$ is a [Lagrange multiplier](/know/concept/lagrange-multiplier/):
+
+$$\begin{aligned}
+ \Inprod{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m^*
+ = \sum_{m=1}^n \alpha_m^* \Big( \sum_{k=1}^n \alpha_k \Inprod{R_m}{R_k} + \lambda \Big)
+\end{aligned}$$
+
+By differentiating the right-hand side with respect to $\alpha_m^*$
+and demanding that the result is zero,
+we get a system of equations that we can write in matrix form,
+which is cheap to solve:
+
+$$\begin{aligned}
+ \begin{bmatrix}
+ \Inprod{R_1}{R_1} & \cdots & \Inprod{R_1}{R_n} & 1 \\
+ \vdots & \ddots & \vdots & \vdots \\
+ \Inprod{R_n}{R_1} & \cdots & \Inprod{R_n}{R_n} & 1 \\
+ 1 & \cdots & 1 & 0
+ \end{bmatrix}
+ \cdot
+ \begin{bmatrix}
+ \alpha_1 \\ \vdots \\ \alpha_n \\ \lambda
+ \end{bmatrix}
+ =
+ \begin{bmatrix}
+ 0 \\ \vdots \\ 0 \\ 1
+ \end{bmatrix}
+\end{aligned}$$
+
+From this, we can also see that the Lagrange multiplier
+$\lambda = - \Inprod{R_{n+1}}{R_{n+1}}$,
+where $R_{n+1}$ is the *predicted* residual of the next iteration,
+subject to the two assumptions.
+
+However, in practice, the earlier inputs $\rho_1$, $\rho_2$, etc.
+are much further from $\rho_*$ than $\rho_n$,
+so usually only the most recent $N\!+\!1$ inputs $\rho_{n - N}$, ..., $\rho_n$ are used:
+
+$$\begin{aligned}
+ \rho_{n+1}
+ = \sum_{m = n-N}^n \alpha_m \rho_m
+\end{aligned}$$
+
+You might be confused by the absence of any $\rho_m^\mathrm{new}$
+in the creation of $\rho_{n+1}$, as if the iteration's outputs are being ignored.
+This is due to the first assumption,
+which states that $\rho_n^\mathrm{new}$ and $\rho_n$ are already similar,
+such that they are basically interchangeable.
+
+Speaking of which, about those assumptions:
+while they will clearly become more accurate as $\rho_n$ approaches $\rho_*$,
+they might be very dubious in the beginning.
+A consequence of this is that the early iterations might get "trapped"
+in a suboptimal subspace spanned by $\rho_1$, $\rho_2$, etc.
+To say it another way, we would be varying $n$ coefficients $\alpha_m$
+to try to optimize a $D$-dimensional $\rho_{n+1}$,
+where in general $D \gg n$, at least in the beginning.
+
+There is an easy fix to this problem:
+add a small amount of the raw residual $R_m$
+to "nudge" $\rho_{n+1}$ towards the right subspace,
+where $\beta \in [0,1]$ is a tunable parameter:
+
+$$\begin{aligned}
+ \boxed{
+ \rho_{n+1}
+ = \sum_{m = N}^n \alpha_m (\rho_m + \beta R_m)
+ }
+\end{aligned}$$
+
+In other words, we end up introducing a small amount of the raw outputs $\rho_m^\mathrm{new}$,
+while still giving more weight to iterations with smaller residuals.
+
+Pulay mixing is very effective for certain types of problems,
+e.g. density functional theory,
+where it can accelerate convergence by up to two orders of magnitude!
+
+
+
+## References
+1. P. Pulay,
+ [Convergence acceleration of iterative sequences. The case of SCF iteration](https://doi.org/10.1016/0009-2614(80)80396-4),
+ 1980, Elsevier.