From 6ce0bb9a8f9fd7d169cbb414a9537d68c5290aae Mon Sep 17 00:00:00 2001 From: Prefetch Date: Fri, 14 Oct 2022 23:25:28 +0200 Subject: Initial commit after migration from Hugo --- source/know/concept/pulay-mixing/index.md | 160 ++++++++++++++++++++++++++++++ 1 file changed, 160 insertions(+) create mode 100644 source/know/concept/pulay-mixing/index.md (limited to 'source/know/concept/pulay-mixing/index.md') diff --git a/source/know/concept/pulay-mixing/index.md b/source/know/concept/pulay-mixing/index.md new file mode 100644 index 0000000..030ad46 --- /dev/null +++ b/source/know/concept/pulay-mixing/index.md @@ -0,0 +1,160 @@ +--- +title: "Pulay mixing" +date: 2021-03-02 +categories: +- Numerical methods +layout: "concept" +--- + +Some numerical problems are most easily solved *iteratively*, +by generating a series $\rho_1$, $\rho_2$, etc. +converging towards the desired solution $\rho_*$. +**Pulay mixing**, also often called +**direct inversion in the iterative subspace** (DIIS), +can speed up the convergence for some types of problems, +and also helps to avoid periodic divergences. + +The key concept it relies on is the **residual vector** $R_n$ +of the $n$th iteration, which in some way measures the error of the current $\rho_n$. +Its exact definition varies, +but is generally along the lines of the difference between +the input of the iteration and the raw resulting output: + +$$\begin{aligned} + R_n + = R[\rho_n] + = \rho_n^\mathrm{new}[\rho_n] - \rho_n +\end{aligned}$$ + +It is not always clear what to do with $\rho_n^\mathrm{new}$. +Directly using it as the next input ($\rho_{n+1} = \rho_n^\mathrm{new}$) +often leads to oscillation, +and linear mixing ($\rho_{n+1} = (1\!-\!f) \rho_n + f \rho_n^\mathrm{new}$) +can take a very long time to converge properly. +Pulay mixing offers an improvement. + +The idea is to construct the next iteration's input $\rho_{n+1}$ +as a linear combination of the previous inputs $\rho_1$, $\rho_2$, ..., $\rho_n$, +such that it is as close as possible to the optimal $\rho_*$: + +$$\begin{aligned} + \boxed{ + \rho_{n+1} + = \sum_{m = 1}^n \alpha_m \rho_m + } +\end{aligned}$$ + +To do so, we make two assumptions. +Firstly, the current $\rho_n$ is already close to $\rho_*$, +so that such a linear combination makes sense. +Secondly, the iteration is linear, +such that the raw output $\rho_{n+1}^\mathrm{new}$ +is also a linear combination with the *same coefficients*: + +$$\begin{aligned} + \rho_{n+1}^\mathrm{new} + = \sum_{m = 1}^n \alpha_m \rho_m^\mathrm{new} +\end{aligned}$$ + +We will return to these assumptions later. +The point is that $R_{n+1}$ is also a linear combination: + +$$\begin{aligned} + R_{n+1} + = \rho_{n+1}^\mathrm{new} - \rho_{n+1} + = \sum_{m = 1}^n \alpha_m \rho_m^\mathrm{new} - \sum_{m = 1}^n \alpha_m \rho_m + = \sum_{m = 1}^n \alpha_m R_m +\end{aligned}$$ + +The goal is to choose the coefficients $\alpha_m$ such that +the norm of the error $|R_{n+1}| \approx 0$, +subject to the following constraint to preserve the normalization of $\rho_{n+1}$: + +$$\begin{aligned} + \sum_{m=1}^n \alpha_m = 1 +\end{aligned}$$ + +We thus want to minimize the following quantity, +where $\lambda$ is a [Lagrange multiplier](/know/concept/lagrange-multiplier/): + +$$\begin{aligned} + \Inprod{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m^* + = \sum_{m=1}^n \alpha_m^* \Big( \sum_{k=1}^n \alpha_k \Inprod{R_m}{R_k} + \lambda \Big) +\end{aligned}$$ + +By differentiating the right-hand side with respect to $\alpha_m^*$ +and demanding that the result is zero, +we get a system of equations that we can write in matrix form, +which is cheap to solve: + +$$\begin{aligned} + \begin{bmatrix} + \Inprod{R_1}{R_1} & \cdots & \Inprod{R_1}{R_n} & 1 \\ + \vdots & \ddots & \vdots & \vdots \\ + \Inprod{R_n}{R_1} & \cdots & \Inprod{R_n}{R_n} & 1 \\ + 1 & \cdots & 1 & 0 + \end{bmatrix} + \cdot + \begin{bmatrix} + \alpha_1 \\ \vdots \\ \alpha_n \\ \lambda + \end{bmatrix} + = + \begin{bmatrix} + 0 \\ \vdots \\ 0 \\ 1 + \end{bmatrix} +\end{aligned}$$ + +From this, we can also see that the Lagrange multiplier +$\lambda = - \Inprod{R_{n+1}}{R_{n+1}}$, +where $R_{n+1}$ is the *predicted* residual of the next iteration, +subject to the two assumptions. + +However, in practice, the earlier inputs $\rho_1$, $\rho_2$, etc. +are much further from $\rho_*$ than $\rho_n$, +so usually only the most recent $N\!+\!1$ inputs $\rho_{n - N}$, ..., $\rho_n$ are used: + +$$\begin{aligned} + \rho_{n+1} + = \sum_{m = n-N}^n \alpha_m \rho_m +\end{aligned}$$ + +You might be confused by the absence of any $\rho_m^\mathrm{new}$ +in the creation of $\rho_{n+1}$, as if the iteration's outputs are being ignored. +This is due to the first assumption, +which states that $\rho_n^\mathrm{new}$ and $\rho_n$ are already similar, +such that they are basically interchangeable. + +Speaking of which, about those assumptions: +while they will clearly become more accurate as $\rho_n$ approaches $\rho_*$, +they might be very dubious in the beginning. +A consequence of this is that the early iterations might get "trapped" +in a suboptimal subspace spanned by $\rho_1$, $\rho_2$, etc. +To say it another way, we would be varying $n$ coefficients $\alpha_m$ +to try to optimize a $D$-dimensional $\rho_{n+1}$, +where in general $D \gg n$, at least in the beginning. + +There is an easy fix to this problem: +add a small amount of the raw residual $R_m$ +to "nudge" $\rho_{n+1}$ towards the right subspace, +where $\beta \in [0,1]$ is a tunable parameter: + +$$\begin{aligned} + \boxed{ + \rho_{n+1} + = \sum_{m = N}^n \alpha_m (\rho_m + \beta R_m) + } +\end{aligned}$$ + +In other words, we end up introducing a small amount of the raw outputs $\rho_m^\mathrm{new}$, +while still giving more weight to iterations with smaller residuals. + +Pulay mixing is very effective for certain types of problems, +e.g. density functional theory, +where it can accelerate convergence by up to two orders of magnitude! + + + +## References +1. P. Pulay, + [Convergence acceleration of iterative sequences. The case of SCF iteration](https://doi.org/10.1016/0009-2614(80)80396-4), + 1980, Elsevier. -- cgit v1.2.3