--- title: "Pulay mixing" firstLetter: "P" publishDate: 2021-03-02 categories: - Numerical methods date: 2021-03-02T19:11:51+01:00 draft: false markup: pandoc --- # Pulay mixing Some numerical problems are most easily solved *iteratively*, by generating a series $\rho_1$, $\rho_2$, etc. converging towards the desired solution $\rho_*$. **Pulay mixing**, also often called **direct inversion in the iterative subspace** (DIIS), is an effective method to speed up convergence, which also helps to avoid periodic divergences. The key concept it relies on is the **residual vector** $R_n$ of the $n$th iteration, which in some way measures the error of the current $\rho_n$. Its exact definition varies, but is generally along the lines of the difference between the input of the iteration and the raw resulting output: $$\begin{aligned} R_n = R[\rho_n] = \rho_n^\mathrm{new}[\rho_n] - \rho_n \end{aligned}$$ It is not always clear what to do with $\rho_n^\mathrm{new}$. Directly using it as the next input ($\rho_{n+1} = \rho_n^\mathrm{new}$) often leads to oscillation, and linear mixing ($\rho_{n+1} = (1\!-\!f) \rho_n + f \rho_n^\mathrm{new}$) can take a very long time to converge properly. Pulay mixing offers an improvement. The idea is to construct the next iteration's input $\rho_{n+1}$ as a linear combination of the previous inputs $\rho_1$, $\rho_2$, ..., $\rho_n$, such that it is as close as possible to the optimal $\rho_*$: $$\begin{aligned} \boxed{ \rho_{n+1} = \sum_{m = 1}^n \alpha_m \rho_m } \end{aligned}$$ To do so, we make two assumptions. Firstly, the current $\rho_n$ is already close to $\rho_*$, so that such a linear combination makes sense. Secondly, the iteration is linear, such that the raw output $\rho_{n+1}^\mathrm{new}$ is also a linear combination with the *same coefficients*: $$\begin{aligned} \rho_{n+1}^\mathrm{new} = \sum_{m = 1}^n \alpha_m \rho_m^\mathrm{new} \end{aligned}$$ We will return to these assumptions later. The point is that $R_{n+1}$ is also a linear combination: $$\begin{aligned} R_{n+1} = \rho_{n+1}^\mathrm{new} - \rho_{n+1} = \sum_{m = 1}^n \alpha_m \rho_m^\mathrm{new} - \sum_{m = 1}^n \alpha_m \rho_m = \sum_{m = 1}^n \alpha_m R_m \end{aligned}$$ The goal is to choose the coefficients $\alpha_m$ such that the norm of the error $|R_{n+1}| \approx 0$, subject to the following constraint to preserve the normalization of $\rho_{n+1}$: $$\begin{aligned} \sum_{m=1}^n \alpha_m = 1 \end{aligned}$$ We thus want to minimize the following quantity, where $\lambda$ is a [Lagrange multiplier](/know/concept/lagrange-multiplier/): $$\begin{aligned} \braket{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m = \sum_{m=1}^n \alpha_m \Big( \sum_{k=1}^n \alpha_k \braket{R_m}{R_k} + \lambda \Big) \end{aligned}$$ By differentiating the right-hand side with respect to $\alpha_m$, we get a system of equations that we can write in matrix form, which is relatively cheap to solve numerically: $$\begin{aligned} \begin{bmatrix} \braket{R_1}{R_1} & \cdots & \braket{R_1}{R_n} & 1 \\ \vdots & \ddots & \vdots & \vdots \\ \braket{R_n}{R_1} & \cdots & \braket{R_n}{R_n} & 1 \\ 1 & \cdots & 1 & 0 \end{bmatrix} \begin{bmatrix} \alpha_1 \\ \vdots \\ \alpha_n \\ \lambda \end{bmatrix} = \begin{bmatrix} 0 \\ \vdots \\ 0 \\ 1 \end{bmatrix} \end{aligned}$$ This method is very effective. However, in practice, the earlier inputs $\rho_1$, $\rho_2$, etc. are much further from $\rho_*$ than $\rho_n$, so usually only the most recent $N$ inputs $\rho_{n - N}$, ..., $\rho_n$ are used: $$\begin{aligned} \rho_{n+1} = \sum_{m = N}^n \alpha_m \rho_m \end{aligned}$$ You might be confused by the absence of all $\rho_m^\mathrm{new}$ in the creation of $\rho_{n+1}$, as if the iteration's outputs are being ignored. This is due to the first assumption, which states that $\rho_n^\mathrm{new}$ are $\rho_n$ are already similar, such that they are interchangeable. Speaking of which, about those assumptions: while they will clearly become more accurate as $\rho_n$ approaches $\rho_*$, they might be very dubious in the beginning. A consequence of this is that the early iterations might get "trapped" in a suboptimal subspace spanned by $\rho_1$, $\rho_2$, etc. To say it another way, we would be varying $n$ coefficients $\alpha_m$ to try to optimize a $D$-dimensional $\rho_{n+1}$, where in general $D \gg n$, at least in the beginning. There is an easy fix to this problem: add a small amount of the raw residual $R_m$ to "nudge" $\rho_{n+1}$ towards the right subspace, where $\beta \in [0,1]$ is a tunable parameter: $$\begin{aligned} \boxed{ \rho_{n+1} = \sum_{m = N}^n \alpha_m (\rho_m + \beta R_m) } \end{aligned}$$ In other words, we end up introducing a small amount of the raw outputs $\rho_m^\mathrm{new}$, while still giving more weight to iterations with smaller residuals. Pulay mixing is very effective: it can accelerate convergence by up to one order of magnitude! ## References 1. P. Pulay, [Convergence acceleration of iterative sequences. The case of SCF iteration](https://doi.org/10.1016/0009-2614(80)80396-4), 1980, Elsevier.