Categories: Numerical methods.

Pulay mixing

Some numerical problems are most easily solved iteratively, by generating a series of “solutions” f1f_1, f2f_2, etc. converging towards the true ff_\infty. Pulay mixing, also often called direct inversion in the iterative subspace (DIIS), can speed up the convergence for some types of problems, and also helps to avoid periodic divergences.

The key concept it relies on is the residual vector RnR_n of the nnth iteration, which measures the error of the current fnf_n. Its exact definition can vary, but it is generally the difference between the input fnf_n of the nnth iteration and the raw resulting output fnnewf_n^\mathrm{new}:

RnR[fn]fnnew[fn]fn\begin{aligned} R_n \equiv R[f_n] \equiv f_n^\mathrm{new}[f_n] - f_n \end{aligned}

It is not always clear what to do with fnnewf_n^\mathrm{new}. Directly using it as the next input (fn+1=fnnewf_{n+1} = f_n^\mathrm{new}) often leads to oscillation, and linear mixing (fn+1=(1 ⁣ ⁣c)fn+cfnnewf_{n+1} = (1\!-\!c) f_n + c f_n^\mathrm{new}) can take a very long time to converge properly. Pulay mixing offers an improvement.

The idea is to construct the next iteration’s input fn+1f_{n+1} as a linear combination of the previous inputs f1f_1, f2f_2, …, fnf_n, such that it is as close as possible to the optimal ff_\infty:

fn+1=m=1nαmfm\begin{aligned} \boxed{ f_{n+1} = \sum_{m = 1}^n \alpha_m f_m } \end{aligned}

To do so, we make two assumptions. First, that the current fnf_n is already close to ff_\infty, so such a linear combination makes sense. Second, that the iteration is linear, so the raw output fn+1newf_{n+1}^\mathrm{new} is also a linear combination with the same coefficients:

fn+1new=m=1nαmfmnew\begin{aligned} f_{n+1}^\mathrm{new} = \sum_{m = 1}^n \alpha_m f_m^\mathrm{new} \end{aligned}

We will revisit these assumptions later. The point is that Rn+1R_{n+1} can now also be written as a linear combination of old residuals RmR_m:

Rn+1=fn+1newfn+1=m=1nαmfmnewm=1nαmfm=m=1nαmRm\begin{aligned} R_{n+1} = f_{n+1}^\mathrm{new} - f_{n+1} = \sum_{m = 1}^n \alpha_m f_m^\mathrm{new} - \sum_{m = 1}^n \alpha_m f_m = \sum_{m = 1}^n \alpha_m R_m \end{aligned}

The goal is to choose the coefficients αm\alpha_m such that the norm of the error Rn+10|R_{n+1}| \approx 0, subject to the following constraint to preserve the normalization of fn+1f_{n+1}:

m=1nαm=1\begin{aligned} \sum_{m=1}^n \alpha_m = 1 \end{aligned}

We thus want to minimize the following quantity, where λ\lambda is a Lagrange multiplier:

Rn+1Rn+1+λm=1nαm=m=1nαm(k=1nαkRmRk+λ)\begin{aligned} \inprod{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m^* = \sum_{m=1}^n \alpha_m^* \Big( \sum_{k=1}^n \alpha_k \inprod{R_m}{R_k} + \lambda \Big) \end{aligned}

By differentiating the right-hand side with respect to αm\alpha_m^* and demanding that the result is zero, we get a cheap-to-solve system of equations, in matrix form:

[R1R1R1Rn1RnR1RnRn1110][α1αnλ]=[001]\begin{aligned} \begin{bmatrix} \inprod{R_1}{R_1} & \cdots & \inprod{R_1}{R_n} & 1 \\ \vdots & \ddots & \vdots & \vdots \\ \inprod{R_n}{R_1} & \cdots & \inprod{R_n}{R_n} & 1 \\ 1 & \cdots & 1 & 0 \end{bmatrix} \cdot \begin{bmatrix} \alpha_1 \\ \vdots \\ \alpha_n \\ \lambda \end{bmatrix} = \begin{bmatrix} 0 \\ \vdots \\ 0 \\ 1 \end{bmatrix} \end{aligned}

From this, we can also see that the Lagrange multiplier λ=Rn+1Rn+1\lambda = - \inprod{R_{n+1}}{R_{n+1}}, where Rn+1R_{n+1} is the predicted residual of the next iteration, subject to the two assumptions. This fact makes λ\lambda a useful measure of convergence.

In practice, the earlier inputs f1f_1, f2f_2, etc. are much further from ff_\infty than fnf_n, so usually only the most recent N ⁣+ ⁣1N\!+\!1 inputs fnN,...,fnf_{n - N}, ..., f_n are used. This also keeps the matrix small:

fn+1=m=nNnαmfm\begin{aligned} f_{n+1} = \sum_{m = n-N}^n \alpha_m f_m \end{aligned}

You might be confused by the absence of any fmnewf_m^\mathrm{new} in the creation of fn+1f_{n+1}, as if the iteration’s outputs are being ignored. This is due to the first assumption, which states that fnnewf_n^\mathrm{new} and fnf_n are already similar, such that they are basically interchangeable.

Although those assumptions will clearly become more realistic as fnff_n \to f_\infty, they might be very dubious at first. Consequently, the early iterations may get “trapped” in a suboptimal subspace spanned by f1f_1, f2f_2, etc. Think of it like this: we would be varying up to nn coefficients αm\alpha_m to try to optimize a DD-dimensional fn+1f_{n+1}, where usually DnD \gg n. It is almost impossible to find a decent optimum in this way!

This problem is easy to fix, by mixing in a small amount of the raw residuals RmR_m to “nudge” fn+1f_{n+1} towards the right subspace, where β[0,1]\beta \in [0,1] is a tunable parameter:

fn+1=m=Nnαm(fm+βRm)\begin{aligned} \boxed{ f_{n+1} = \sum_{m = N}^n \alpha_m (f_m + \beta R_m) } \end{aligned}

In this way, the raw outputs fmnewf_m^\mathrm{new} are (rightfully) included via RmR_m, but we still give more weight to iterations with smaller residuals.

Pulay mixing is very effective for certain types of problems, e.g. density functional theory, where it can accelerate convergence by up to two orders of magnitude!


  1. P. Pulay, Convergence acceleration of iterative sequences. The case of SCF iteration, 1980, Elsevier.