From 75636ed8772512bdf38e3dec431888837eaddc5d Mon Sep 17 00:00:00 2001 From: Prefetch Date: Mon, 20 Feb 2023 18:08:31 +0100 Subject: Improve knowledge base --- source/know/concept/pulay-mixing/index.md | 121 +++++++++++++++--------------- 1 file changed, 62 insertions(+), 59 deletions(-) (limited to 'source/know/concept/pulay-mixing/index.md') diff --git a/source/know/concept/pulay-mixing/index.md b/source/know/concept/pulay-mixing/index.md index 6e809dd..81051f1 100644 --- a/source/know/concept/pulay-mixing/index.md +++ b/source/know/concept/pulay-mixing/index.md @@ -8,68 +8,70 @@ layout: "concept" --- Some numerical problems are most easily solved *iteratively*, -by generating a series $$\rho_1$$, $$\rho_2$$, etc. -converging towards the desired solution $$\rho_*$$. +by generating a series of "solutions" $$f_1$$, $$f_2$$, etc. +converging towards the true $$f_\infty$$. **Pulay mixing**, also often called **direct inversion in the iterative subspace** (DIIS), can speed up the convergence for some types of problems, and also helps to avoid periodic divergences. The key concept it relies on is the **residual vector** $$R_n$$ -of the $$n$$th iteration, which in some way measures the error of the current $$\rho_n$$. -Its exact definition varies, -but is generally along the lines of the difference between -the input of the iteration and the raw resulting output: +of the $$n$$th iteration, which measures the error of the current $$f_n$$. +Its exact definition can vary, +but it is generally the difference between +the input $$f_n$$ of the $$n$$th iteration +and the raw resulting output $$f_n^\mathrm{new}$$: $$\begin{aligned} R_n - = R[\rho_n] - = \rho_n^\mathrm{new}[\rho_n] - \rho_n + \equiv R[f_n] + \equiv f_n^\mathrm{new}[f_n] - f_n \end{aligned}$$ -It is not always clear what to do with $$\rho_n^\mathrm{new}$$. -Directly using it as the next input ($$\rho_{n+1} = \rho_n^\mathrm{new}$$) +It is not always clear what to do with $$f_n^\mathrm{new}$$. +Directly using it as the next input ($$f_{n+1} = f_n^\mathrm{new}$$) often leads to oscillation, -and linear mixing ($$\rho_{n+1} = (1\!-\!f) \rho_n + f \rho_n^\mathrm{new}$$) +and linear mixing ($$f_{n+1} = (1\!-\!c) f_n + c f_n^\mathrm{new}$$) can take a very long time to converge properly. Pulay mixing offers an improvement. -The idea is to construct the next iteration's input $$\rho_{n+1}$$ -as a linear combination of the previous inputs $$\rho_1$$, $$\rho_2$$, ..., $$\rho_n$$, -such that it is as close as possible to the optimal $$\rho_*$$: +The idea is to construct the next iteration's input $$f_{n+1}$$ +as a linear combination of the previous inputs $$f_1$$, $$f_2$$, ..., $$f_n$$, +such that it is as close as possible to the optimal $$f_\infty$$: $$\begin{aligned} \boxed{ - \rho_{n+1} - = \sum_{m = 1}^n \alpha_m \rho_m + f_{n+1} + = \sum_{m = 1}^n \alpha_m f_m } \end{aligned}$$ To do so, we make two assumptions. -Firstly, the current $$\rho_n$$ is already close to $$\rho_*$$, -so that such a linear combination makes sense. -Secondly, the iteration is linear, -such that the raw output $$\rho_{n+1}^\mathrm{new}$$ -is also a linear combination with the *same coefficients*: +First, that the current $$f_n$$ is already close to $$f_\infty$$, +so such a linear combination makes sense. +Second, that the iteration is linear, +so the raw output $$f_{n+1}^\mathrm{new}$$ +is also a linear combination *with the same coefficients*: $$\begin{aligned} - \rho_{n+1}^\mathrm{new} - = \sum_{m = 1}^n \alpha_m \rho_m^\mathrm{new} + f_{n+1}^\mathrm{new} + = \sum_{m = 1}^n \alpha_m f_m^\mathrm{new} \end{aligned}$$ -We will return to these assumptions later. -The point is that $$R_{n+1}$$ is also a linear combination: +We will revisit these assumptions later. +The point is that $$R_{n+1}$$ can now also be written +as a linear combination of old residuals $$R_m$$: $$\begin{aligned} R_{n+1} - = \rho_{n+1}^\mathrm{new} - \rho_{n+1} - = \sum_{m = 1}^n \alpha_m \rho_m^\mathrm{new} - \sum_{m = 1}^n \alpha_m \rho_m + = f_{n+1}^\mathrm{new} - f_{n+1} + = \sum_{m = 1}^n \alpha_m f_m^\mathrm{new} - \sum_{m = 1}^n \alpha_m f_m = \sum_{m = 1}^n \alpha_m R_m \end{aligned}$$ The goal is to choose the coefficients $$\alpha_m$$ such that the norm of the error $$|R_{n+1}| \approx 0$$, -subject to the following constraint to preserve the normalization of $$\rho_{n+1}$$: +subject to the following constraint to preserve the normalization of $$f_{n+1}$$: $$\begin{aligned} \sum_{m=1}^n \alpha_m = 1 @@ -79,20 +81,19 @@ We thus want to minimize the following quantity, where $$\lambda$$ is a [Lagrange multiplier](/know/concept/lagrange-multiplier/): $$\begin{aligned} - \Inprod{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m^* - = \sum_{m=1}^n \alpha_m^* \Big( \sum_{k=1}^n \alpha_k \Inprod{R_m}{R_k} + \lambda \Big) + \inprod{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m^* + = \sum_{m=1}^n \alpha_m^* \Big( \sum_{k=1}^n \alpha_k \inprod{R_m}{R_k} + \lambda \Big) \end{aligned}$$ By differentiating the right-hand side with respect to $$\alpha_m^*$$ and demanding that the result is zero, -we get a system of equations that we can write in matrix form, -which is cheap to solve: +we get a cheap-to-solve system of equations, in matrix form: $$\begin{aligned} \begin{bmatrix} - \Inprod{R_1}{R_1} & \cdots & \Inprod{R_1}{R_n} & 1 \\ + \inprod{R_1}{R_1} & \cdots & \inprod{R_1}{R_n} & 1 \\ \vdots & \ddots & \vdots & \vdots \\ - \Inprod{R_n}{R_1} & \cdots & \Inprod{R_n}{R_n} & 1 \\ + \inprod{R_n}{R_1} & \cdots & \inprod{R_n}{R_n} & 1 \\ 1 & \cdots & 1 & 0 \end{bmatrix} \cdot @@ -106,48 +107,50 @@ $$\begin{aligned} \end{aligned}$$ From this, we can also see that the Lagrange multiplier -$$\lambda = - \Inprod{R_{n+1}}{R_{n+1}}$$, +$$\lambda = - \inprod{R_{n+1}}{R_{n+1}}$$, where $$R_{n+1}$$ is the *predicted* residual of the next iteration, subject to the two assumptions. +This fact makes $$\lambda$$ a useful measure of convergence. -However, in practice, the earlier inputs $$\rho_1$$, $$\rho_2$$, etc. -are much further from $$\rho_*$$ than $$\rho_n$$, -so usually only the most recent $$N\!+\!1$$ inputs $$\rho_{n - N}$$, ..., $$\rho_n$$ are used: +In practice, the earlier inputs $$f_1$$, $$f_2$$, etc. +are much further from $$f_\infty$$ than $$f_n$$, +so usually only the most recent $$N\!+\!1$$ inputs $$f_{n - N}, ..., f_n$$ are used. +This also keeps the matrix small: $$\begin{aligned} - \rho_{n+1} - = \sum_{m = n-N}^n \alpha_m \rho_m + f_{n+1} + = \sum_{m = n-N}^n \alpha_m f_m \end{aligned}$$ -You might be confused by the absence of any $$\rho_m^\mathrm{new}$$ -in the creation of $$\rho_{n+1}$$, as if the iteration's outputs are being ignored. +You might be confused by the absence of any $$f_m^\mathrm{new}$$ +in the creation of $$f_{n+1}$$, as if the iteration's outputs are being ignored. This is due to the first assumption, -which states that $$\rho_n^\mathrm{new}$$ and $$\rho_n$$ are already similar, +which states that $$f_n^\mathrm{new}$$ and $$f_n$$ are already similar, such that they are basically interchangeable. -Speaking of which, about those assumptions: -while they will clearly become more accurate as $$\rho_n$$ approaches $$\rho_*$$, -they might be very dubious in the beginning. -A consequence of this is that the early iterations might get "trapped" -in a suboptimal subspace spanned by $$\rho_1$$, $$\rho_2$$, etc. -To say it another way, we would be varying $$n$$ coefficients $$\alpha_m$$ -to try to optimize a $$D$$-dimensional $$\rho_{n+1}$$, -where in general $$D \gg n$$, at least in the beginning. - -There is an easy fix to this problem: -add a small amount of the raw residual $$R_m$$ -to "nudge" $$\rho_{n+1}$$ towards the right subspace, +Although those assumptions will clearly become more realistic as $$f_n \to f_\infty$$, +they might be very dubious at first. +Consequently, the early iterations may get "trapped" +in a suboptimal subspace spanned by $$f_1$$, $$f_2$$, etc. +Think of it like this: +we would be varying up to $$n$$ coefficients $$\alpha_m$$ +to try to optimize a $$D$$-dimensional $$f_{n+1}$$, where usually $$D \gg n$$. +It is almost impossible to find a decent optimum in this way! + +This problem is easy to fix, +by mixing in a small amount of the raw residuals $$R_m$$ +to "nudge" $$f_{n+1}$$ towards the right subspace, where $$\beta \in [0,1]$$ is a tunable parameter: $$\begin{aligned} \boxed{ - \rho_{n+1} - = \sum_{m = N}^n \alpha_m (\rho_m + \beta R_m) + f_{n+1} + = \sum_{m = N}^n \alpha_m (f_m + \beta R_m) } \end{aligned}$$ -In other words, we end up introducing a small amount of the raw outputs $$\rho_m^\mathrm{new}$$, -while still giving more weight to iterations with smaller residuals. +In this way, the raw outputs $$f_m^\mathrm{new}$$ are (rightfully) included via $$R_m$$, +but we still give more weight to iterations with smaller residuals. Pulay mixing is very effective for certain types of problems, e.g. density functional theory, -- cgit v1.2.3