summaryrefslogtreecommitdiff
path: root/source/know/concept/pulay-mixing/index.md
diff options
context:
space:
mode:
Diffstat (limited to 'source/know/concept/pulay-mixing/index.md')
-rw-r--r--source/know/concept/pulay-mixing/index.md70
1 files changed, 35 insertions, 35 deletions
diff --git a/source/know/concept/pulay-mixing/index.md b/source/know/concept/pulay-mixing/index.md
index 91beb03..6e809dd 100644
--- a/source/know/concept/pulay-mixing/index.md
+++ b/source/know/concept/pulay-mixing/index.md
@@ -8,15 +8,15 @@ layout: "concept"
---
Some numerical problems are most easily solved *iteratively*,
-by generating a series $\rho_1$, $\rho_2$, etc.
-converging towards the desired solution $\rho_*$.
+by generating a series $$\rho_1$$, $$\rho_2$$, etc.
+converging towards the desired solution $$\rho_*$$.
**Pulay mixing**, also often called
**direct inversion in the iterative subspace** (DIIS),
can speed up the convergence for some types of problems,
and also helps to avoid periodic divergences.
-The key concept it relies on is the **residual vector** $R_n$
-of the $n$th iteration, which in some way measures the error of the current $\rho_n$.
+The key concept it relies on is the **residual vector** $$R_n$$
+of the $$n$$th iteration, which in some way measures the error of the current $$\rho_n$$.
Its exact definition varies,
but is generally along the lines of the difference between
the input of the iteration and the raw resulting output:
@@ -27,16 +27,16 @@ $$\begin{aligned}
= \rho_n^\mathrm{new}[\rho_n] - \rho_n
\end{aligned}$$
-It is not always clear what to do with $\rho_n^\mathrm{new}$.
-Directly using it as the next input ($\rho_{n+1} = \rho_n^\mathrm{new}$)
+It is not always clear what to do with $$\rho_n^\mathrm{new}$$.
+Directly using it as the next input ($$\rho_{n+1} = \rho_n^\mathrm{new}$$)
often leads to oscillation,
-and linear mixing ($\rho_{n+1} = (1\!-\!f) \rho_n + f \rho_n^\mathrm{new}$)
+and linear mixing ($$\rho_{n+1} = (1\!-\!f) \rho_n + f \rho_n^\mathrm{new}$$)
can take a very long time to converge properly.
Pulay mixing offers an improvement.
-The idea is to construct the next iteration's input $\rho_{n+1}$
-as a linear combination of the previous inputs $\rho_1$, $\rho_2$, ..., $\rho_n$,
-such that it is as close as possible to the optimal $\rho_*$:
+The idea is to construct the next iteration's input $$\rho_{n+1}$$
+as a linear combination of the previous inputs $$\rho_1$$, $$\rho_2$$, ..., $$\rho_n$$,
+such that it is as close as possible to the optimal $$\rho_*$$:
$$\begin{aligned}
\boxed{
@@ -46,10 +46,10 @@ $$\begin{aligned}
\end{aligned}$$
To do so, we make two assumptions.
-Firstly, the current $\rho_n$ is already close to $\rho_*$,
+Firstly, the current $$\rho_n$$ is already close to $$\rho_*$$,
so that such a linear combination makes sense.
Secondly, the iteration is linear,
-such that the raw output $\rho_{n+1}^\mathrm{new}$
+such that the raw output $$\rho_{n+1}^\mathrm{new}$$
is also a linear combination with the *same coefficients*:
$$\begin{aligned}
@@ -58,7 +58,7 @@ $$\begin{aligned}
\end{aligned}$$
We will return to these assumptions later.
-The point is that $R_{n+1}$ is also a linear combination:
+The point is that $$R_{n+1}$$ is also a linear combination:
$$\begin{aligned}
R_{n+1}
@@ -67,23 +67,23 @@ $$\begin{aligned}
= \sum_{m = 1}^n \alpha_m R_m
\end{aligned}$$
-The goal is to choose the coefficients $\alpha_m$ such that
-the norm of the error $|R_{n+1}| \approx 0$,
-subject to the following constraint to preserve the normalization of $\rho_{n+1}$:
+The goal is to choose the coefficients $$\alpha_m$$ such that
+the norm of the error $$|R_{n+1}| \approx 0$$,
+subject to the following constraint to preserve the normalization of $$\rho_{n+1}$$:
$$\begin{aligned}
\sum_{m=1}^n \alpha_m = 1
\end{aligned}$$
We thus want to minimize the following quantity,
-where $\lambda$ is a [Lagrange multiplier](/know/concept/lagrange-multiplier/):
+where $$\lambda$$ is a [Lagrange multiplier](/know/concept/lagrange-multiplier/):
$$\begin{aligned}
\Inprod{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m^*
= \sum_{m=1}^n \alpha_m^* \Big( \sum_{k=1}^n \alpha_k \Inprod{R_m}{R_k} + \lambda \Big)
\end{aligned}$$
-By differentiating the right-hand side with respect to $\alpha_m^*$
+By differentiating the right-hand side with respect to $$\alpha_m^*$$
and demanding that the result is zero,
we get a system of equations that we can write in matrix form,
which is cheap to solve:
@@ -106,38 +106,38 @@ $$\begin{aligned}
\end{aligned}$$
From this, we can also see that the Lagrange multiplier
-$\lambda = - \Inprod{R_{n+1}}{R_{n+1}}$,
-where $R_{n+1}$ is the *predicted* residual of the next iteration,
+$$\lambda = - \Inprod{R_{n+1}}{R_{n+1}}$$,
+where $$R_{n+1}$$ is the *predicted* residual of the next iteration,
subject to the two assumptions.
-However, in practice, the earlier inputs $\rho_1$, $\rho_2$, etc.
-are much further from $\rho_*$ than $\rho_n$,
-so usually only the most recent $N\!+\!1$ inputs $\rho_{n - N}$, ..., $\rho_n$ are used:
+However, in practice, the earlier inputs $$\rho_1$$, $$\rho_2$$, etc.
+are much further from $$\rho_*$$ than $$\rho_n$$,
+so usually only the most recent $$N\!+\!1$$ inputs $$\rho_{n - N}$$, ..., $$\rho_n$$ are used:
$$\begin{aligned}
\rho_{n+1}
= \sum_{m = n-N}^n \alpha_m \rho_m
\end{aligned}$$
-You might be confused by the absence of any $\rho_m^\mathrm{new}$
-in the creation of $\rho_{n+1}$, as if the iteration's outputs are being ignored.
+You might be confused by the absence of any $$\rho_m^\mathrm{new}$$
+in the creation of $$\rho_{n+1}$$, as if the iteration's outputs are being ignored.
This is due to the first assumption,
-which states that $\rho_n^\mathrm{new}$ and $\rho_n$ are already similar,
+which states that $$\rho_n^\mathrm{new}$$ and $$\rho_n$$ are already similar,
such that they are basically interchangeable.
Speaking of which, about those assumptions:
-while they will clearly become more accurate as $\rho_n$ approaches $\rho_*$,
+while they will clearly become more accurate as $$\rho_n$$ approaches $$\rho_*$$,
they might be very dubious in the beginning.
A consequence of this is that the early iterations might get "trapped"
-in a suboptimal subspace spanned by $\rho_1$, $\rho_2$, etc.
-To say it another way, we would be varying $n$ coefficients $\alpha_m$
-to try to optimize a $D$-dimensional $\rho_{n+1}$,
-where in general $D \gg n$, at least in the beginning.
+in a suboptimal subspace spanned by $$\rho_1$$, $$\rho_2$$, etc.
+To say it another way, we would be varying $$n$$ coefficients $$\alpha_m$$
+to try to optimize a $$D$$-dimensional $$\rho_{n+1}$$,
+where in general $$D \gg n$$, at least in the beginning.
There is an easy fix to this problem:
-add a small amount of the raw residual $R_m$
-to "nudge" $\rho_{n+1}$ towards the right subspace,
-where $\beta \in [0,1]$ is a tunable parameter:
+add a small amount of the raw residual $$R_m$$
+to "nudge" $$\rho_{n+1}$$ towards the right subspace,
+where $$\beta \in [0,1]$$ is a tunable parameter:
$$\begin{aligned}
\boxed{
@@ -146,7 +146,7 @@ $$\begin{aligned}
}
\end{aligned}$$
-In other words, we end up introducing a small amount of the raw outputs $\rho_m^\mathrm{new}$,
+In other words, we end up introducing a small amount of the raw outputs $$\rho_m^\mathrm{new}$$,
while still giving more weight to iterations with smaller residuals.
Pulay mixing is very effective for certain types of problems,