From 16555851b6514a736c5c9d8e73de7da7fc9b6288 Mon Sep 17 00:00:00 2001 From: Prefetch Date: Thu, 20 Oct 2022 18:25:31 +0200 Subject: Migrate from 'jekyll-katex' to 'kramdown-math-sskatex' --- source/know/concept/pulay-mixing/index.md | 70 +++++++++++++++---------------- 1 file changed, 35 insertions(+), 35 deletions(-) (limited to 'source/know/concept/pulay-mixing/index.md') diff --git a/source/know/concept/pulay-mixing/index.md b/source/know/concept/pulay-mixing/index.md index 91beb03..6e809dd 100644 --- a/source/know/concept/pulay-mixing/index.md +++ b/source/know/concept/pulay-mixing/index.md @@ -8,15 +8,15 @@ layout: "concept" --- Some numerical problems are most easily solved *iteratively*, -by generating a series $\rho_1$, $\rho_2$, etc. -converging towards the desired solution $\rho_*$. +by generating a series $$\rho_1$$, $$\rho_2$$, etc. +converging towards the desired solution $$\rho_*$$. **Pulay mixing**, also often called **direct inversion in the iterative subspace** (DIIS), can speed up the convergence for some types of problems, and also helps to avoid periodic divergences. -The key concept it relies on is the **residual vector** $R_n$ -of the $n$th iteration, which in some way measures the error of the current $\rho_n$. +The key concept it relies on is the **residual vector** $$R_n$$ +of the $$n$$th iteration, which in some way measures the error of the current $$\rho_n$$. Its exact definition varies, but is generally along the lines of the difference between the input of the iteration and the raw resulting output: @@ -27,16 +27,16 @@ $$\begin{aligned} = \rho_n^\mathrm{new}[\rho_n] - \rho_n \end{aligned}$$ -It is not always clear what to do with $\rho_n^\mathrm{new}$. -Directly using it as the next input ($\rho_{n+1} = \rho_n^\mathrm{new}$) +It is not always clear what to do with $$\rho_n^\mathrm{new}$$. +Directly using it as the next input ($$\rho_{n+1} = \rho_n^\mathrm{new}$$) often leads to oscillation, -and linear mixing ($\rho_{n+1} = (1\!-\!f) \rho_n + f \rho_n^\mathrm{new}$) +and linear mixing ($$\rho_{n+1} = (1\!-\!f) \rho_n + f \rho_n^\mathrm{new}$$) can take a very long time to converge properly. Pulay mixing offers an improvement. -The idea is to construct the next iteration's input $\rho_{n+1}$ -as a linear combination of the previous inputs $\rho_1$, $\rho_2$, ..., $\rho_n$, -such that it is as close as possible to the optimal $\rho_*$: +The idea is to construct the next iteration's input $$\rho_{n+1}$$ +as a linear combination of the previous inputs $$\rho_1$$, $$\rho_2$$, ..., $$\rho_n$$, +such that it is as close as possible to the optimal $$\rho_*$$: $$\begin{aligned} \boxed{ @@ -46,10 +46,10 @@ $$\begin{aligned} \end{aligned}$$ To do so, we make two assumptions. -Firstly, the current $\rho_n$ is already close to $\rho_*$, +Firstly, the current $$\rho_n$$ is already close to $$\rho_*$$, so that such a linear combination makes sense. Secondly, the iteration is linear, -such that the raw output $\rho_{n+1}^\mathrm{new}$ +such that the raw output $$\rho_{n+1}^\mathrm{new}$$ is also a linear combination with the *same coefficients*: $$\begin{aligned} @@ -58,7 +58,7 @@ $$\begin{aligned} \end{aligned}$$ We will return to these assumptions later. -The point is that $R_{n+1}$ is also a linear combination: +The point is that $$R_{n+1}$$ is also a linear combination: $$\begin{aligned} R_{n+1} @@ -67,23 +67,23 @@ $$\begin{aligned} = \sum_{m = 1}^n \alpha_m R_m \end{aligned}$$ -The goal is to choose the coefficients $\alpha_m$ such that -the norm of the error $|R_{n+1}| \approx 0$, -subject to the following constraint to preserve the normalization of $\rho_{n+1}$: +The goal is to choose the coefficients $$\alpha_m$$ such that +the norm of the error $$|R_{n+1}| \approx 0$$, +subject to the following constraint to preserve the normalization of $$\rho_{n+1}$$: $$\begin{aligned} \sum_{m=1}^n \alpha_m = 1 \end{aligned}$$ We thus want to minimize the following quantity, -where $\lambda$ is a [Lagrange multiplier](/know/concept/lagrange-multiplier/): +where $$\lambda$$ is a [Lagrange multiplier](/know/concept/lagrange-multiplier/): $$\begin{aligned} \Inprod{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m^* = \sum_{m=1}^n \alpha_m^* \Big( \sum_{k=1}^n \alpha_k \Inprod{R_m}{R_k} + \lambda \Big) \end{aligned}$$ -By differentiating the right-hand side with respect to $\alpha_m^*$ +By differentiating the right-hand side with respect to $$\alpha_m^*$$ and demanding that the result is zero, we get a system of equations that we can write in matrix form, which is cheap to solve: @@ -106,38 +106,38 @@ $$\begin{aligned} \end{aligned}$$ From this, we can also see that the Lagrange multiplier -$\lambda = - \Inprod{R_{n+1}}{R_{n+1}}$, -where $R_{n+1}$ is the *predicted* residual of the next iteration, +$$\lambda = - \Inprod{R_{n+1}}{R_{n+1}}$$, +where $$R_{n+1}$$ is the *predicted* residual of the next iteration, subject to the two assumptions. -However, in practice, the earlier inputs $\rho_1$, $\rho_2$, etc. -are much further from $\rho_*$ than $\rho_n$, -so usually only the most recent $N\!+\!1$ inputs $\rho_{n - N}$, ..., $\rho_n$ are used: +However, in practice, the earlier inputs $$\rho_1$$, $$\rho_2$$, etc. +are much further from $$\rho_*$$ than $$\rho_n$$, +so usually only the most recent $$N\!+\!1$$ inputs $$\rho_{n - N}$$, ..., $$\rho_n$$ are used: $$\begin{aligned} \rho_{n+1} = \sum_{m = n-N}^n \alpha_m \rho_m \end{aligned}$$ -You might be confused by the absence of any $\rho_m^\mathrm{new}$ -in the creation of $\rho_{n+1}$, as if the iteration's outputs are being ignored. +You might be confused by the absence of any $$\rho_m^\mathrm{new}$$ +in the creation of $$\rho_{n+1}$$, as if the iteration's outputs are being ignored. This is due to the first assumption, -which states that $\rho_n^\mathrm{new}$ and $\rho_n$ are already similar, +which states that $$\rho_n^\mathrm{new}$$ and $$\rho_n$$ are already similar, such that they are basically interchangeable. Speaking of which, about those assumptions: -while they will clearly become more accurate as $\rho_n$ approaches $\rho_*$, +while they will clearly become more accurate as $$\rho_n$$ approaches $$\rho_*$$, they might be very dubious in the beginning. A consequence of this is that the early iterations might get "trapped" -in a suboptimal subspace spanned by $\rho_1$, $\rho_2$, etc. -To say it another way, we would be varying $n$ coefficients $\alpha_m$ -to try to optimize a $D$-dimensional $\rho_{n+1}$, -where in general $D \gg n$, at least in the beginning. +in a suboptimal subspace spanned by $$\rho_1$$, $$\rho_2$$, etc. +To say it another way, we would be varying $$n$$ coefficients $$\alpha_m$$ +to try to optimize a $$D$$-dimensional $$\rho_{n+1}$$, +where in general $$D \gg n$$, at least in the beginning. There is an easy fix to this problem: -add a small amount of the raw residual $R_m$ -to "nudge" $\rho_{n+1}$ towards the right subspace, -where $\beta \in [0,1]$ is a tunable parameter: +add a small amount of the raw residual $$R_m$$ +to "nudge" $$\rho_{n+1}$$ towards the right subspace, +where $$\beta \in [0,1]$$ is a tunable parameter: $$\begin{aligned} \boxed{ @@ -146,7 +146,7 @@ $$\begin{aligned} } \end{aligned}$$ -In other words, we end up introducing a small amount of the raw outputs $\rho_m^\mathrm{new}$, +In other words, we end up introducing a small amount of the raw outputs $$\rho_m^\mathrm{new}$$, while still giving more weight to iterations with smaller residuals. Pulay mixing is very effective for certain types of problems, -- cgit v1.2.3