1 files changed, 14 insertions, 7 deletions
diff --git a/content/know/concept/pulay-mixing/index.pdc b/content/know/concept/pulay-mixing/index.pdc
index 9102c0e..8daa54f 100644
--- a/content/know/concept/pulay-mixing/index.pdc
+++ b/content/know/concept/pulay-mixing/index.pdc
@@ -83,13 +83,14 @@ We thus want to minimize the following quantity,
 where $\lambda$ is a [Lagrange multiplier](/know/concept/lagrange-multiplier/):
 
 $$\begin{aligned}
-	\braket{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m
-	= \sum_{m=1}^n \alpha_m \Big( \sum_{k=1}^n \alpha_k \braket{R_m}{R_k} + \lambda \Big)
+	\braket{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m^*
+	= \sum_{m=1}^n \alpha_m^* \Big( \sum_{k=1}^n \alpha_k \braket{R_m}{R_k} + \lambda \Big)
 \end{aligned}$$
 
-By differentiating the right-hand side with respect to $\alpha_m$,
+By differentiating the right-hand side with respect to $\alpha_m^*$
+and demanding that the result is zero,
 we get a system of equations that we can write in matrix form,
-which is relatively cheap to solve numerically:
+which is cheap to solve:
 
 $$\begin{aligned}
 	\begin{bmatrix}
@@ -107,6 +108,11 @@ $$\begin{aligned}
 	\end{bmatrix}
 \end{aligned}$$
 
+From this, we can also see that the Lagrange multiplier
+$\lambda = - \braket{R_{n+1}}{R_{n+1}}$,
+where $R_{n+1}$ is the *predicted* residual of the next iteration,
+subject to the two assumptions.
+
 This method is very effective.
 However, in practice, the earlier inputs $\rho_1$, $\rho_2$, etc.
 are much further from $\rho_*$ than $\rho_n$,
@@ -121,7 +127,7 @@ You might be confused by the absence of all $\rho_m^\mathrm{new}$
 in the creation of $\rho_{n+1}$, as if the iteration's outputs are being ignored.
 This is due to the first assumption,
 which states that $\rho_n^\mathrm{new}$ are $\rho_n$ are already similar,
-such that they are interchangeable.
+such that they are basically interchangeable.
 
 Speaking of which, about those assumptions:
 while they will clearly become more accurate as $\rho_n$ approaches $\rho_*$,
@@ -147,8 +153,9 @@ $$\begin{aligned}
 In other words, we end up introducing a small amount of the raw outputs $\rho_m^\mathrm{new}$,
 while still giving more weight to iterations with smaller residuals.
 
-Pulay mixing is very effective:
-it can accelerate convergence by up to one order of magnitude!
+Pulay mixing is very effective for certain types of problems,
+e.g. density functional theory,
+where it can accelerate convergence by up to one order of magnitude!