1 files changed, 101 insertions, 100 deletions
diff --git a/source/know/concept/ritz-method/index.md b/source/know/concept/ritz-method/index.md
index 902b7cf..ef694da 100644
--- a/source/know/concept/ritz-method/index.md
+++ b/source/know/concept/ritz-method/index.md
@@ -25,25 +25,26 @@ consider the following functional to be optimized:
 
 $$\begin{aligned}
     R[u]
-    = \frac{1}{S} \int_a^b p(x) \big|u_x(x)\big|^2 - q(x) \big|u(x)\big|^2 \dd{x}
+    \equiv \frac{1}{S} \int_a^b p(x) \big|u_x(x)\big|^2 - q(x) \big|u(x)\big|^2 \dd{x}
 \end{aligned}$$
 
 Where $$u(x) \in \mathbb{C}$$ is the unknown function,
 and $$p(x), q(x) \in \mathbb{R}$$ are given.
-In addition, $$S$$ is the norm of $$u$$, which we demand be constant
+In addition, $$S$$ is the norm of $$u$$, which we take to be constant
 with respect to a weight function $$w(x) \in \mathbb{R}$$:
 
 $$\begin{aligned}
     S
-    = \int_a^b w(x) \big|u(x)\big|^2 \dd{x}
+    \equiv \int_a^b w(x) \big|u(x)\big|^2 \dd{x}
 \end{aligned}$$
 
-To handle this normalization requirement,
-we introduce a [Lagrange multiplier](/know/concept/lagrange-multiplier/) $$\lambda$$,
-and define the Lagrangian $$\Lambda$$ for the full constrained optimization problem as:
+This normalization requirement acts as a constraint
+to the optimization problem for $$R[u]$$,
+so we introduce a [Lagrange multiplier](/know/concept/lagrange-multiplier/) $$\lambda$$,
+and define the Lagrangian $$\mathcal{L}$$ for the full problem as:
 
 $$\begin{aligned}
-    \Lambda
+    \mathcal{L}
     \equiv \frac{1}{S} \bigg( \big( p |u_x|^2 - q |u|^2 \big) - \lambda \big( w |u|^2 \big) \bigg)
 \end{aligned}$$
 
@@ -51,7 +52,7 @@ The resulting Euler-Lagrange equation is then calculated in the standard way, yi
 
 $$\begin{aligned}
     0
-    &= \pdv{\Lambda}{u^*} - \dv{}{x}\Big( \pdv{\Lambda}{u_x^*} \Big)
+    &= \pdv{\mathcal{L}}{u^*} - \dv{}{x}\Big( \pdv{\mathcal{L}}{u_x^*} \Big)
     \\
     &= - \frac{1}{S} \bigg( q u + \lambda w u + \dv{}{x}\big( p u_x \big) \bigg)
 \end{aligned}$$
@@ -69,15 +70,14 @@ SLPs have useful properties, but before we can take advantage of those,
 we need to handle an important detail: the boundary conditions (BCs) on $$u$$.
 The above equation is only a valid SLP for certain BCs,
 as seen in the derivation of Sturm-Liouville theory.
-
-Let us return to the definition of $$R[u]$$,
+Let us return to the definition of $$R$$,
 and integrate it by parts:
 
 $$\begin{aligned}
     R[u]
     &= \frac{1}{S} \int_a^b p u_x u_x^* - q u u^* \dd{x}
     \\
-    &= \frac{1}{S} \Big[ p u_x u^* \Big]_a^b - \frac{1}{S} \int_a^b \dv{}{x}\Big(p u_x\Big) u^* + q u u^* \dd{x}
+    &= \frac{1}{S} \Big[ p u_x u^* \Big]_a^b - \frac{1}{N} \int_a^b \dv{}{x}\Big(p u_x\Big) u^* + q u u^* \dd{x}
 \end{aligned}$$
 
 The boundary term vanishes for a subset of the BCs that make a valid SLP,
@@ -88,10 +88,11 @@ such that we can use Sturm-Liouville theory later:
 $$\begin{aligned}
     R[u]
     &= - \frac{1}{S} \int_a^b \bigg( \dv{}{x}\Big(p u_x\Big) + q u \bigg) u^* \dd{x}
-    \equiv - \frac{1}{S} \int_a^b u^* \hat{H} u \dd{x}
+    \\
+    &\equiv - \frac{1}{S} \int_a^b u^* \hat{L} u \dd{x}
 \end{aligned}$$
 
-Where $$\hat{H}$$ is the self-adjoint Sturm-Liouville operator.
+Where $$\hat{L}$$ is the self-adjoint Sturm-Liouville operator.
 Because the constrained Euler-Lagrange equation is now an SLP,
 we know that it has an infinite number of real discrete eigenvalues $$\lambda_n$$ with a lower bound,
 corresponding to mutually orthogonal eigenfunctions $$u_n(x)$$.
@@ -102,16 +103,16 @@ and now insert one of the eigenfunctions $$u_n$$ into $$R$$:
 
 $$\begin{aligned}
     R[u_n]
-    &= - \frac{1}{S_n} \int_a^b u_n^* \hat{H} u_n \dd{x}
-    = \frac{1}{S_n} \int_a^b u_n^* \lambda_n w u_n \dd{x}
+    &= - \frac{1}{S_n} \int_a^b u_n^* \hat{L} u_n \dd{x}
+    \\
+    &= \frac{1}{S_n} \int_a^b \lambda_n w |u_n|^2 \dd{x}
     \\
-    &= \frac{1}{S_n} \lambda_n \int_a^b w |u_n|^2 \dd{x}
-    = \frac{S_n}{S_n} \lambda_n
+    &= \frac{S_n}{S_n} \lambda_n
 \end{aligned}$$
 
 Where $$S_n$$ is the normalization of $$u_n$$.
-In other words, when given $$u_n$$,
-the functional $$R$$ yields the corresponding eigenvalue $$\lambda_n$$:
+In other words, when given $$u_n$$ as input,
+the functional $$R$$ returns the corresponding eigenvalue $$\lambda_n$$:
 
 $$\begin{aligned}
     \boxed{
@@ -121,6 +122,11 @@ $$\begin{aligned}
 \end{aligned}$$
 
 This powerful result was not at all clear from $$R$$'s initial definition.
+Note that some authors use the opposite sign for $$\lambda$$ in their SLP definition,
+in which case this result can still be obtained
+simply by also defining $$R$$ with the opposite sign.
+This sign choice is consistent with quantum mechanics,
+with the Hamiltonian $$\hat{H} = - \hat{L}$$.
 
 
 
@@ -137,81 +143,79 @@ $$\begin{aligned}
 
 Here, we are using the fact that the eigenfunctions of an SLP form a complete set,
 so our (known) guess $$u$$ can be expanded in the true (unknown) eigenfunctions $$u_n$$.
-We are assuming that $$u$$ is already quite close to its target $$u_0$$,
-such that the (unknown) expansion coefficients $$c_n$$ are small;
-specifically $$|c_n|^2 \ll 1$$.
-Let us start from what we know:
+Next, by definition:
 
 $$\begin{aligned}
     \boxed{
         R[u]
-        = - \frac{\displaystyle\int u^* \hat{H} u \dd{x}}{\displaystyle\int u^* w u \dd{x}}
+        = - \frac{\displaystyle\int u^* \hat{L} u \dd{x}}{\displaystyle\int u^* w u \dd{x}}
     }
 \end{aligned}$$
 
-This quantity is known as the **Rayleigh quotient**.
+This quantity is known as the **Rayleigh quotient**,
+and again beware of the sign in its definition; see the remark above.
 Inserting our ansatz $$u$$,
-and using that the true $$u_n$$ have corresponding eigenvalues $$\lambda_n$$:
+and using that the true $$u_n$$ have corresponding eigenvalues $$\lambda_n$$,
+we have:
 
 $$\begin{aligned}
     R[u]
-    &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \: \hat{H} \Big\{ u_0 + \sum_n c_n u_n \Big\} \dd{x}}
+    &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \: \hat{L} \Big\{ u_0 + \sum_n c_n u_n \Big\} \dd{x}}
     {\displaystyle\int w \Big( u_0 + \sum_n c_n u_n \Big) \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \dd{x}}
     \\
-    &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( \!-\! \lambda_0 w u_0 - \sum_n c_n \lambda_n w u_n \Big) \dd{x}}
+    &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big)
+    \Big( \!-\! \lambda_0 w u_0 - \sum_n c_n \lambda_n w u_n \Big) \dd{x}}
     {\displaystyle\int w  \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( u_0 + \sum_n c_n u_n \Big) \dd{x}}
 \end{aligned}$$
 
 For convenience, we switch to [Dirac notation](/know/concept/dirac-notation/)
-before evaluating further.
+before evaluating further:
 
 $$\begin{aligned}
-    R
-    &= \frac{\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \cdot \Big( \lambda_0 \Ket{w u_0} + \sum_n c_n \lambda_n \Ket{w u_n} \Big)}
-    {\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \cdot \Big( \Ket{w u_0} + \sum_n c_n \Ket{w u_n} \Big)}
+    R[u]
+    &= \frac{\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big)
+    \Big( \lambda_0 \Ket{w u_0} + \sum_n c_n \lambda_n \Ket{w u_n} \Big)}
+    {\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \Big( \Ket{w u_0} + \sum_n c_n \Ket{w u_n} \Big)}
     \\
-    &= \frac{\displaystyle \lambda_0 \Inprod{u_0}{w u_0} + \lambda_0 \sum_{n = 1}^\infty c_n^* \Inprod{u_n}{w u_0}
-    + \sum_{n = 1}^\infty c_n \lambda_n \Inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \lambda_n \Inprod{u_m}{w u_n}}
-    {\displaystyle \Inprod{u_0}{w u_0} + \sum_{n = 1}^\infty c_n^* \Inprod{u_n}{w u_0}
-    + \sum_{n = 1}^\infty c_n \Inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \Inprod{u_m}{w u_n}}
+    &= \frac{\displaystyle \lambda_0 \inprod{u_0}{w u_0} + \lambda_0 \sum_{n} c_n^* \inprod{u_n}{w u_0}
+    + \sum_{n} c_n \lambda_n \inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \lambda_n \inprod{u_m}{w u_n}}
+    {\displaystyle \inprod{u_0}{w u_0} + \sum_{n} c_n^* \inprod{u_n}{w u_0}
+    + \sum_{n} c_n \inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \inprod{u_m}{w u_n}}
 \end{aligned}$$
 
-Using orthogonality $$\Inprod{u_m}{w u_n} = S_n \delta_{mn}$$,
+Using orthogonality $$\inprod{u_m}{w u_n} = S_n \delta_{mn}$$,
 and the fact that $$n \neq 0$$ by definition, we find:
 
 $$\begin{aligned}
-    R
+    R[u]
     &= \frac{\displaystyle \lambda_0 S_0 + \lambda_0 \sum_n c_n^* S_n \delta_{n0}
     + \sum_n c_n \lambda_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* \lambda_n S_n \delta_{mn}}
     {\displaystyle S_0 + \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* S_n \delta_{mn}}
     \\
-    &= \frac{\displaystyle \lambda_0 S_0 + 0 + 0 + \sum_{n} c_n c_n^* \lambda_n S_n}
-    {\displaystyle S_0 + 0 + 0 + \sum_{n} c_n c_n^* S_n}
-    = \frac{\displaystyle \lambda_0 S_0 + \sum_{n} |c_n|^2 \lambda_n S_n}
+    &= \frac{\displaystyle \lambda_0 S_0 + \sum_{n} |c_n|^2 \lambda_n S_n}
     {\displaystyle S_0 + \sum_{n} |c_n|^2 S_n}
 \end{aligned}$$
 
 It is always possible to choose our normalizations such that $$S_n = S$$ for all $$u_n$$, leaving:
 
 $$\begin{aligned}
-    R
-    &= \frac{\displaystyle \lambda_0 S + \sum_{n} |c_n|^2 \lambda_n S}
-    {\displaystyle S + \sum_{n} |c_n|^2 S}
-    = \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_n}
+    R[u]
+    &= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_n}
     {\displaystyle 1 + \sum_{n} |c_n|^2}
 \end{aligned}$$
 
 And finally, after rearranging the numerator, we arrive at the following relation:
 
 $$\begin{aligned}
-    R
+    R[u]
     &= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_0 + \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)}
     {\displaystyle 1 + \sum_{n} |c_n|^2}
-    = \lambda_0 + \frac{\displaystyle \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)}
+    \\
+    &= \lambda_0 + \frac{\displaystyle \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)}
     {\displaystyle 1 + \sum_{n} |c_n|^2}
 \end{aligned}$$
 
-Thus, if we improve our guess $$u$$,
+Thus, if we improve our guess $$u$$ (i.e. reduce $$|c_n|$$),
 then $$R[u]$$ approaches the true eigenvalue $$\lambda_0$$.
 For numerically finding $$u_0$$ and $$\lambda_0$$, this gives us a clear goal: minimize $$R$$, because:
 
@@ -228,19 +232,21 @@ In the context of quantum mechanics, this is not surprising,
 since any superposition of multiple states
 is guaranteed to have a higher energy than the ground state.
 
-Note that the convergence to $$\lambda_0$$ goes as $$|c_n|^2$$,
+As our guess $$u$$ is improved, $$\lambda_0$$ converges as $$|c_n|^2$$,
 while $$u$$ converges to $$u_0$$ as $$|c_n|$$ by definition,
-so even a fairly bad guess $$u$$ will give a decent estimate for $$\lambda_0$$.
+so even a fairly bad ansatz $$u$$ gives a decent estimate for $$\lambda_0$$.
 
 
 
 ## The method
 
 In the following, we stick to Dirac notation,
-since the results hold for both continuous functions $$u(x)$$ and discrete vectors $$\vb{u}$$,
-as long as the operator $$\hat{H}$$ is self-adjoint.
+since the results hold for both continuous functions $$u(x)$$
+and discrete vectors $$\vb{u}$$,
+as long as the operator $$\hat{L}$$ is self-adjoint.
 Suppose we express our guess $$\Ket{u}$$ as a linear combination
-of *known* basis vectors $$\Ket{f_n}$$ with weights $$a_n \in \mathbb{C}$$:
+of *known* basis vectors $$\Ket{f_n}$$ with weights $$a_n \in \mathbb{C}$$,
+where $$\Ket{f_n}$$ are not necessarily eigenvectors of $$\hat{L}$$:
 
 $$\begin{aligned}
     \Ket{u}
@@ -250,11 +256,11 @@ $$\begin{aligned}
 \end{aligned}$$
 
 For numerical tractability, we truncate the sum at $$N$$ terms,
-and for generality, we allow $$\Ket{f_n}$$ to be non-orthogonal,
+and for generality we allow $$\Ket{f_n}$$ to be non-orthogonal,
 as described by an *overlap matrix* with elements $$S_{mn}$$:
 
 $$\begin{aligned}
-    \Inprod{f_m}{w f_n} = S_{m n}
+    \inprod{f_m}{w f_n} = S_{m n}
 \end{aligned}$$
 
 From the discussion above,
@@ -262,11 +268,10 @@ we know that the ground-state eigenvalue $$\lambda_0$$ is estimated by:
 
 $$\begin{aligned}
     \lambda_0
-    \approx \lambda
-    = R[u]
-    = \frac{\inprod{u}{\hat{H} u}}{\Inprod{u}{w u}}
-    = \frac{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{\hat{H} f_n}}{\displaystyle \sum_{m n} a_m^* a_n \Inprod{f_m}{w f_n}}
-    \equiv \frac{\displaystyle \sum_{m n} a_m^* a_n H_{m n}}{\displaystyle \sum_{m n} a_m^* a_n S_{mn}}
+    \approx R[u]
+    = - \frac{\inprod{u}{\hat{L} u}}{\inprod{u}{w u}}
+    = - \frac{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{\hat{L} f_n}}{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{w f_n}}
+    \equiv - \frac{\displaystyle \sum_{m n} a_m^* a_n L_{m n}}{\displaystyle \sum_{m n} a_m^* a_n S_{mn}}
 \end{aligned}$$
 
 And we also know that our goal is to minimize $$R[u]$$,
@@ -274,25 +279,27 @@ so we vary $$a_k^*$$ to find its extremum:
 
 $$\begin{aligned}
     0
-    = \pdv{R}{a_k^*}
-    &= \frac{\displaystyle \Big( \sum_{n} a_n H_{k n} \Big) \Big( \sum_{m n} a_n a_m^* S_{mn} \Big)
-    - \Big( \sum_{n} a_n S_{k n} \Big) \Big( \sum_{m n} a_n a_m^* H_{mn} \Big)}
+    = - \pdv{R}{a_k^*}
+    &= \frac{\displaystyle \Big( \sum_{n} a_n L_{k n} \Big) \Big( \sum_{m n} a_n a_m^* S_{mn} \Big)
+    - \Big( \sum_{n} a_n S_{k n} \Big) \Big( \sum_{m n} a_n a_m^* L_{mn} \Big)}
     {\Big( \displaystyle \sum_{m n} a_n a_m^* S_{mn} \Big)^2}
     \\
-    &= \frac{\displaystyle \Big( \sum_{n} a_n H_{k n} \Big) - R[u] \Big( \sum_{n} a_n S_{k n}\Big)}{\Inprod{u}{w u}}
-    = \frac{\displaystyle \sum_{n} a_n \big(H_{k n} - \lambda S_{k n}\big)}{\Inprod{u}{w u}}
+    &= \frac{\displaystyle \Big( \sum_{n} a_n L_{k n} \Big) - R[u] \Big( \sum_{n} a_n S_{k n}\Big)}
+    {\displaystyle \sum_{m n} a_n a_m^* S_{mn}}
+    \\
+    &= \sum_{n} a_n \frac{\big(L_{k n} - \lambda S_{k n}\big)}{\inprod{u}{w u}}
 \end{aligned}$$
 
 Clearly, this is only satisfied if the following holds for all $$k = 0, 1, ..., N\!-\!1$$:
 
 $$\begin{aligned}
     0
-    = \sum_{n = 0}^{N - 1} a_n \big(H_{k n} - \lambda S_{k n}\big)
+    = \sum_{n = 0}^{N - 1} a_n \big(L_{k n} - \lambda S_{k n}\big)
 \end{aligned}$$
 
 For illustrative purposes,
 we can write this as a matrix equation
-with $$M_{k n} \equiv H_{k n} - \lambda S_{k n}$$:
+with $$M_{k n} \equiv L_{k n} - \lambda S_{k n}$$:
 
 $$\begin{aligned}
     \begin{bmatrix}
@@ -311,53 +318,47 @@ $$\begin{aligned}
     \end{bmatrix}
 \end{aligned}$$
 
-Note that this looks like an eigenvalue problem for $$\lambda$$.
-Indeed, demanding that $$\overline{M}$$ cannot simply be inverted
-(i.e. the solution is non-trivial)
-yields a characteristic polynomial for $$\lambda$$:
+This looks like an eigenvalue problem for $$\lambda$$,
+so we demand that its determinant vanishes:
 
 $$\begin{aligned}
     0
-    = \det\!\Big[ \overline{M} \Big]
-    = \det\!\Big[ \overline{H} - \lambda \overline{S} \Big]
+    = \det\!\Big[ \bar{M} \Big]
+    = \det\!\Big[ \bar{L} - \lambda \bar{S} \Big]
 \end{aligned}$$
 
 This gives a set of $$\lambda$$,
-which are the exact eigenvalues of $$\overline{H}$$,
-and the estimated eigenvalues of $$\hat{H}$$
-(recall that $$\overline{H}$$ is $$\hat{H}$$ expressed in a truncated basis).
+which are exact eigenvalues of $$\bar{L}$$,
+and estimated eigenvalues of $$\hat{L}$$
+(recall that $$\bar{L}$$ is $$\hat{L}$$ expressed in a truncated basis).
 The eigenvector $$\big[ a_0, a_1, ..., a_{N-1} \big]$$ of the lowest $$\lambda$$
-gives the optimal weights to approximate $$\Ket{u_0}$$ in the basis $$\{\Ket{f_n}\}$$.
-Likewise, the higher $$\lambda$$'s eigenvectors approximate
-excited (i.e. non-ground) eigenstates of $$\hat{H}$$,
-although in practice the results are less accurate the higher we go.
+gives the optimal weights $$a_n$$ to approximate $$\Ket{u_0}$$ in the basis $$\{\Ket{f_n}\}$$.
+Likewise, the higher $$\lambda$$s' eigenvectors approximate
+excited (i.e. non-ground) eigenstates of $$\hat{L}$$,
+although in practice the results become less accurate the higher we go.
+If we only care about the ground state,
+then we already know $$\lambda$$ from $$R[u]$$,
+so we just need to solve the matrix equation for $$a_n$$.
 
-The overall accuracy is determined by how good our truncated basis is,
-i.e. how large a subspace it spans
-of the [Hilbert space](/know/concept/hilbert-space/) in which the true $$\Ket{u_0}$$ resides.
-Clearly, adding more basis vectors will improve the results,
-at the cost of computation.
-For example, if $$\hat{H}$$ represents a helium atom,
-a good choice for $$\{\Ket{f_n}\}$$ would be hydrogen orbitals,
-since those are qualitatively similar.
-
-You may find this result unsurprising;
-it makes some intuitive sense that approximating $$\hat{H}$$
-in a limited basis would yield a matrix $$\overline{H}$$ giving rough eigenvalues.
+You may find this result unsurprising:
+it makes some intuitive sense that approximating $$\hat{L}$$
+in a limited basis would yield a matrix $$\bar{L}$$ giving rough eigenvalues.
 The point of this discussion is to rigorously show
 the validity of this approach.
 
-If we only care about the ground state,
-then we already know $$\lambda$$ from $$R[u]$$,
-so all we need to do is solve the above matrix equation for $$a_n$$.
-Keep in mind that $$\overline{M}$$ is singular,
-and $$a_n$$ are only defined up to a constant factor.
-
 Nowadays, there exist many other methods to calculate eigenvalues
-of complicated operators $$\hat{H}$$,
+of complicated operators $$\hat{L}$$,
 but an attractive feature of the Ritz method is that it is single-step,
 whereas its competitors tend to be iterative.
-That said, the Ritz method cannot recover from a poorly chosen basis.
+That said, this method cannot recover from a poorly chosen basis $$\{\Ket{f_n}\}$$.
+
+Indeed, the overall accuracy is determined by how good our truncated basis is,
+i.e. how large a subspace it spans
+of the [Hilbert space](/know/concept/hilbert-space/) in which the true $$\Ket{u_0}$$ resides.
+Clearly, adding more basis vectors improves the results,
+but at a computational cost;
+it is usually more efficient to carefully choose *which* $$\ket{f_n}$$ to use,
+rather than just *how many*.