summaryrefslogtreecommitdiff
path: root/content/know/concept/ritz-method/index.pdc
diff options
context:
space:
mode:
Diffstat (limited to 'content/know/concept/ritz-method/index.pdc')
-rw-r--r--content/know/concept/ritz-method/index.pdc364
1 files changed, 364 insertions, 0 deletions
diff --git a/content/know/concept/ritz-method/index.pdc b/content/know/concept/ritz-method/index.pdc
new file mode 100644
index 0000000..320771a
--- /dev/null
+++ b/content/know/concept/ritz-method/index.pdc
@@ -0,0 +1,364 @@
+---
+title: "Ritz method"
+firstLetter: "R"
+publishDate: 2022-09-18
+categories:
+- Physics
+- Mathematics
+- Perturbation
+- Quantum mechanics
+- Numerical methods
+
+date: 2022-09-10T13:33:07+02:00
+draft: false
+markup: pandoc
+---
+
+# Ritz method
+
+In various branches of physics,
+the **Ritz method** is a technique to approximately find the lowest solutions to an eigenvalue problem.
+Some call it the **Rayleigh-Ritz method**, the **Ritz-Galerkin method**,
+or simply the **variational method**.
+
+
+
+## Background
+
+In the context of [variational calculus](/know/concept/calculus-of-variations/),
+consider the following functional to be optimized:
+
+$$\begin{aligned}
+ R[u]
+ = \frac{1}{S} \int_a^b p(x) \big|u_x(x)\big|^2 - q(x) \big|u(x)\big|^2 \dd{x}
+\end{aligned}$$
+
+Where $u(x) \in \mathbb{C}$ is the unknown function,
+and $p(x), q(x) \in \mathbb{R}$ are given.
+In addition, $S$ is the norm of $u$, which we demand be constant
+with respect to a weight function $w(x) \in \mathbb{R}$:
+
+$$\begin{aligned}
+ S
+ = \int_a^b w(x) \big|u(x)\big|^2 \dd{x}
+\end{aligned}$$
+
+To handle this normalization requirement,
+we introduce a [Lagrange multiplier](/know/concept/lagrange-multiplier/) $\lambda$,
+and define the Lagrangian $\Lambda$ for the full constrained optimization problem as:
+
+$$\begin{aligned}
+ \Lambda
+ \equiv \frac{1}{S} \bigg( \big( p |u_x|^2 - q |u|^2 \big) - \lambda \big( w |u|^2 \big) \bigg)
+\end{aligned}$$
+
+The resulting Euler-Lagrange equation is then calculated in the standard way, yielding:
+
+$$\begin{aligned}
+ 0
+ &= \pdv{\Lambda}{u^*} - \dv{x} \Big( \pdv{\Lambda}{u_x^*} \Big)
+ \\
+ &= - \frac{1}{S} \bigg( q u + \lambda w u + \dv{x} \big( p u_x \big) \bigg)
+\end{aligned}$$
+
+Which is clearly satisfied if and only if the following equation is fulfilled:
+
+$$\begin{aligned}
+ \dv{x} \big( p u_x \big) + q u
+ = - \lambda w u
+\end{aligned}$$
+
+This has the familiar form of a [Sturm-Liouville problem](/know/concept/sturm-liouville-theory/) (SLP),
+with $\lambda$ representing an eigenvalue.
+SLPs have useful properties, but before we can take advantage of those,
+we need to handle an important detail: the boundary conditions (BCs) on $u$.
+The above equation is only a valid SLP for certain BCs,
+as seen in the derivation of Sturm-Liouville theory.
+
+Let us return to the definition of $R[u]$,
+and integrate it by parts:
+
+$$\begin{aligned}
+ R[u]
+ &= \frac{1}{S} \int_a^b p u_x u_x^* - q u u^* \dd{x}
+ \\
+ &= \frac{1}{S} \Big[ p u_x u^* \Big]_a^b - \frac{1}{S} \int_a^b \dv{x} \Big(p u_x\Big) u^* + q u u^* \dd{x}
+\end{aligned}$$
+
+The boundary term vanishes for a subset of the BCs that make a valid SLP,
+including Dirichlet BCs $u(a) = u(b) = 0$, Neumann BCs $u_x(a) = u_x(b) = 0$, and periodic BCs.
+Therefore, we assume that this term does indeed vanish,
+such that we can use Sturm-Liouville theory later:
+
+$$\begin{aligned}
+ R[u]
+ &= - \frac{1}{S} \int_a^b \bigg( \dv{x} \Big(p u_x\Big) + q u \bigg) u^* \dd{x}
+ \equiv - \frac{1}{S} \int_a^b u^* \hat{H} u \dd{x}
+\end{aligned}$$
+
+Where $\hat{H}$ is the self-adjoint Sturm-Liouville operator.
+Because the constrained Euler-Lagrange equation is now an SLP,
+we know that it has an infinite number of real discrete eigenvalues $\lambda_n$ with a lower bound,
+corresponding to mutually orthogonal eigenfunctions $u_n(x)$.
+
+To understand the significance of this result,
+suppose we have solved the SLP,
+and now insert one of the eigenfunctions $u_n$ into $R$:
+
+$$\begin{aligned}
+ R[u_n]
+ &= - \frac{1}{S_n} \int_a^b u_n^* \hat{H} u_n \dd{x}
+ = \frac{1}{S_n} \int_a^b u_n^* \lambda_n w u_n \dd{x}
+ \\
+ &= \frac{1}{S_n} \lambda_n \int_a^b w |u_n|^2 \dd{x}
+ = \frac{S_n}{S_n} \lambda_n
+\end{aligned}$$
+
+Where $S_n$ is the normalization of $u_n$.
+In other words, when given $u_n$,
+the functional $R$ yields the corresponding eigenvalue $\lambda_n$:
+
+$$\begin{aligned}
+ \boxed{
+ R[u_n]
+ = \lambda_n
+ }
+\end{aligned}$$
+
+This powerful result was not at all clear from $R$'s initial definition.
+
+
+
+## Justification
+
+But what if we do not know the eigenfunctions? Is $R$ still useful?
+Yes, as we shall see. Suppose we make an educated guess $u(x)$
+for the ground state (i.e. lowest-eigenvalue) solution $u_0(x)$:
+
+$$\begin{aligned}
+ u(x)
+ = u_0(x) + \sum_{n = 1}^\infty c_n u_n(x)
+\end{aligned}$$
+
+Here, we are using the fact that the eigenfunctions of an SLP form a complete set,
+so our (known) guess $u$ can be expanded in the true (unknown) eigenfunctions $u_n$.
+We are assuming that $u$ is already quite close to its target $u_0$,
+such that the (unknown) expansion coefficients $c_n$ are small;
+specifically $|c_n|^2 \ll 1$.
+Let us start from what we know:
+
+$$\begin{aligned}
+ \boxed{
+ R[u]
+ = - \frac{\displaystyle\int u^* \hat{H} u \dd{x}}{\displaystyle\int u^* w u \dd{x}}
+ }
+\end{aligned}$$
+
+This quantity is known as the **Rayleigh quotient**.
+Inserting our ansatz $u$,
+and using that the true $u_n$ have corresponding eigenvalues $\lambda_n$:
+
+$$\begin{aligned}
+ R[u]
+ &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \: \hat{H} \Big\{ u_0 + \sum_n c_n u_n \Big\} \dd{x}}
+ {\displaystyle\int w \Big( u_0 + \sum_n c_n u_n \Big) \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \dd{x}}
+ \\
+ &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( \!-\! \lambda_0 w u_0 - \sum_n c_n \lambda_n w u_n \Big) \dd{x}}
+ {\displaystyle\int w \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( u_0 + \sum_n c_n u_n \Big) \dd{x}}
+\end{aligned}$$
+
+For convenience, we switch to [Dirac notation](/know/concept/dirac-notation/)
+before evaluating further.
+
+$$\begin{aligned}
+ R
+ &= \frac{\displaystyle \Big( \bra{u_0} + \sum_n c_n^* \bra{u_n} \Big) \cdot \Big( \lambda_0 \ket{w u_0} + \sum_n c_n \lambda_n \ket{w u_n} \Big)}
+ {\displaystyle \Big( \bra{u_0} + \sum_n c_n^* \bra{u_n} \Big) \cdot \Big( \ket{w u_0} + \sum_n c_n \ket{w u_n} \Big)}
+ \\
+ &= \frac{\displaystyle \lambda_0 \braket{u_0}{w u_0} + \lambda_0 \sum_{n = 1}^\infty c_n^* \braket{u_n}{w u_0}
+ + \sum_{n = 1}^\infty c_n \lambda_n \braket{u_0}{w u_n} + \sum_{m n} c_n c_m^* \lambda_n \braket{u_m}{w u_n}}
+ {\displaystyle \braket{u_0}{w u_0} + \sum_{n = 1}^\infty c_n^* \braket{u_n}{w u_0}
+ + \sum_{n = 1}^\infty c_n \braket{u_0}{w u_n} + \sum_{m n} c_n c_m^* \braket{u_m}{w u_n}}
+\end{aligned}$$
+
+Using orthogonality $\braket{u_m}{w u_n} = S_n \delta_{mn}$,
+and the fact that $n \neq 0$ by definition, we find:
+
+$$\begin{aligned}
+ R
+ &= \frac{\displaystyle \lambda_0 S_0 + \lambda_0 \sum_n c_n^* S_n \delta_{n0}
+ + \sum_n c_n \lambda_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* \lambda_n S_n \delta_{mn}}
+ {\displaystyle S_0 + \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* S_n \delta_{mn}}
+ \\
+ &= \frac{\displaystyle \lambda_0 S_0 + 0 + 0 + \sum_{n} c_n c_n^* \lambda_n S_n}
+ {\displaystyle S_0 + 0 + 0 + \sum_{n} c_n c_n^* S_n}
+ = \frac{\displaystyle \lambda_0 S_0 + \sum_{n} |c_n|^2 \lambda_n S_n}
+ {\displaystyle S_0 + \sum_{n} |c_n|^2 S_n}
+\end{aligned}$$
+
+It is always possible to choose our normalizations such that $S_n = S$ for all $u_n$, leaving:
+
+$$\begin{aligned}
+ R
+ &= \frac{\displaystyle \lambda_0 S + \sum_{n} |c_n|^2 \lambda_n S}
+ {\displaystyle S + \sum_{n} |c_n|^2 S}
+ = \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_n}
+ {\displaystyle 1 + \sum_{n} |c_n|^2}
+\end{aligned}$$
+
+And finally, after rearranging the numerator, we arrive at the following relation:
+
+$$\begin{aligned}
+ R
+ &= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_0 + \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)}
+ {\displaystyle 1 + \sum_{n} |c_n|^2}
+ = \lambda_0 + \frac{\displaystyle \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)}
+ {\displaystyle 1 + \sum_{n} |c_n|^2}
+\end{aligned}$$
+
+Thus, if we improve our guess $u$,
+then $R[u]$ approaches the true eigenvalue $\lambda_0$.
+For numerically finding $u_0$ and $\lambda_0$, this gives us a clear goal: minimize $R$, because:
+
+$$\begin{aligned}
+ \boxed{
+ R[u]
+ = \lambda_0 + \frac{\displaystyle \sum_{n = 1}^\infty |c_n|^2 (\lambda_n - \lambda_0)}
+ {\displaystyle 1 + \sum_{n = 1}^\infty |c_n|^2}
+ \ge \lambda_0
+ }
+\end{aligned}$$
+
+Note that the convergence to $\lambda_0$ goes as $|c_n|^2$,
+while $u$ converges to $u_0$ as $|c_n|$ by definition,
+so even a fairly bad guess $u$ will give a decent estimate for $\lambda_0$.
+
+
+
+## The method
+
+In the following, we stick to Dirac notation,
+since the results hold for both continuous functions $u(x)$ and discrete vectors $\vb{u}$,
+as long as the operator $\hat{H}$ is self-adjoint.
+Suppose we express our guess $\ket{u}$ as a linear combination
+of *known* basis vectors $\ket{f_n}$ with weights $a_n \in \mathbb{C}$:
+
+$$\begin{aligned}
+ \ket{u}
+ = \sum_{n = 0}^\infty c_n \ket{u_n}
+ = \sum_{n = 0}^\infty a_n \ket{f_n}
+ \approx \sum_{n = 0}^{N - 1} a_n \ket{f_n}
+\end{aligned}$$
+
+For numerical tractability, we truncate the sum at $N$ terms,
+and for generality, we allow $\ket{f_n}$ to be non-orthogonal,
+as described by an *overlap matrix* with elements $S_{mn}$:
+
+$$\begin{aligned}
+ \braket{f_m}{w f_n} = S_{m n}
+\end{aligned}$$
+
+From the discussion above,
+we know that the ground-state eigenvalue $\lambda_0$ is estimated by:
+
+$$\begin{aligned}
+ \lambda_0
+ \approx \lambda
+ = R[u]
+ = \frac{\braket*{u}{\hat{H} u}}{\braket{u}{w u}}
+ = \frac{\displaystyle \sum_{m n} a_m^* a_n \braket*{f_m}{\hat{H} f_n}}{\displaystyle \sum_{m n} a_m^* a_n \braket{f_m}{w f_n}}
+ \equiv \frac{\displaystyle \sum_{m n} a_m^* a_n H_{m n}}{\displaystyle \sum_{m n} a_m^* a_n S_{mn}}
+\end{aligned}$$
+
+And we also know that our goal is to minimize $R[u]$,
+so we vary $a_k^*$ to find its extremum:
+
+$$\begin{aligned}
+ 0
+ = \pdv{R}{a_k^*}
+ &= \frac{\displaystyle \Big( \sum_{n} a_n H_{k n} \Big) \Big( \sum_{m n} a_n a_m^* S_{mn} \Big)
+ - \Big( \sum_{n} a_n S_{k n} \Big) \Big( \sum_{m n} a_n a_m^* H_{mn} \Big)}
+ {\Big( \displaystyle \sum_{m n} a_n a_m^* S_{mn} \Big)^2}
+ \\
+ &= \frac{\displaystyle \Big( \sum_{n} a_n H_{k n} \Big) - R[u] \Big( \sum_{n} a_n S_{k n}\Big)}{\braket{u}{w u}}
+ = \frac{\displaystyle \sum_{n} a_n \big(H_{k n} - \lambda S_{k n}\big)}{\braket{u}{w u}}
+\end{aligned}$$
+
+Clearly, this is only satisfied if the following holds for all $k = 0, 1, ..., N\!-\!1$:
+
+$$\begin{aligned}
+ 0
+ = \sum_{n = 0}^{N - 1} a_n \big(H_{k n} - \lambda S_{k n}\big)
+\end{aligned}$$
+
+For illustrative purposes,
+we can write this as a matrix equation
+with $M_{k n} \equiv H_{k n} - \lambda S_{k n}$:
+
+$$\begin{aligned}
+ \begin{bmatrix}
+ M_{0,0} & M_{0,1} & \cdots & M_{0,N-1} \\
+ M_{1,0} & \ddots & & \vdots \\
+ \vdots & & \ddots & \vdots \\
+ M_{N-1,0} & \cdots & \cdots & M_{N-1,N-1}
+ \end{bmatrix}
+ \cdot
+ \begin{bmatrix}
+ a_0 \\ a_1 \\ \vdots \\ a_{N-1}
+ \end{bmatrix}
+ =
+ \begin{bmatrix}
+ 0 \\ 0 \\ \vdots \\ 0
+ \end{bmatrix}
+\end{aligned}$$
+
+Note that this looks like an eigenvalue problem for $\lambda$.
+Indeed, demanding that $\overline{M}$ cannot simply be inverted
+(i.e. the solution is non-trivial)
+yields a characteristic polynomial for $\lambda$:
+
+$$\begin{aligned}
+ 0
+ = \det\!\Big[ \overline{M} \Big]
+ = \det\!\Big[ \overline{H} - \lambda \overline{S} \Big]
+\end{aligned}$$
+
+This gives a set of $\lambda$,
+which are the exact eigenvalues of $\overline{H}$,
+and the estimated eigenvalues of $\hat{H}$
+(recall that $\overline{H}$ is $\hat{H}$ expressed in a truncated basis).
+The eigenvector $\big[ a_0, a_1, ..., a_{N-1} \big]$ of the lowest $\lambda$
+gives the optimal weights to approximate $\ket{u_0}$ in the basis $\{\ket{f_n}\}$.
+Likewise, the higher $\lambda$'s eigenvectors approximate
+excited (i.e. non-ground) eigenstates of $\hat{H}$,
+although in practice the results are less accurate the higher we go.
+
+The overall accuracy is determined by how good our truncated basis is,
+i.e. how large a subspace it spans
+of the [Hilbert space](/know/concept/hilbert-space/) in which the true $\ket{u_0}$ resides.
+Clearly, adding more basis vectors will improve the results,
+at the cost of computation.
+For example, if $\hat{H}$ represents a helium atom,
+a good choice for $\{\ket{f_n}\}$ would be hydrogen orbitals,
+since those are qualitatively similar.
+
+You may find this result unsurprising;
+it makes some intuitive sense that approximating $\hat{H}$
+in a limited basis would yield a matrix $\overline{H}$ giving rough eigenvalues.
+The point of this discussion is to rigorously show
+the validity of this approach.
+
+Nowadays, there exist many other methods to calculate eigenvalues
+of complicated operators $\hat{H}$,
+but an attractive feature of the Ritz method is that it is single-step,
+whereas its competitors tend to be iterative.
+
+
+
+## References
+1. G.B. Arfken, H.J. Weber,
+ *Mathematical methods for physicists*, 6th edition, 2005,
+ Elsevier.
+2. O. Bang,
+ *Applied mathematics for physicists: lecture notes*, 2019,
+ unpublished.