From 6ce0bb9a8f9fd7d169cbb414a9537d68c5290aae Mon Sep 17 00:00:00 2001 From: Prefetch Date: Fri, 14 Oct 2022 23:25:28 +0200 Subject: Initial commit after migration from Hugo --- source/know/concept/ritz-method/index.md | 369 +++++++++++++++++++++++++++++++ 1 file changed, 369 insertions(+) create mode 100644 source/know/concept/ritz-method/index.md (limited to 'source/know/concept/ritz-method') diff --git a/source/know/concept/ritz-method/index.md b/source/know/concept/ritz-method/index.md new file mode 100644 index 0000000..6ea41a4 --- /dev/null +++ b/source/know/concept/ritz-method/index.md @@ -0,0 +1,369 @@ +--- +title: "Ritz method" +date: 2022-09-18 +categories: +- Physics +- Mathematics +- Perturbation +- Quantum mechanics +- Numerical methods +layout: "concept" +--- + +In various branches of physics, +the **Ritz method** is a technique to approximately find the lowest solutions to an eigenvalue problem. +Some call it the **Rayleigh-Ritz method**, the **Ritz-Galerkin method**, +or simply the **variational method**. + + + +## Background + +In the context of [variational calculus](/know/concept/calculus-of-variations/), +consider the following functional to be optimized: + +$$\begin{aligned} + R[u] + = \frac{1}{S} \int_a^b p(x) \big|u_x(x)\big|^2 - q(x) \big|u(x)\big|^2 \dd{x} +\end{aligned}$$ + +Where $u(x) \in \mathbb{C}$ is the unknown function, +and $p(x), q(x) \in \mathbb{R}$ are given. +In addition, $S$ is the norm of $u$, which we demand be constant +with respect to a weight function $w(x) \in \mathbb{R}$: + +$$\begin{aligned} + S + = \int_a^b w(x) \big|u(x)\big|^2 \dd{x} +\end{aligned}$$ + +To handle this normalization requirement, +we introduce a [Lagrange multiplier](/know/concept/lagrange-multiplier/) $\lambda$, +and define the Lagrangian $\Lambda$ for the full constrained optimization problem as: + +$$\begin{aligned} + \Lambda + \equiv \frac{1}{S} \bigg( \big( p |u_x|^2 - q |u|^2 \big) - \lambda \big( w |u|^2 \big) \bigg) +\end{aligned}$$ + +The resulting Euler-Lagrange equation is then calculated in the standard way, yielding: + +$$\begin{aligned} + 0 + &= \pdv{\Lambda}{u^*} - \dv{}{x}\Big( \pdv{\Lambda}{u_x^*} \Big) + \\ + &= - \frac{1}{S} \bigg( q u + \lambda w u + \dv{}{x}\big( p u_x \big) \bigg) +\end{aligned}$$ + +Which is clearly satisfied if and only if the following equation is fulfilled: + +$$\begin{aligned} + \dv{}{x}\big( p u_x \big) + q u + = - \lambda w u +\end{aligned}$$ + +This has the familiar form of a [Sturm-Liouville problem](/know/concept/sturm-liouville-theory/) (SLP), +with $\lambda$ representing an eigenvalue. +SLPs have useful properties, but before we can take advantage of those, +we need to handle an important detail: the boundary conditions (BCs) on $u$. +The above equation is only a valid SLP for certain BCs, +as seen in the derivation of Sturm-Liouville theory. + +Let us return to the definition of $R[u]$, +and integrate it by parts: + +$$\begin{aligned} + R[u] + &= \frac{1}{S} \int_a^b p u_x u_x^* - q u u^* \dd{x} + \\ + &= \frac{1}{S} \Big[ p u_x u^* \Big]_a^b - \frac{1}{S} \int_a^b \dv{}{x}\Big(p u_x\Big) u^* + q u u^* \dd{x} +\end{aligned}$$ + +The boundary term vanishes for a subset of the BCs that make a valid SLP, +including Dirichlet BCs $u(a) = u(b) = 0$, Neumann BCs $u_x(a) = u_x(b) = 0$, and periodic BCs. +Therefore, we assume that this term does indeed vanish, +such that we can use Sturm-Liouville theory later: + +$$\begin{aligned} + R[u] + &= - \frac{1}{S} \int_a^b \bigg( \dv{}{x}\Big(p u_x\Big) + q u \bigg) u^* \dd{x} + \equiv - \frac{1}{S} \int_a^b u^* \hat{H} u \dd{x} +\end{aligned}$$ + +Where $\hat{H}$ is the self-adjoint Sturm-Liouville operator. +Because the constrained Euler-Lagrange equation is now an SLP, +we know that it has an infinite number of real discrete eigenvalues $\lambda_n$ with a lower bound, +corresponding to mutually orthogonal eigenfunctions $u_n(x)$. + +To understand the significance of this result, +suppose we have solved the SLP, +and now insert one of the eigenfunctions $u_n$ into $R$: + +$$\begin{aligned} + R[u_n] + &= - \frac{1}{S_n} \int_a^b u_n^* \hat{H} u_n \dd{x} + = \frac{1}{S_n} \int_a^b u_n^* \lambda_n w u_n \dd{x} + \\ + &= \frac{1}{S_n} \lambda_n \int_a^b w |u_n|^2 \dd{x} + = \frac{S_n}{S_n} \lambda_n +\end{aligned}$$ + +Where $S_n$ is the normalization of $u_n$. +In other words, when given $u_n$, +the functional $R$ yields the corresponding eigenvalue $\lambda_n$: + +$$\begin{aligned} + \boxed{ + R[u_n] + = \lambda_n + } +\end{aligned}$$ + +This powerful result was not at all clear from $R$'s initial definition. + + + +## Justification + +But what if we do not know the eigenfunctions? Is $R$ still useful? +Yes, as we shall see. Suppose we make an educated guess $u(x)$ +for the ground state (i.e. lowest-eigenvalue) solution $u_0(x)$: + +$$\begin{aligned} + u(x) + = u_0(x) + \sum_{n = 1}^\infty c_n u_n(x) +\end{aligned}$$ + +Here, we are using the fact that the eigenfunctions of an SLP form a complete set, +so our (known) guess $u$ can be expanded in the true (unknown) eigenfunctions $u_n$. +We are assuming that $u$ is already quite close to its target $u_0$, +such that the (unknown) expansion coefficients $c_n$ are small; +specifically $|c_n|^2 \ll 1$. +Let us start from what we know: + +$$\begin{aligned} + \boxed{ + R[u] + = - \frac{\displaystyle\int u^* \hat{H} u \dd{x}}{\displaystyle\int u^* w u \dd{x}} + } +\end{aligned}$$ + +This quantity is known as the **Rayleigh quotient**. +Inserting our ansatz $u$, +and using that the true $u_n$ have corresponding eigenvalues $\lambda_n$: + +$$\begin{aligned} + R[u] + &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \: \hat{H} \Big\{ u_0 + \sum_n c_n u_n \Big\} \dd{x}} + {\displaystyle\int w \Big( u_0 + \sum_n c_n u_n \Big) \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \dd{x}} + \\ + &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( \!-\! \lambda_0 w u_0 - \sum_n c_n \lambda_n w u_n \Big) \dd{x}} + {\displaystyle\int w \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( u_0 + \sum_n c_n u_n \Big) \dd{x}} +\end{aligned}$$ + +For convenience, we switch to [Dirac notation](/know/concept/dirac-notation/) +before evaluating further. + +$$\begin{aligned} + R + &= \frac{\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \cdot \Big( \lambda_0 \Ket{w u_0} + \sum_n c_n \lambda_n \Ket{w u_n} \Big)} + {\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \cdot \Big( \Ket{w u_0} + \sum_n c_n \Ket{w u_n} \Big)} + \\ + &= \frac{\displaystyle \lambda_0 \Inprod{u_0}{w u_0} + \lambda_0 \sum_{n = 1}^\infty c_n^* \Inprod{u_n}{w u_0} + + \sum_{n = 1}^\infty c_n \lambda_n \Inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \lambda_n \Inprod{u_m}{w u_n}} + {\displaystyle \Inprod{u_0}{w u_0} + \sum_{n = 1}^\infty c_n^* \Inprod{u_n}{w u_0} + + \sum_{n = 1}^\infty c_n \Inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \Inprod{u_m}{w u_n}} +\end{aligned}$$ + +Using orthogonality $\Inprod{u_m}{w u_n} = S_n \delta_{mn}$, +and the fact that $n \neq 0$ by definition, we find: + +$$\begin{aligned} + R + &= \frac{\displaystyle \lambda_0 S_0 + \lambda_0 \sum_n c_n^* S_n \delta_{n0} + + \sum_n c_n \lambda_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* \lambda_n S_n \delta_{mn}} + {\displaystyle S_0 + \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* S_n \delta_{mn}} + \\ + &= \frac{\displaystyle \lambda_0 S_0 + 0 + 0 + \sum_{n} c_n c_n^* \lambda_n S_n} + {\displaystyle S_0 + 0 + 0 + \sum_{n} c_n c_n^* S_n} + = \frac{\displaystyle \lambda_0 S_0 + \sum_{n} |c_n|^2 \lambda_n S_n} + {\displaystyle S_0 + \sum_{n} |c_n|^2 S_n} +\end{aligned}$$ + +It is always possible to choose our normalizations such that $S_n = S$ for all $u_n$, leaving: + +$$\begin{aligned} + R + &= \frac{\displaystyle \lambda_0 S + \sum_{n} |c_n|^2 \lambda_n S} + {\displaystyle S + \sum_{n} |c_n|^2 S} + = \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_n} + {\displaystyle 1 + \sum_{n} |c_n|^2} +\end{aligned}$$ + +And finally, after rearranging the numerator, we arrive at the following relation: + +$$\begin{aligned} + R + &= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_0 + \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)} + {\displaystyle 1 + \sum_{n} |c_n|^2} + = \lambda_0 + \frac{\displaystyle \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)} + {\displaystyle 1 + \sum_{n} |c_n|^2} +\end{aligned}$$ + +Thus, if we improve our guess $u$, +then $R[u]$ approaches the true eigenvalue $\lambda_0$. +For numerically finding $u_0$ and $\lambda_0$, this gives us a clear goal: minimize $R$, because: + +$$\begin{aligned} + \boxed{ + R[u] + = \lambda_0 + \frac{\displaystyle \sum_{n = 1}^\infty |c_n|^2 (\lambda_n - \lambda_0)} + {\displaystyle 1 + \sum_{n = 1}^\infty |c_n|^2} + \ge \lambda_0 + } +\end{aligned}$$ + +In the context of quantum mechanics, this is not surprising, +since any superposition of multiple states +is guaranteed to have a higher energy than the ground state. + +Note that the convergence to $\lambda_0$ goes as $|c_n|^2$, +while $u$ converges to $u_0$ as $|c_n|$ by definition, +so even a fairly bad guess $u$ will give a decent estimate for $\lambda_0$. + + + +## The method + +In the following, we stick to Dirac notation, +since the results hold for both continuous functions $u(x)$ and discrete vectors $\vb{u}$, +as long as the operator $\hat{H}$ is self-adjoint. +Suppose we express our guess $\Ket{u}$ as a linear combination +of *known* basis vectors $\Ket{f_n}$ with weights $a_n \in \mathbb{C}$: + +$$\begin{aligned} + \Ket{u} + = \sum_{n = 0}^\infty c_n \Ket{u_n} + = \sum_{n = 0}^\infty a_n \Ket{f_n} + \approx \sum_{n = 0}^{N - 1} a_n \Ket{f_n} +\end{aligned}$$ + +For numerical tractability, we truncate the sum at $N$ terms, +and for generality, we allow $\Ket{f_n}$ to be non-orthogonal, +as described by an *overlap matrix* with elements $S_{mn}$: + +$$\begin{aligned} + \Inprod{f_m}{w f_n} = S_{m n} +\end{aligned}$$ + +From the discussion above, +we know that the ground-state eigenvalue $\lambda_0$ is estimated by: + +$$\begin{aligned} + \lambda_0 + \approx \lambda + = R[u] + = \frac{\inprod{u}{\hat{H} u}}{\Inprod{u}{w u}} + = \frac{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{\hat{H} f_n}}{\displaystyle \sum_{m n} a_m^* a_n \Inprod{f_m}{w f_n}} + \equiv \frac{\displaystyle \sum_{m n} a_m^* a_n H_{m n}}{\displaystyle \sum_{m n} a_m^* a_n S_{mn}} +\end{aligned}$$ + +And we also know that our goal is to minimize $R[u]$, +so we vary $a_k^*$ to find its extremum: + +$$\begin{aligned} + 0 + = \pdv{R}{a_k^*} + &= \frac{\displaystyle \Big( \sum_{n} a_n H_{k n} \Big) \Big( \sum_{m n} a_n a_m^* S_{mn} \Big) + - \Big( \sum_{n} a_n S_{k n} \Big) \Big( \sum_{m n} a_n a_m^* H_{mn} \Big)} + {\Big( \displaystyle \sum_{m n} a_n a_m^* S_{mn} \Big)^2} + \\ + &= \frac{\displaystyle \Big( \sum_{n} a_n H_{k n} \Big) - R[u] \Big( \sum_{n} a_n S_{k n}\Big)}{\Inprod{u}{w u}} + = \frac{\displaystyle \sum_{n} a_n \big(H_{k n} - \lambda S_{k n}\big)}{\Inprod{u}{w u}} +\end{aligned}$$ + +Clearly, this is only satisfied if the following holds for all $k = 0, 1, ..., N\!-\!1$: + +$$\begin{aligned} + 0 + = \sum_{n = 0}^{N - 1} a_n \big(H_{k n} - \lambda S_{k n}\big) +\end{aligned}$$ + +For illustrative purposes, +we can write this as a matrix equation +with $M_{k n} \equiv H_{k n} - \lambda S_{k n}$: + +$$\begin{aligned} + \begin{bmatrix} + M_{0,0} & M_{0,1} & \cdots & M_{0,N-1} \\ + M_{1,0} & \ddots & & \vdots \\ + \vdots & & \ddots & \vdots \\ + M_{N-1,0} & \cdots & \cdots & M_{N-1,N-1} + \end{bmatrix} + \cdot + \begin{bmatrix} + a_0 \\ a_1 \\ \vdots \\ a_{N-1} + \end{bmatrix} + = + \begin{bmatrix} + 0 \\ 0 \\ \vdots \\ 0 + \end{bmatrix} +\end{aligned}$$ + +Note that this looks like an eigenvalue problem for $\lambda$. +Indeed, demanding that $\overline{M}$ cannot simply be inverted +(i.e. the solution is non-trivial) +yields a characteristic polynomial for $\lambda$: + +$$\begin{aligned} + 0 + = \det\!\Big[ \overline{M} \Big] + = \det\!\Big[ \overline{H} - \lambda \overline{S} \Big] +\end{aligned}$$ + +This gives a set of $\lambda$, +which are the exact eigenvalues of $\overline{H}$, +and the estimated eigenvalues of $\hat{H}$ +(recall that $\overline{H}$ is $\hat{H}$ expressed in a truncated basis). +The eigenvector $\big[ a_0, a_1, ..., a_{N-1} \big]$ of the lowest $\lambda$ +gives the optimal weights to approximate $\Ket{u_0}$ in the basis $\{\Ket{f_n}\}$. +Likewise, the higher $\lambda$'s eigenvectors approximate +excited (i.e. non-ground) eigenstates of $\hat{H}$, +although in practice the results are less accurate the higher we go. + +The overall accuracy is determined by how good our truncated basis is, +i.e. how large a subspace it spans +of the [Hilbert space](/know/concept/hilbert-space/) in which the true $\Ket{u_0}$ resides. +Clearly, adding more basis vectors will improve the results, +at the cost of computation. +For example, if $\hat{H}$ represents a helium atom, +a good choice for $\{\Ket{f_n}\}$ would be hydrogen orbitals, +since those are qualitatively similar. + +You may find this result unsurprising; +it makes some intuitive sense that approximating $\hat{H}$ +in a limited basis would yield a matrix $\overline{H}$ giving rough eigenvalues. +The point of this discussion is to rigorously show +the validity of this approach. + +If we only care about the ground state, +then we already know $\lambda$ from $R[u]$, +so all we need to do is solve the above matrix equation for $a_n$. +Keep in mind that $\overline{M}$ is singular, +and $a_n$ are only defined up to a constant factor. + +Nowadays, there exist many other methods to calculate eigenvalues +of complicated operators $\hat{H}$, +but an attractive feature of the Ritz method is that it is single-step, +whereas its competitors tend to be iterative. +That said, the Ritz method cannot recover from a poorly chosen basis. + + + +## References +1. G.B. Arfken, H.J. Weber, + *Mathematical methods for physicists*, 6th edition, 2005, + Elsevier. +2. O. Bang, + *Applied mathematics for physicists: lecture notes*, 2019, + unpublished. -- cgit v1.2.3