Categories: Mathematics, Numerical methods, Perturbation, Physics, Quantum mechanics.

Ritz method

In various branches of physics, the Ritz method is a technique to approximately find the lowest solutions to an eigenvalue problem. Some call it the Rayleigh-Ritz method, the Ritz-Galerkin method, or simply the variational method.

Background

In the context of variational calculus, consider the following functional to be optimized:

\begin{aligned} R[u] \equiv \frac{1}{S} \int_a^b p(x) \big|u_x(x)\big|^2 - q(x) \big|u(x)\big|^2 \dd{x} \end{aligned}

Where $u(x) \in \mathbb{C}$ is the unknown function, and $p(x), q(x) \in \mathbb{R}$ are given. In addition, $S$ is the norm of $u$ , which we take to be constant with respect to a weight function $w(x) \in \mathbb{R}$ :

\begin{aligned} S \equiv \int_a^b w(x) \big|u(x)\big|^2 \dd{x} \end{aligned}

This normalization requirement acts as a constraint to the optimization problem for $R[u]$ , so we introduce a Lagrange multiplier $\lambda$ , and define the Lagrangian $\mathcal{L}$ for the full problem as:

\begin{aligned} \mathcal{L} \equiv \frac{1}{S} \bigg( \big( p |u_x|^2 - q |u|^2 \big) - \lambda \big( w |u|^2 \big) \bigg) \end{aligned}

The resulting Euler-Lagrange equation is then calculated in the standard way, yielding:

\begin{aligned} 0 &= \pdv{\mathcal{L}}{u^*} - \dv{}{x}\Big( \pdv{\mathcal{L}}{u_x^*} \Big) \\ &= - \frac{1}{S} \bigg( q u + \lambda w u + \dv{}{x}\big( p u_x \big) \bigg) \end{aligned}

Which is clearly satisfied if and only if the following equation is fulfilled:

\begin{aligned} \dv{}{x}\big( p u_x \big) + q u = - \lambda w u \end{aligned}

This has the familiar form of a Sturm-Liouville problem (SLP), with $\lambda$ representing an eigenvalue. SLPs have useful properties, but before we can take advantage of those, we need to handle an important detail: the boundary conditions (BCs) on $u$ . The above equation is only a valid SLP for certain BCs, as seen in the derivation of Sturm-Liouville theory. Let us return to the definition of $R$ , and integrate it by parts:

\begin{aligned} R[u] &= \frac{1}{S} \int_a^b p u_x u_x^* - q u u^* \dd{x} \\ &= \frac{1}{S} \Big[ p u_x u^* \Big]_a^b - \frac{1}{N} \int_a^b \dv{}{x}\Big(p u_x\Big) u^* + q u u^* \dd{x} \end{aligned}

The boundary term vanishes for a subset of the BCs that make a valid SLP, including Dirichlet BCs $u(a) = u(b) = 0$ , Neumann BCs $u_x(a) = u_x(b) = 0$ , and periodic BCs. Therefore, we assume that this term does indeed vanish, such that we can use Sturm-Liouville theory later:

\begin{aligned} R[u] &= - \frac{1}{S} \int_a^b \bigg( \dv{}{x}\Big(p u_x\Big) + q u \bigg) u^* \dd{x} \\ &\equiv - \frac{1}{S} \int_a^b u^* \hat{L} u \dd{x} \end{aligned}

Where $\hat{L}$ is the self-adjoint Sturm-Liouville operator. Because the constrained Euler-Lagrange equation is now an SLP, we know that it has an infinite number of real discrete eigenvalues $\lambda_n$ with a lower bound, corresponding to mutually orthogonal eigenfunctions $u_n(x)$ .

To understand the significance of this result, suppose we have solved the SLP, and now insert one of the eigenfunctions $u_n$ into $R$ :

\begin{aligned} R[u_n] &= - \frac{1}{S_n} \int_a^b u_n^* \hat{L} u_n \dd{x} \\ &= \frac{1}{S_n} \int_a^b \lambda_n w |u_n|^2 \dd{x} \\ &= \frac{S_n}{S_n} \lambda_n \end{aligned}

Where $S_n$ is the normalization of $u_n$ . In other words, when given $u_n$ as input, the functional $R$ returns the corresponding eigenvalue $\lambda_n$ :

\begin{aligned} \boxed{ R[u_n] = \lambda_n } \end{aligned}

This powerful result was not at all clear from $R$ ’s initial definition. Note that some authors use the opposite sign for $\lambda$ in their SLP definition, in which case this result can still be obtained simply by also defining $R$ with the opposite sign. This sign choice is consistent with quantum mechanics, with the Hamiltonian $\hat{H} = - \hat{L}$ .

Justification

But what if we do not know the eigenfunctions? Is $R$ still useful? Yes, as we shall see. Suppose we make an educated guess $u(x)$ for the ground state (i.e. lowest-eigenvalue) solution $u_0(x)$ :

\begin{aligned} u(x) = u_0(x) + \sum_{n = 1}^\infty c_n u_n(x) \end{aligned}

Here, we are using the fact that the eigenfunctions of an SLP form a complete set, so our (known) guess $u$ can be expanded in the true (unknown) eigenfunctions $u_n$ . Next, by definition:

\begin{aligned} \boxed{ R[u] = - \frac{\displaystyle\int u^* \hat{L} u \dd{x}}{\displaystyle\int u^* w u \dd{x}} } \end{aligned}

This quantity is known as the Rayleigh quotient, and again beware of the sign in its definition; see the remark above. Inserting our ansatz $u$ , and using that the true $u_n$ have corresponding eigenvalues $\lambda_n$ , we have:

\begin{aligned} R[u] &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \: \hat{L} \Big\{ u_0 + \sum_n c_n u_n \Big\} \dd{x}} {\displaystyle\int w \Big( u_0 + \sum_n c_n u_n \Big) \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \dd{x}} \\ &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( \!-\! \lambda_0 w u_0 - \sum_n c_n \lambda_n w u_n \Big) \dd{x}} {\displaystyle\int w \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( u_0 + \sum_n c_n u_n \Big) \dd{x}} \end{aligned}

For convenience, we switch to Dirac notation before evaluating further:

\begin{aligned} R[u] &= \frac{\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \Big( \lambda_0 \Ket{w u_0} + \sum_n c_n \lambda_n \Ket{w u_n} \Big)} {\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \Big( \Ket{w u_0} + \sum_n c_n \Ket{w u_n} \Big)} \\ &= \frac{\displaystyle \lambda_0 \inprod{u_0}{w u_0} + \lambda_0 \sum_{n} c_n^* \inprod{u_n}{w u_0} + \sum_{n} c_n \lambda_n \inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \lambda_n \inprod{u_m}{w u_n}} {\displaystyle \inprod{u_0}{w u_0} + \sum_{n} c_n^* \inprod{u_n}{w u_0} + \sum_{n} c_n \inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \inprod{u_m}{w u_n}} \end{aligned}

Using orthogonality $\inprod{u_m}{w u_n} = S_n \delta_{mn}$ , and the fact that $n \neq 0$ by definition, we find:

\begin{aligned} R[u] &= \frac{\displaystyle \lambda_0 S_0 + \lambda_0 \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n \lambda_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* \lambda_n S_n \delta_{mn}} {\displaystyle S_0 + \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* S_n \delta_{mn}} \\ &= \frac{\displaystyle \lambda_0 S_0 + \sum_{n} |c_n|^2 \lambda_n S_n} {\displaystyle S_0 + \sum_{n} |c_n|^2 S_n} \end{aligned}

It is always possible to choose our normalizations such that $S_n = S$ for all $u_n$ , leaving:

\begin{aligned} R[u] &= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_n} {\displaystyle 1 + \sum_{n} |c_n|^2} \end{aligned}

And finally, after rearranging the numerator, we arrive at the following relation:

\begin{aligned} R[u] &= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_0 + \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)} {\displaystyle 1 + \sum_{n} |c_n|^2} \\ &= \lambda_0 + \frac{\displaystyle \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)} {\displaystyle 1 + \sum_{n} |c_n|^2} \end{aligned}

Thus, if we improve our guess $u$ (i.e. reduce $|c_n|$ ), then $R[u]$ approaches the true eigenvalue $\lambda_0$ . For numerically finding $u_0$ and $\lambda_0$ , this gives us a clear goal: minimize $R$ , because:

\begin{aligned} \boxed{ R[u] = \lambda_0 + \frac{\displaystyle \sum_{n = 1}^\infty |c_n|^2 (\lambda_n - \lambda_0)} {\displaystyle 1 + \sum_{n = 1}^\infty |c_n|^2} \ge \lambda_0 } \end{aligned}

In the context of quantum mechanics, this is not surprising, since any superposition of multiple states is guaranteed to have a higher energy than the ground state.

As our guess $u$ is improved, $\lambda_0$ converges as $|c_n|^2$ , while $u$ converges to $u_0$ as $|c_n|$ by definition, so even a fairly bad ansatz $u$ gives a decent estimate for $\lambda_0$ .

The method

In the following, we stick to Dirac notation, since the results hold for both continuous functions $u(x)$ and discrete vectors $\vb{u}$ , as long as the operator $\hat{L}$ is self-adjoint. Suppose we express our guess $\Ket{u}$ as a linear combination of known basis vectors $\Ket{f_n}$ with weights $a_n \in \mathbb{C}$ , where $\Ket{f_n}$ are not necessarily eigenvectors of $\hat{L}$ :

\begin{aligned} \Ket{u} = \sum_{n = 0}^\infty c_n \Ket{u_n} = \sum_{n = 0}^\infty a_n \Ket{f_n} \approx \sum_{n = 0}^{N - 1} a_n \Ket{f_n} \end{aligned}

For numerical tractability, we truncate the sum at $N$ terms, and for generality we allow $\Ket{f_n}$ to be non-orthogonal, as described by an overlap matrix with elements $S_{mn}$ :

\begin{aligned} \inprod{f_m}{w f_n} = S_{m n} \end{aligned}

From the discussion above, we know that the ground-state eigenvalue $\lambda_0$ is estimated by:

\begin{aligned} \lambda_0 \approx R[u] = - \frac{\inprod{u}{\hat{L} u}}{\inprod{u}{w u}} = - \frac{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{\hat{L} f_n}}{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{w f_n}} \equiv - \frac{\displaystyle \sum_{m n} a_m^* a_n L_{m n}}{\displaystyle \sum_{m n} a_m^* a_n S_{mn}} \end{aligned}

And we also know that our goal is to minimize $R[u]$ , so we vary $a_k^*$ to find its extremum:

\begin{aligned} 0 = - \pdv{R}{a_k^*} &= \frac{\displaystyle \Big( \sum_{n} a_n L_{k n} \Big) \Big( \sum_{m n} a_n a_m^* S_{mn} \Big) - \Big( \sum_{n} a_n S_{k n} \Big) \Big( \sum_{m n} a_n a_m^* L_{mn} \Big)} {\Big( \displaystyle \sum_{m n} a_n a_m^* S_{mn} \Big)^2} \\ &= \frac{\displaystyle \Big( \sum_{n} a_n L_{k n} \Big) - R[u] \Big( \sum_{n} a_n S_{k n}\Big)} {\displaystyle \sum_{m n} a_n a_m^* S_{mn}} \\ &= \sum_{n} a_n \frac{\big(L_{k n} - \lambda S_{k n}\big)}{\inprod{u}{w u}} \end{aligned}

Clearly, this is only satisfied if the following holds for all $k = 0, 1, ..., N\!-\!1$ :

\begin{aligned} 0 = \sum_{n = 0}^{N - 1} a_n \big(L_{k n} - \lambda S_{k n}\big) \end{aligned}

For illustrative purposes, we can write this as a matrix equation with $M_{k n} \equiv L_{k n} - \lambda S_{k n}$ :

\begin{aligned} \begin{bmatrix} M_{0,0} & M_{0,1} & \cdots & M_{0,N-1} \\ M_{1,0} & \ddots & & \vdots \\ \vdots & & \ddots & \vdots \\ M_{N-1,0} & \cdots & \cdots & M_{N-1,N-1} \end{bmatrix} \cdot \begin{bmatrix} a_0 \\ a_1 \\ \vdots \\ a_{N-1} \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} \end{aligned}

This looks like an eigenvalue problem for $\lambda$ , so we demand that its determinant vanishes:

\begin{aligned} 0 = \det\!\Big[ \bar{M} \Big] = \det\!\Big[ \bar{L} - \lambda \bar{S} \Big] \end{aligned}

This gives a set of $\lambda$ , which are exact eigenvalues of $\bar{L}$ , and estimated eigenvalues of $\hat{L}$ (recall that $\bar{L}$ is $\hat{L}$ expressed in a truncated basis). The eigenvector $\big[ a_0, a_1, ..., a_{N-1} \big]$ of the lowest $\lambda$ gives the optimal weights $a_n$ to approximate $\Ket{u_0}$ in the basis $\{\Ket{f_n}\}$ . Likewise, the higher $\lambda$ s’ eigenvectors approximate excited (i.e. non-ground) eigenstates of $\hat{L}$ , although in practice the results become less accurate the higher we go. If we only care about the ground state, then we already know $\lambda$ from $R[u]$ , so we just need to solve the matrix equation for $a_n$ .

You may find this result unsurprising: it makes some intuitive sense that approximating $\hat{L}$ in a limited basis would yield a matrix $\bar{L}$ giving rough eigenvalues. The point of this discussion is to rigorously show the validity of this approach.

Nowadays, there exist many other methods to calculate eigenvalues of complicated operators $\hat{L}$ , but an attractive feature of the Ritz method is that it is single-step, whereas its competitors tend to be iterative. That said, this method cannot recover from a poorly chosen basis $\{\Ket{f_n}\}$ .

Indeed, the overall accuracy is determined by how good our truncated basis is, i.e. how large a subspace it spans of the Hilbert space in which the true $\Ket{u_0}$ resides. Clearly, adding more basis vectors improves the results, but at a computational cost; it is usually more efficient to carefully choose which $\ket{f_n}$ to use, rather than just how many.

References

G.B. Arfken, H.J. Weber, Mathematical methods for physicists, 6th edition, 2005, Elsevier.
O. Bang, Applied mathematics for physicists: lecture notes, 2019, unpublished.