Categories: Mathematics, Numerical methods, Perturbation, Physics, Quantum mechanics.

# Ritz method

In various branches of physics, the Ritz method is a technique to approximately find the lowest solutions to an eigenvalue problem. Some call it the Rayleigh-Ritz method, the Ritz-Galerkin method, or simply the variational method.

## Background

In the context of variational calculus, consider the following functional to be optimized:

\begin{aligned} R[u] = \frac{1}{S} \int_a^b p(x) \big|u_x(x)\big|^2 - q(x) \big|u(x)\big|^2 \dd{x} \end{aligned}

Where $$u(x) \in \mathbb{C}$$ is the unknown function, and $$p(x), q(x) \in \mathbb{R}$$ are given. In addition, $$S$$ is the norm of $$u$$, which we demand be constant with respect to a weight function $$w(x) \in \mathbb{R}$$:

\begin{aligned} S = \int_a^b w(x) \big|u(x)\big|^2 \dd{x} \end{aligned}

To handle this normalization requirement, we introduce a Lagrange multiplier $$\lambda$$, and define the Lagrangian $$\Lambda$$ for the full constrained optimization problem as:

\begin{aligned} \Lambda \equiv \frac{1}{S} \bigg( \big( p |u_x|^2 - q |u|^2 \big) - \lambda \big( w |u|^2 \big) \bigg) \end{aligned}

The resulting Euler-Lagrange equation is then calculated in the standard way, yielding:

\begin{aligned} 0 &= \pdv{\Lambda}{u^*} - \dv{x} \Big( \pdv{\Lambda}{u_x^*} \Big) \\ &= - \frac{1}{S} \bigg( q u + \lambda w u + \dv{x} \big( p u_x \big) \bigg) \end{aligned}

Which is clearly satisfied if and only if the following equation is fulfilled:

\begin{aligned} \dv{x} \big( p u_x \big) + q u = - \lambda w u \end{aligned}

This has the familiar form of a Sturm-Liouville problem (SLP), with $$\lambda$$ representing an eigenvalue. SLPs have useful properties, but before we can take advantage of those, we need to handle an important detail: the boundary conditions (BCs) on $$u$$. The above equation is only a valid SLP for certain BCs, as seen in the derivation of Sturm-Liouville theory.

Let us return to the definition of $$R[u]$$, and integrate it by parts:

\begin{aligned} R[u] &= \frac{1}{S} \int_a^b p u_x u_x^* - q u u^* \dd{x} \\ &= \frac{1}{S} \Big[ p u_x u^* \Big]_a^b - \frac{1}{S} \int_a^b \dv{x} \Big(p u_x\Big) u^* + q u u^* \dd{x} \end{aligned}

The boundary term vanishes for a subset of the BCs that make a valid SLP, including Dirichlet BCs $$u(a) = u(b) = 0$$, Neumann BCs $$u_x(a) = u_x(b) = 0$$, and periodic BCs. Therefore, we assume that this term does indeed vanish, such that we can use Sturm-Liouville theory later:

\begin{aligned} R[u] &= - \frac{1}{S} \int_a^b \bigg( \dv{x} \Big(p u_x\Big) + q u \bigg) u^* \dd{x} \equiv - \frac{1}{S} \int_a^b u^* \hat{H} u \dd{x} \end{aligned}

Where $$\hat{H}$$ is the self-adjoint Sturm-Liouville operator. Because the constrained Euler-Lagrange equation is now an SLP, we know that it has an infinite number of real discrete eigenvalues $$\lambda_n$$ with a lower bound, corresponding to mutually orthogonal eigenfunctions $$u_n(x)$$.

To understand the significance of this result, suppose we have solved the SLP, and now insert one of the eigenfunctions $$u_n$$ into $$R$$:

\begin{aligned} R[u_n] &= - \frac{1}{S_n} \int_a^b u_n^* \hat{H} u_n \dd{x} = \frac{1}{S_n} \int_a^b u_n^* \lambda_n w u_n \dd{x} \\ &= \frac{1}{S_n} \lambda_n \int_a^b w |u_n|^2 \dd{x} = \frac{S_n}{S_n} \lambda_n \end{aligned}

Where $$S_n$$ is the normalization of $$u_n$$. In other words, when given $$u_n$$, the functional $$R$$ yields the corresponding eigenvalue $$\lambda_n$$:

\begin{aligned} \boxed{ R[u_n] = \lambda_n } \end{aligned}

This powerful result was not at all clear from $$R$$’s initial definition.

## Justification

But what if we do not know the eigenfunctions? Is $$R$$ still useful? Yes, as we shall see. Suppose we make an educated guess $$u(x)$$ for the ground state (i.e. lowest-eigenvalue) solution $$u_0(x)$$:

\begin{aligned} u(x) = u_0(x) + \sum_{n = 1}^\infty c_n u_n(x) \end{aligned}

Here, we are using the fact that the eigenfunctions of an SLP form a complete set, so our (known) guess $$u$$ can be expanded in the true (unknown) eigenfunctions $$u_n$$. We are assuming that $$u$$ is already quite close to its target $$u_0$$, such that the (unknown) expansion coefficients $$c_n$$ are small; specifically $$|c_n|^2 \ll 1$$. Let us start from what we know:

\begin{aligned} \boxed{ R[u] = - \frac{\displaystyle\int u^* \hat{H} u \dd{x}}{\displaystyle\int u^* w u \dd{x}} } \end{aligned}

This quantity is known as the Rayleigh quotient. Inserting our ansatz $$u$$, and using that the true $$u_n$$ have corresponding eigenvalues $$\lambda_n$$:

\begin{aligned} R[u] &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \: \hat{H} \Big\{ u_0 + \sum_n c_n u_n \Big\} \dd{x}} {\displaystyle\int w \Big( u_0 + \sum_n c_n u_n \Big) \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \dd{x}} \\ &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( \!-\! \lambda_0 w u_0 - \sum_n c_n \lambda_n w u_n \Big) \dd{x}} {\displaystyle\int w \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( u_0 + \sum_n c_n u_n \Big) \dd{x}} \end{aligned}

For convenience, we switch to Dirac notation before evaluating further.

\begin{aligned} R &= \frac\Big( \bra{u_0} + \sum_n c_n^* \bra{u_n} \Big) \cdot \Big( \lambda_0 \ket{w u_0} + \sum_n c_n \lambda_n \ket{w u_n} \Big)} \Big( \bra{u_0} + \sum_n c_n^* \bra{u_n} \Big) \cdot \Big( \ket{w u_0} + \sum_n c_n \ket{w u_n} \Big)} \\ &= \frac\lambda_0 \braket{u_0}{w u_0} + \lambda_0 \sum_{n = 1}^\infty c_n^* \braket{u_n}{w u_0} + \sum_{n = 1}^\infty c_n \lambda_n \braket{u_0}{w u_n} + \sum_{m n} c_n c_m^* \lambda_n \braket{u_m}{w u_n}} \braket{u_0}{w u_0} + \sum_{n = 1}^\infty c_n^* \braket{u_n}{w u_0} + \sum_{n = 1}^\infty c_n \braket{u_0}{w u_n} + \sum_{m n} c_n c_m^* \braket{u_m}{w u_n}} \end{aligned

Using orthogonality $$\braket{u_m}{w u_n} = S_n \delta_{mn}$$, and the fact that $$n \neq 0$$ by definition, we find:

\begin{aligned} R &= \frac\lambda_0 S_0 + \lambda_0 \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n \lambda_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* \lambda_n S_n \delta_{mn}} S_0 + \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* S_n \delta_{mn}} \\ &= \frac\lambda_0 S_0 + 0 + 0 + \sum_{n} c_n c_n^* \lambda_n S_n} S_0 + 0 + 0 + \sum_{n} c_n c_n^* S_n} = \frac\lambda_0 S_0 + \sum_{n} |c_n|^2 \lambda_n S_n} S_0 + \sum_{n} |c_n|^2 S_n} \end{aligned

It is always possible to choose our normalizations such that $$S_n = S$$ for all $$u_n$$, leaving:

\begin{aligned} R &= \frac\lambda_0 S + \sum_{n} |c_n|^2 \lambda_n S} S + \sum_{n} |c_n|^2 S} = \frac\lambda_0 + \sum_{n} |c_n|^2 \lambda_n} 1 + \sum_{n} |c_n|^2} \end{aligned

And finally, after rearranging the numerator, we arrive at the following relation:

\begin{aligned} R &= \frac\lambda_0 + \sum_{n} |c_n|^2 \lambda_0 + \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)} 1 + \sum_{n} |c_n|^2} = \lambda_0 + \frac\sum_{n} |c_n|^2 (\lambda_n - \lambda_0)} 1 + \sum_{n} |c_n|^2} \end{aligned

Thus, if we improve our guess $$u$$, then $$R[u]$$ approaches the true eigenvalue $$\lambda_0$$. For numerically finding $$u_0$$ and $$\lambda_0$$, this gives us a clear goal: minimize $$R$$, because:

\begin{aligned} \boxed{ R[u] = \lambda_0 + \frac\sum_{n = 1}^\infty |c_n|^2 (\lambda_n - \lambda_0)} 1 + \sum_{n = 1}^\infty |c_n|^2} \ge \lambda_0 } \end{aligned

Note that the convergence to $$\lambda_0$$ goes as $$|c_n|^2$$, while $$u$$ converges to $$u_0$$ as $$|c_n|$$ by definition, so even a fairly bad guess $$u$$ will give a decent estimate for $$\lambda_0$$.

## The method

In the following, we stick to Dirac notation, since the results hold for both continuous functions $$u(x)$$ and discrete vectors $$\vb{u}$$, as long as the operator $$\hat{H}$$ is self-adjoint. Suppose we express our guess $$\ket{u}$$ as a linear combination of known basis vectors $$\ket{f_n}$$ with weights $$a_n \in \mathbb{C}$$:

\begin{aligned} \ket{u} = \sum_{n = 0}^\infty c_n \ket{u_n} = \sum_{n = 0}^\infty a_n \ket{f_n} \approx \sum_{n = 0}^{N - 1} a_n \ket{f_n} \end{aligned}

For numerical tractability, we truncate the sum at $$N$$ terms, and for generality, we allow $$\ket{f_n}$$ to be non-orthogonal, as described by an overlap matrix with elements $$S_{mn}$$:

\begin{aligned} \braket{f_m}{w f_n} = S_{m n} \end{aligned}

From the discussion above, we know that the ground-state eigenvalue $$\lambda_0$$ is estimated by:

\begin{aligned} \lambda_0 \approx \lambda = R[u] = \frac{\braket*{u}{\hat{H} u}}{\braket{u}{w u}} = \frac\sum_{m n} a_m^* a_n \braket*{f_m}{\hat{H} f_n}}\sum_{m n} a_m^* a_n \braket{f_m}{w f_n}} \equiv \frac\sum_{m n} a_m^* a_n H_{m n}}\sum_{m n} a_m^* a_n S_{mn}} \end{aligned

And we also know that our goal is to minimize $$R[u]$$, so we vary $$a_k^*$$ to find its extremum:

\begin{aligned} 0 = \pdv{R}{a_k^*} &= \frac\Big( \sum_{n} a_n H_{k n} \Big) \Big( \sum_{m n} a_n a_m^* S_{mn} \Big) - \Big( \sum_{n} a_n S_{k n} \Big) \Big( \sum_{m n} a_n a_m^* H_{mn} \Big)} {\Big( \displaystyle \sum_{m n} a_n a_m^* S_{mn} \Big)^2} \\ &= \frac\Big( \sum_{n} a_n H_{k n} \Big) - R[u] \Big( \sum_{n} a_n S_{k n}\Big)}{\braket{u}{w u}} = \frac\sum_{n} a_n \big(H_{k n} - \lambda S_{k n}\big)}{\braket{u}{w u}} \end{aligned

Clearly, this is only satisfied if the following holds for all $$k = 0, 1, ..., N\!-\!1$$:

\begin{aligned} 0 = \sum_{n = 0}^{N - 1} a_n \big(H_{k n} - \lambda S_{k n}\big) \end{aligned}

For illustrative purposes, we can write this as a matrix equation with $$M_{k n} \equiv H_{k n} - \lambda S_{k n}$$:

\begin{aligned} \begin{bmatrix} M_{0,0} & M_{0,1} & \cdots & M_{0,N-1} \\ M_{1,0} & \ddots & & \vdots \\ \vdots & & \ddots & \vdots \\ M_{N-1,0} & \cdots & \cdots & M_{N-1,N-1} \end{bmatrix} \cdot \begin{bmatrix} a_0 \\ a_1 \\ \vdots \\ a_{N-1} \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} \end{aligned}

Note that this looks like an eigenvalue problem for $$\lambda$$. Indeed, demanding that $$\overline{M}$$ cannot simply be inverted (i.e. the solution is non-trivial) yields a characteristic polynomial for $$\lambda$$:

\begin{aligned} 0 = \det\!\Big[ \overline{M} \Big] = \det\!\Big[ \overline{H} - \lambda \overline{S} \Big] \end{aligned}

This gives a set of $$\lambda$$, which are the exact eigenvalues of $$\overline{H}$$, and the estimated eigenvalues of $$\hat{H}$$ (recall that $$\overline{H}$$ is $$\hat{H}$$ expressed in a truncated basis). The eigenvector $$\big[ a_0, a_1, ..., a_{N-1} \big]$$ of the lowest $$\lambda$$ gives the optimal weights to approximate $$\ket{u_0}$$ in the basis $$\{\ket{f_n}\}$$. Likewise, the higher $$\lambda$$’s eigenvectors approximate excited (i.e. non-ground) eigenstates of $$\hat{H}$$, although in practice the results are less accurate the higher we go.

The overall accuracy is determined by how good our truncated basis is, i.e. how large a subspace it spans of the Hilbert space in which the true $$\ket{u_0}$$ resides. Clearly, adding more basis vectors will improve the results, at the cost of computation. For example, if $$\hat{H}$$ represents a helium atom, a good choice for $$\{\ket{f_n}\}$$ would be hydrogen orbitals, since those are qualitatively similar.

You may find this result unsurprising; it makes some intuitive sense that approximating $$\hat{H}$$ in a limited basis would yield a matrix $$\overline{H}$$ giving rough eigenvalues. The point of this discussion is to rigorously show the validity of this approach.

Nowadays, there exist many other methods to calculate eigenvalues of complicated operators $$\hat{H}$$, but an attractive feature of the Ritz method is that it is single-step, whereas its competitors tend to be iterative.

1. G.B. Arfken, H.J. Weber, Mathematical methods for physicists, 6th edition, 2005, Elsevier.
2. O. Bang, Applied mathematics for physicists: lecture notes, 2019, unpublished.

© Marcus R.A. Newman, CC BY-SA 4.0.
Privacy-friendly statistics by GoatCounter.