Categories: Mathematics, Numerical methods, Perturbation, Physics, Quantum mechanics.

Ritz method

In various branches of physics, the Ritz method is a technique to approximately find the lowest solutions to an eigenvalue problem. Some call it the Rayleigh-Ritz method, the Ritz-Galerkin method, or simply the variational method.

Background

In the context of variational calculus, consider the following functional to be optimized:

\[\begin{aligned} R[u] = \frac{1}{S} \int_a^b p(x) \big|u_x(x)\big|^2 - q(x) \big|u(x)\big|^2 \dd{x} \end{aligned}\]

Where \(u(x) \in \mathbb{C}\) is the unknown function, and \(p(x), q(x) \in \mathbb{R}\) are given. In addition, \(S\) is the norm of \(u\), which we demand be constant with respect to a weight function \(w(x) \in \mathbb{R}\):

\[\begin{aligned} S = \int_a^b w(x) \big|u(x)\big|^2 \dd{x} \end{aligned}\]

To handle this normalization requirement, we introduce a Lagrange multiplier \(\lambda\), and define the Lagrangian \(\Lambda\) for the full constrained optimization problem as:

\[\begin{aligned} \Lambda \equiv \frac{1}{S} \bigg( \big( p |u_x|^2 - q |u|^2 \big) - \lambda \big( w |u|^2 \big) \bigg) \end{aligned}\]

The resulting Euler-Lagrange equation is then calculated in the standard way, yielding:

\[\begin{aligned} 0 &= \pdv{\Lambda}{u^*} - \dv{x} \Big( \pdv{\Lambda}{u_x^*} \Big) \\ &= - \frac{1}{S} \bigg( q u + \lambda w u + \dv{x} \big( p u_x \big) \bigg) \end{aligned}\]

Which is clearly satisfied if and only if the following equation is fulfilled:

\[\begin{aligned} \dv{x} \big( p u_x \big) + q u = - \lambda w u \end{aligned}\]

This has the familiar form of a Sturm-Liouville problem (SLP), with \(\lambda\) representing an eigenvalue. SLPs have useful properties, but before we can take advantage of those, we need to handle an important detail: the boundary conditions (BCs) on \(u\). The above equation is only a valid SLP for certain BCs, as seen in the derivation of Sturm-Liouville theory.

Let us return to the definition of \(R[u]\), and integrate it by parts:

\[\begin{aligned} R[u] &= \frac{1}{S} \int_a^b p u_x u_x^* - q u u^* \dd{x} \\ &= \frac{1}{S} \Big[ p u_x u^* \Big]_a^b - \frac{1}{S} \int_a^b \dv{x} \Big(p u_x\Big) u^* + q u u^* \dd{x} \end{aligned}\]

The boundary term vanishes for a subset of the BCs that make a valid SLP, including Dirichlet BCs \(u(a) = u(b) = 0\), Neumann BCs \(u_x(a) = u_x(b) = 0\), and periodic BCs. Therefore, we assume that this term does indeed vanish, such that we can use Sturm-Liouville theory later:

\[\begin{aligned} R[u] &= - \frac{1}{S} \int_a^b \bigg( \dv{x} \Big(p u_x\Big) + q u \bigg) u^* \dd{x} \equiv - \frac{1}{S} \int_a^b u^* \hat{H} u \dd{x} \end{aligned}\]

Where \(\hat{H}\) is the self-adjoint Sturm-Liouville operator. Because the constrained Euler-Lagrange equation is now an SLP, we know that it has an infinite number of real discrete eigenvalues \(\lambda_n\) with a lower bound, corresponding to mutually orthogonal eigenfunctions \(u_n(x)\).

To understand the significance of this result, suppose we have solved the SLP, and now insert one of the eigenfunctions \(u_n\) into \(R\):

\[\begin{aligned} R[u_n] &= - \frac{1}{S_n} \int_a^b u_n^* \hat{H} u_n \dd{x} = \frac{1}{S_n} \int_a^b u_n^* \lambda_n w u_n \dd{x} \\ &= \frac{1}{S_n} \lambda_n \int_a^b w |u_n|^2 \dd{x} = \frac{S_n}{S_n} \lambda_n \end{aligned}\]

Where \(S_n\) is the normalization of \(u_n\). In other words, when given \(u_n\), the functional \(R\) yields the corresponding eigenvalue \(\lambda_n\):

\[\begin{aligned} \boxed{ R[u_n] = \lambda_n } \end{aligned}\]

This powerful result was not at all clear from \(R\)’s initial definition.

Justification

But what if we do not know the eigenfunctions? Is \(R\) still useful? Yes, as we shall see. Suppose we make an educated guess \(u(x)\) for the ground state (i.e. lowest-eigenvalue) solution \(u_0(x)\):

\[\begin{aligned} u(x) = u_0(x) + \sum_{n = 1}^\infty c_n u_n(x) \end{aligned}\]

Here, we are using the fact that the eigenfunctions of an SLP form a complete set, so our (known) guess \(u\) can be expanded in the true (unknown) eigenfunctions \(u_n\). We are assuming that \(u\) is already quite close to its target \(u_0\), such that the (unknown) expansion coefficients \(c_n\) are small; specifically \(|c_n|^2 \ll 1\). Let us start from what we know:

\[\begin{aligned} \boxed{ R[u] = - \frac{\displaystyle\int u^* \hat{H} u \dd{x}}{\displaystyle\int u^* w u \dd{x}} } \end{aligned}\]

This quantity is known as the Rayleigh quotient. Inserting our ansatz \(u\), and using that the true \(u_n\) have corresponding eigenvalues \(\lambda_n\):

\[\begin{aligned} R[u] &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \: \hat{H} \Big\{ u_0 + \sum_n c_n u_n \Big\} \dd{x}} {\displaystyle\int w \Big( u_0 + \sum_n c_n u_n \Big) \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \dd{x}} \\ &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( \!-\! \lambda_0 w u_0 - \sum_n c_n \lambda_n w u_n \Big) \dd{x}} {\displaystyle\int w \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( u_0 + \sum_n c_n u_n \Big) \dd{x}} \end{aligned}\]

For convenience, we switch to Dirac notation before evaluating further.

\[\begin{aligned} R &= \frac{\displaystyle \Big( \bra{u_0} + \sum_n c_n^* \bra{u_n} \Big) \cdot \Big( \lambda_0 \ket{w u_0} + \sum_n c_n \lambda_n \ket{w u_n} \Big)} {\displaystyle \Big( \bra{u_0} + \sum_n c_n^* \bra{u_n} \Big) \cdot \Big( \ket{w u_0} + \sum_n c_n \ket{w u_n} \Big)} \\ &= \frac{\displaystyle \lambda_0 \braket{u_0}{w u_0} + \lambda_0 \sum_{n = 1}^\infty c_n^* \braket{u_n}{w u_0} + \sum_{n = 1}^\infty c_n \lambda_n \braket{u_0}{w u_n} + \sum_{m n} c_n c_m^* \lambda_n \braket{u_m}{w u_n}} {\displaystyle \braket{u_0}{w u_0} + \sum_{n = 1}^\infty c_n^* \braket{u_n}{w u_0} + \sum_{n = 1}^\infty c_n \braket{u_0}{w u_n} + \sum_{m n} c_n c_m^* \braket{u_m}{w u_n}} \end{aligned}\]

Using orthogonality \(\braket{u_m}{w u_n} = S_n \delta_{mn}\), and the fact that \(n \neq 0\) by definition, we find:

\[\begin{aligned} R &= \frac{\displaystyle \lambda_0 S_0 + \lambda_0 \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n \lambda_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* \lambda_n S_n \delta_{mn}} {\displaystyle S_0 + \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* S_n \delta_{mn}} \\ &= \frac{\displaystyle \lambda_0 S_0 + 0 + 0 + \sum_{n} c_n c_n^* \lambda_n S_n} {\displaystyle S_0 + 0 + 0 + \sum_{n} c_n c_n^* S_n} = \frac{\displaystyle \lambda_0 S_0 + \sum_{n} |c_n|^2 \lambda_n S_n} {\displaystyle S_0 + \sum_{n} |c_n|^2 S_n} \end{aligned}\]

It is always possible to choose our normalizations such that \(S_n = S\) for all \(u_n\), leaving:

\[\begin{aligned} R &= \frac{\displaystyle \lambda_0 S + \sum_{n} |c_n|^2 \lambda_n S} {\displaystyle S + \sum_{n} |c_n|^2 S} = \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_n} {\displaystyle 1 + \sum_{n} |c_n|^2} \end{aligned}\]

And finally, after rearranging the numerator, we arrive at the following relation:

\[\begin{aligned} R &= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_0 + \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)} {\displaystyle 1 + \sum_{n} |c_n|^2} = \lambda_0 + \frac{\displaystyle \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)} {\displaystyle 1 + \sum_{n} |c_n|^2} \end{aligned}\]

Thus, if we improve our guess \(u\), then \(R[u]\) approaches the true eigenvalue \(\lambda_0\). For numerically finding \(u_0\) and \(\lambda_0\), this gives us a clear goal: minimize \(R\), because:

\[\begin{aligned} \boxed{ R[u] = \lambda_0 + \frac{\displaystyle \sum_{n = 1}^\infty |c_n|^2 (\lambda_n - \lambda_0)} {\displaystyle 1 + \sum_{n = 1}^\infty |c_n|^2} \ge \lambda_0 } \end{aligned}\]

Note that the convergence to \(\lambda_0\) goes as \(|c_n|^2\), while \(u\) converges to \(u_0\) as \(|c_n|\) by definition, so even a fairly bad guess \(u\) will give a decent estimate for \(\lambda_0\).

The method

In the following, we stick to Dirac notation, since the results hold for both continuous functions \(u(x)\) and discrete vectors \(\vb{u}\), as long as the operator \(\hat{H}\) is self-adjoint. Suppose we express our guess \(\ket{u}\) as a linear combination of known basis vectors \(\ket{f_n}\) with weights \(a_n \in \mathbb{C}\):

\[\begin{aligned} \ket{u} = \sum_{n = 0}^\infty c_n \ket{u_n} = \sum_{n = 0}^\infty a_n \ket{f_n} \approx \sum_{n = 0}^{N - 1} a_n \ket{f_n} \end{aligned}\]

For numerical tractability, we truncate the sum at \(N\) terms, and for generality, we allow \(\ket{f_n}\) to be non-orthogonal, as described by an overlap matrix with elements \(S_{mn}\):

\[\begin{aligned} \braket{f_m}{w f_n} = S_{m n} \end{aligned}\]

From the discussion above, we know that the ground-state eigenvalue \(\lambda_0\) is estimated by:

\[\begin{aligned} \lambda_0 \approx \lambda = R[u] = \frac{\braket*{u}{\hat{H} u}}{\braket{u}{w u}} = \frac{\displaystyle \sum_{m n} a_m^* a_n \braket*{f_m}{\hat{H} f_n}}{\displaystyle \sum_{m n} a_m^* a_n \braket{f_m}{w f_n}} \equiv \frac{\displaystyle \sum_{m n} a_m^* a_n H_{m n}}{\displaystyle \sum_{m n} a_m^* a_n S_{mn}} \end{aligned}\]

And we also know that our goal is to minimize \(R[u]\), so we vary \(a_k^*\) to find its extremum:

\[\begin{aligned} 0 = \pdv{R}{a_k^*} &= \frac{\displaystyle \Big( \sum_{n} a_n H_{k n} \Big) \Big( \sum_{m n} a_n a_m^* S_{mn} \Big) - \Big( \sum_{n} a_n S_{k n} \Big) \Big( \sum_{m n} a_n a_m^* H_{mn} \Big)} {\Big( \displaystyle \sum_{m n} a_n a_m^* S_{mn} \Big)^2} \\ &= \frac{\displaystyle \Big( \sum_{n} a_n H_{k n} \Big) - R[u] \Big( \sum_{n} a_n S_{k n}\Big)}{\braket{u}{w u}} = \frac{\displaystyle \sum_{n} a_n \big(H_{k n} - \lambda S_{k n}\big)}{\braket{u}{w u}} \end{aligned}\]

Clearly, this is only satisfied if the following holds for all \(k = 0, 1, ..., N\!-\!1\):

\[\begin{aligned} 0 = \sum_{n = 0}^{N - 1} a_n \big(H_{k n} - \lambda S_{k n}\big) \end{aligned}\]

For illustrative purposes, we can write this as a matrix equation with \(M_{k n} \equiv H_{k n} - \lambda S_{k n}\):

\[\begin{aligned} \begin{bmatrix} M_{0,0} & M_{0,1} & \cdots & M_{0,N-1} \\ M_{1,0} & \ddots & & \vdots \\ \vdots & & \ddots & \vdots \\ M_{N-1,0} & \cdots & \cdots & M_{N-1,N-1} \end{bmatrix} \cdot \begin{bmatrix} a_0 \\ a_1 \\ \vdots \\ a_{N-1} \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} \end{aligned}\]

Note that this looks like an eigenvalue problem for \(\lambda\). Indeed, demanding that \(\overline{M}\) cannot simply be inverted (i.e. the solution is non-trivial) yields a characteristic polynomial for \(\lambda\):

\[\begin{aligned} 0 = \det\!\Big[ \overline{M} \Big] = \det\!\Big[ \overline{H} - \lambda \overline{S} \Big] \end{aligned}\]

This gives a set of \(\lambda\), which are the exact eigenvalues of \(\overline{H}\), and the estimated eigenvalues of \(\hat{H}\) (recall that \(\overline{H}\) is \(\hat{H}\) expressed in a truncated basis). The eigenvector \(\big[ a_0, a_1, ..., a_{N-1} \big]\) of the lowest \(\lambda\) gives the optimal weights to approximate \(\ket{u_0}\) in the basis \(\{\ket{f_n}\}\). Likewise, the higher \(\lambda\)’s eigenvectors approximate excited (i.e. non-ground) eigenstates of \(\hat{H}\), although in practice the results are less accurate the higher we go.

The overall accuracy is determined by how good our truncated basis is, i.e. how large a subspace it spans of the Hilbert space in which the true \(\ket{u_0}\) resides. Clearly, adding more basis vectors will improve the results, at the cost of computation. For example, if \(\hat{H}\) represents a helium atom, a good choice for \(\{\ket{f_n}\}\) would be hydrogen orbitals, since those are qualitatively similar.

You may find this result unsurprising; it makes some intuitive sense that approximating \(\hat{H}\) in a limited basis would yield a matrix \(\overline{H}\) giving rough eigenvalues. The point of this discussion is to rigorously show the validity of this approach.

Nowadays, there exist many other methods to calculate eigenvalues of complicated operators \(\hat{H}\), but an attractive feature of the Ritz method is that it is single-step, whereas its competitors tend to be iterative.

References

  1. G.B. Arfken, H.J. Weber, Mathematical methods for physicists, 6th edition, 2005, Elsevier.
  2. O. Bang, Applied mathematics for physicists: lecture notes, 2019, unpublished.

© Marcus R.A. Newman, CC BY-SA 4.0.
Privacy-friendly statistics by GoatCounter.