Categories: Mathematics, Numerical methods, Perturbation, Physics, Quantum mechanics.

Ritz method

In various branches of physics, the Ritz method is a technique to approximately find the lowest solutions to an eigenvalue problem. Some call it the Rayleigh-Ritz method, the Ritz-Galerkin method, or simply the variational method.

Background

In the context of variational calculus, consider the following functional to be optimized:

R[u]1Sabp(x)ux(x)2q(x)u(x)2dx\begin{aligned} R[u] \equiv \frac{1}{S} \int_a^b p(x) \big|u_x(x)\big|^2 - q(x) \big|u(x)\big|^2 \dd{x} \end{aligned}

Where u(x)Cu(x) \in \mathbb{C} is the unknown function, and p(x),q(x)Rp(x), q(x) \in \mathbb{R} are given. In addition, SS is the norm of uu, which we take to be constant with respect to a weight function w(x)Rw(x) \in \mathbb{R}:

Sabw(x)u(x)2dx\begin{aligned} S \equiv \int_a^b w(x) \big|u(x)\big|^2 \dd{x} \end{aligned}

This normalization requirement acts as a constraint to the optimization problem for R[u]R[u], so we introduce a Lagrange multiplier λ\lambda, and define the Lagrangian L\mathcal{L} for the full problem as:

L1S((pux2qu2)λ(wu2))\begin{aligned} \mathcal{L} \equiv \frac{1}{S} \bigg( \big( p |u_x|^2 - q |u|^2 \big) - \lambda \big( w |u|^2 \big) \bigg) \end{aligned}

The resulting Euler-Lagrange equation is then calculated in the standard way, yielding:

0=Luddx(Lux)=1S(qu+λwu+ddx(pux))\begin{aligned} 0 &= \pdv{\mathcal{L}}{u^*} - \dv{}{x}\Big( \pdv{\mathcal{L}}{u_x^*} \Big) \\ &= - \frac{1}{S} \bigg( q u + \lambda w u + \dv{}{x}\big( p u_x \big) \bigg) \end{aligned}

Which is clearly satisfied if and only if the following equation is fulfilled:

ddx(pux)+qu=λwu\begin{aligned} \dv{}{x}\big( p u_x \big) + q u = - \lambda w u \end{aligned}

This has the familiar form of a Sturm-Liouville problem (SLP), with λ\lambda representing an eigenvalue. SLPs have useful properties, but before we can take advantage of those, we need to handle an important detail: the boundary conditions (BCs) on uu. The above equation is only a valid SLP for certain BCs, as seen in the derivation of Sturm-Liouville theory. Let us return to the definition of RR, and integrate it by parts:

R[u]=1Sabpuxuxquudx=1S[puxu]ab1Nabddx(pux)u+quudx\begin{aligned} R[u] &= \frac{1}{S} \int_a^b p u_x u_x^* - q u u^* \dd{x} \\ &= \frac{1}{S} \Big[ p u_x u^* \Big]_a^b - \frac{1}{N} \int_a^b \dv{}{x}\Big(p u_x\Big) u^* + q u u^* \dd{x} \end{aligned}

The boundary term vanishes for a subset of the BCs that make a valid SLP, including Dirichlet BCs u(a)=u(b)=0u(a) = u(b) = 0, Neumann BCs ux(a)=ux(b)=0u_x(a) = u_x(b) = 0, and periodic BCs. Therefore, we assume that this term does indeed vanish, such that we can use Sturm-Liouville theory later:

R[u]=1Sab(ddx(pux)+qu)udx1SabuL^udx\begin{aligned} R[u] &= - \frac{1}{S} \int_a^b \bigg( \dv{}{x}\Big(p u_x\Big) + q u \bigg) u^* \dd{x} \\ &\equiv - \frac{1}{S} \int_a^b u^* \hat{L} u \dd{x} \end{aligned}

Where L^\hat{L} is the self-adjoint Sturm-Liouville operator. Because the constrained Euler-Lagrange equation is now an SLP, we know that it has an infinite number of real discrete eigenvalues λn\lambda_n with a lower bound, corresponding to mutually orthogonal eigenfunctions un(x)u_n(x).

To understand the significance of this result, suppose we have solved the SLP, and now insert one of the eigenfunctions unu_n into RR:

R[un]=1SnabunL^undx=1Snabλnwun2dx=SnSnλn\begin{aligned} R[u_n] &= - \frac{1}{S_n} \int_a^b u_n^* \hat{L} u_n \dd{x} \\ &= \frac{1}{S_n} \int_a^b \lambda_n w |u_n|^2 \dd{x} \\ &= \frac{S_n}{S_n} \lambda_n \end{aligned}

Where SnS_n is the normalization of unu_n. In other words, when given unu_n as input, the functional RR returns the corresponding eigenvalue λn\lambda_n:

R[un]=λn\begin{aligned} \boxed{ R[u_n] = \lambda_n } \end{aligned}

This powerful result was not at all clear from RR’s initial definition. Note that some authors use the opposite sign for λ\lambda in their SLP definition, in which case this result can still be obtained simply by also defining RR with the opposite sign. This sign choice is consistent with quantum mechanics, with the Hamiltonian H^=L^\hat{H} = - \hat{L}.

Justification

But what if we do not know the eigenfunctions? Is RR still useful? Yes, as we shall see. Suppose we make an educated guess u(x)u(x) for the ground state (i.e. lowest-eigenvalue) solution u0(x)u_0(x):

u(x)=u0(x)+n=1cnun(x)\begin{aligned} u(x) = u_0(x) + \sum_{n = 1}^\infty c_n u_n(x) \end{aligned}

Here, we are using the fact that the eigenfunctions of an SLP form a complete set, so our (known) guess uu can be expanded in the true (unknown) eigenfunctions unu_n. Next, by definition:

R[u]=uL^udxuwudx\begin{aligned} \boxed{ R[u] = - \frac{\displaystyle\int u^* \hat{L} u \dd{x}}{\displaystyle\int u^* w u \dd{x}} } \end{aligned}

This quantity is known as the Rayleigh quotient, and again beware of the sign in its definition; see the remark above. Inserting our ansatz uu, and using that the true unu_n have corresponding eigenvalues λn\lambda_n, we have:

R[u]=(u0+ncnun)L^{u0+ncnun}dxw(u0+ncnun)(u0+ncnun)dx=(u0+ncnun)( ⁣ ⁣λ0wu0ncnλnwun)dxw(u0+ncnun)(u0+ncnun)dx\begin{aligned} R[u] &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \: \hat{L} \Big\{ u_0 + \sum_n c_n u_n \Big\} \dd{x}} {\displaystyle\int w \Big( u_0 + \sum_n c_n u_n \Big) \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \dd{x}} \\ &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( \!-\! \lambda_0 w u_0 - \sum_n c_n \lambda_n w u_n \Big) \dd{x}} {\displaystyle\int w \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( u_0 + \sum_n c_n u_n \Big) \dd{x}} \end{aligned}

For convenience, we switch to Dirac notation before evaluating further:

R[u]=(u0+ncnun)(λ0wu0+ncnλnwun)(u0+ncnun)(wu0+ncnwun)=λ0u0wu0+λ0ncnunwu0+ncnλnu0wun+mncncmλnumwunu0wu0+ncnunwu0+ncnu0wun+mncncmumwun\begin{aligned} R[u] &= \frac{\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \Big( \lambda_0 \Ket{w u_0} + \sum_n c_n \lambda_n \Ket{w u_n} \Big)} {\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \Big( \Ket{w u_0} + \sum_n c_n \Ket{w u_n} \Big)} \\ &= \frac{\displaystyle \lambda_0 \inprod{u_0}{w u_0} + \lambda_0 \sum_{n} c_n^* \inprod{u_n}{w u_0} + \sum_{n} c_n \lambda_n \inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \lambda_n \inprod{u_m}{w u_n}} {\displaystyle \inprod{u_0}{w u_0} + \sum_{n} c_n^* \inprod{u_n}{w u_0} + \sum_{n} c_n \inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \inprod{u_m}{w u_n}} \end{aligned}

Using orthogonality umwun=Snδmn\inprod{u_m}{w u_n} = S_n \delta_{mn}, and the fact that n0n \neq 0 by definition, we find:

R[u]=λ0S0+λ0ncnSnδn0+ncnλnSnδn0+mncncmλnSnδmnS0+ncnSnδn0+ncnSnδn0+mncncmSnδmn=λ0S0+ncn2λnSnS0+ncn2Sn\begin{aligned} R[u] &= \frac{\displaystyle \lambda_0 S_0 + \lambda_0 \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n \lambda_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* \lambda_n S_n \delta_{mn}} {\displaystyle S_0 + \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* S_n \delta_{mn}} \\ &= \frac{\displaystyle \lambda_0 S_0 + \sum_{n} |c_n|^2 \lambda_n S_n} {\displaystyle S_0 + \sum_{n} |c_n|^2 S_n} \end{aligned}

It is always possible to choose our normalizations such that Sn=SS_n = S for all unu_n, leaving:

R[u]=λ0+ncn2λn1+ncn2\begin{aligned} R[u] &= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_n} {\displaystyle 1 + \sum_{n} |c_n|^2} \end{aligned}

And finally, after rearranging the numerator, we arrive at the following relation:

R[u]=λ0+ncn2λ0+ncn2(λnλ0)1+ncn2=λ0+ncn2(λnλ0)1+ncn2\begin{aligned} R[u] &= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_0 + \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)} {\displaystyle 1 + \sum_{n} |c_n|^2} \\ &= \lambda_0 + \frac{\displaystyle \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)} {\displaystyle 1 + \sum_{n} |c_n|^2} \end{aligned}

Thus, if we improve our guess uu (i.e. reduce cn|c_n|), then R[u]R[u] approaches the true eigenvalue λ0\lambda_0. For numerically finding u0u_0 and λ0\lambda_0, this gives us a clear goal: minimize RR, because:

R[u]=λ0+n=1cn2(λnλ0)1+n=1cn2λ0\begin{aligned} \boxed{ R[u] = \lambda_0 + \frac{\displaystyle \sum_{n = 1}^\infty |c_n|^2 (\lambda_n - \lambda_0)} {\displaystyle 1 + \sum_{n = 1}^\infty |c_n|^2} \ge \lambda_0 } \end{aligned}

In the context of quantum mechanics, this is not surprising, since any superposition of multiple states is guaranteed to have a higher energy than the ground state.

As our guess uu is improved, λ0\lambda_0 converges as cn2|c_n|^2, while uu converges to u0u_0 as cn|c_n| by definition, so even a fairly bad ansatz uu gives a decent estimate for λ0\lambda_0.

The method

In the following, we stick to Dirac notation, since the results hold for both continuous functions u(x)u(x) and discrete vectors u\vb{u}, as long as the operator L^\hat{L} is self-adjoint. Suppose we express our guess u\Ket{u} as a linear combination of known basis vectors fn\Ket{f_n} with weights anCa_n \in \mathbb{C}, where fn\Ket{f_n} are not necessarily eigenvectors of L^\hat{L}:

u=n=0cnun=n=0anfnn=0N1anfn\begin{aligned} \Ket{u} = \sum_{n = 0}^\infty c_n \Ket{u_n} = \sum_{n = 0}^\infty a_n \Ket{f_n} \approx \sum_{n = 0}^{N - 1} a_n \Ket{f_n} \end{aligned}

For numerical tractability, we truncate the sum at NN terms, and for generality we allow fn\Ket{f_n} to be non-orthogonal, as described by an overlap matrix with elements SmnS_{mn}:

fmwfn=Smn\begin{aligned} \inprod{f_m}{w f_n} = S_{m n} \end{aligned}

From the discussion above, we know that the ground-state eigenvalue λ0\lambda_0 is estimated by:

λ0R[u]=uL^uuwu=mnamanfmL^fnmnamanfmwfnmnamanLmnmnamanSmn\begin{aligned} \lambda_0 \approx R[u] = - \frac{\inprod{u}{\hat{L} u}}{\inprod{u}{w u}} = - \frac{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{\hat{L} f_n}}{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{w f_n}} \equiv - \frac{\displaystyle \sum_{m n} a_m^* a_n L_{m n}}{\displaystyle \sum_{m n} a_m^* a_n S_{mn}} \end{aligned}

And we also know that our goal is to minimize R[u]R[u], so we vary aka_k^* to find its extremum:

0=Rak=(nanLkn)(mnanamSmn)(nanSkn)(mnanamLmn)(mnanamSmn)2=(nanLkn)R[u](nanSkn)mnanamSmn=nan(LknλSkn)uwu\begin{aligned} 0 = - \pdv{R}{a_k^*} &= \frac{\displaystyle \Big( \sum_{n} a_n L_{k n} \Big) \Big( \sum_{m n} a_n a_m^* S_{mn} \Big) - \Big( \sum_{n} a_n S_{k n} \Big) \Big( \sum_{m n} a_n a_m^* L_{mn} \Big)} {\Big( \displaystyle \sum_{m n} a_n a_m^* S_{mn} \Big)^2} \\ &= \frac{\displaystyle \Big( \sum_{n} a_n L_{k n} \Big) - R[u] \Big( \sum_{n} a_n S_{k n}\Big)} {\displaystyle \sum_{m n} a_n a_m^* S_{mn}} \\ &= \sum_{n} a_n \frac{\big(L_{k n} - \lambda S_{k n}\big)}{\inprod{u}{w u}} \end{aligned}

Clearly, this is only satisfied if the following holds for all k=0,1,...,N ⁣ ⁣1k = 0, 1, ..., N\!-\!1:

0=n=0N1an(LknλSkn)\begin{aligned} 0 = \sum_{n = 0}^{N - 1} a_n \big(L_{k n} - \lambda S_{k n}\big) \end{aligned}

For illustrative purposes, we can write this as a matrix equation with MknLknλSknM_{k n} \equiv L_{k n} - \lambda S_{k n}:

[M0,0M0,1M0,N1M1,0MN1,0MN1,N1][a0a1aN1]=[000]\begin{aligned} \begin{bmatrix} M_{0,0} & M_{0,1} & \cdots & M_{0,N-1} \\ M_{1,0} & \ddots & & \vdots \\ \vdots & & \ddots & \vdots \\ M_{N-1,0} & \cdots & \cdots & M_{N-1,N-1} \end{bmatrix} \cdot \begin{bmatrix} a_0 \\ a_1 \\ \vdots \\ a_{N-1} \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} \end{aligned}

This looks like an eigenvalue problem for λ\lambda, so we demand that its determinant vanishes:

0=det ⁣[Mˉ]=det ⁣[LˉλSˉ]\begin{aligned} 0 = \det\!\Big[ \bar{M} \Big] = \det\!\Big[ \bar{L} - \lambda \bar{S} \Big] \end{aligned}

This gives a set of λ\lambda, which are exact eigenvalues of Lˉ\bar{L}, and estimated eigenvalues of L^\hat{L} (recall that Lˉ\bar{L} is L^\hat{L} expressed in a truncated basis). The eigenvector [a0,a1,...,aN1]\big[ a_0, a_1, ..., a_{N-1} \big] of the lowest λ\lambda gives the optimal weights ana_n to approximate u0\Ket{u_0} in the basis {fn}\{\Ket{f_n}\}. Likewise, the higher λ\lambdas’ eigenvectors approximate excited (i.e. non-ground) eigenstates of L^\hat{L}, although in practice the results become less accurate the higher we go. If we only care about the ground state, then we already know λ\lambda from R[u]R[u], so we just need to solve the matrix equation for ana_n.

You may find this result unsurprising: it makes some intuitive sense that approximating L^\hat{L} in a limited basis would yield a matrix Lˉ\bar{L} giving rough eigenvalues. The point of this discussion is to rigorously show the validity of this approach.

Nowadays, there exist many other methods to calculate eigenvalues of complicated operators L^\hat{L}, but an attractive feature of the Ritz method is that it is single-step, whereas its competitors tend to be iterative. That said, this method cannot recover from a poorly chosen basis {fn}\{\Ket{f_n}\}.

Indeed, the overall accuracy is determined by how good our truncated basis is, i.e. how large a subspace it spans of the Hilbert space in which the true u0\Ket{u_0} resides. Clearly, adding more basis vectors improves the results, but at a computational cost; it is usually more efficient to carefully choose which fn\ket{f_n} to use, rather than just how many.

References

  1. G.B. Arfken, H.J. Weber, Mathematical methods for physicists, 6th edition, 2005, Elsevier.
  2. O. Bang, Applied mathematics for physicists: lecture notes, 2019, unpublished.