---
title: "Ritz method"
sort_title: "Ritz method"
date: 2022-09-18
categories:
- Physics
- Mathematics
- Perturbation
- Quantum mechanics
- Numerical methods
layout: "concept"
---

In various branches of physics,
the **Ritz method** is a technique to approximately find the lowest solutions to an eigenvalue problem.
Some call it the **Rayleigh-Ritz method**, the **Ritz-Galerkin method**,
or simply the **variational method**.


## Background

In the context of [variational calculus](/know/concept/calculus-of-variations/),
consider the following functional to be optimized:

$$\begin{aligned}
    R[u]
    = \frac{1}{S} \int_a^b p(x) \big|u_x(x)\big|^2 - q(x) \big|u(x)\big|^2 \dd{x}
\end{aligned}$$

Where $$u(x) \in \mathbb{C}$$ is the unknown function,
and $$p(x), q(x) \in \mathbb{R}$$ are given.
In addition, $$S$$ is the norm of $$u$$, which we demand be constant
with respect to a weight function $$w(x) \in \mathbb{R}$$:

$$\begin{aligned}
    S
    = \int_a^b w(x) \big|u(x)\big|^2 \dd{x}
\end{aligned}$$

To handle this normalization requirement,
we introduce a [Lagrange multiplier](/know/concept/lagrange-multiplier/) $$\lambda$$,
and define the Lagrangian $$\Lambda$$ for the full constrained optimization problem as:

$$\begin{aligned}
    \Lambda
    \equiv \frac{1}{S} \bigg( \big( p |u_x|^2 - q |u|^2 \big) - \lambda \big( w |u|^2 \big) \bigg)
\end{aligned}$$

The resulting Euler-Lagrange equation is then calculated in the standard way, yielding:

$$\begin{aligned}
    0
    &= \pdv{\Lambda}{u^*} - \dv{}{x}\Big( \pdv{\Lambda}{u_x^*} \Big)
    \\
    &= - \frac{1}{S} \bigg( q u + \lambda w u + \dv{}{x}\big( p u_x \big) \bigg)
\end{aligned}$$

Which is clearly satisfied if and only if the following equation is fulfilled:

$$\begin{aligned}
    \dv{}{x}\big( p u_x \big) + q u
    = - \lambda w u
\end{aligned}$$

This has the familiar form of a [Sturm-Liouville problem](/know/concept/sturm-liouville-theory/) (SLP),
with $$\lambda$$ representing an eigenvalue.
SLPs have useful properties, but before we can take advantage of those,
we need to handle an important detail: the boundary conditions (BCs) on $$u$$.
The above equation is only a valid SLP for certain BCs,
as seen in the derivation of Sturm-Liouville theory.

Let us return to the definition of $$R[u]$$,
and integrate it by parts:

$$\begin{aligned}
    R[u]
    &= \frac{1}{S} \int_a^b p u_x u_x^* - q u u^* \dd{x}
    \\
    &= \frac{1}{S} \Big[ p u_x u^* \Big]_a^b - \frac{1}{S} \int_a^b \dv{}{x}\Big(p u_x\Big) u^* + q u u^* \dd{x}
\end{aligned}$$

The boundary term vanishes for a subset of the BCs that make a valid SLP,
including Dirichlet BCs $$u(a) = u(b) = 0$$, Neumann BCs $$u_x(a) = u_x(b) = 0$$, and periodic BCs.
Therefore, we assume that this term does indeed vanish,
such that we can use Sturm-Liouville theory later:

$$\begin{aligned}
    R[u]
    &= - \frac{1}{S} \int_a^b \bigg( \dv{}{x}\Big(p u_x\Big) + q u \bigg) u^* \dd{x}
    \equiv - \frac{1}{S} \int_a^b u^* \hat{H} u \dd{x}
\end{aligned}$$

Where $$\hat{H}$$ is the self-adjoint Sturm-Liouville operator.
Because the constrained Euler-Lagrange equation is now an SLP,
we know that it has an infinite number of real discrete eigenvalues $$\lambda_n$$ with a lower bound,
corresponding to mutually orthogonal eigenfunctions $$u_n(x)$$.

To understand the significance of this result,
suppose we have solved the SLP,
and now insert one of the eigenfunctions $$u_n$$ into $$R$$:

$$\begin{aligned}
    R[u_n]
    &= - \frac{1}{S_n} \int_a^b u_n^* \hat{H} u_n \dd{x}
    = \frac{1}{S_n} \int_a^b u_n^* \lambda_n w u_n \dd{x}
    \\
    &= \frac{1}{S_n} \lambda_n \int_a^b w |u_n|^2 \dd{x}
    = \frac{S_n}{S_n} \lambda_n
\end{aligned}$$

Where $$S_n$$ is the normalization of $$u_n$$.
In other words, when given $$u_n$$,
the functional $$R$$ yields the corresponding eigenvalue $$\lambda_n$$:

$$\begin{aligned}
    \boxed{
        R[u_n]
        = \lambda_n
    }
\end{aligned}$$

This powerful result was not at all clear from $$R$$'s initial definition.


## Justification

But what if we do not know the eigenfunctions? Is $$R$$ still useful?
Yes, as we shall see. Suppose we make an educated guess $$u(x)$$
for the ground state (i.e. lowest-eigenvalue) solution $$u_0(x)$$:

$$\begin{aligned}
    u(x)
    = u_0(x) + \sum_{n = 1}^\infty c_n u_n(x)
\end{aligned}$$

Here, we are using the fact that the eigenfunctions of an SLP form a complete set,
so our (known) guess $$u$$ can be expanded in the true (unknown) eigenfunctions $$u_n$$.
We are assuming that $$u$$ is already quite close to its target $$u_0$$,
such that the (unknown) expansion coefficients $$c_n$$ are small;
specifically $$|c_n|^2 \ll 1$$.
Let us start from what we know:

$$\begin{aligned}
    \boxed{
        R[u]
        = - \frac{\displaystyle\int u^* \hat{H} u \dd{x}}{\displaystyle\int u^* w u \dd{x}}
    }
\end{aligned}$$

This quantity is known as the **Rayleigh quotient**.
Inserting our ansatz $$u$$,
and using that the true $$u_n$$ have corresponding eigenvalues $$\lambda_n$$:

$$\begin{aligned}
    R[u]
    &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \: \hat{H} \Big\{ u_0 + \sum_n c_n u_n \Big\} \dd{x}}
    {\displaystyle\int w \Big( u_0 + \sum_n c_n u_n \Big) \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \dd{x}}
    \\
    &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( \!-\! \lambda_0 w u_0 - \sum_n c_n \lambda_n w u_n \Big) \dd{x}}
    {\displaystyle\int w  \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( u_0 + \sum_n c_n u_n \Big) \dd{x}}
\end{aligned}$$

For convenience, we switch to [Dirac notation](/know/concept/dirac-notation/)
before evaluating further.

$$\begin{aligned}
    R
    &= \frac{\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \cdot \Big( \lambda_0 \Ket{w u_0} + \sum_n c_n \lambda_n \Ket{w u_n} \Big)}
    {\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \cdot \Big( \Ket{w u_0} + \sum_n c_n \Ket{w u_n} \Big)}
    \\
    &= \frac{\displaystyle \lambda_0 \Inprod{u_0}{w u_0} + \lambda_0 \sum_{n = 1}^\infty c_n^* \Inprod{u_n}{w u_0}
    + \sum_{n = 1}^\infty c_n \lambda_n \Inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \lambda_n \Inprod{u_m}{w u_n}}
    {\displaystyle \Inprod{u_0}{w u_0} + \sum_{n = 1}^\infty c_n^* \Inprod{u_n}{w u_0}
    + \sum_{n = 1}^\infty c_n \Inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \Inprod{u_m}{w u_n}}
\end{aligned}$$

Using orthogonality $$\Inprod{u_m}{w u_n} = S_n \delta_{mn}$$,
and the fact that $$n \neq 0$$ by definition, we find:

$$\begin{aligned}
    R
    &= \frac{\displaystyle \lambda_0 S_0 + \lambda_0 \sum_n c_n^* S_n \delta_{n0}
    + \sum_n c_n \lambda_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* \lambda_n S_n \delta_{mn}}
    {\displaystyle S_0 + \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* S_n \delta_{mn}}
    \\
    &= \frac{\displaystyle \lambda_0 S_0 + 0 + 0 + \sum_{n} c_n c_n^* \lambda_n S_n}
    {\displaystyle S_0 + 0 + 0 + \sum_{n} c_n c_n^* S_n}
    = \frac{\displaystyle \lambda_0 S_0 + \sum_{n} |c_n|^2 \lambda_n S_n}
    {\displaystyle S_0 + \sum_{n} |c_n|^2 S_n}
\end{aligned}$$

It is always possible to choose our normalizations such that $$S_n = S$$ for all $$u_n$$, leaving:

$$\begin{aligned}
    R
    &= \frac{\displaystyle \lambda_0 S + \sum_{n} |c_n|^2 \lambda_n S}
    {\displaystyle S + \sum_{n} |c_n|^2 S}
    = \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_n}
    {\displaystyle 1 + \sum_{n} |c_n|^2}
\end{aligned}$$

And finally, after rearranging the numerator, we arrive at the following relation:

$$\begin{aligned}
    R
    &= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_0 + \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)}
    {\displaystyle 1 + \sum_{n} |c_n|^2}
    = \lambda_0 + \frac{\displaystyle \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)}
    {\displaystyle 1 + \sum_{n} |c_n|^2}
\end{aligned}$$

Thus, if we improve our guess $$u$$,
then $$R[u]$$ approaches the true eigenvalue $$\lambda_0$$.
For numerically finding $$u_0$$ and $$\lambda_0$$, this gives us a clear goal: minimize $$R$$, because:

$$\begin{aligned}
    \boxed{
        R[u]
        = \lambda_0 + \frac{\displaystyle \sum_{n = 1}^\infty |c_n|^2 (\lambda_n - \lambda_0)}
        {\displaystyle 1 + \sum_{n = 1}^\infty |c_n|^2}
        \ge \lambda_0
    }
\end{aligned}$$

In the context of quantum mechanics, this is not surprising,
since any superposition of multiple states
is guaranteed to have a higher energy than the ground state.

Note that the convergence to $$\lambda_0$$ goes as $$|c_n|^2$$,
while $$u$$ converges to $$u_0$$ as $$|c_n|$$ by definition,
so even a fairly bad guess $$u$$ will give a decent estimate for $$\lambda_0$$.


## The method

In the following, we stick to Dirac notation,
since the results hold for both continuous functions $$u(x)$$ and discrete vectors $$\vb{u}$$,
as long as the operator $$\hat{H}$$ is self-adjoint.
Suppose we express our guess $$\Ket{u}$$ as a linear combination
of *known* basis vectors $$\Ket{f_n}$$ with weights $$a_n \in \mathbb{C}$$:

$$\begin{aligned}
    \Ket{u}
    = \sum_{n = 0}^\infty c_n \Ket{u_n}
    = \sum_{n = 0}^\infty a_n \Ket{f_n}
    \approx \sum_{n = 0}^{N - 1} a_n \Ket{f_n}
\end{aligned}$$

For numerical tractability, we truncate the sum at $$N$$ terms,
and for generality, we allow $$\Ket{f_n}$$ to be non-orthogonal,
as described by an *overlap matrix* with elements $$S_{mn}$$:

$$\begin{aligned}
    \Inprod{f_m}{w f_n} = S_{m n}
\end{aligned}$$

From the discussion above,
we know that the ground-state eigenvalue $$\lambda_0$$ is estimated by:

$$\begin{aligned}
    \lambda_0
    \approx \lambda
    = R[u]
    = \frac{\inprod{u}{\hat{H} u}}{\Inprod{u}{w u}}
    = \frac{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{\hat{H} f_n}}{\displaystyle \sum_{m n} a_m^* a_n \Inprod{f_m}{w f_n}}
    \equiv \frac{\displaystyle \sum_{m n} a_m^* a_n H_{m n}}{\displaystyle \sum_{m n} a_m^* a_n S_{mn}}
\end{aligned}$$

And we also know that our goal is to minimize $$R[u]$$,
so we vary $$a_k^*$$ to find its extremum:

$$\begin{aligned}
    0
    = \pdv{R}{a_k^*}
    &= \frac{\displaystyle \Big( \sum_{n} a_n H_{k n} \Big) \Big( \sum_{m n} a_n a_m^* S_{mn} \Big)
    - \Big( \sum_{n} a_n S_{k n} \Big) \Big( \sum_{m n} a_n a_m^* H_{mn} \Big)}
    {\Big( \displaystyle \sum_{m n} a_n a_m^* S_{mn} \Big)^2}
    \\
    &= \frac{\displaystyle \Big( \sum_{n} a_n H_{k n} \Big) - R[u] \Big( \sum_{n} a_n S_{k n}\Big)}{\Inprod{u}{w u}}
    = \frac{\displaystyle \sum_{n} a_n \big(H_{k n} - \lambda S_{k n}\big)}{\Inprod{u}{w u}}
\end{aligned}$$

Clearly, this is only satisfied if the following holds for all $$k = 0, 1, ..., N\!-\!1$$:

$$\begin{aligned}
    0
    = \sum_{n = 0}^{N - 1} a_n \big(H_{k n} - \lambda S_{k n}\big)
\end{aligned}$$

For illustrative purposes,
we can write this as a matrix equation
with $$M_{k n} \equiv H_{k n} - \lambda S_{k n}$$:

$$\begin{aligned}
    \begin{bmatrix}
        M_{0,0} & M_{0,1} & \cdots & M_{0,N-1} \\
        M_{1,0} & \ddots & & \vdots \\
        \vdots & & \ddots & \vdots \\
        M_{N-1,0} & \cdots & \cdots & M_{N-1,N-1}
    \end{bmatrix}
    \cdot
    \begin{bmatrix}
        a_0 \\ a_1 \\ \vdots \\ a_{N-1}
    \end{bmatrix}
    =
    \begin{bmatrix}
        0 \\ 0 \\ \vdots \\ 0
    \end{bmatrix}
\end{aligned}$$

Note that this looks like an eigenvalue problem for $$\lambda$$.
Indeed, demanding that $$\overline{M}$$ cannot simply be inverted
(i.e. the solution is non-trivial)
yields a characteristic polynomial for $$\lambda$$:

$$\begin{aligned}
    0
    = \det\!\Big[ \overline{M} \Big]
    = \det\!\Big[ \overline{H} - \lambda \overline{S} \Big]
\end{aligned}$$

This gives a set of $$\lambda$$,
which are the exact eigenvalues of $$\overline{H}$$,
and the estimated eigenvalues of $$\hat{H}$$
(recall that $$\overline{H}$$ is $$\hat{H}$$ expressed in a truncated basis).
The eigenvector $$\big[ a_0, a_1, ..., a_{N-1} \big]$$ of the lowest $$\lambda$$
gives the optimal weights to approximate $$\Ket{u_0}$$ in the basis $$\{\Ket{f_n}\}$$.
Likewise, the higher $$\lambda$$'s eigenvectors approximate
excited (i.e. non-ground) eigenstates of $$\hat{H}$$,
although in practice the results are less accurate the higher we go.

The overall accuracy is determined by how good our truncated basis is,
i.e. how large a subspace it spans
of the [Hilbert space](/know/concept/hilbert-space/) in which the true $$\Ket{u_0}$$ resides.
Clearly, adding more basis vectors will improve the results,
at the cost of computation.
For example, if $$\hat{H}$$ represents a helium atom,
a good choice for $$\{\Ket{f_n}\}$$ would be hydrogen orbitals,
since those are qualitatively similar.

You may find this result unsurprising;
it makes some intuitive sense that approximating $$\hat{H}$$
in a limited basis would yield a matrix $$\overline{H}$$ giving rough eigenvalues.
The point of this discussion is to rigorously show
the validity of this approach.

If we only care about the ground state,
then we already know $$\lambda$$ from $$R[u]$$,
so all we need to do is solve the above matrix equation for $$a_n$$.
Keep in mind that $$\overline{M}$$ is singular,
and $$a_n$$ are only defined up to a constant factor.

Nowadays, there exist many other methods to calculate eigenvalues
of complicated operators $$\hat{H}$$,
but an attractive feature of the Ritz method is that it is single-step,
whereas its competitors tend to be iterative.
That said, the Ritz method cannot recover from a poorly chosen basis.


## References
1.  G.B. Arfken, H.J. Weber,
    *Mathematical methods for physicists*, 6th edition, 2005,
    Elsevier.
2.  O. Bang,
    *Applied mathematics for physicists: lecture notes*, 2019,
    unpublished.