In various branches of physics,
the Ritz method is a technique to approximately find the lowest solutions to an eigenvalue problem.
Some call it the Rayleigh-Ritz method, the Ritz-Galerkin method,
or simply the variational method.
Background
In the context of variational calculus,
consider the following functional to be optimized:
R[u]≡S1∫abp(x)ux(x)2−q(x)u(x)2dx
Where u(x)∈C is the unknown function,
and p(x),q(x)∈R are given.
In addition, S is the norm of u, which we take to be constant
with respect to a weight function w(x)∈R:
S≡∫abw(x)u(x)2dx
This normalization requirement acts as a constraint
to the optimization problem for R[u],
so we introduce a Lagrange multiplierλ,
and define the Lagrangian L for the full problem as:
L≡S1((p∣ux∣2−q∣u∣2)−λ(w∣u∣2))
The resulting Euler-Lagrange equation is then calculated in the standard way, yielding:
0=∂u∗∂L−dxd(∂ux∗∂L)=−S1(qu+λwu+dxd(pux))
Which is clearly satisfied if and only if the following equation is fulfilled:
dxd(pux)+qu=−λwu
This has the familiar form of a Sturm-Liouville problem (SLP),
with λ representing an eigenvalue.
SLPs have useful properties, but before we can take advantage of those,
we need to handle an important detail: the boundary conditions (BCs) on u.
The above equation is only a valid SLP for certain BCs,
as seen in the derivation of Sturm-Liouville theory.
Let us return to the definition of R,
and integrate it by parts:
The boundary term vanishes for a subset of the BCs that make a valid SLP,
including Dirichlet BCs u(a)=u(b)=0, Neumann BCs ux(a)=ux(b)=0, and periodic BCs.
Therefore, we assume that this term does indeed vanish,
such that we can use Sturm-Liouville theory later:
Where L^ is the self-adjoint Sturm-Liouville operator.
Because the constrained Euler-Lagrange equation is now an SLP,
we know that it has an infinite number of real discrete eigenvalues λn with a lower bound,
corresponding to mutually orthogonal eigenfunctions un(x).
To understand the significance of this result,
suppose we have solved the SLP,
and now insert one of the eigenfunctions un into R:
Where Sn is the normalization of un.
In other words, when given un as input,
the functional R returns the corresponding eigenvalue λn:
R[un]=λn
This powerful result was not at all clear from R’s initial definition.
Note that some authors use the opposite sign for λ in their SLP definition,
in which case this result can still be obtained
simply by also defining R with the opposite sign.
This sign choice is consistent with quantum mechanics,
with the Hamiltonian H^=−L^.
Justification
But what if we do not know the eigenfunctions? Is R still useful?
Yes, as we shall see. Suppose we make an educated guess u(x)
for the ground state (i.e. lowest-eigenvalue) solution u0(x):
u(x)=u0(x)+n=1∑∞cnun(x)
Here, we are using the fact that the eigenfunctions of an SLP form a complete set,
so our (known) guess u can be expanded in the true (unknown) eigenfunctions un.
Next, by definition:
R[u]=−∫u∗wudx∫u∗L^udx
This quantity is known as the Rayleigh quotient,
and again beware of the sign in its definition; see the remark above.
Inserting our ansatz u,
and using that the true un have corresponding eigenvalues λn,
we have:
Thus, if we improve our guess u (i.e. reduce ∣cn∣),
then R[u] approaches the true eigenvalue λ0.
For numerically finding u0 and λ0, this gives us a clear goal: minimize R, because:
In the context of quantum mechanics, this is not surprising,
since any superposition of multiple states
is guaranteed to have a higher energy than the ground state.
As our guess u is improved, λ0 converges as ∣cn∣2,
while u converges to u0 as ∣cn∣ by definition,
so even a fairly bad ansatz u gives a decent estimate for λ0.
The method
In the following, we stick to Dirac notation,
since the results hold for both continuous functions u(x)
and discrete vectors u,
as long as the operator L^ is self-adjoint.
Suppose we express our guess ∣u⟩ as a linear combination
of known basis vectors ∣fn⟩ with weights an∈C,
where ∣fn⟩ are not necessarily eigenvectors of L^:
For numerical tractability, we truncate the sum at N terms,
and for generality we allow ∣fn⟩ to be non-orthogonal,
as described by an overlap matrix with elements Smn:
⟨fm∣wfn⟩=Smn
From the discussion above,
we know that the ground-state eigenvalue λ0 is estimated by:
This looks like an eigenvalue problem for λ,
so we demand that its determinant vanishes:
0=det[Mˉ]=det[Lˉ−λSˉ]
This gives a set of λ,
which are exact eigenvalues of Lˉ,
and estimated eigenvalues of L^
(recall that Lˉ is L^ expressed in a truncated basis).
The eigenvector [a0,a1,...,aN−1] of the lowest λ
gives the optimal weights an to approximate ∣u0⟩ in the basis {∣fn⟩}.
Likewise, the higher λs’ eigenvectors approximate
excited (i.e. non-ground) eigenstates of L^,
although in practice the results become less accurate the higher we go.
If we only care about the ground state,
then we already know λ from R[u],
so we just need to solve the matrix equation for an.
You may find this result unsurprising:
it makes some intuitive sense that approximating L^
in a limited basis would yield a matrix Lˉ giving rough eigenvalues.
The point of this discussion is to rigorously show
the validity of this approach.
Nowadays, there exist many other methods to calculate eigenvalues
of complicated operators L^,
but an attractive feature of the Ritz method is that it is single-step,
whereas its competitors tend to be iterative.
That said, this method cannot recover from a poorly chosen basis {∣fn⟩}.
Indeed, the overall accuracy is determined by how good our truncated basis is,
i.e. how large a subspace it spans
of the Hilbert space in which the true ∣u0⟩ resides.
Clearly, adding more basis vectors improves the results,
but at a computational cost;
it is usually more efficient to carefully choose which∣fn⟩ to use,
rather than just how many.