source/know/concept/ritz-method/index.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371

---
title: "Ritz method"
sort_title: "Ritz method"
date: 2022-09-18
categories:
- Physics
- Mathematics
- Perturbation
- Quantum mechanics
- Numerical methods
layout: "concept"
---

In various branches of physics,
the **Ritz method** is a technique to approximately find the lowest solutions to an eigenvalue problem.
Some call it the **Rayleigh-Ritz method**, the **Ritz-Galerkin method**,
or simply the **variational method**.


## Background

In the context of [variational calculus](/know/concept/calculus-of-variations/),
consider the following functional to be optimized:

$$\begin{aligned}
    R[u]
    \equiv \frac{1}{S} \int_a^b p(x) \big|u_x(x)\big|^2 - q(x) \big|u(x)\big|^2 \dd{x}
\end{aligned}$$

Where $$u(x) \in \mathbb{C}$$ is the unknown function,
and $$p(x), q(x) \in \mathbb{R}$$ are given.
In addition, $$S$$ is the norm of $$u$$, which we take to be constant
with respect to a weight function $$w(x) \in \mathbb{R}$$:

$$\begin{aligned}
    S
    \equiv \int_a^b w(x) \big|u(x)\big|^2 \dd{x}
\end{aligned}$$

This normalization requirement acts as a constraint
to the optimization problem for $$R[u]$$,
so we introduce a [Lagrange multiplier](/know/concept/lagrange-multiplier/) $$\lambda$$,
and define the Lagrangian $$\mathcal{L}$$ for the full problem as:

$$\begin{aligned}
    \mathcal{L}
    \equiv \frac{1}{S} \bigg( \big( p |u_x|^2 - q |u|^2 \big) - \lambda \big( w |u|^2 \big) \bigg)
\end{aligned}$$

The resulting Euler-Lagrange equation is then calculated in the standard way, yielding:

$$\begin{aligned}
    0
    &= \pdv{\mathcal{L}}{u^*} - \dv{}{x}\Big( \pdv{\mathcal{L}}{u_x^*} \Big)
    \\
    &= - \frac{1}{S} \bigg( q u + \lambda w u + \dv{}{x}\big( p u_x \big) \bigg)
\end{aligned}$$

Which is clearly satisfied if and only if the following equation is fulfilled:

$$\begin{aligned}
    \dv{}{x}\big( p u_x \big) + q u
    = - \lambda w u
\end{aligned}$$

This has the familiar form of a [Sturm-Liouville problem](/know/concept/sturm-liouville-theory/) (SLP),
with $$\lambda$$ representing an eigenvalue.
SLPs have useful properties, but before we can take advantage of those,
we need to handle an important detail: the boundary conditions (BCs) on $$u$$.
The above equation is only a valid SLP for certain BCs,
as seen in the derivation of Sturm-Liouville theory.
Let us return to the definition of $$R$$,
and integrate it by parts:

$$\begin{aligned}
    R[u]
    &= \frac{1}{S} \int_a^b p u_x u_x^* - q u u^* \dd{x}
    \\
    &= \frac{1}{S} \Big[ p u_x u^* \Big]_a^b - \frac{1}{N} \int_a^b \dv{}{x}\Big(p u_x\Big) u^* + q u u^* \dd{x}
\end{aligned}$$

The boundary term vanishes for a subset of the BCs that make a valid SLP,
including Dirichlet BCs $$u(a) = u(b) = 0$$, Neumann BCs $$u_x(a) = u_x(b) = 0$$, and periodic BCs.
Therefore, we assume that this term does indeed vanish,
such that we can use Sturm-Liouville theory later:

$$\begin{aligned}
    R[u]
    &= - \frac{1}{S} \int_a^b \bigg( \dv{}{x}\Big(p u_x\Big) + q u \bigg) u^* \dd{x}
    \\
    &\equiv - \frac{1}{S} \int_a^b u^* \hat{L} u \dd{x}
\end{aligned}$$

Where $$\hat{L}$$ is the self-adjoint Sturm-Liouville operator.
Because the constrained Euler-Lagrange equation is now an SLP,
we know that it has an infinite number of real discrete eigenvalues $$\lambda_n$$ with a lower bound,
corresponding to mutually orthogonal eigenfunctions $$u_n(x)$$.

To understand the significance of this result,
suppose we have solved the SLP,
and now insert one of the eigenfunctions $$u_n$$ into $$R$$:

$$\begin{aligned}
    R[u_n]
    &= - \frac{1}{S_n} \int_a^b u_n^* \hat{L} u_n \dd{x}
    \\
    &= \frac{1}{S_n} \int_a^b \lambda_n w |u_n|^2 \dd{x}
    \\
    &= \frac{S_n}{S_n} \lambda_n
\end{aligned}$$

Where $$S_n$$ is the normalization of $$u_n$$.
In other words, when given $$u_n$$ as input,
the functional $$R$$ returns the corresponding eigenvalue $$\lambda_n$$:

$$\begin{aligned}
    \boxed{
        R[u_n]
        = \lambda_n
    }
\end{aligned}$$

This powerful result was not at all clear from $$R$$'s initial definition.
Note that some authors use the opposite sign for $$\lambda$$ in their SLP definition,
in which case this result can still be obtained
simply by also defining $$R$$ with the opposite sign.
This sign choice is consistent with quantum mechanics,
with the Hamiltonian $$\hat{H} = - \hat{L}$$.


## Justification

But what if we do not know the eigenfunctions? Is $$R$$ still useful?
Yes, as we shall see. Suppose we make an educated guess $$u(x)$$
for the ground state (i.e. lowest-eigenvalue) solution $$u_0(x)$$:

$$\begin{aligned}
    u(x)
    = u_0(x) + \sum_{n = 1}^\infty c_n u_n(x)
\end{aligned}$$

Here, we are using the fact that the eigenfunctions of an SLP form a complete set,
so our (known) guess $$u$$ can be expanded in the true (unknown) eigenfunctions $$u_n$$.
Next, by definition:

$$\begin{aligned}
    \boxed{
        R[u]
        = - \frac{\displaystyle\int u^* \hat{L} u \dd{x}}{\displaystyle\int u^* w u \dd{x}}
    }
\end{aligned}$$

This quantity is known as the **Rayleigh quotient**,
and again beware of the sign in its definition; see the remark above.
Inserting our ansatz $$u$$,
and using that the true $$u_n$$ have corresponding eigenvalues $$\lambda_n$$,
we have:

$$\begin{aligned}
    R[u]
    &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \: \hat{L} \Big\{ u_0 + \sum_n c_n u_n \Big\} \dd{x}}
    {\displaystyle\int w \Big( u_0 + \sum_n c_n u_n \Big) \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \dd{x}}
    \\
    &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big)
    \Big( \!-\! \lambda_0 w u_0 - \sum_n c_n \lambda_n w u_n \Big) \dd{x}}
    {\displaystyle\int w  \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( u_0 + \sum_n c_n u_n \Big) \dd{x}}
\end{aligned}$$

For convenience, we switch to [Dirac notation](/know/concept/dirac-notation/)
before evaluating further:

$$\begin{aligned}
    R[u]
    &= \frac{\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big)
    \Big( \lambda_0 \Ket{w u_0} + \sum_n c_n \lambda_n \Ket{w u_n} \Big)}
    {\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \Big( \Ket{w u_0} + \sum_n c_n \Ket{w u_n} \Big)}
    \\
    &= \frac{\displaystyle \lambda_0 \inprod{u_0}{w u_0} + \lambda_0 \sum_{n} c_n^* \inprod{u_n}{w u_0}
    + \sum_{n} c_n \lambda_n \inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \lambda_n \inprod{u_m}{w u_n}}
    {\displaystyle \inprod{u_0}{w u_0} + \sum_{n} c_n^* \inprod{u_n}{w u_0}
    + \sum_{n} c_n \inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \inprod{u_m}{w u_n}}
\end{aligned}$$

Using orthogonality $$\inprod{u_m}{w u_n} = S_n \delta_{mn}$$,
and the fact that $$n \neq 0$$ by definition, we find:

$$\begin{aligned}
    R[u]
    &= \frac{\displaystyle \lambda_0 S_0 + \lambda_0 \sum_n c_n^* S_n \delta_{n0}
    + \sum_n c_n \lambda_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* \lambda_n S_n \delta_{mn}}
    {\displaystyle S_0 + \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* S_n \delta_{mn}}
    \\
    &= \frac{\displaystyle \lambda_0 S_0 + \sum_{n} |c_n|^2 \lambda_n S_n}
    {\displaystyle S_0 + \sum_{n} |c_n|^2 S_n}
\end{aligned}$$

It is always possible to choose our normalizations such that $$S_n = S$$ for all $$u_n$$, leaving:

$$\begin{aligned}
    R[u]
    &= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_n}
    {\displaystyle 1 + \sum_{n} |c_n|^2}
\end{aligned}$$

And finally, after rearranging the numerator, we arrive at the following relation:

$$\begin{aligned}
    R[u]
    &= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_0 + \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)}
    {\displaystyle 1 + \sum_{n} |c_n|^2}
    \\
    &= \lambda_0 + \frac{\displaystyle \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)}
    {\displaystyle 1 + \sum_{n} |c_n|^2}
\end{aligned}$$

Thus, if we improve our guess $$u$$ (i.e. reduce $$|c_n|$$),
then $$R[u]$$ approaches the true eigenvalue $$\lambda_0$$.
For numerically finding $$u_0$$ and $$\lambda_0$$, this gives us a clear goal: minimize $$R$$, because:

$$\begin{aligned}
    \boxed{
        R[u]
        = \lambda_0 + \frac{\displaystyle \sum_{n = 1}^\infty |c_n|^2 (\lambda_n - \lambda_0)}
        {\displaystyle 1 + \sum_{n = 1}^\infty |c_n|^2}
        \ge \lambda_0
    }
\end{aligned}$$

In the context of quantum mechanics, this is not surprising,
since any superposition of multiple states
is guaranteed to have a higher energy than the ground state.

As our guess $$u$$ is improved, $$\lambda_0$$ converges as $$|c_n|^2$$,
while $$u$$ converges to $$u_0$$ as $$|c_n|$$ by definition,
so even a fairly bad ansatz $$u$$ gives a decent estimate for $$\lambda_0$$.


## The method

In the following, we stick to Dirac notation,
since the results hold for both continuous functions $$u(x)$$
and discrete vectors $$\vb{u}$$,
as long as the operator $$\hat{L}$$ is self-adjoint.
Suppose we express our guess $$\Ket{u}$$ as a linear combination
of *known* basis vectors $$\Ket{f_n}$$ with weights $$a_n \in \mathbb{C}$$,
where $$\Ket{f_n}$$ are not necessarily eigenvectors of $$\hat{L}$$:

$$\begin{aligned}
    \Ket{u}
    = \sum_{n = 0}^\infty c_n \Ket{u_n}
    = \sum_{n = 0}^\infty a_n \Ket{f_n}
    \approx \sum_{n = 0}^{N - 1} a_n \Ket{f_n}
\end{aligned}$$

For numerical tractability, we truncate the sum at $$N$$ terms,
and for generality we allow $$\Ket{f_n}$$ to be non-orthogonal,
as described by an *overlap matrix* with elements $$S_{mn}$$:

$$\begin{aligned}
    \inprod{f_m}{w f_n} = S_{m n}
\end{aligned}$$

From the discussion above,
we know that the ground-state eigenvalue $$\lambda_0$$ is estimated by:

$$\begin{aligned}
    \lambda_0
    \approx R[u]
    = - \frac{\inprod{u}{\hat{L} u}}{\inprod{u}{w u}}
    = - \frac{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{\hat{L} f_n}}{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{w f_n}}
    \equiv - \frac{\displaystyle \sum_{m n} a_m^* a_n L_{m n}}{\displaystyle \sum_{m n} a_m^* a_n S_{mn}}
\end{aligned}$$

And we also know that our goal is to minimize $$R[u]$$,
so we vary $$a_k^*$$ to find its extremum:

$$\begin{aligned}
    0
    = - \pdv{R}{a_k^*}
    &= \frac{\displaystyle \Big( \sum_{n} a_n L_{k n} \Big) \Big( \sum_{m n} a_n a_m^* S_{mn} \Big)
    - \Big( \sum_{n} a_n S_{k n} \Big) \Big( \sum_{m n} a_n a_m^* L_{mn} \Big)}
    {\Big( \displaystyle \sum_{m n} a_n a_m^* S_{mn} \Big)^2}
    \\
    &= \frac{\displaystyle \Big( \sum_{n} a_n L_{k n} \Big) - R[u] \Big( \sum_{n} a_n S_{k n}\Big)}
    {\displaystyle \sum_{m n} a_n a_m^* S_{mn}}
    \\
    &= \sum_{n} a_n \frac{\big(L_{k n} - \lambda S_{k n}\big)}{\inprod{u}{w u}}
\end{aligned}$$

Clearly, this is only satisfied if the following holds for all $$k = 0, 1, ..., N\!-\!1$$:

$$\begin{aligned}
    0
    = \sum_{n = 0}^{N - 1} a_n \big(L_{k n} - \lambda S_{k n}\big)
\end{aligned}$$

For illustrative purposes,
we can write this as a matrix equation
with $$M_{k n} \equiv L_{k n} - \lambda S_{k n}$$:

$$\begin{aligned}
    \begin{bmatrix}
        M_{0,0} & M_{0,1} & \cdots & M_{0,N-1} \\
        M_{1,0} & \ddots & & \vdots \\
        \vdots & & \ddots & \vdots \\
        M_{N-1,0} & \cdots & \cdots & M_{N-1,N-1}
    \end{bmatrix}
    \cdot
    \begin{bmatrix}
        a_0 \\ a_1 \\ \vdots \\ a_{N-1}
    \end{bmatrix}
    =
    \begin{bmatrix}
        0 \\ 0 \\ \vdots \\ 0
    \end{bmatrix}
\end{aligned}$$

This looks like an eigenvalue problem for $$\lambda$$,
so we demand that its determinant vanishes:

$$\begin{aligned}
    0
    = \det\!\Big[ \bar{M} \Big]
    = \det\!\Big[ \bar{L} - \lambda \bar{S} \Big]
\end{aligned}$$

This gives a set of $$\lambda$$,
which are exact eigenvalues of $$\bar{L}$$,
and estimated eigenvalues of $$\hat{L}$$
(recall that $$\bar{L}$$ is $$\hat{L}$$ expressed in a truncated basis).
The eigenvector $$\big[ a_0, a_1, ..., a_{N-1} \big]$$ of the lowest $$\lambda$$
gives the optimal weights $$a_n$$ to approximate $$\Ket{u_0}$$ in the basis $$\{\Ket{f_n}\}$$.
Likewise, the higher $$\lambda$$s' eigenvectors approximate
excited (i.e. non-ground) eigenstates of $$\hat{L}$$,
although in practice the results become less accurate the higher we go.
If we only care about the ground state,
then we already know $$\lambda$$ from $$R[u]$$,
so we just need to solve the matrix equation for $$a_n$$.

You may find this result unsurprising:
it makes some intuitive sense that approximating $$\hat{L}$$
in a limited basis would yield a matrix $$\bar{L}$$ giving rough eigenvalues.
The point of this discussion is to rigorously show
the validity of this approach.

Nowadays, there exist many other methods to calculate eigenvalues
of complicated operators $$\hat{L}$$,
but an attractive feature of the Ritz method is that it is single-step,
whereas its competitors tend to be iterative.
That said, this method cannot recover from a poorly chosen basis $$\{\Ket{f_n}\}$$.

Indeed, the overall accuracy is determined by how good our truncated basis is,
i.e. how large a subspace it spans
of the [Hilbert space](/know/concept/hilbert-space/) in which the true $$\Ket{u_0}$$ resides.
Clearly, adding more basis vectors improves the results,
but at a computational cost;
it is usually more efficient to carefully choose *which* $$\ket{f_n}$$ to use,
rather than just *how many*.


## References
1.  G.B. Arfken, H.J. Weber,
    *Mathematical methods for physicists*, 6th edition, 2005,
    Elsevier.
2.  O. Bang,
    *Applied mathematics for physicists: lecture notes*, 2019,
    unpublished.