summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--source/know/concept/electromagnetic-wave-equation/index.md347
-rw-r--r--source/know/concept/martingale/index.md2
-rw-r--r--source/know/concept/ritz-method/index.md201
3 files changed, 298 insertions, 252 deletions
diff --git a/source/know/concept/electromagnetic-wave-equation/index.md b/source/know/concept/electromagnetic-wave-equation/index.md
index a27fe6f..559d943 100644
--- a/source/know/concept/electromagnetic-wave-equation/index.md
+++ b/source/know/concept/electromagnetic-wave-equation/index.md
@@ -1,7 +1,7 @@
---
title: "Electromagnetic wave equation"
sort_title: "Electromagnetic wave equation"
-date: 2021-09-09
+date: 2024-09-08 # Originally 2021-09-09, major rewrite
categories:
- Physics
- Electromagnetism
@@ -9,236 +9,281 @@ categories:
layout: "concept"
---
-The electromagnetic wave equation describes
-the propagation of light through various media.
-Since an electromagnetic (light) wave consists of
+Light, i.e. **electromagnetic waves**, consist of
an [electric field](/know/concept/electric-field/)
and a [magnetic field](/know/concept/magnetic-field/),
-we need [Maxwell's equations](/know/concept/maxwells-equations/)
-in order to derive the wave equation.
+one inducing the other and vice versa.
+The existence and classical behavior of such waves
+can be derived using only [Maxwell's equations](/know/concept/maxwells-equations/),
+as we will demonstrate here.
-
-## Uniform medium
-
-We will use all of Maxwell's equations,
-but we start with Ampère's circuital law for the "free" fields $$\vb{H}$$ and $$\vb{D}$$,
-in the absence of a free current $$\vb{J}_\mathrm{free} = 0$$:
-
-$$\begin{aligned}
- \nabla \cross \vb{H}
- = \pdv{\vb{D}}{t}
-\end{aligned}$$
-
-We assume that the medium is isotropic, linear,
-and uniform in all of space, such that:
+We start from Faraday's law of induction,
+where we assume that the system consists of materials
+with well-known (linear) relative magnetic permeabilities $$\mu_r(\vb{r})$$,
+such that $$\vb{B} = \mu_0 \mu_r \vb{H}$$:
$$\begin{aligned}
- \vb{D} = \varepsilon_0 \varepsilon_r \vb{E}
- \qquad \quad
- \vb{H} = \frac{1}{\mu_0 \mu_r} \vb{B}
+ \nabla \cross \vb{E}
+ = - \pdv{\vb{B}}{t}
+ = - \mu_0 \mu_r \pdv{\vb{H}}{t}
\end{aligned}$$
-Which, upon insertion into Ampère's law,
-yields an equation relating $$\vb{B}$$ and $$\vb{E}$$.
-This may seem to contradict Ampère's "total" law,
-but keep in mind that $$\vb{J}_\mathrm{bound} \neq 0$$ here:
+We move $$\mu_r(\vb{r})$$ to the other side,
+take the curl, and insert Ampère's circuital law:
$$\begin{aligned}
- \nabla \cross \vb{B}
- = \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdv{\vb{E}}{t}
+ \nabla \cross \bigg( \frac{1}{\mu_r} \nabla \cross \vb{E} \bigg)
+ &= - \mu_0 \pdv{}{t} \big( \nabla \cross \vb{H} \big)
+ \\
+ &= - \mu_0 \bigg( \pdv{\vb{J}_\mathrm{free}}{t} + \pdvn{2}{\vb{D}}{t} \bigg)
\end{aligned}$$
-Now we take the curl, rearrange,
-and substitute $$\nabla \cross \vb{E}$$ according to Faraday's law:
+For simplicity, we only consider insulating materials,
+since light propagation in conductors is a complex beast.
+We thus assume that there are no free currents $$\vb{J}_\mathrm{free} = 0$$, leaving:
$$\begin{aligned}
- \nabla \cross (\nabla \cross \vb{B})
- = \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdv{}{t}(\nabla \cross \vb{E})
- = - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{B}}{t}
+ \nabla \cross \bigg( \frac{1}{\mu_r} \nabla \cross \vb{E} \bigg)
+ &= - \mu_0 \pdvn{2}{\vb{D}}{t}
\end{aligned}$$
-Using a vector identity, we rewrite the leftmost expression,
-which can then be reduced thanks to Gauss' law for magnetism $$\nabla \cdot \vb{B} = 0$$:
+Having $$\vb{E}$$ and $$\vb{D}$$ in the same equation is not ideal,
+so we should make a choice:
+do we restrict ourselves to linear media
+(so $$\vb{D} = \varepsilon_0 \varepsilon_r \vb{E}$$),
+or do we allow materials with more complicated responses
+(so $$\vb{D} = \varepsilon_0 \vb{E} + \vb{P}$$, with $$\vb{P}$$ unspecified)?
+The former is usually sufficient:
$$\begin{aligned}
- - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{B}}{t}
- &= \nabla (\nabla \cdot \vb{B}) - \nabla^2 \vb{B}
- = - \nabla^2 \vb{B}
+ \boxed{
+ \nabla \cross \bigg( \frac{1}{\mu_r} \nabla \cross \vb{E} \bigg)
+ = - \mu_0 \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{E}}{t}
+ }
\end{aligned}$$
-This describes $$\vb{B}$$.
-Next, we repeat the process for $$\vb{E}$$:
-taking the curl of Faraday's law yields:
+This is the general linear form of the **electromagnetic wave equation**,
+where $$\mu_r$$ and $$\varepsilon_r$$
+both depend on $$\vb{r}$$ in order to describe the structure of the system.
+We can obtain a similar equation for $$\vb{H}$$,
+by starting from Ampère's law under the same assumptions:
$$\begin{aligned}
- \nabla \cross (\nabla \cross \vb{E})
- = - \pdv{}{t}(\nabla \cross \vb{B})
- = - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{E}}{t}
+ \nabla \cross \vb{H}
+ = \pdv{\vb{D}}{t}
+ = \varepsilon_0 \varepsilon_r \pdv{\vb{E}}{t}
\end{aligned}$$
-Which can be rewritten using same vector identity as before,
-and then reduced by assuming that there is no net charge density $$\rho = 0$$
-in Gauss' law, such that $$\nabla \cdot \vb{E} = 0$$:
+Taking the curl and substituting Faraday's law on the right yields:
$$\begin{aligned}
- - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{E}}{t}
- &= \nabla (\nabla \cdot \vb{E}) - \nabla^2 \vb{E}
- = - \nabla^2 \vb{E}
+ \nabla \cross \bigg( \frac{1}{\varepsilon_r} \nabla \cross \vb{H} \bigg)
+ &= \varepsilon_0 \pdv{}{t} \big( \nabla \cross \vb{E} \big)
+ = - \varepsilon_0 \pdvn{2}{\vb{B}}{t}
\end{aligned}$$
-We thus arrive at the following two (implicitly coupled)
-wave equations for $$\vb{E}$$ and $$\vb{B}$$,
-where we have defined the phase velocity $$v \equiv 1 / \sqrt{\mu_0 \mu_r \varepsilon_0 \varepsilon_r}$$:
+And then we insert $$\vb{B} = \mu_0 \mu_r \vb{H}$$ to get the analogous
+electromagnetic wave equation for $$\vb{H}$$:
$$\begin{aligned}
\boxed{
- \pdvn{2}{\vb{E}}{t} - \frac{1}{v^2} \nabla^2 \vb{E}
- = 0
- }
- \qquad \quad
- \boxed{
- \pdvn{2}{\vb{B}}{t} - \frac{1}{v^2} \nabla^2 \vb{B}
- = 0
+ \nabla \cross \bigg( \frac{1}{\varepsilon_r} \nabla \cross \vb{H} \bigg)
+ = - \mu_0 \varepsilon_0 \mu_r \pdvn{2}{\vb{H}}{t}
}
\end{aligned}$$
-Traditionally, it is said that the solutions are as follows,
-where the wavenumber $$|\vb{k}| = \omega / v$$:
-
-$$\begin{aligned}
- \vb{E}(\vb{r}, t)
- &= \vb{E}_0 \exp(i \vb{k} \cdot \vb{r} - i \omega t)
- \\
- \vb{B}(\vb{r}, t)
- &= \vb{B}_0 \exp(i \vb{k} \cdot \vb{r} - i \omega t)
-\end{aligned}$$
-
-In fact, thanks to linearity, these **plane waves** can be treated as
-terms in a Fourier series, meaning that virtually
-*any* function $$f(\vb{k} \cdot \vb{r} - \omega t)$$ is a valid solution.
+This is equivalent to the problem for $$\vb{E}$$,
+since they are coupled by Maxwell's equations.
+By solving either, subject to Gauss's laws
+$$\nabla \cdot (\varepsilon_r \vb{E}) = 0$$ and $$\nabla \cdot (\mu_r \vb{H}) = 0$$,
+the behavior of light in a given system can be deduced.
+Note that Gauss's laws enforce that the wave's fields are transverse,
+i.e. they must be perpendicular to the propagation direction.
-Keep in mind that in reality $$\vb{E}$$ and $$\vb{B}$$ are real,
-so although it is mathematically convenient to use plane waves,
-in the end you will need to take the real part.
-## Non-uniform medium
+## Homogeneous linear media
-A useful generalization is to allow spatial change
-in the relative permittivity $$\varepsilon_r(\vb{r})$$
-and the relative permeability $$\mu_r(\vb{r})$$.
-We still assume that the medium is linear and isotropic, so:
+In the special case where the medium is completely uniform,
+$$\mu_r$$ and $$\varepsilon_r$$ no longer depend on $$\vb{r}$$,
+so they can be moved to the other side:
$$\begin{aligned}
- \vb{D}
- = \varepsilon_0 \varepsilon_r(\vb{r}) \vb{E}
- \qquad \quad
- \vb{B}
- = \mu_0 \mu_r(\vb{r}) \vb{H}
+ \nabla \cross \big( \nabla \cross \vb{E} \big)
+ &= - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{E}}{t}
+ \\
+ \nabla \cross \big( \nabla \cross \vb{H} \big)
+ &= - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{H}}{t}
\end{aligned}$$
-Inserting these expressions into Faraday's and Ampère's laws
-respectively yields:
+This can be rewritten using the vector identity
+$$\nabla \cross (\nabla \cross \vb{V}) = \nabla (\nabla \cdot \vb{V}) - \nabla^2 \vb{V}$$:
$$\begin{aligned}
- \nabla \cross \vb{E}
- = - \mu_0 \mu_r(\vb{r}) \pdv{\vb{H}}{t}
- \qquad \quad
- \nabla \cross \vb{H}
- = \varepsilon_0 \varepsilon_r(\vb{r}) \pdv{\vb{E}}{t}
+ \nabla (\nabla \cdot \vb{E}) - \nabla^2 \vb{E}
+ &= - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{E}}{t}
+ \\
+ \nabla (\nabla \cdot \vb{H}) - \nabla^2 \vb{H}
+ &= - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{H}}{t}
\end{aligned}$$
-We then divide Ampère's law by $$\varepsilon_r(\vb{r})$$,
-take the curl, and substitute Faraday's law, giving:
+Which can be reduced using Gauss's laws
+$$\nabla \cdot \vb{E} = 0$$ and $$\nabla \cdot \vb{H} = 0$$
+thanks to the fact that $$\varepsilon_r$$ and $$\mu_r$$ are constants in this case.
+We therefore arrive at:
$$\begin{aligned}
- \nabla \cross \Big( \frac{1}{\varepsilon_r} \nabla \cross \vb{H} \Big)
- = \varepsilon_0 \pdv{}{t}(\nabla \cross \vb{E})
- = - \mu_0 \mu_r \varepsilon_0 \pdvn{2}{\vb{H}}{t}
+ \boxed{
+ \nabla^2 \vb{E} - \frac{n^2}{c^2} \pdvn{2}{\vb{E}}{t}
+ = 0
+ }
\end{aligned}$$
-Next, we exploit linearity by decomposing $$\vb{H}$$ and $$\vb{E}$$
-into Fourier series, with terms given by:
-
$$\begin{aligned}
- \vb{H}(\vb{r}, t)
- = \vb{H}(\vb{r}) \exp(- i \omega t)
- \qquad \quad
- \vb{E}(\vb{r}, t)
- = \vb{E}(\vb{r}) \exp(- i \omega t)
+ \boxed{
+ \nabla^2 \vb{H} - \frac{n^2}{c^2} \pdvn{2}{\vb{H}}{t}
+ = 0
+ }
\end{aligned}$$
-By inserting this ansatz into the equation,
-we can remove the explicit time dependence:
+Where $$c = 1 / \sqrt{\mu_0 \varepsilon_0}$$ is the speed of light in a vacuum,
+and $$n = \sqrt{\mu_0 \varepsilon_0}$$ is the refractive index of the medium.
+Note that most authors write the magnetic equation with $$\vb{B}$$ instead of $$\vb{H}$$;
+both are correct thanks to linearity.
+
+In a vacuum, where $$n = 1$$, these equations are sometimes written as
+$$\square \vb{E} = 0$$ and $$\square \vb{H} = 0$$,
+where $$\square$$ is the **d'Alembert operator**, defined as follows:
$$\begin{aligned}
- \nabla \cross \Big( \frac{1}{\varepsilon_r} \nabla \cross \vb{H} \Big) \exp(- i \omega t)
- = \mu_0 \varepsilon_0 \omega^2 \mu_r \vb{H} \exp(- i \omega t)
+ \boxed{
+ \square
+ \equiv \nabla^2 - \frac{1}{c^2} \pdvn{2}{}{t}
+ }
\end{aligned}$$
-Dividing out $$\exp(- i \omega t)$$,
-we arrive at an eigenvalue problem for $$\omega^2$$,
-with $$c = 1 / \sqrt{\mu_0 \varepsilon_0}$$:
+Note that some authors define it with the opposite sign.
+In any case, the d'Alembert operator is important for special relativity.
+
+The solution to the homogeneous electromagnetic wave equation
+are traditionally said to be the so-called **plane waves** given by:
$$\begin{aligned}
- \boxed{
- \nabla \cross \Big( \frac{1}{\varepsilon_r(\vb{r})} \nabla \cross \vb{H}(\vb{r}) \Big)
- = \Big( \frac{\omega}{c} \Big)^2 \mu_r(\vb{r}) \vb{H}(\vb{r})
- }
+ \vb{E}(\vb{r}, t)
+ &= \vb{E}_0 e^{i \vb{k} \cdot \vb{r} - i \omega t}
+ \\
+ \vb{B}(\vb{r}, t)
+ &= \vb{B}_0 e^{i \vb{k} \cdot \vb{r} - i \omega t}
\end{aligned}$$
-Compared to a uniform medium, $$\omega$$ is often not arbitrary here:
-there are discrete eigenvalues $$\omega$$,
-corresponding to discrete **modes** $$\vb{H}(\vb{r})$$.
+Where the wavevector $$\vb{k}$$ is arbitrary,
+and the angular frequency $$\omega = c |\vb{k}| / n$$.
+We also often talk about the wavelength, which is $$\lambda = 2 \pi / |\vb{k}|$$.
+The appearance of $$\vb{k}$$ in the exponent
+tells us that these waves are propagating through space,
+as you would expect.
+
+In fact, because the wave equations are linear,
+any superposition of plane waves,
+i.e. any function of the form $$f(\vb{k} \cdot \vb{r} - \omega t)$$,
+is in fact a valid solution.
+Just remember that $$\vb{E}$$ and $$\vb{H}$$ are real-valued,
+so it may be necessary to take the real part at the end of a calculation.
-Next, we go through the same process to find an equation for $$\vb{E}$$.
-Starting from Faraday's law, we divide by $$\mu_r(\vb{r})$$,
-take the curl, and insert Ampère's law:
+
+
+## Inhomogeneous linear media
+
+But suppose the medium is not uniform, i.e. it contains structures
+described by $$\varepsilon_r(\vb{r})$$ and $$\mu_r(\vb{r})$$.
+If the structures are much larger than the light's wavelength,
+the homogeneous equation is still a very good approximation
+away from any material boundaries;
+anywhere else, however, they will break down.
+Recall the general equations from before we assumed homogeneity:
$$\begin{aligned}
- \nabla \cross \Big( \frac{1}{\mu_r} \nabla \cross \vb{E} \Big)
- = - \mu_0 \pdv{}{t}(\nabla \cross \vb{H})
- = - \mu_0 \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{E}}{t}
+ \nabla \cross \bigg( \frac{1}{\mu_r} \nabla \cross \vb{E} \bigg)
+ &= - \frac{\varepsilon_r}{c^2} \pdvn{2}{\vb{E}}{t}
+ \\
+ \nabla \cross \bigg( \frac{1}{\varepsilon_r} \nabla \cross \vb{H} \bigg)
+ &= - \frac{\mu_r}{c^2} \pdvn{2}{\vb{H}}{t}
\end{aligned}$$
-Then, by replacing $$\vb{E}(\vb{r}, t)$$ with our plane-wave ansatz,
-we remove the time dependence:
+In theory, this is everything we need,
+but in most cases a better approach is possible:
+the trick is that we only rarely need to explicitly calculate
+the $$t$$-dependence of $$\vb{E}$$ or $$\vb{H}$$.
+Instead, we can first solve an easier time-independent version
+of this problem, and then approximate the dynamics
+with [coupled mode theory](/know/concept/coupled-mode-theory/) later.
+
+To eliminate $$t$$, we make an ansatz for $$\vb{E}$$ and $$\vb{H}$$, shown below.
+No generality is lost by doing this;
+this is effectively a kind of [Fourier transform](/know/concept/fourier-transform/):
$$\begin{aligned}
- \nabla \cross \Big( \frac{1}{\mu_r} \nabla \cross \vb{E} \Big) \exp(- i \omega t)
- = - \mu_0 \varepsilon_0 \omega^2 \varepsilon_r \vb{E} \exp(- i \omega t)
+ \vb{E}(\vb{r}, t)
+ &= \vb{E}(\vb{r}) e^{- i \omega t}
+ \\
+ \vb{H}(\vb{r}, t)
+ &= \vb{H}(\vb{r}) e^{- i \omega t}
\end{aligned}$$
-Which, after dividing out $$\exp(- i \omega t)$$,
-yields an analogous eigenvalue problem with $$\vb{E}(r)$$:
+Inserting this ansatz and dividing out $$e^{-i \omega t}$$
+yields the time-independent forms:
$$\begin{aligned}
\boxed{
- \nabla \cross \Big( \frac{1}{\mu_r(\vb{r})} \nabla \cross \vb{E}(\vb{r}) \Big)
- = \Big( \frac{\omega}{c} \Big)^2 \varepsilon_r(\vb{r}) \vb{E}(\vb{r})
+ \nabla \cross \bigg( \frac{1}{\mu_r} \nabla \cross \vb{E} \bigg)
+ = \Big( \frac{\omega}{c} \Big)^2 \varepsilon_r \vb{E}
}
\end{aligned}$$
-Usually, it is a reasonable approximation
-to say $$\mu_r(\vb{r}) = 1$$,
-in which case the equation for $$\vb{H}(\vb{r})$$
-becomes a Hermitian eigenvalue problem,
-and is thus easier to solve than for $$\vb{E}(\vb{r})$$.
+$$\begin{aligned}
+ \boxed{
+ \nabla \cross \bigg( \frac{1}{\varepsilon_r} \nabla \cross \vb{H} \bigg)
+ = \Big( \frac{\omega}{c} \Big)^2 \mu_r \vb{H}
+ }
+\end{aligned}$$
-Keep in mind, however, that in any case,
-the solutions $$\vb{H}(\vb{r})$$ and/or $$\vb{E}(\vb{r})$$
-must satisfy the two Maxwell's equations that were not explicitly used:
+These are eigenvalue problems for $$\omega^2$$,
+which can be solved subject to Gauss's laws and suitable boundary conditions.
+The resulting allowed values of $$\omega$$ may consist of
+continuous ranges and/or discrete resonances,
+analogous to *scattering* and *bound* quantum states, respectively.
+It can be shown that the operators on both sides of each equation
+are Hermitian, meaning these are well-behaved problems
+yielding real eigenvalues and orthogonal eigenfields.
+
+Both equations are still equivalent:
+we only need to solve one. But which one?
+In practice, one is usually easier than the other,
+due to the common approximation that $$\mu_r \approx 1$$ for many dielectric materials,
+in which case the equations reduce to:
$$\begin{aligned}
- \nabla \cdot (\varepsilon_r \vb{E}) = 0
- \qquad \quad
- \nabla \cdot (\mu_r \vb{H}) = 0
+ \nabla \cross \big( \nabla \cross \vb{E} \big)
+ &= \Big( \frac{\omega}{c} \Big)^2 \varepsilon_r \vb{E}
+ \\
+ \nabla \cross \bigg( \frac{1}{\varepsilon_r} \nabla \cross \vb{H} \bigg)
+ &= \Big( \frac{\omega}{c} \Big)^2 \vb{H}
\end{aligned}$$
-This is equivalent to demanding that the resulting waves are *transverse*,
-or in other words,
-the wavevector $$\vb{k}$$ must be perpendicular to
-the amplitudes $$\vb{H}_0$$ and $$\vb{E}_0$$.
+Now the equation for $$\vb{H}$$ is starting to look simpler,
+because it only has an operator on *one* side.
+We could "fix" the equation for $$\vb{E}$$ by dividing it by $$\varepsilon_r$$,
+but the resulting operator would no longer be Hermitian,
+and hence not well-behaved.
+To get an idea of how to handle $$\varepsilon_r$$ in the $$\vb{E}$$-equation,
+notice its similarity to the weight function $$w$$
+in [Sturm-Liouville theory](/know/concept/sturm-liouville-theory/).
+
+Gauss's magnetic law $$\nabla \cdot \vb{H} = 0$$
+is also significantly easier for numerical calculations
+than its electric counterpart $$\nabla \cdot (\varepsilon_r \vb{E}) = 0$$,
+so we usually prefer to solve the equation for $$\vb{H}$$.
+
## References
diff --git a/source/know/concept/martingale/index.md b/source/know/concept/martingale/index.md
index 53a346a..7daebea 100644
--- a/source/know/concept/martingale/index.md
+++ b/source/know/concept/martingale/index.md
@@ -20,7 +20,7 @@ then $$M_t$$ is a martingale if it satisfies all of the following:
1. $$M_t$$ is $$\mathcal{F}_t$$-adapted, meaning
the filtration $$\mathcal{F}_t$$ contains enough information
to reconstruct the current and all past values of $$M_t$$.
-2. For all times $$t \ge 0$$, the expectation value exists $$\mathbf{E}(M_t) < \infty$$.
+2. For all times $$t \ge 0$$, the expectation value $$\mathbf{E}(M_t)$$ is finite.
3. For all $$s, t$$ satisfying $$0 \le s \le t$$,
the [conditional expectation](/know/concept/conditional-expectation/)
$$\mathbf{E}(M_t | \mathcal{F}_s) = M_s$$,
diff --git a/source/know/concept/ritz-method/index.md b/source/know/concept/ritz-method/index.md
index 902b7cf..ef694da 100644
--- a/source/know/concept/ritz-method/index.md
+++ b/source/know/concept/ritz-method/index.md
@@ -25,25 +25,26 @@ consider the following functional to be optimized:
$$\begin{aligned}
R[u]
- = \frac{1}{S} \int_a^b p(x) \big|u_x(x)\big|^2 - q(x) \big|u(x)\big|^2 \dd{x}
+ \equiv \frac{1}{S} \int_a^b p(x) \big|u_x(x)\big|^2 - q(x) \big|u(x)\big|^2 \dd{x}
\end{aligned}$$
Where $$u(x) \in \mathbb{C}$$ is the unknown function,
and $$p(x), q(x) \in \mathbb{R}$$ are given.
-In addition, $$S$$ is the norm of $$u$$, which we demand be constant
+In addition, $$S$$ is the norm of $$u$$, which we take to be constant
with respect to a weight function $$w(x) \in \mathbb{R}$$:
$$\begin{aligned}
S
- = \int_a^b w(x) \big|u(x)\big|^2 \dd{x}
+ \equiv \int_a^b w(x) \big|u(x)\big|^2 \dd{x}
\end{aligned}$$
-To handle this normalization requirement,
-we introduce a [Lagrange multiplier](/know/concept/lagrange-multiplier/) $$\lambda$$,
-and define the Lagrangian $$\Lambda$$ for the full constrained optimization problem as:
+This normalization requirement acts as a constraint
+to the optimization problem for $$R[u]$$,
+so we introduce a [Lagrange multiplier](/know/concept/lagrange-multiplier/) $$\lambda$$,
+and define the Lagrangian $$\mathcal{L}$$ for the full problem as:
$$\begin{aligned}
- \Lambda
+ \mathcal{L}
\equiv \frac{1}{S} \bigg( \big( p |u_x|^2 - q |u|^2 \big) - \lambda \big( w |u|^2 \big) \bigg)
\end{aligned}$$
@@ -51,7 +52,7 @@ The resulting Euler-Lagrange equation is then calculated in the standard way, yi
$$\begin{aligned}
0
- &= \pdv{\Lambda}{u^*} - \dv{}{x}\Big( \pdv{\Lambda}{u_x^*} \Big)
+ &= \pdv{\mathcal{L}}{u^*} - \dv{}{x}\Big( \pdv{\mathcal{L}}{u_x^*} \Big)
\\
&= - \frac{1}{S} \bigg( q u + \lambda w u + \dv{}{x}\big( p u_x \big) \bigg)
\end{aligned}$$
@@ -69,15 +70,14 @@ SLPs have useful properties, but before we can take advantage of those,
we need to handle an important detail: the boundary conditions (BCs) on $$u$$.
The above equation is only a valid SLP for certain BCs,
as seen in the derivation of Sturm-Liouville theory.
-
-Let us return to the definition of $$R[u]$$,
+Let us return to the definition of $$R$$,
and integrate it by parts:
$$\begin{aligned}
R[u]
&= \frac{1}{S} \int_a^b p u_x u_x^* - q u u^* \dd{x}
\\
- &= \frac{1}{S} \Big[ p u_x u^* \Big]_a^b - \frac{1}{S} \int_a^b \dv{}{x}\Big(p u_x\Big) u^* + q u u^* \dd{x}
+ &= \frac{1}{S} \Big[ p u_x u^* \Big]_a^b - \frac{1}{N} \int_a^b \dv{}{x}\Big(p u_x\Big) u^* + q u u^* \dd{x}
\end{aligned}$$
The boundary term vanishes for a subset of the BCs that make a valid SLP,
@@ -88,10 +88,11 @@ such that we can use Sturm-Liouville theory later:
$$\begin{aligned}
R[u]
&= - \frac{1}{S} \int_a^b \bigg( \dv{}{x}\Big(p u_x\Big) + q u \bigg) u^* \dd{x}
- \equiv - \frac{1}{S} \int_a^b u^* \hat{H} u \dd{x}
+ \\
+ &\equiv - \frac{1}{S} \int_a^b u^* \hat{L} u \dd{x}
\end{aligned}$$
-Where $$\hat{H}$$ is the self-adjoint Sturm-Liouville operator.
+Where $$\hat{L}$$ is the self-adjoint Sturm-Liouville operator.
Because the constrained Euler-Lagrange equation is now an SLP,
we know that it has an infinite number of real discrete eigenvalues $$\lambda_n$$ with a lower bound,
corresponding to mutually orthogonal eigenfunctions $$u_n(x)$$.
@@ -102,16 +103,16 @@ and now insert one of the eigenfunctions $$u_n$$ into $$R$$:
$$\begin{aligned}
R[u_n]
- &= - \frac{1}{S_n} \int_a^b u_n^* \hat{H} u_n \dd{x}
- = \frac{1}{S_n} \int_a^b u_n^* \lambda_n w u_n \dd{x}
+ &= - \frac{1}{S_n} \int_a^b u_n^* \hat{L} u_n \dd{x}
+ \\
+ &= \frac{1}{S_n} \int_a^b \lambda_n w |u_n|^2 \dd{x}
\\
- &= \frac{1}{S_n} \lambda_n \int_a^b w |u_n|^2 \dd{x}
- = \frac{S_n}{S_n} \lambda_n
+ &= \frac{S_n}{S_n} \lambda_n
\end{aligned}$$
Where $$S_n$$ is the normalization of $$u_n$$.
-In other words, when given $$u_n$$,
-the functional $$R$$ yields the corresponding eigenvalue $$\lambda_n$$:
+In other words, when given $$u_n$$ as input,
+the functional $$R$$ returns the corresponding eigenvalue $$\lambda_n$$:
$$\begin{aligned}
\boxed{
@@ -121,6 +122,11 @@ $$\begin{aligned}
\end{aligned}$$
This powerful result was not at all clear from $$R$$'s initial definition.
+Note that some authors use the opposite sign for $$\lambda$$ in their SLP definition,
+in which case this result can still be obtained
+simply by also defining $$R$$ with the opposite sign.
+This sign choice is consistent with quantum mechanics,
+with the Hamiltonian $$\hat{H} = - \hat{L}$$.
@@ -137,81 +143,79 @@ $$\begin{aligned}
Here, we are using the fact that the eigenfunctions of an SLP form a complete set,
so our (known) guess $$u$$ can be expanded in the true (unknown) eigenfunctions $$u_n$$.
-We are assuming that $$u$$ is already quite close to its target $$u_0$$,
-such that the (unknown) expansion coefficients $$c_n$$ are small;
-specifically $$|c_n|^2 \ll 1$$.
-Let us start from what we know:
+Next, by definition:
$$\begin{aligned}
\boxed{
R[u]
- = - \frac{\displaystyle\int u^* \hat{H} u \dd{x}}{\displaystyle\int u^* w u \dd{x}}
+ = - \frac{\displaystyle\int u^* \hat{L} u \dd{x}}{\displaystyle\int u^* w u \dd{x}}
}
\end{aligned}$$
-This quantity is known as the **Rayleigh quotient**.
+This quantity is known as the **Rayleigh quotient**,
+and again beware of the sign in its definition; see the remark above.
Inserting our ansatz $$u$$,
-and using that the true $$u_n$$ have corresponding eigenvalues $$\lambda_n$$:
+and using that the true $$u_n$$ have corresponding eigenvalues $$\lambda_n$$,
+we have:
$$\begin{aligned}
R[u]
- &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \: \hat{H} \Big\{ u_0 + \sum_n c_n u_n \Big\} \dd{x}}
+ &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \: \hat{L} \Big\{ u_0 + \sum_n c_n u_n \Big\} \dd{x}}
{\displaystyle\int w \Big( u_0 + \sum_n c_n u_n \Big) \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \dd{x}}
\\
- &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( \!-\! \lambda_0 w u_0 - \sum_n c_n \lambda_n w u_n \Big) \dd{x}}
+ &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big)
+ \Big( \!-\! \lambda_0 w u_0 - \sum_n c_n \lambda_n w u_n \Big) \dd{x}}
{\displaystyle\int w \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( u_0 + \sum_n c_n u_n \Big) \dd{x}}
\end{aligned}$$
For convenience, we switch to [Dirac notation](/know/concept/dirac-notation/)
-before evaluating further.
+before evaluating further:
$$\begin{aligned}
- R
- &= \frac{\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \cdot \Big( \lambda_0 \Ket{w u_0} + \sum_n c_n \lambda_n \Ket{w u_n} \Big)}
- {\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \cdot \Big( \Ket{w u_0} + \sum_n c_n \Ket{w u_n} \Big)}
+ R[u]
+ &= \frac{\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big)
+ \Big( \lambda_0 \Ket{w u_0} + \sum_n c_n \lambda_n \Ket{w u_n} \Big)}
+ {\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \Big( \Ket{w u_0} + \sum_n c_n \Ket{w u_n} \Big)}
\\
- &= \frac{\displaystyle \lambda_0 \Inprod{u_0}{w u_0} + \lambda_0 \sum_{n = 1}^\infty c_n^* \Inprod{u_n}{w u_0}
- + \sum_{n = 1}^\infty c_n \lambda_n \Inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \lambda_n \Inprod{u_m}{w u_n}}
- {\displaystyle \Inprod{u_0}{w u_0} + \sum_{n = 1}^\infty c_n^* \Inprod{u_n}{w u_0}
- + \sum_{n = 1}^\infty c_n \Inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \Inprod{u_m}{w u_n}}
+ &= \frac{\displaystyle \lambda_0 \inprod{u_0}{w u_0} + \lambda_0 \sum_{n} c_n^* \inprod{u_n}{w u_0}
+ + \sum_{n} c_n \lambda_n \inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \lambda_n \inprod{u_m}{w u_n}}
+ {\displaystyle \inprod{u_0}{w u_0} + \sum_{n} c_n^* \inprod{u_n}{w u_0}
+ + \sum_{n} c_n \inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \inprod{u_m}{w u_n}}
\end{aligned}$$
-Using orthogonality $$\Inprod{u_m}{w u_n} = S_n \delta_{mn}$$,
+Using orthogonality $$\inprod{u_m}{w u_n} = S_n \delta_{mn}$$,
and the fact that $$n \neq 0$$ by definition, we find:
$$\begin{aligned}
- R
+ R[u]
&= \frac{\displaystyle \lambda_0 S_0 + \lambda_0 \sum_n c_n^* S_n \delta_{n0}
+ \sum_n c_n \lambda_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* \lambda_n S_n \delta_{mn}}
{\displaystyle S_0 + \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* S_n \delta_{mn}}
\\
- &= \frac{\displaystyle \lambda_0 S_0 + 0 + 0 + \sum_{n} c_n c_n^* \lambda_n S_n}
- {\displaystyle S_0 + 0 + 0 + \sum_{n} c_n c_n^* S_n}
- = \frac{\displaystyle \lambda_0 S_0 + \sum_{n} |c_n|^2 \lambda_n S_n}
+ &= \frac{\displaystyle \lambda_0 S_0 + \sum_{n} |c_n|^2 \lambda_n S_n}
{\displaystyle S_0 + \sum_{n} |c_n|^2 S_n}
\end{aligned}$$
It is always possible to choose our normalizations such that $$S_n = S$$ for all $$u_n$$, leaving:
$$\begin{aligned}
- R
- &= \frac{\displaystyle \lambda_0 S + \sum_{n} |c_n|^2 \lambda_n S}
- {\displaystyle S + \sum_{n} |c_n|^2 S}
- = \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_n}
+ R[u]
+ &= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_n}
{\displaystyle 1 + \sum_{n} |c_n|^2}
\end{aligned}$$
And finally, after rearranging the numerator, we arrive at the following relation:
$$\begin{aligned}
- R
+ R[u]
&= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_0 + \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)}
{\displaystyle 1 + \sum_{n} |c_n|^2}
- = \lambda_0 + \frac{\displaystyle \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)}
+ \\
+ &= \lambda_0 + \frac{\displaystyle \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)}
{\displaystyle 1 + \sum_{n} |c_n|^2}
\end{aligned}$$
-Thus, if we improve our guess $$u$$,
+Thus, if we improve our guess $$u$$ (i.e. reduce $$|c_n|$$),
then $$R[u]$$ approaches the true eigenvalue $$\lambda_0$$.
For numerically finding $$u_0$$ and $$\lambda_0$$, this gives us a clear goal: minimize $$R$$, because:
@@ -228,19 +232,21 @@ In the context of quantum mechanics, this is not surprising,
since any superposition of multiple states
is guaranteed to have a higher energy than the ground state.
-Note that the convergence to $$\lambda_0$$ goes as $$|c_n|^2$$,
+As our guess $$u$$ is improved, $$\lambda_0$$ converges as $$|c_n|^2$$,
while $$u$$ converges to $$u_0$$ as $$|c_n|$$ by definition,
-so even a fairly bad guess $$u$$ will give a decent estimate for $$\lambda_0$$.
+so even a fairly bad ansatz $$u$$ gives a decent estimate for $$\lambda_0$$.
## The method
In the following, we stick to Dirac notation,
-since the results hold for both continuous functions $$u(x)$$ and discrete vectors $$\vb{u}$$,
-as long as the operator $$\hat{H}$$ is self-adjoint.
+since the results hold for both continuous functions $$u(x)$$
+and discrete vectors $$\vb{u}$$,
+as long as the operator $$\hat{L}$$ is self-adjoint.
Suppose we express our guess $$\Ket{u}$$ as a linear combination
-of *known* basis vectors $$\Ket{f_n}$$ with weights $$a_n \in \mathbb{C}$$:
+of *known* basis vectors $$\Ket{f_n}$$ with weights $$a_n \in \mathbb{C}$$,
+where $$\Ket{f_n}$$ are not necessarily eigenvectors of $$\hat{L}$$:
$$\begin{aligned}
\Ket{u}
@@ -250,11 +256,11 @@ $$\begin{aligned}
\end{aligned}$$
For numerical tractability, we truncate the sum at $$N$$ terms,
-and for generality, we allow $$\Ket{f_n}$$ to be non-orthogonal,
+and for generality we allow $$\Ket{f_n}$$ to be non-orthogonal,
as described by an *overlap matrix* with elements $$S_{mn}$$:
$$\begin{aligned}
- \Inprod{f_m}{w f_n} = S_{m n}
+ \inprod{f_m}{w f_n} = S_{m n}
\end{aligned}$$
From the discussion above,
@@ -262,11 +268,10 @@ we know that the ground-state eigenvalue $$\lambda_0$$ is estimated by:
$$\begin{aligned}
\lambda_0
- \approx \lambda
- = R[u]
- = \frac{\inprod{u}{\hat{H} u}}{\Inprod{u}{w u}}
- = \frac{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{\hat{H} f_n}}{\displaystyle \sum_{m n} a_m^* a_n \Inprod{f_m}{w f_n}}
- \equiv \frac{\displaystyle \sum_{m n} a_m^* a_n H_{m n}}{\displaystyle \sum_{m n} a_m^* a_n S_{mn}}
+ \approx R[u]
+ = - \frac{\inprod{u}{\hat{L} u}}{\inprod{u}{w u}}
+ = - \frac{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{\hat{L} f_n}}{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{w f_n}}
+ \equiv - \frac{\displaystyle \sum_{m n} a_m^* a_n L_{m n}}{\displaystyle \sum_{m n} a_m^* a_n S_{mn}}
\end{aligned}$$
And we also know that our goal is to minimize $$R[u]$$,
@@ -274,25 +279,27 @@ so we vary $$a_k^*$$ to find its extremum:
$$\begin{aligned}
0
- = \pdv{R}{a_k^*}
- &= \frac{\displaystyle \Big( \sum_{n} a_n H_{k n} \Big) \Big( \sum_{m n} a_n a_m^* S_{mn} \Big)
- - \Big( \sum_{n} a_n S_{k n} \Big) \Big( \sum_{m n} a_n a_m^* H_{mn} \Big)}
+ = - \pdv{R}{a_k^*}
+ &= \frac{\displaystyle \Big( \sum_{n} a_n L_{k n} \Big) \Big( \sum_{m n} a_n a_m^* S_{mn} \Big)
+ - \Big( \sum_{n} a_n S_{k n} \Big) \Big( \sum_{m n} a_n a_m^* L_{mn} \Big)}
{\Big( \displaystyle \sum_{m n} a_n a_m^* S_{mn} \Big)^2}
\\
- &= \frac{\displaystyle \Big( \sum_{n} a_n H_{k n} \Big) - R[u] \Big( \sum_{n} a_n S_{k n}\Big)}{\Inprod{u}{w u}}
- = \frac{\displaystyle \sum_{n} a_n \big(H_{k n} - \lambda S_{k n}\big)}{\Inprod{u}{w u}}
+ &= \frac{\displaystyle \Big( \sum_{n} a_n L_{k n} \Big) - R[u] \Big( \sum_{n} a_n S_{k n}\Big)}
+ {\displaystyle \sum_{m n} a_n a_m^* S_{mn}}
+ \\
+ &= \sum_{n} a_n \frac{\big(L_{k n} - \lambda S_{k n}\big)}{\inprod{u}{w u}}
\end{aligned}$$
Clearly, this is only satisfied if the following holds for all $$k = 0, 1, ..., N\!-\!1$$:
$$\begin{aligned}
0
- = \sum_{n = 0}^{N - 1} a_n \big(H_{k n} - \lambda S_{k n}\big)
+ = \sum_{n = 0}^{N - 1} a_n \big(L_{k n} - \lambda S_{k n}\big)
\end{aligned}$$
For illustrative purposes,
we can write this as a matrix equation
-with $$M_{k n} \equiv H_{k n} - \lambda S_{k n}$$:
+with $$M_{k n} \equiv L_{k n} - \lambda S_{k n}$$:
$$\begin{aligned}
\begin{bmatrix}
@@ -311,53 +318,47 @@ $$\begin{aligned}
\end{bmatrix}
\end{aligned}$$
-Note that this looks like an eigenvalue problem for $$\lambda$$.
-Indeed, demanding that $$\overline{M}$$ cannot simply be inverted
-(i.e. the solution is non-trivial)
-yields a characteristic polynomial for $$\lambda$$:
+This looks like an eigenvalue problem for $$\lambda$$,
+so we demand that its determinant vanishes:
$$\begin{aligned}
0
- = \det\!\Big[ \overline{M} \Big]
- = \det\!\Big[ \overline{H} - \lambda \overline{S} \Big]
+ = \det\!\Big[ \bar{M} \Big]
+ = \det\!\Big[ \bar{L} - \lambda \bar{S} \Big]
\end{aligned}$$
This gives a set of $$\lambda$$,
-which are the exact eigenvalues of $$\overline{H}$$,
-and the estimated eigenvalues of $$\hat{H}$$
-(recall that $$\overline{H}$$ is $$\hat{H}$$ expressed in a truncated basis).
+which are exact eigenvalues of $$\bar{L}$$,
+and estimated eigenvalues of $$\hat{L}$$
+(recall that $$\bar{L}$$ is $$\hat{L}$$ expressed in a truncated basis).
The eigenvector $$\big[ a_0, a_1, ..., a_{N-1} \big]$$ of the lowest $$\lambda$$
-gives the optimal weights to approximate $$\Ket{u_0}$$ in the basis $$\{\Ket{f_n}\}$$.
-Likewise, the higher $$\lambda$$'s eigenvectors approximate
-excited (i.e. non-ground) eigenstates of $$\hat{H}$$,
-although in practice the results are less accurate the higher we go.
+gives the optimal weights $$a_n$$ to approximate $$\Ket{u_0}$$ in the basis $$\{\Ket{f_n}\}$$.
+Likewise, the higher $$\lambda$$s' eigenvectors approximate
+excited (i.e. non-ground) eigenstates of $$\hat{L}$$,
+although in practice the results become less accurate the higher we go.
+If we only care about the ground state,
+then we already know $$\lambda$$ from $$R[u]$$,
+so we just need to solve the matrix equation for $$a_n$$.
-The overall accuracy is determined by how good our truncated basis is,
-i.e. how large a subspace it spans
-of the [Hilbert space](/know/concept/hilbert-space/) in which the true $$\Ket{u_0}$$ resides.
-Clearly, adding more basis vectors will improve the results,
-at the cost of computation.
-For example, if $$\hat{H}$$ represents a helium atom,
-a good choice for $$\{\Ket{f_n}\}$$ would be hydrogen orbitals,
-since those are qualitatively similar.
-
-You may find this result unsurprising;
-it makes some intuitive sense that approximating $$\hat{H}$$
-in a limited basis would yield a matrix $$\overline{H}$$ giving rough eigenvalues.
+You may find this result unsurprising:
+it makes some intuitive sense that approximating $$\hat{L}$$
+in a limited basis would yield a matrix $$\bar{L}$$ giving rough eigenvalues.
The point of this discussion is to rigorously show
the validity of this approach.
-If we only care about the ground state,
-then we already know $$\lambda$$ from $$R[u]$$,
-so all we need to do is solve the above matrix equation for $$a_n$$.
-Keep in mind that $$\overline{M}$$ is singular,
-and $$a_n$$ are only defined up to a constant factor.
-
Nowadays, there exist many other methods to calculate eigenvalues
-of complicated operators $$\hat{H}$$,
+of complicated operators $$\hat{L}$$,
but an attractive feature of the Ritz method is that it is single-step,
whereas its competitors tend to be iterative.
-That said, the Ritz method cannot recover from a poorly chosen basis.
+That said, this method cannot recover from a poorly chosen basis $$\{\Ket{f_n}\}$$.
+
+Indeed, the overall accuracy is determined by how good our truncated basis is,
+i.e. how large a subspace it spans
+of the [Hilbert space](/know/concept/hilbert-space/) in which the true $$\Ket{u_0}$$ resides.
+Clearly, adding more basis vectors improves the results,
+but at a computational cost;
+it is usually more efficient to carefully choose *which* $$\ket{f_n}$$ to use,
+rather than just *how many*.