From dee3ce1536168c9ed5c8c90d8008073afdb51cb9 Mon Sep 17 00:00:00 2001 From: Prefetch Date: Sun, 8 Sep 2024 21:56:52 +0200 Subject: Improve knowledge base --- .../concept/electromagnetic-wave-equation/index.md | 347 ++++++++++++--------- source/know/concept/martingale/index.md | 2 +- source/know/concept/ritz-method/index.md | 201 ++++++------ 3 files changed, 298 insertions(+), 252 deletions(-) diff --git a/source/know/concept/electromagnetic-wave-equation/index.md b/source/know/concept/electromagnetic-wave-equation/index.md index a27fe6f..559d943 100644 --- a/source/know/concept/electromagnetic-wave-equation/index.md +++ b/source/know/concept/electromagnetic-wave-equation/index.md @@ -1,7 +1,7 @@ --- title: "Electromagnetic wave equation" sort_title: "Electromagnetic wave equation" -date: 2021-09-09 +date: 2024-09-08 # Originally 2021-09-09, major rewrite categories: - Physics - Electromagnetism @@ -9,236 +9,281 @@ categories: layout: "concept" --- -The electromagnetic wave equation describes -the propagation of light through various media. -Since an electromagnetic (light) wave consists of +Light, i.e. **electromagnetic waves**, consist of an [electric field](/know/concept/electric-field/) and a [magnetic field](/know/concept/magnetic-field/), -we need [Maxwell's equations](/know/concept/maxwells-equations/) -in order to derive the wave equation. +one inducing the other and vice versa. +The existence and classical behavior of such waves +can be derived using only [Maxwell's equations](/know/concept/maxwells-equations/), +as we will demonstrate here. - -## Uniform medium - -We will use all of Maxwell's equations, -but we start with Ampère's circuital law for the "free" fields $$\vb{H}$$ and $$\vb{D}$$, -in the absence of a free current $$\vb{J}_\mathrm{free} = 0$$: - -$$\begin{aligned} - \nabla \cross \vb{H} - = \pdv{\vb{D}}{t} -\end{aligned}$$ - -We assume that the medium is isotropic, linear, -and uniform in all of space, such that: +We start from Faraday's law of induction, +where we assume that the system consists of materials +with well-known (linear) relative magnetic permeabilities $$\mu_r(\vb{r})$$, +such that $$\vb{B} = \mu_0 \mu_r \vb{H}$$: $$\begin{aligned} - \vb{D} = \varepsilon_0 \varepsilon_r \vb{E} - \qquad \quad - \vb{H} = \frac{1}{\mu_0 \mu_r} \vb{B} + \nabla \cross \vb{E} + = - \pdv{\vb{B}}{t} + = - \mu_0 \mu_r \pdv{\vb{H}}{t} \end{aligned}$$ -Which, upon insertion into Ampère's law, -yields an equation relating $$\vb{B}$$ and $$\vb{E}$$. -This may seem to contradict Ampère's "total" law, -but keep in mind that $$\vb{J}_\mathrm{bound} \neq 0$$ here: +We move $$\mu_r(\vb{r})$$ to the other side, +take the curl, and insert Ampère's circuital law: $$\begin{aligned} - \nabla \cross \vb{B} - = \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdv{\vb{E}}{t} + \nabla \cross \bigg( \frac{1}{\mu_r} \nabla \cross \vb{E} \bigg) + &= - \mu_0 \pdv{}{t} \big( \nabla \cross \vb{H} \big) + \\ + &= - \mu_0 \bigg( \pdv{\vb{J}_\mathrm{free}}{t} + \pdvn{2}{\vb{D}}{t} \bigg) \end{aligned}$$ -Now we take the curl, rearrange, -and substitute $$\nabla \cross \vb{E}$$ according to Faraday's law: +For simplicity, we only consider insulating materials, +since light propagation in conductors is a complex beast. +We thus assume that there are no free currents $$\vb{J}_\mathrm{free} = 0$$, leaving: $$\begin{aligned} - \nabla \cross (\nabla \cross \vb{B}) - = \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdv{}{t}(\nabla \cross \vb{E}) - = - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{B}}{t} + \nabla \cross \bigg( \frac{1}{\mu_r} \nabla \cross \vb{E} \bigg) + &= - \mu_0 \pdvn{2}{\vb{D}}{t} \end{aligned}$$ -Using a vector identity, we rewrite the leftmost expression, -which can then be reduced thanks to Gauss' law for magnetism $$\nabla \cdot \vb{B} = 0$$: +Having $$\vb{E}$$ and $$\vb{D}$$ in the same equation is not ideal, +so we should make a choice: +do we restrict ourselves to linear media +(so $$\vb{D} = \varepsilon_0 \varepsilon_r \vb{E}$$), +or do we allow materials with more complicated responses +(so $$\vb{D} = \varepsilon_0 \vb{E} + \vb{P}$$, with $$\vb{P}$$ unspecified)? +The former is usually sufficient: $$\begin{aligned} - - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{B}}{t} - &= \nabla (\nabla \cdot \vb{B}) - \nabla^2 \vb{B} - = - \nabla^2 \vb{B} + \boxed{ + \nabla \cross \bigg( \frac{1}{\mu_r} \nabla \cross \vb{E} \bigg) + = - \mu_0 \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{E}}{t} + } \end{aligned}$$ -This describes $$\vb{B}$$. -Next, we repeat the process for $$\vb{E}$$: -taking the curl of Faraday's law yields: +This is the general linear form of the **electromagnetic wave equation**, +where $$\mu_r$$ and $$\varepsilon_r$$ +both depend on $$\vb{r}$$ in order to describe the structure of the system. +We can obtain a similar equation for $$\vb{H}$$, +by starting from Ampère's law under the same assumptions: $$\begin{aligned} - \nabla \cross (\nabla \cross \vb{E}) - = - \pdv{}{t}(\nabla \cross \vb{B}) - = - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{E}}{t} + \nabla \cross \vb{H} + = \pdv{\vb{D}}{t} + = \varepsilon_0 \varepsilon_r \pdv{\vb{E}}{t} \end{aligned}$$ -Which can be rewritten using same vector identity as before, -and then reduced by assuming that there is no net charge density $$\rho = 0$$ -in Gauss' law, such that $$\nabla \cdot \vb{E} = 0$$: +Taking the curl and substituting Faraday's law on the right yields: $$\begin{aligned} - - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{E}}{t} - &= \nabla (\nabla \cdot \vb{E}) - \nabla^2 \vb{E} - = - \nabla^2 \vb{E} + \nabla \cross \bigg( \frac{1}{\varepsilon_r} \nabla \cross \vb{H} \bigg) + &= \varepsilon_0 \pdv{}{t} \big( \nabla \cross \vb{E} \big) + = - \varepsilon_0 \pdvn{2}{\vb{B}}{t} \end{aligned}$$ -We thus arrive at the following two (implicitly coupled) -wave equations for $$\vb{E}$$ and $$\vb{B}$$, -where we have defined the phase velocity $$v \equiv 1 / \sqrt{\mu_0 \mu_r \varepsilon_0 \varepsilon_r}$$: +And then we insert $$\vb{B} = \mu_0 \mu_r \vb{H}$$ to get the analogous +electromagnetic wave equation for $$\vb{H}$$: $$\begin{aligned} \boxed{ - \pdvn{2}{\vb{E}}{t} - \frac{1}{v^2} \nabla^2 \vb{E} - = 0 - } - \qquad \quad - \boxed{ - \pdvn{2}{\vb{B}}{t} - \frac{1}{v^2} \nabla^2 \vb{B} - = 0 + \nabla \cross \bigg( \frac{1}{\varepsilon_r} \nabla \cross \vb{H} \bigg) + = - \mu_0 \varepsilon_0 \mu_r \pdvn{2}{\vb{H}}{t} } \end{aligned}$$ -Traditionally, it is said that the solutions are as follows, -where the wavenumber $$|\vb{k}| = \omega / v$$: - -$$\begin{aligned} - \vb{E}(\vb{r}, t) - &= \vb{E}_0 \exp(i \vb{k} \cdot \vb{r} - i \omega t) - \\ - \vb{B}(\vb{r}, t) - &= \vb{B}_0 \exp(i \vb{k} \cdot \vb{r} - i \omega t) -\end{aligned}$$ - -In fact, thanks to linearity, these **plane waves** can be treated as -terms in a Fourier series, meaning that virtually -*any* function $$f(\vb{k} \cdot \vb{r} - \omega t)$$ is a valid solution. +This is equivalent to the problem for $$\vb{E}$$, +since they are coupled by Maxwell's equations. +By solving either, subject to Gauss's laws +$$\nabla \cdot (\varepsilon_r \vb{E}) = 0$$ and $$\nabla \cdot (\mu_r \vb{H}) = 0$$, +the behavior of light in a given system can be deduced. +Note that Gauss's laws enforce that the wave's fields are transverse, +i.e. they must be perpendicular to the propagation direction. -Keep in mind that in reality $$\vb{E}$$ and $$\vb{B}$$ are real, -so although it is mathematically convenient to use plane waves, -in the end you will need to take the real part. -## Non-uniform medium +## Homogeneous linear media -A useful generalization is to allow spatial change -in the relative permittivity $$\varepsilon_r(\vb{r})$$ -and the relative permeability $$\mu_r(\vb{r})$$. -We still assume that the medium is linear and isotropic, so: +In the special case where the medium is completely uniform, +$$\mu_r$$ and $$\varepsilon_r$$ no longer depend on $$\vb{r}$$, +so they can be moved to the other side: $$\begin{aligned} - \vb{D} - = \varepsilon_0 \varepsilon_r(\vb{r}) \vb{E} - \qquad \quad - \vb{B} - = \mu_0 \mu_r(\vb{r}) \vb{H} + \nabla \cross \big( \nabla \cross \vb{E} \big) + &= - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{E}}{t} + \\ + \nabla \cross \big( \nabla \cross \vb{H} \big) + &= - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{H}}{t} \end{aligned}$$ -Inserting these expressions into Faraday's and Ampère's laws -respectively yields: +This can be rewritten using the vector identity +$$\nabla \cross (\nabla \cross \vb{V}) = \nabla (\nabla \cdot \vb{V}) - \nabla^2 \vb{V}$$: $$\begin{aligned} - \nabla \cross \vb{E} - = - \mu_0 \mu_r(\vb{r}) \pdv{\vb{H}}{t} - \qquad \quad - \nabla \cross \vb{H} - = \varepsilon_0 \varepsilon_r(\vb{r}) \pdv{\vb{E}}{t} + \nabla (\nabla \cdot \vb{E}) - \nabla^2 \vb{E} + &= - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{E}}{t} + \\ + \nabla (\nabla \cdot \vb{H}) - \nabla^2 \vb{H} + &= - \mu_0 \mu_r \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{H}}{t} \end{aligned}$$ -We then divide Ampère's law by $$\varepsilon_r(\vb{r})$$, -take the curl, and substitute Faraday's law, giving: +Which can be reduced using Gauss's laws +$$\nabla \cdot \vb{E} = 0$$ and $$\nabla \cdot \vb{H} = 0$$ +thanks to the fact that $$\varepsilon_r$$ and $$\mu_r$$ are constants in this case. +We therefore arrive at: $$\begin{aligned} - \nabla \cross \Big( \frac{1}{\varepsilon_r} \nabla \cross \vb{H} \Big) - = \varepsilon_0 \pdv{}{t}(\nabla \cross \vb{E}) - = - \mu_0 \mu_r \varepsilon_0 \pdvn{2}{\vb{H}}{t} + \boxed{ + \nabla^2 \vb{E} - \frac{n^2}{c^2} \pdvn{2}{\vb{E}}{t} + = 0 + } \end{aligned}$$ -Next, we exploit linearity by decomposing $$\vb{H}$$ and $$\vb{E}$$ -into Fourier series, with terms given by: - $$\begin{aligned} - \vb{H}(\vb{r}, t) - = \vb{H}(\vb{r}) \exp(- i \omega t) - \qquad \quad - \vb{E}(\vb{r}, t) - = \vb{E}(\vb{r}) \exp(- i \omega t) + \boxed{ + \nabla^2 \vb{H} - \frac{n^2}{c^2} \pdvn{2}{\vb{H}}{t} + = 0 + } \end{aligned}$$ -By inserting this ansatz into the equation, -we can remove the explicit time dependence: +Where $$c = 1 / \sqrt{\mu_0 \varepsilon_0}$$ is the speed of light in a vacuum, +and $$n = \sqrt{\mu_0 \varepsilon_0}$$ is the refractive index of the medium. +Note that most authors write the magnetic equation with $$\vb{B}$$ instead of $$\vb{H}$$; +both are correct thanks to linearity. + +In a vacuum, where $$n = 1$$, these equations are sometimes written as +$$\square \vb{E} = 0$$ and $$\square \vb{H} = 0$$, +where $$\square$$ is the **d'Alembert operator**, defined as follows: $$\begin{aligned} - \nabla \cross \Big( \frac{1}{\varepsilon_r} \nabla \cross \vb{H} \Big) \exp(- i \omega t) - = \mu_0 \varepsilon_0 \omega^2 \mu_r \vb{H} \exp(- i \omega t) + \boxed{ + \square + \equiv \nabla^2 - \frac{1}{c^2} \pdvn{2}{}{t} + } \end{aligned}$$ -Dividing out $$\exp(- i \omega t)$$, -we arrive at an eigenvalue problem for $$\omega^2$$, -with $$c = 1 / \sqrt{\mu_0 \varepsilon_0}$$: +Note that some authors define it with the opposite sign. +In any case, the d'Alembert operator is important for special relativity. + +The solution to the homogeneous electromagnetic wave equation +are traditionally said to be the so-called **plane waves** given by: $$\begin{aligned} - \boxed{ - \nabla \cross \Big( \frac{1}{\varepsilon_r(\vb{r})} \nabla \cross \vb{H}(\vb{r}) \Big) - = \Big( \frac{\omega}{c} \Big)^2 \mu_r(\vb{r}) \vb{H}(\vb{r}) - } + \vb{E}(\vb{r}, t) + &= \vb{E}_0 e^{i \vb{k} \cdot \vb{r} - i \omega t} + \\ + \vb{B}(\vb{r}, t) + &= \vb{B}_0 e^{i \vb{k} \cdot \vb{r} - i \omega t} \end{aligned}$$ -Compared to a uniform medium, $$\omega$$ is often not arbitrary here: -there are discrete eigenvalues $$\omega$$, -corresponding to discrete **modes** $$\vb{H}(\vb{r})$$. +Where the wavevector $$\vb{k}$$ is arbitrary, +and the angular frequency $$\omega = c |\vb{k}| / n$$. +We also often talk about the wavelength, which is $$\lambda = 2 \pi / |\vb{k}|$$. +The appearance of $$\vb{k}$$ in the exponent +tells us that these waves are propagating through space, +as you would expect. + +In fact, because the wave equations are linear, +any superposition of plane waves, +i.e. any function of the form $$f(\vb{k} \cdot \vb{r} - \omega t)$$, +is in fact a valid solution. +Just remember that $$\vb{E}$$ and $$\vb{H}$$ are real-valued, +so it may be necessary to take the real part at the end of a calculation. -Next, we go through the same process to find an equation for $$\vb{E}$$. -Starting from Faraday's law, we divide by $$\mu_r(\vb{r})$$, -take the curl, and insert Ampère's law: + + +## Inhomogeneous linear media + +But suppose the medium is not uniform, i.e. it contains structures +described by $$\varepsilon_r(\vb{r})$$ and $$\mu_r(\vb{r})$$. +If the structures are much larger than the light's wavelength, +the homogeneous equation is still a very good approximation +away from any material boundaries; +anywhere else, however, they will break down. +Recall the general equations from before we assumed homogeneity: $$\begin{aligned} - \nabla \cross \Big( \frac{1}{\mu_r} \nabla \cross \vb{E} \Big) - = - \mu_0 \pdv{}{t}(\nabla \cross \vb{H}) - = - \mu_0 \varepsilon_0 \varepsilon_r \pdvn{2}{\vb{E}}{t} + \nabla \cross \bigg( \frac{1}{\mu_r} \nabla \cross \vb{E} \bigg) + &= - \frac{\varepsilon_r}{c^2} \pdvn{2}{\vb{E}}{t} + \\ + \nabla \cross \bigg( \frac{1}{\varepsilon_r} \nabla \cross \vb{H} \bigg) + &= - \frac{\mu_r}{c^2} \pdvn{2}{\vb{H}}{t} \end{aligned}$$ -Then, by replacing $$\vb{E}(\vb{r}, t)$$ with our plane-wave ansatz, -we remove the time dependence: +In theory, this is everything we need, +but in most cases a better approach is possible: +the trick is that we only rarely need to explicitly calculate +the $$t$$-dependence of $$\vb{E}$$ or $$\vb{H}$$. +Instead, we can first solve an easier time-independent version +of this problem, and then approximate the dynamics +with [coupled mode theory](/know/concept/coupled-mode-theory/) later. + +To eliminate $$t$$, we make an ansatz for $$\vb{E}$$ and $$\vb{H}$$, shown below. +No generality is lost by doing this; +this is effectively a kind of [Fourier transform](/know/concept/fourier-transform/): $$\begin{aligned} - \nabla \cross \Big( \frac{1}{\mu_r} \nabla \cross \vb{E} \Big) \exp(- i \omega t) - = - \mu_0 \varepsilon_0 \omega^2 \varepsilon_r \vb{E} \exp(- i \omega t) + \vb{E}(\vb{r}, t) + &= \vb{E}(\vb{r}) e^{- i \omega t} + \\ + \vb{H}(\vb{r}, t) + &= \vb{H}(\vb{r}) e^{- i \omega t} \end{aligned}$$ -Which, after dividing out $$\exp(- i \omega t)$$, -yields an analogous eigenvalue problem with $$\vb{E}(r)$$: +Inserting this ansatz and dividing out $$e^{-i \omega t}$$ +yields the time-independent forms: $$\begin{aligned} \boxed{ - \nabla \cross \Big( \frac{1}{\mu_r(\vb{r})} \nabla \cross \vb{E}(\vb{r}) \Big) - = \Big( \frac{\omega}{c} \Big)^2 \varepsilon_r(\vb{r}) \vb{E}(\vb{r}) + \nabla \cross \bigg( \frac{1}{\mu_r} \nabla \cross \vb{E} \bigg) + = \Big( \frac{\omega}{c} \Big)^2 \varepsilon_r \vb{E} } \end{aligned}$$ -Usually, it is a reasonable approximation -to say $$\mu_r(\vb{r}) = 1$$, -in which case the equation for $$\vb{H}(\vb{r})$$ -becomes a Hermitian eigenvalue problem, -and is thus easier to solve than for $$\vb{E}(\vb{r})$$. +$$\begin{aligned} + \boxed{ + \nabla \cross \bigg( \frac{1}{\varepsilon_r} \nabla \cross \vb{H} \bigg) + = \Big( \frac{\omega}{c} \Big)^2 \mu_r \vb{H} + } +\end{aligned}$$ -Keep in mind, however, that in any case, -the solutions $$\vb{H}(\vb{r})$$ and/or $$\vb{E}(\vb{r})$$ -must satisfy the two Maxwell's equations that were not explicitly used: +These are eigenvalue problems for $$\omega^2$$, +which can be solved subject to Gauss's laws and suitable boundary conditions. +The resulting allowed values of $$\omega$$ may consist of +continuous ranges and/or discrete resonances, +analogous to *scattering* and *bound* quantum states, respectively. +It can be shown that the operators on both sides of each equation +are Hermitian, meaning these are well-behaved problems +yielding real eigenvalues and orthogonal eigenfields. + +Both equations are still equivalent: +we only need to solve one. But which one? +In practice, one is usually easier than the other, +due to the common approximation that $$\mu_r \approx 1$$ for many dielectric materials, +in which case the equations reduce to: $$\begin{aligned} - \nabla \cdot (\varepsilon_r \vb{E}) = 0 - \qquad \quad - \nabla \cdot (\mu_r \vb{H}) = 0 + \nabla \cross \big( \nabla \cross \vb{E} \big) + &= \Big( \frac{\omega}{c} \Big)^2 \varepsilon_r \vb{E} + \\ + \nabla \cross \bigg( \frac{1}{\varepsilon_r} \nabla \cross \vb{H} \bigg) + &= \Big( \frac{\omega}{c} \Big)^2 \vb{H} \end{aligned}$$ -This is equivalent to demanding that the resulting waves are *transverse*, -or in other words, -the wavevector $$\vb{k}$$ must be perpendicular to -the amplitudes $$\vb{H}_0$$ and $$\vb{E}_0$$. +Now the equation for $$\vb{H}$$ is starting to look simpler, +because it only has an operator on *one* side. +We could "fix" the equation for $$\vb{E}$$ by dividing it by $$\varepsilon_r$$, +but the resulting operator would no longer be Hermitian, +and hence not well-behaved. +To get an idea of how to handle $$\varepsilon_r$$ in the $$\vb{E}$$-equation, +notice its similarity to the weight function $$w$$ +in [Sturm-Liouville theory](/know/concept/sturm-liouville-theory/). + +Gauss's magnetic law $$\nabla \cdot \vb{H} = 0$$ +is also significantly easier for numerical calculations +than its electric counterpart $$\nabla \cdot (\varepsilon_r \vb{E}) = 0$$, +so we usually prefer to solve the equation for $$\vb{H}$$. + ## References diff --git a/source/know/concept/martingale/index.md b/source/know/concept/martingale/index.md index 53a346a..7daebea 100644 --- a/source/know/concept/martingale/index.md +++ b/source/know/concept/martingale/index.md @@ -20,7 +20,7 @@ then $$M_t$$ is a martingale if it satisfies all of the following: 1. $$M_t$$ is $$\mathcal{F}_t$$-adapted, meaning the filtration $$\mathcal{F}_t$$ contains enough information to reconstruct the current and all past values of $$M_t$$. -2. For all times $$t \ge 0$$, the expectation value exists $$\mathbf{E}(M_t) < \infty$$. +2. For all times $$t \ge 0$$, the expectation value $$\mathbf{E}(M_t)$$ is finite. 3. For all $$s, t$$ satisfying $$0 \le s \le t$$, the [conditional expectation](/know/concept/conditional-expectation/) $$\mathbf{E}(M_t | \mathcal{F}_s) = M_s$$, diff --git a/source/know/concept/ritz-method/index.md b/source/know/concept/ritz-method/index.md index 902b7cf..ef694da 100644 --- a/source/know/concept/ritz-method/index.md +++ b/source/know/concept/ritz-method/index.md @@ -25,25 +25,26 @@ consider the following functional to be optimized: $$\begin{aligned} R[u] - = \frac{1}{S} \int_a^b p(x) \big|u_x(x)\big|^2 - q(x) \big|u(x)\big|^2 \dd{x} + \equiv \frac{1}{S} \int_a^b p(x) \big|u_x(x)\big|^2 - q(x) \big|u(x)\big|^2 \dd{x} \end{aligned}$$ Where $$u(x) \in \mathbb{C}$$ is the unknown function, and $$p(x), q(x) \in \mathbb{R}$$ are given. -In addition, $$S$$ is the norm of $$u$$, which we demand be constant +In addition, $$S$$ is the norm of $$u$$, which we take to be constant with respect to a weight function $$w(x) \in \mathbb{R}$$: $$\begin{aligned} S - = \int_a^b w(x) \big|u(x)\big|^2 \dd{x} + \equiv \int_a^b w(x) \big|u(x)\big|^2 \dd{x} \end{aligned}$$ -To handle this normalization requirement, -we introduce a [Lagrange multiplier](/know/concept/lagrange-multiplier/) $$\lambda$$, -and define the Lagrangian $$\Lambda$$ for the full constrained optimization problem as: +This normalization requirement acts as a constraint +to the optimization problem for $$R[u]$$, +so we introduce a [Lagrange multiplier](/know/concept/lagrange-multiplier/) $$\lambda$$, +and define the Lagrangian $$\mathcal{L}$$ for the full problem as: $$\begin{aligned} - \Lambda + \mathcal{L} \equiv \frac{1}{S} \bigg( \big( p |u_x|^2 - q |u|^2 \big) - \lambda \big( w |u|^2 \big) \bigg) \end{aligned}$$ @@ -51,7 +52,7 @@ The resulting Euler-Lagrange equation is then calculated in the standard way, yi $$\begin{aligned} 0 - &= \pdv{\Lambda}{u^*} - \dv{}{x}\Big( \pdv{\Lambda}{u_x^*} \Big) + &= \pdv{\mathcal{L}}{u^*} - \dv{}{x}\Big( \pdv{\mathcal{L}}{u_x^*} \Big) \\ &= - \frac{1}{S} \bigg( q u + \lambda w u + \dv{}{x}\big( p u_x \big) \bigg) \end{aligned}$$ @@ -69,15 +70,14 @@ SLPs have useful properties, but before we can take advantage of those, we need to handle an important detail: the boundary conditions (BCs) on $$u$$. The above equation is only a valid SLP for certain BCs, as seen in the derivation of Sturm-Liouville theory. - -Let us return to the definition of $$R[u]$$, +Let us return to the definition of $$R$$, and integrate it by parts: $$\begin{aligned} R[u] &= \frac{1}{S} \int_a^b p u_x u_x^* - q u u^* \dd{x} \\ - &= \frac{1}{S} \Big[ p u_x u^* \Big]_a^b - \frac{1}{S} \int_a^b \dv{}{x}\Big(p u_x\Big) u^* + q u u^* \dd{x} + &= \frac{1}{S} \Big[ p u_x u^* \Big]_a^b - \frac{1}{N} \int_a^b \dv{}{x}\Big(p u_x\Big) u^* + q u u^* \dd{x} \end{aligned}$$ The boundary term vanishes for a subset of the BCs that make a valid SLP, @@ -88,10 +88,11 @@ such that we can use Sturm-Liouville theory later: $$\begin{aligned} R[u] &= - \frac{1}{S} \int_a^b \bigg( \dv{}{x}\Big(p u_x\Big) + q u \bigg) u^* \dd{x} - \equiv - \frac{1}{S} \int_a^b u^* \hat{H} u \dd{x} + \\ + &\equiv - \frac{1}{S} \int_a^b u^* \hat{L} u \dd{x} \end{aligned}$$ -Where $$\hat{H}$$ is the self-adjoint Sturm-Liouville operator. +Where $$\hat{L}$$ is the self-adjoint Sturm-Liouville operator. Because the constrained Euler-Lagrange equation is now an SLP, we know that it has an infinite number of real discrete eigenvalues $$\lambda_n$$ with a lower bound, corresponding to mutually orthogonal eigenfunctions $$u_n(x)$$. @@ -102,16 +103,16 @@ and now insert one of the eigenfunctions $$u_n$$ into $$R$$: $$\begin{aligned} R[u_n] - &= - \frac{1}{S_n} \int_a^b u_n^* \hat{H} u_n \dd{x} - = \frac{1}{S_n} \int_a^b u_n^* \lambda_n w u_n \dd{x} + &= - \frac{1}{S_n} \int_a^b u_n^* \hat{L} u_n \dd{x} + \\ + &= \frac{1}{S_n} \int_a^b \lambda_n w |u_n|^2 \dd{x} \\ - &= \frac{1}{S_n} \lambda_n \int_a^b w |u_n|^2 \dd{x} - = \frac{S_n}{S_n} \lambda_n + &= \frac{S_n}{S_n} \lambda_n \end{aligned}$$ Where $$S_n$$ is the normalization of $$u_n$$. -In other words, when given $$u_n$$, -the functional $$R$$ yields the corresponding eigenvalue $$\lambda_n$$: +In other words, when given $$u_n$$ as input, +the functional $$R$$ returns the corresponding eigenvalue $$\lambda_n$$: $$\begin{aligned} \boxed{ @@ -121,6 +122,11 @@ $$\begin{aligned} \end{aligned}$$ This powerful result was not at all clear from $$R$$'s initial definition. +Note that some authors use the opposite sign for $$\lambda$$ in their SLP definition, +in which case this result can still be obtained +simply by also defining $$R$$ with the opposite sign. +This sign choice is consistent with quantum mechanics, +with the Hamiltonian $$\hat{H} = - \hat{L}$$. @@ -137,81 +143,79 @@ $$\begin{aligned} Here, we are using the fact that the eigenfunctions of an SLP form a complete set, so our (known) guess $$u$$ can be expanded in the true (unknown) eigenfunctions $$u_n$$. -We are assuming that $$u$$ is already quite close to its target $$u_0$$, -such that the (unknown) expansion coefficients $$c_n$$ are small; -specifically $$|c_n|^2 \ll 1$$. -Let us start from what we know: +Next, by definition: $$\begin{aligned} \boxed{ R[u] - = - \frac{\displaystyle\int u^* \hat{H} u \dd{x}}{\displaystyle\int u^* w u \dd{x}} + = - \frac{\displaystyle\int u^* \hat{L} u \dd{x}}{\displaystyle\int u^* w u \dd{x}} } \end{aligned}$$ -This quantity is known as the **Rayleigh quotient**. +This quantity is known as the **Rayleigh quotient**, +and again beware of the sign in its definition; see the remark above. Inserting our ansatz $$u$$, -and using that the true $$u_n$$ have corresponding eigenvalues $$\lambda_n$$: +and using that the true $$u_n$$ have corresponding eigenvalues $$\lambda_n$$, +we have: $$\begin{aligned} R[u] - &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \: \hat{H} \Big\{ u_0 + \sum_n c_n u_n \Big\} \dd{x}} + &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \: \hat{L} \Big\{ u_0 + \sum_n c_n u_n \Big\} \dd{x}} {\displaystyle\int w \Big( u_0 + \sum_n c_n u_n \Big) \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \dd{x}} \\ - &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( \!-\! \lambda_0 w u_0 - \sum_n c_n \lambda_n w u_n \Big) \dd{x}} + &= - \frac{\displaystyle\int \Big( u_0^* + \sum_n c_n^* u_n^* \Big) + \Big( \!-\! \lambda_0 w u_0 - \sum_n c_n \lambda_n w u_n \Big) \dd{x}} {\displaystyle\int w \Big( u_0^* + \sum_n c_n^* u_n^* \Big) \Big( u_0 + \sum_n c_n u_n \Big) \dd{x}} \end{aligned}$$ For convenience, we switch to [Dirac notation](/know/concept/dirac-notation/) -before evaluating further. +before evaluating further: $$\begin{aligned} - R - &= \frac{\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \cdot \Big( \lambda_0 \Ket{w u_0} + \sum_n c_n \lambda_n \Ket{w u_n} \Big)} - {\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \cdot \Big( \Ket{w u_0} + \sum_n c_n \Ket{w u_n} \Big)} + R[u] + &= \frac{\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) + \Big( \lambda_0 \Ket{w u_0} + \sum_n c_n \lambda_n \Ket{w u_n} \Big)} + {\displaystyle \Big( \Bra{u_0} + \sum_n c_n^* \Bra{u_n} \Big) \Big( \Ket{w u_0} + \sum_n c_n \Ket{w u_n} \Big)} \\ - &= \frac{\displaystyle \lambda_0 \Inprod{u_0}{w u_0} + \lambda_0 \sum_{n = 1}^\infty c_n^* \Inprod{u_n}{w u_0} - + \sum_{n = 1}^\infty c_n \lambda_n \Inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \lambda_n \Inprod{u_m}{w u_n}} - {\displaystyle \Inprod{u_0}{w u_0} + \sum_{n = 1}^\infty c_n^* \Inprod{u_n}{w u_0} - + \sum_{n = 1}^\infty c_n \Inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \Inprod{u_m}{w u_n}} + &= \frac{\displaystyle \lambda_0 \inprod{u_0}{w u_0} + \lambda_0 \sum_{n} c_n^* \inprod{u_n}{w u_0} + + \sum_{n} c_n \lambda_n \inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \lambda_n \inprod{u_m}{w u_n}} + {\displaystyle \inprod{u_0}{w u_0} + \sum_{n} c_n^* \inprod{u_n}{w u_0} + + \sum_{n} c_n \inprod{u_0}{w u_n} + \sum_{m n} c_n c_m^* \inprod{u_m}{w u_n}} \end{aligned}$$ -Using orthogonality $$\Inprod{u_m}{w u_n} = S_n \delta_{mn}$$, +Using orthogonality $$\inprod{u_m}{w u_n} = S_n \delta_{mn}$$, and the fact that $$n \neq 0$$ by definition, we find: $$\begin{aligned} - R + R[u] &= \frac{\displaystyle \lambda_0 S_0 + \lambda_0 \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n \lambda_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* \lambda_n S_n \delta_{mn}} {\displaystyle S_0 + \sum_n c_n^* S_n \delta_{n0} + \sum_n c_n S_n \delta_{n0} + \sum_{m n} c_n c_m^* S_n \delta_{mn}} \\ - &= \frac{\displaystyle \lambda_0 S_0 + 0 + 0 + \sum_{n} c_n c_n^* \lambda_n S_n} - {\displaystyle S_0 + 0 + 0 + \sum_{n} c_n c_n^* S_n} - = \frac{\displaystyle \lambda_0 S_0 + \sum_{n} |c_n|^2 \lambda_n S_n} + &= \frac{\displaystyle \lambda_0 S_0 + \sum_{n} |c_n|^2 \lambda_n S_n} {\displaystyle S_0 + \sum_{n} |c_n|^2 S_n} \end{aligned}$$ It is always possible to choose our normalizations such that $$S_n = S$$ for all $$u_n$$, leaving: $$\begin{aligned} - R - &= \frac{\displaystyle \lambda_0 S + \sum_{n} |c_n|^2 \lambda_n S} - {\displaystyle S + \sum_{n} |c_n|^2 S} - = \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_n} + R[u] + &= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_n} {\displaystyle 1 + \sum_{n} |c_n|^2} \end{aligned}$$ And finally, after rearranging the numerator, we arrive at the following relation: $$\begin{aligned} - R + R[u] &= \frac{\displaystyle \lambda_0 + \sum_{n} |c_n|^2 \lambda_0 + \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)} {\displaystyle 1 + \sum_{n} |c_n|^2} - = \lambda_0 + \frac{\displaystyle \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)} + \\ + &= \lambda_0 + \frac{\displaystyle \sum_{n} |c_n|^2 (\lambda_n - \lambda_0)} {\displaystyle 1 + \sum_{n} |c_n|^2} \end{aligned}$$ -Thus, if we improve our guess $$u$$, +Thus, if we improve our guess $$u$$ (i.e. reduce $$|c_n|$$), then $$R[u]$$ approaches the true eigenvalue $$\lambda_0$$. For numerically finding $$u_0$$ and $$\lambda_0$$, this gives us a clear goal: minimize $$R$$, because: @@ -228,19 +232,21 @@ In the context of quantum mechanics, this is not surprising, since any superposition of multiple states is guaranteed to have a higher energy than the ground state. -Note that the convergence to $$\lambda_0$$ goes as $$|c_n|^2$$, +As our guess $$u$$ is improved, $$\lambda_0$$ converges as $$|c_n|^2$$, while $$u$$ converges to $$u_0$$ as $$|c_n|$$ by definition, -so even a fairly bad guess $$u$$ will give a decent estimate for $$\lambda_0$$. +so even a fairly bad ansatz $$u$$ gives a decent estimate for $$\lambda_0$$. ## The method In the following, we stick to Dirac notation, -since the results hold for both continuous functions $$u(x)$$ and discrete vectors $$\vb{u}$$, -as long as the operator $$\hat{H}$$ is self-adjoint. +since the results hold for both continuous functions $$u(x)$$ +and discrete vectors $$\vb{u}$$, +as long as the operator $$\hat{L}$$ is self-adjoint. Suppose we express our guess $$\Ket{u}$$ as a linear combination -of *known* basis vectors $$\Ket{f_n}$$ with weights $$a_n \in \mathbb{C}$$: +of *known* basis vectors $$\Ket{f_n}$$ with weights $$a_n \in \mathbb{C}$$, +where $$\Ket{f_n}$$ are not necessarily eigenvectors of $$\hat{L}$$: $$\begin{aligned} \Ket{u} @@ -250,11 +256,11 @@ $$\begin{aligned} \end{aligned}$$ For numerical tractability, we truncate the sum at $$N$$ terms, -and for generality, we allow $$\Ket{f_n}$$ to be non-orthogonal, +and for generality we allow $$\Ket{f_n}$$ to be non-orthogonal, as described by an *overlap matrix* with elements $$S_{mn}$$: $$\begin{aligned} - \Inprod{f_m}{w f_n} = S_{m n} + \inprod{f_m}{w f_n} = S_{m n} \end{aligned}$$ From the discussion above, @@ -262,11 +268,10 @@ we know that the ground-state eigenvalue $$\lambda_0$$ is estimated by: $$\begin{aligned} \lambda_0 - \approx \lambda - = R[u] - = \frac{\inprod{u}{\hat{H} u}}{\Inprod{u}{w u}} - = \frac{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{\hat{H} f_n}}{\displaystyle \sum_{m n} a_m^* a_n \Inprod{f_m}{w f_n}} - \equiv \frac{\displaystyle \sum_{m n} a_m^* a_n H_{m n}}{\displaystyle \sum_{m n} a_m^* a_n S_{mn}} + \approx R[u] + = - \frac{\inprod{u}{\hat{L} u}}{\inprod{u}{w u}} + = - \frac{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{\hat{L} f_n}}{\displaystyle \sum_{m n} a_m^* a_n \inprod{f_m}{w f_n}} + \equiv - \frac{\displaystyle \sum_{m n} a_m^* a_n L_{m n}}{\displaystyle \sum_{m n} a_m^* a_n S_{mn}} \end{aligned}$$ And we also know that our goal is to minimize $$R[u]$$, @@ -274,25 +279,27 @@ so we vary $$a_k^*$$ to find its extremum: $$\begin{aligned} 0 - = \pdv{R}{a_k^*} - &= \frac{\displaystyle \Big( \sum_{n} a_n H_{k n} \Big) \Big( \sum_{m n} a_n a_m^* S_{mn} \Big) - - \Big( \sum_{n} a_n S_{k n} \Big) \Big( \sum_{m n} a_n a_m^* H_{mn} \Big)} + = - \pdv{R}{a_k^*} + &= \frac{\displaystyle \Big( \sum_{n} a_n L_{k n} \Big) \Big( \sum_{m n} a_n a_m^* S_{mn} \Big) + - \Big( \sum_{n} a_n S_{k n} \Big) \Big( \sum_{m n} a_n a_m^* L_{mn} \Big)} {\Big( \displaystyle \sum_{m n} a_n a_m^* S_{mn} \Big)^2} \\ - &= \frac{\displaystyle \Big( \sum_{n} a_n H_{k n} \Big) - R[u] \Big( \sum_{n} a_n S_{k n}\Big)}{\Inprod{u}{w u}} - = \frac{\displaystyle \sum_{n} a_n \big(H_{k n} - \lambda S_{k n}\big)}{\Inprod{u}{w u}} + &= \frac{\displaystyle \Big( \sum_{n} a_n L_{k n} \Big) - R[u] \Big( \sum_{n} a_n S_{k n}\Big)} + {\displaystyle \sum_{m n} a_n a_m^* S_{mn}} + \\ + &= \sum_{n} a_n \frac{\big(L_{k n} - \lambda S_{k n}\big)}{\inprod{u}{w u}} \end{aligned}$$ Clearly, this is only satisfied if the following holds for all $$k = 0, 1, ..., N\!-\!1$$: $$\begin{aligned} 0 - = \sum_{n = 0}^{N - 1} a_n \big(H_{k n} - \lambda S_{k n}\big) + = \sum_{n = 0}^{N - 1} a_n \big(L_{k n} - \lambda S_{k n}\big) \end{aligned}$$ For illustrative purposes, we can write this as a matrix equation -with $$M_{k n} \equiv H_{k n} - \lambda S_{k n}$$: +with $$M_{k n} \equiv L_{k n} - \lambda S_{k n}$$: $$\begin{aligned} \begin{bmatrix} @@ -311,53 +318,47 @@ $$\begin{aligned} \end{bmatrix} \end{aligned}$$ -Note that this looks like an eigenvalue problem for $$\lambda$$. -Indeed, demanding that $$\overline{M}$$ cannot simply be inverted -(i.e. the solution is non-trivial) -yields a characteristic polynomial for $$\lambda$$: +This looks like an eigenvalue problem for $$\lambda$$, +so we demand that its determinant vanishes: $$\begin{aligned} 0 - = \det\!\Big[ \overline{M} \Big] - = \det\!\Big[ \overline{H} - \lambda \overline{S} \Big] + = \det\!\Big[ \bar{M} \Big] + = \det\!\Big[ \bar{L} - \lambda \bar{S} \Big] \end{aligned}$$ This gives a set of $$\lambda$$, -which are the exact eigenvalues of $$\overline{H}$$, -and the estimated eigenvalues of $$\hat{H}$$ -(recall that $$\overline{H}$$ is $$\hat{H}$$ expressed in a truncated basis). +which are exact eigenvalues of $$\bar{L}$$, +and estimated eigenvalues of $$\hat{L}$$ +(recall that $$\bar{L}$$ is $$\hat{L}$$ expressed in a truncated basis). The eigenvector $$\big[ a_0, a_1, ..., a_{N-1} \big]$$ of the lowest $$\lambda$$ -gives the optimal weights to approximate $$\Ket{u_0}$$ in the basis $$\{\Ket{f_n}\}$$. -Likewise, the higher $$\lambda$$'s eigenvectors approximate -excited (i.e. non-ground) eigenstates of $$\hat{H}$$, -although in practice the results are less accurate the higher we go. +gives the optimal weights $$a_n$$ to approximate $$\Ket{u_0}$$ in the basis $$\{\Ket{f_n}\}$$. +Likewise, the higher $$\lambda$$s' eigenvectors approximate +excited (i.e. non-ground) eigenstates of $$\hat{L}$$, +although in practice the results become less accurate the higher we go. +If we only care about the ground state, +then we already know $$\lambda$$ from $$R[u]$$, +so we just need to solve the matrix equation for $$a_n$$. -The overall accuracy is determined by how good our truncated basis is, -i.e. how large a subspace it spans -of the [Hilbert space](/know/concept/hilbert-space/) in which the true $$\Ket{u_0}$$ resides. -Clearly, adding more basis vectors will improve the results, -at the cost of computation. -For example, if $$\hat{H}$$ represents a helium atom, -a good choice for $$\{\Ket{f_n}\}$$ would be hydrogen orbitals, -since those are qualitatively similar. - -You may find this result unsurprising; -it makes some intuitive sense that approximating $$\hat{H}$$ -in a limited basis would yield a matrix $$\overline{H}$$ giving rough eigenvalues. +You may find this result unsurprising: +it makes some intuitive sense that approximating $$\hat{L}$$ +in a limited basis would yield a matrix $$\bar{L}$$ giving rough eigenvalues. The point of this discussion is to rigorously show the validity of this approach. -If we only care about the ground state, -then we already know $$\lambda$$ from $$R[u]$$, -so all we need to do is solve the above matrix equation for $$a_n$$. -Keep in mind that $$\overline{M}$$ is singular, -and $$a_n$$ are only defined up to a constant factor. - Nowadays, there exist many other methods to calculate eigenvalues -of complicated operators $$\hat{H}$$, +of complicated operators $$\hat{L}$$, but an attractive feature of the Ritz method is that it is single-step, whereas its competitors tend to be iterative. -That said, the Ritz method cannot recover from a poorly chosen basis. +That said, this method cannot recover from a poorly chosen basis $$\{\Ket{f_n}\}$$. + +Indeed, the overall accuracy is determined by how good our truncated basis is, +i.e. how large a subspace it spans +of the [Hilbert space](/know/concept/hilbert-space/) in which the true $$\Ket{u_0}$$ resides. +Clearly, adding more basis vectors improves the results, +but at a computational cost; +it is usually more efficient to carefully choose *which* $$\ket{f_n}$$ to use, +rather than just *how many*. -- cgit v1.2.3