Categories: Mathematics, Statistics, Stochastic analysis.

Consider the following general Itō diffusion \(X_t \in \mathbb{R}\), which is assumed to satisfy the conditions for unique existence on the entire time axis:

\[\begin{aligned} \dd{X}_t = f(X_t, t) \dd{t} + g(X_t, t) \dd{B_t} \end{aligned}\]

Let \(\mathcal{F}_t\) be the filtration to which \(X_t\) is adapted, then we define \(Y_s\) as shown below, namely as the conditional expectation of \(h(X_t)\), for an arbitrary bounded function \(h(x)\), given the information \(\mathcal{F}_s\) available at time \(s \le t\). Because \(X_t\) is a Markov process, \(Y_s\) must be \(X_s\)-measurable, so it is a function \(k\) of \(X_s\) and \(s\):

\[\begin{aligned} Y_s \equiv \mathbf{E}[h(X_t) | \mathcal{F}_s] = \mathbf{E}[h(X_t) | X_s] = k(X_s, s) \end{aligned}\]

Consequently, we can apply Itō’s lemma to find \(\dd{Y_s}\) in terms of \(k\), \(f\) and \(g\):

\[\begin{aligned} \dd{Y_s} &= \bigg( \pdv{k}{s} + \pdv{k}{x} f + \frac{1}{2} \pdv[2]{k}{x} g^2 \bigg) \dd{s} + \pdv{k}{x} g \dd{B_s} \\ &= \bigg( \pdv{k}{s} + \hat{L} k \bigg) \dd{s} + \pdv{k}{x} g \dd{B_s} \end{aligned}\]

Where we have defined the linear operator \(\hat{L}\) to have the following action on \(k\):

\[\begin{aligned} \hat{L} k \equiv \pdv{k}{x} f + \frac{1}{2} \pdv[2]{k}{x} g^2 \end{aligned}\]

At this point, we need to realize that \(Y_s\) is a martingale with respect to \(\mathcal{F}_s\), since \(Y_s\) is \(\mathcal{F}_s\)-adapted and finite, and it satisfies the martingale property, for \(r \le s \le t\):

\[\begin{aligned} \mathbf{E}[Y_s | \mathcal{F}_r] = \mathbf{E}\Big[ \mathbf{E}[h(X_t) | \mathcal{F}_s] \Big| \mathcal{F}_r \Big] = \mathbf{E}\big[ h(X_t) \big| \mathcal{F}_r \big] = Y_r \end{aligned}\]

Where we used the tower property of conditional expectations, because \(\mathcal{F}_r \subset \mathcal{F}_s\). However, an Itō diffusion can only be a martingale if its drift term (the one containing \(\dd{s}\)) vanishes, so, looking at \(\dd{Y_s}\), we must demand that:

\[\begin{aligned} \pdv{k}{s} + \hat{L} k = 0 \end{aligned}\]

Because \(k(X_s, s)\) is a Markov process, we can write it with a transition density \(p(s, X_s; t, X_t)\), where in this case \(s\) and \(X_s\) are given initial conditions, \(t\) is a parameter, and the terminal state \(X_t\) is a random variable. We thus have:

\[\begin{aligned} k(x, s) = \int_{-\infty}^\infty p(s, x; t, y) \: h(y) \dd{y} \end{aligned}\]

We insert this into the equation that we just derived for \(k\), yielding:

\[\begin{aligned} 0 = \int_{-\infty}^\infty \!\! \Big( \pdv{s} p(s, x; t, y) + \hat{L} p(s, x; t, y) \Big) h(y) \dd{y} \end{aligned}\]

Because \(h\) is arbitrary, and this must be satisfied for all \(h\), the transition density \(p\) fulfills:

\[\begin{aligned} 0 = \pdv{s} p(s, x; t, y) + \hat{L} p(s, x; t, y) \end{aligned}\]

Here, \(t\) is a known parameter and \(y\) is a “known” integration variable, leaving only \(s\) and \(x\) as free variables for us to choose. We therefore define the **likelihood function** \(\psi(s, x)\), which gives the likelihood of an initial condition \((s, x)\) given that the terminal condition is \((t, y)\):

\[\begin{aligned} \boxed{ \psi(s, x) \equiv p(s, x; t, y) } \end{aligned}\]

And from the above derivation, we conclude that \(\psi\) satisfies the following PDE, known as the **backward Kolmogorov equation**:

\[\begin{aligned} \boxed{ - \pdv{\psi}{s} = \hat{L} \psi = f \pdv{\psi}{x} + \frac{1}{2} g^2 \pdv[2]{\psi}{x} } \end{aligned}\]

Moving on, we can define the traditional **probability density function** \(\phi(t, y)\) from the transition density \(p\), by fixing the initial \((s, x)\) and leaving the terminal \((t, y)\) free:

\[\begin{aligned} \boxed{ \phi(t, y) \equiv p(s, x; t, y) } \end{aligned}\]

With this in mind, for \((s, x) = (0, X_0)\), the unconditional expectation \(\mathbf{E}[Y_t]\) (i.e. the conditional expectation without information) will be constant in time, because \(Y_t\) is a martingale:

\[\begin{aligned} \mathbf{E}[Y_t] = \mathbf{E}[k(X_t, t)] = \int_{-\infty}^\infty k(y, t) \: \phi(t, y) \dd{y} = \braket{k}{\phi} = \mathrm{const} \end{aligned}\]

This integral has the form of an inner product, so we switch to Dirac notation. We differentiate with respect to \(t\), and use the backward equation \(\pdv*{k}{t} + \hat{L} k = 0\):

\[\begin{aligned} 0 = \pdv{t} \braket{k}{\phi} = \braket{k}{\pdv{\phi}{t}} + \braket{\pdv{k}{t}}{\phi} = \braket{k}{\pdv{\phi}{t}} - \braket{\hat{L} k}{\phi} = \braket{k}{\pdv{\phi}{t} - \hat{L}{}^\dagger \phi} \end{aligned}\]

Where \(\hat{L}{}^\dagger\) is by definition the adjoint operator of \(\hat{L}\), which we calculate using partial integration, where all boundary terms vanish thanks to the *existence* of \(X_t\); in other words, \(X_t\) cannot reach infinity at any finite \(t\), so the integrand must decay to zero for \(|y| \to \infty\):

\[\begin{aligned} \braket{\hat{L} k}{\phi} &= \int_{-\infty}^\infty \pdv{k}{y} f \phi + \frac{1}{2} \pdv[2]{k}{y} g^2 \phi \dd{y} \\ &= \bigg[ k f \phi + \frac{1}{2} \pdv{k}{y} g^2 \phi \bigg]_{-\infty}^\infty - \int_{-\infty}^\infty k \pdv{y}(f \phi) + \frac{1}{2} \pdv{k}{y} \pdv{y}(g^2 \phi) \dd{y} \\ &= \bigg[ -\frac{1}{2} k g^2 \phi \bigg]_{-\infty}^\infty + \int_{-\infty}^\infty - k \pdv{y}(f \phi) + \frac{1}{2} k \pdv[2]{y}(g^2 \phi) \dd{y} \\ &= \int_{-\infty}^\infty k \: \big( \hat{L}{}^\dagger \phi \big) \dd{y} = \braket{k}{\hat{L}{}^\dagger \phi} \end{aligned}\]

Since \(k\) is arbitrary, and \(\pdv*{\braket{k}{\phi}}{t} = 0\) for all \(k\), we thus arrive at the **forward Kolmogorov equation**, describing the evolution of the probability density \(\phi(t, y)\):

\[\begin{aligned} \boxed{ \pdv{\phi}{t} = \hat{L}{}^\dagger \phi = - \pdv{y}(f \phi) + \frac{1}{2} \pdv[2]{y}(g^2 \phi) } \end{aligned}\]

This can be rewritten in a way that highlights the connection between Itō diffusions and physical diffusion, if we define the **diffusivity** \(D\), **advection** \(u\), and **probability flux** \(J\):

\[\begin{aligned} D \equiv \frac{1}{2} g^2 \qquad \quad u = f - \pdv{D}{x} \qquad \quad J \equiv u \phi - D \pdv{\phi}{x} \end{aligned}\]

Such that the forward Kolmogorov equation takes the following **conservative form**, so called because it looks like a physical continuity equation:

\[\begin{aligned} \boxed{ \pdv{\phi}{t} = - \pdv{J}{x} = - \pdv{x} \Big( u \phi - D \pdv{\phi}{x} \Big) } \end{aligned}\]

Note that if \(u = 0\), then this reduces to Fick’s second law. The backward Kolmogorov equation can also be rewritten analogously, although it is less noteworthy:

\[\begin{aligned} \boxed{ - \pdv{\psi}{t} = u \pdv{\psi}{x} + \pdv{x} \Big( D \pdv{\psi}{x} \Big) } \end{aligned}\]

Notice that the diffusivity term looks the same in both the forward and backward equations; we say that diffusion is self-adjoint.

- U.H. Thygesen,
*Lecture notes on diffusions and stochastic differential equations*, 2021, Polyteknisk Kompendie.

© Marcus R.A. Newman, a.k.a. "Prefetch".
Available under CC BY-SA 4.0.