Expand knowledge base

author: Prefetch 2021-11-06 21:47:08 +0100
committer: Prefetch 2021-11-06 21:47:08 +0100
commit: f091bf0922c26238d16bf175a8ea916a16d11fba (patch)
tree: 307ace9fde0b408f45fdc55bc8926fc15d8df7c6
parent: a17363fa734518ada98fc3e79c9fd20f70e42f1b (diff)
7 files changed, 539 insertions, 47 deletions
diff --git a/content/know/concept/conditional-expectation/index.pdc b/content/know/concept/conditional-expectation/index.pdc
index 7da7660..5bcc152 100644
--- a/content/know/concept/conditional-expectation/index.pdc
+++ b/content/know/concept/conditional-expectation/index.pdc
@@ -13,17 +13,17 @@ markup: pandoc
 
 # Conditional expectation
 
-Recall that the expectation value $\mathbf{E}(X)$
+Recall that the expectation value $\mathbf{E}[X]$
 of a [random variable](/know/concept/random-variable/) $X$
 is a function of the probability space $(\Omega, \mathcal{F}, P)$
 on which $X$ is defined, and the definition of $X$ itself.
 
-The **conditional expectation** $\mathbf{E}(X|A)$
+The **conditional expectation** $\mathbf{E}[X|A]$
 is the expectation value of $X$ given that an event $A$ has occurred,
 i.e. only the outcomes $\omega \in \Omega$
 satisfying $\omega \in A$ should be considered.
-If $A$ is obtained by observing another variable,
-then $\mathbf{E}(X|A)$ is a random variable in its own right.
+If $A$ is obtained by observing a variable,
+then $\mathbf{E}[X|A]$ is a random variable in its own right.
 
 Consider two random variables $X$ and $Y$
 on the same probability space $(\Omega, \mathcal{F}, P)$,
@@ -33,7 +33,7 @@ then the conditional expectation of $X$
 given the event $Y = y$ is as follows:
 
 $$\begin{aligned}
-    \mathbf{E}(X | Y \!=\! y)
+    \mathbf{E}[X | Y \!=\! y]
     = \sum_{x} x \: Q(X \!=\! x)
     \qquad \quad
     Q(X \!=\! x)
@@ -43,12 +43,12 @@ $$\begin{aligned}
 Where $Q$ is a renormalized probability function,
 which assigns zero to all events incompatible with $Y = y$.
 If we allow $\Omega$ to be continuous,
-then from the definition $\mathbf{E}(X)$,
+then from the definition $\mathbf{E}[X]$,
 we know that the following Lebesgue integral can be used,
 which we call $f(y)$:
 
 $$\begin{aligned}
-    \mathbf{E}(X | Y \!=\! y)
+    \mathbf{E}[X | Y \!=\! y]
     = f(y)
     = \int_\Omega X(\omega) \dd{Q(\omega)}
 \end{aligned}$$
@@ -60,7 +60,7 @@ Sticking with the assumption $P(Y \!=\! y) > 0$, notice that:
 $$\begin{aligned}
     f(y)
     = \frac{1}{P(Y \!=\! y)} \int_\Omega X(\omega) \dd{P(\omega \cap Y \!=\! y)}
-    = \frac{\mathbf{E}(X \cdot I(Y \!=\! y))}{P(Y \!=\! y)}
+    = \frac{\mathbf{E}[X \cdot I(Y \!=\! y)]}{P(Y \!=\! y)}
 \end{aligned}$$
 
 Where $I$ is the indicator function,
@@ -68,33 +68,33 @@ equal to $1$ if its argument is true, and $0$ if not.
 Multiplying the definition of $f(y)$ by $P(Y \!=\! y)$ then leads us to:
 
 $$\begin{aligned}
-    \mathbf{E}(X \cdot I(Y \!=\! y))
+    \mathbf{E}[X \cdot I(Y \!=\! y)]
     &= f(y) \cdot P(Y \!=\! y)
     \\
-    &= \mathbf{E}(f(Y) \cdot I(Y \!=\! y))
+    &= \mathbf{E}[f(Y) \cdot I(Y \!=\! y)]
 \end{aligned}$$
 
 Recall that because $Y$ is a random variable,
-$\mathbf{E}(X|Y) = f(Y)$ is too.
+$\mathbf{E}[X|Y] = f(Y)$ is too.
 In other words, $f$ maps $Y$ to another random variable,
 which, due to the *Doob-Dynkin lemma*
 (see [$\sigma$-algebra](/know/concept/sigma-algebra/)),
-must mean that $\mathbf{E}(X|Y)$ is measurable with respect to $\sigma(Y)$.
+must mean that $\mathbf{E}[X|Y]$ is measurable with respect to $\sigma(Y)$.
 Intuitively, this makes some sense:
-$\mathbf{E}(X|Y)$ cannot contain more information about events
+$\mathbf{E}[X|Y]$ cannot contain more information about events
 than the $Y$ it was calculated from.
 
 This suggests a straightforward generalization of the above:
 instead of a specific value $Y = y$,
 we can condition on *any* information from $Y$.
 If $\mathcal{H} = \sigma(Y)$ is the information generated by $Y$,
-then the conditional expectation $\mathbf{E}(X|\mathcal{H}) = Z$
+then the conditional expectation $\mathbf{E}[X|\mathcal{H}] = Z$
 is $\mathcal{H}$-measurable, and given by a $Z$ satisfying:
 
 $$\begin{aligned}
     \boxed{
-        \mathbf{E}\big(X \cdot I(H)\big)
-        = \mathbf{E}\big(Z \cdot I(H)\big)
+        \mathbf{E}\big[X \cdot I(H)\big]
+        = \mathbf{E}\big[Z \cdot I(H)\big]
     }
 \end{aligned}$$
 
@@ -102,52 +102,55 @@ For any $H \in \mathcal{H}$. Note that $Z$ is almost surely unique:
 *almost* because it could take any value
 for an event $A$ with zero probability $P(A) = 0$.
 Fortunately, if there exists a continuous $f$
-such that $\mathbf{E}(X | \sigma(Y)) = f(Y)$,
-then $Z = \mathbf{E}(X | \sigma(Y))$ is unique.
+such that $\mathbf{E}[X | \sigma(Y)] = f(Y)$,
+then $Z = \mathbf{E}[X | \sigma(Y)]$ is unique.
+
+
+## Properties
 
 A conditional expectation defined in this way has many useful properties,
 most notably linearity:
-$\mathbf{E}(aX \!+\! bY | \mathcal{H}) = a \mathbf{E}(X|\mathcal{H}) + b \mathbf{E}(Y|\mathcal{H})$
+$\mathbf{E}[aX \!+\! bY | \mathcal{H}] = a \mathbf{E}[X|\mathcal{H}] + b \mathbf{E}[Y|\mathcal{H}]$
 for any $a, b \in \mathbb{R}$.
 
 The **tower property** states that if $\mathcal{F} \supset \mathcal{G} \supset \mathcal{H}$,
-then $\mathbf{E}(\mathbf{E}(X|\mathcal{G})|\mathcal{H}) = \mathbf{E}(X|\mathcal{H})$.
+then $\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}] = \mathbf{E}[X|\mathcal{H}]$.
 Intuitively, this works as follows:
 suppose person $G$ knows more about $X$ than person $H$,
-then $\mathbf{E}(X | \mathcal{H})$ is $H$'s expectation,
-$\mathbf{E}(X | \mathcal{G})$ is $G$'s "better" expectation,
-and then $\mathbf{E}(\mathbf{E}(X|\mathcal{G})|\mathcal{H})$
+then $\mathbf{E}[X | \mathcal{H}]$ is $H$'s expectation,
+$\mathbf{E}[X | \mathcal{G}]$ is $G$'s "better" expectation,
+and then $\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}]$
 is $H$'s prediction about what $G$'s expectation will be.
 However, $H$ does not have access to $G$'s extra information,
-so $H$'s best prediction is simply $\mathbf{E}(X | \mathcal{H})$.
+so $H$'s best prediction is simply $\mathbf{E}[X | \mathcal{H}]$.
 
 The **law of total expectation** says that
-$\mathbf{E}(\mathbf{E}(X | \mathcal{G})) = \mathbf{E}(X)$,
+$\mathbf{E}[\mathbf{E}[X | \mathcal{G}]] = \mathbf{E}[X]$,
 and follows from the above tower property
 by choosing $\mathcal{H}$ to contain no information:
 $\mathcal{H} = \{ \varnothing, \Omega \}$.
 
-Another useful property is that $\mathbf{E}(X | \mathcal{H}) = X$
+Another useful property is that $\mathbf{E}[X | \mathcal{H}] = X$
 if $X$ is $\mathcal{H}$-measurable.
 In other words, if $\mathcal{H}$ already contains
 all the information extractable from $X$,
 then we know $X$'s exact value.
 Conveniently, this can easily be generalized to products:
-$\mathbf{E}(XY | \mathcal{H}) = X \mathbf{E}(Y | \mathcal{H})$
+$\mathbf{E}[XY | \mathcal{H}] = X \mathbf{E}[Y | \mathcal{H}]$
 if $X$ is $\mathcal{H}$-measurable:
 since $X$'s value is known, it can simply be factored out.
 
 Armed with this definition of conditional expectation,
 we can define other conditional quantities,
-such as the **conditional variance** $\mathbf{V}(X | \mathcal{H})$:
+such as the **conditional variance** $\mathbf{V}[X | \mathcal{H}]$:
 
 $$\begin{aligned}
-    \mathbf{V}(X | \mathcal{H})
-    = \mathbf{E}(X^2 | \mathcal{H}) - \big(\mathbf{E}(X | \mathcal{H})\big)^2
+    \mathbf{V}[X | \mathcal{H}]
+    = \mathbf{E}[X^2 | \mathcal{H}] - \big[\mathbf{E}[X | \mathcal{H}]\big]^2
 \end{aligned}$$
 
 The **law of total variance** then states that
-$\mathbf{V}(X) = \mathbf{E}(\mathbf{V}(X | \mathcal{H})) + \mathbf{V}(\mathbf{E}(X | \mathcal{H}))$.
+$\mathbf{V}[X] = \mathbf{E}[\mathbf{V}[X | \mathcal{H}]] + \mathbf{V}[\mathbf{E}[X | \mathcal{H}]]$.
 
 Likewise, we can define the **conditional probability** $P$,
 **conditional distribution function** $F_{X|\mathcal{H}}$,
@@ -156,7 +159,7 @@ like their non-conditional counterparts:
 
 $$\begin{aligned}
     P(A | \mathcal{H})
-    = \mathbf{E}(I(A) | \mathcal{H})
+    = \mathbf{E}[I(A) | \mathcal{H}]
     \qquad
     F_{X|\mathcal{H}}(x)
     = P(X \le x | \mathcal{H})
@@ -168,6 +171,6 @@ $$\begin{aligned}
 
 
 ## References
-1.  U.F. Thygesen,
+1.  U.H. Thygesen,
     *Lecture notes on diffusions and stochastic differential equations*,
     2021, Polyteknisk Kompendie.
diff --git a/content/know/concept/ito-calculus/index.pdc b/content/know/concept/ito-calculus/index.pdc
new file mode 100644
index 0000000..576e09a
--- /dev/null
+++ b/content/know/concept/ito-calculus/index.pdc
@@ -0,0 +1,215 @@
+---
+title: "Itō calculus"
+firstLetter: "I"
+publishDate: 2021-11-06
+categories:
+- Mathematics
+
+date: 2021-11-06T14:34:00+01:00
+draft: false
+markup: pandoc
+---
+
+# Itō calculus
+
+Given two time-indexed [random variables](/know/concept/random-variable/)
+(i.e. stochastic processes) $F_t$ and $G_t$,
+then consider the following random variable $X_t$,
+where $B_t$ is the [Wiener process](/know/concept/wiener-process/):
+
+$$\begin{aligned}
+    X_t
+    = X_0 + \int_0^t F_s \dd{s} + \int_0^t G_s \dd{B_s}
+\end{aligned}$$
+
+Where the latter is an [Itō integral](/know/concept/ito-integral/),
+assuming $G_t$ is Itō-integrable.
+We call $X_t$ an **Itō process** if $F_t$ is locally integrable,
+and the initial condition $X_0$ is known,
+i.e. $X_0$ is $\mathcal{F}_0$-measurable,
+where $\mathcal{F}_t$ is the [filtration](/know/concept/sigma-algebra/)
+to which $F_t$, $G_t$ and $B_t$ are adapted.
+The above definition of $X_t$ is often abbreviated as follows,
+where $X_0$ is implicit:
+
+$$\begin{aligned}
+    \dd{X_t}
+    = F_t \dd{t} + G_t \dd{B_t}
+\end{aligned}$$
+
+Typically, $F_t$ is referred to as the **drift** of $X_t$,
+and $G_t$ as its **intensity**.
+Now, consider the following **Itō stochastic differential equation** (SDE),
+where $\xi_t = \dv*{B_t}{t}$ is white noise:
+
+$$\begin{aligned}
+    \dv{X_t}{t}
+    = f(X_t, t) + g(X_t, t) \: \xi_t
+\end{aligned}$$
+
+An Itō process $X_t$ is said to satisfy this equation
+if $f(X_t, t) = F_t$ and $g(X_t, t) = G_t$,
+in which case $X_t$ is also called an **Itō diffusion**.
+
+Because the Itō integral of $G_t$ is a
+[martingale](/know/concept/martingale/),
+it does not contribute to the mean of $X_t$:
+
+$$\begin{aligned}
+    \mathbf{E}[X_t]
+    = \int_0^t \mathbf{E}[F_s] \dd{s}
+\end{aligned}$$
+
+
+## Itō's lemma
+
+Classically, given $y \equiv h(x(t), t)$,
+the chain rule of differentiation states that:
+
+$$\begin{aligned}
+    \dd{y}
+    = \pdv{h}{t} \dd{t} + \pdv{h}{x} \dd{x}
+\end{aligned}$$
+
+However, for a stochastic process $Y_t \equiv h(X_t, t)$,
+where $X_t$ is an Itō process,
+the chain rule is modified to the following,
+known as **Itō's lemma**:
+
+$$\begin{aligned}
+    \boxed{
+        \dd{Y_t}
+        = \pdv{h}{t} \dd{t} + \bigg( \pdv{h}{x} F_t + \frac{1}{2} G_t^2 \pdv[2]{h}{x} \bigg) \dd{t} + \pdv{h}{x} G_t \dd{B_t}
+    }
+\end{aligned}$$
+
+<div class="accordion">
+<input type="checkbox" id="proof-lemma"/>
+<label for="proof-lemma">Proof</label>
+<div class="hidden">
+<label for="proof-lemma">Proof.</label>
+We start by applying the classical chain rule,
+but we go to second order in $x$.
+This is also valid classically,
+but there we would neglect all higher-order infinitesimals:
+
+$$\begin{aligned}
+    \dd{Y_t}
+    = \pdv{h}{t} \dd{t} + \pdv{h}{x} \dd{X_t} + \frac{1}{2} \pdv[2]{h}{x} \dd{X_t}^2
+\end{aligned}$$
+
+But here we cannot neglect $\dd{X_t}^2$.
+We insert the definition of an Itō process:
+
+$$\begin{aligned}
+    \dd{Y_t}
+    &= \pdv{h}{t} \dd{t} + \pdv{h}{x} \Big( F_t \dd{t} + G_t \dd{B_t} \Big) + \frac{1}{2} \pdv[2]{h}{x} \Big( F_t \dd{t} + G_t \dd{B_t} \Big)^2
+    \\
+    &= \pdv{h}{t} \dd{t} + \pdv{h}{x} \Big( F_t \dd{t} + G_t \dd{B_t} \Big)
+    + \frac{1}{2} \pdv[2]{h}{x} \Big( F_t^2 \dd{t}^2 + 2 F_t G_t \dd{t} \dd{B_t} + G_t^2 \dd{B_t}^2 \Big)
+\end{aligned}$$
+
+In the limit of small $\dd{t}$, we can neglect $\dd{t}^2$,
+and as it turns out, $\dd{t} \dd{B_t}$ too:
+
+$$\begin{aligned}
+    \dd{t} \dd{B_t}
+    &= (B_{t + \dd{t}} - B_t) \dd{t}
+    \sim \dd{t} \mathcal{N}(0, \dd{t})
+    \sim \mathcal{N}(0, \dd{t}^3)
+    \longrightarrow 0
+\end{aligned}$$
+
+However, due to the scaling property of $B_t$,
+we cannot ignore $\dd{B_t}^2$, which has order $\dd{t}$:
+
+$$\begin{aligned}
+    \dd{B_t}^2
+    &= (B_{t + \dd{t}} - B_t)^2
+    \sim \big( \mathcal{N}(0, \dd{t}) \big)^2
+    \sim \chi^2_1(\dd{t})
+    \longrightarrow \dd{t}
+\end{aligned}$$
+
+Where $\chi_1^2(\dd{t})$ is the generalized chi-squared distribution
+with one term of variance $\dd{t}$.
+</div>
+</div>
+
+The most important application of Itō's lemma
+is to perform coordinate transformations,
+to make the solution of a given Itō SDE easier.
+
+
+## Coordinate transformations
+
+The simplest coordinate transformation is a scaling of the time axis.
+Defining $s \equiv \alpha t$, the goal is to keep the Itō process.
+We know how to scale $B_t$, be setting $W_s \equiv \sqrt{\alpha} B_{s / \alpha}$.
+Let $Y_s \equiv X_t$ be the new variable on the rescaled axis, then:
+
+$$\begin{aligned}
+    \dd{Y_s}
+    = \dd{X_t}
+    &= f(X_t) \dd{t} + g(X_t) \dd{B_t}
+    \\
+    &= \frac{1}{\alpha} f(Y_s) \dd{s} + \frac{1}{\sqrt{\alpha}} g(Y_s) \dd{W_s}
+\end{aligned}$$
+
+$W_s$ is a valid Wiener process,
+and the other changes are small,
+so this is still an Itō process.
+
+To solve SDEs analytically, it is usually best
+to have additive noise, i.e. $g = 1$.
+This can be achieved using the **Lamperti transform**:
+define $Y_t \equiv h(X_t)$, where $h$ is given by:
+
+$$\begin{aligned}
+    \boxed{
+        h(x)
+        = \int_{x_0}^x \frac{1}{g(y)} \dd{y}
+    }
+\end{aligned}$$
+
+Then, using Itō's lemma, it is straightforward
+to show that the intensity becomes $1$.
+Note that the lower integration limit $x_0$ does not enter:
+
+$$\begin{aligned}
+    \dd{Y_t}
+    &= \bigg( f(X_t) \: h'(X_t) + \frac{1}{2} g^2(X_t) \: h''(X_t) \bigg) \dd{t} + g(X_t) \: h'(X_t) \dd{B_t}
+    \\
+    &= \bigg( \frac{f(X_t)}{g(X_t)} - \frac{1}{2} g^2(X_t) \frac{g'(X_t)}{g^2(X_t)} \bigg) \dd{t} + \frac{g(X_t)}{g(X_t)} \dd{B_t}
+    \\
+    &= \bigg( \frac{f(X_t)}{g(X_t)} - \frac{1}{2} g'(X_t) \bigg) \dd{t} + \dd{B_t}
+\end{aligned}$$
+
+Similarly, we can eliminate the drift $f = 0$,
+thereby making the Itō process a martingale.
+This is done by defining $Y_t \equiv h(X_t)$, with $h(x)$ given by:
+
+$$\begin{aligned}
+    \boxed{
+        h(x)
+        = \int_{x_0}^x \exp\!\bigg( \!-\!\! \int_{x_1}^x \frac{2 f(y)}{g^2(y)} \dd{y} \bigg)
+    }
+\end{aligned}$$
+
+The goal is to make the parenthesized first term (see above)
+of Itō's lemma disappear, which this $h(x)$ does indeed do.
+Note that $x_0$ and $x_1$ do not enter:
+
+$$\begin{aligned}
+    0
+    &= f(x) \: h'(x) + \frac{1}{2} g^2(x) \: h''(x)
+    \\
+    &= \Big( f(x) - \frac{1}{2} g^2(x) \frac{2 f(x)}{g(x)} \Big) \exp\!\bigg( \!-\!\! \int_{x_1}^x \frac{2 f(y)}{g^2(y)} \dd{y} \bigg)
+\end{aligned}$$
+
+
+
+## References
+1.  U.H. Thygesen,
+    *Lecture notes on diffusions and stochastic differential equations*,
+    2021, Polyteknisk Kompendie.
diff --git a/content/know/concept/ito-integral/index.pdc b/content/know/concept/ito-integral/index.pdc
new file mode 100644
index 0000000..ec49189
--- /dev/null
+++ b/content/know/concept/ito-integral/index.pdc
@@ -0,0 +1,274 @@
+---
+title: "Itō integral"
+firstLetter: "I"
+publishDate: 2021-11-06
+categories:
+- Mathematics
+
+date: 2021-10-21T19:41:58+02:00
+draft: false
+markup: pandoc
+---
+
+# Itō integral
+
+The **Itō integral** offers a way to integrate
+a time-indexed [random variable](/know/concept/random-variable/)
+$G_t$ (i.e. a stochastic process) with respect
+to a [Wiener process](/know/concept/wiener-process/) $B_t$,
+which is also a stochastic process.
+The Itō integral $I_t$ of $G_t$ is defined as follows:
+
+$$\begin{aligned}
+    \boxed{
+        I_t
+        \equiv \int_a^b G_t \dd{B_t}
+        \equiv \lim_{h \to 0} \sum_{t = a}^{t = b} G_t \big(B_{t + h} - B_t\big)
+    }
+\end{aligned}$$
+
+Where have partitioned the time interval $[a, b]$ into steps of size $h$.
+The above integral exists if $G_t$ and $B_t$ are adapted
+to a common [filtration](/know/concept/sigma-algebra) $\mathcal{F}_t$,
+and $\mathbf{E}[G_t^2]$ is integrable for $t \in [a, b]$.
+If $I_t$ exists, $G_t$ is said to be **Itō-integrable** with respect to $B_t$.
+
+
+## Motivation
+
+Consider the following simple first-order differential equation for $X_t$,
+for some function $f$:
+
+$$\begin{aligned}
+    \dv{X_t}{t}
+    = f(X_t)
+\end{aligned}$$
+
+This can be solved numerically using the explicit Euler scheme
+by discretizing it with step size $h$,
+which can be applied recursively, leading to:
+
+$$\begin{aligned}
+    X_{t+h}
+    \approx X_{t} + f(X_t) \: h
+    \quad \implies \quad
+    X_t
+    \approx X_0 + \sum_{s = 0}^{s = t} f(X_s) \: h
+\end{aligned}$$
+
+In the limit $h \to 0$, this leads to the following unsurprising integral for $X_t$:
+
+$$\begin{aligned}
+    \int_0^t f(X_s) \dd{s}
+    = \lim_{h \to 0} \sum_{s = 0}^{s = t} f(X_s) \: h
+\end{aligned}$$
+
+In contrast, consider the *stochastic differential equation* below,
+where $\xi_t$ represents white noise,
+which is informally the $t$-derivative
+of the Wiener process $\xi_t = \dv*{B_t}{t}$:
+
+$$\begin{aligned}
+    \dv{X_t}{t}
+    = g(X_t) \: \xi_t
+\end{aligned}$$
+
+Now $X_t$ is not deterministic,
+since $\xi_t$ is derived from a random variable $B_t$.
+If $g = 1$, we expect $X_t = X_0 + B_t$.
+With this in mind, we introduce the **Euler-Maruyama scheme**:
+
+$$\begin{aligned}
+    X_{t+h}
+    &= X_t + g(X_t) \: (\xi_{t+h} - \xi_t) \: h
+    \\
+    &= X_t + g(X_t) \: (B_{t+h} - B_t)
+\end{aligned}$$
+
+We would like to turn this into an integral for $X_t$, as we did above.
+Therefore, we state:
+
+$$\begin{aligned}
+    X_t
+    = X_0 + \int_0^t g(X_s) \dd{B_s}
+\end{aligned}$$
+
+This integral is *defined* as below,
+analogously to the first, but with $h$ replaced by
+the increment $B_{t+h} \!-\! B_t$ of a Wiener process.
+This is an Itō integral:
+
+$$\begin{aligned}
+    \int_0^t g(X_s) \dd{B_s}
+    \equiv \lim_{h \to 0} \sum_{s = 0}^{s = t} g(X_s) \big(B_{s + h} - B_s\big)
+\end{aligned}$$
+
+For more information about applying the Itō integral in this way,
+see the [Itō calculus](/know/concept/ito-calculus/).
+
+
+## Properties
+
+Since $G_t$ and $B_t$ must be known (i.e. $\mathcal{F}_t$-adapted)
+in order to evaluate the Itō integral $I_t$ at any given $t$,
+it logically follows that $I_t$ is also $\mathcal{F}_t$-adapted.
+
+Because the Itō integral is defined as the limit of a sum of linear terms,
+it inherits this linearity.
+Consider two Itō-integrable processes $G_t$ and $H_t$,
+and two constants $v, w \in \mathbb{R}$:
+
+$$\begin{aligned}
+    \int_a^b v G_t + w H_t \dd{B_t}
+    = v\! \int_a^b G_t \dd{B_t} +\: w\! \int_a^b H_t \dd{B_t}
+\end{aligned}$$
+
+By adding multiple summations,
+the Itō integral clearly satisfies, for $a < b < c$:
+
+$$\begin{aligned}
+    \int_a^c G_t \dd{B_t}
+    = \int_a^b G_t \dd{B_t} + \int_b^c G_t \dd{B_t}
+\end{aligned}$$
+
+A more interesting property is the **Itō isometry**,
+which expresses the expectation of the square of an Itō integral of $G_t$
+as a simpler "ordinary" integral of the expectation of $G_t^2$
+(which exists by the definition of Itō-integrability):
+
+$$\begin{aligned}
+    \boxed{
+        \mathbf{E} \bigg( \int_a^b G_t \dd{B_t} \bigg)^2
+        = \int_a^b \mathbf{E} \big[ G_t^2 \big] \dd{t}
+    }
+\end{aligned}$$
+
+<div class="accordion">
+<input type="checkbox" id="proof-isometry"/>
+<label for="proof-isometry">Proof</label>
+<div class="hidden">
+<label for="proof-isometry">Proof.</label>
+We write out the left-hand side of the Itō isometry,
+where eventually $h \to 0$:
+
+$$\begin{aligned}
+    \mathbf{E} \bigg[ \sum_{t = a}^{t = b} G_t (B_{t + h} \!-\! B_t) \bigg]^2
+    &= \sum_{t = a}^{t = b} \sum_{s = a}^{s = b} \mathbf{E} \bigg[ G_t G_s (B_{t + h} \!-\! B_t) (B_{s + h} \!-\! B_s) \bigg]
+\end{aligned}$$
+
+In the particular case $t \ge s \!+\! h$,
+a given term of this summation can be rewritten
+as follows using the *law of total expectation*
+(see [conditional expectation](/know/concept/conditional-expectation/)):
+
+$$\begin{aligned}
+    \mathbf{E} \Big[ G_t G_s (B_{t + h} \!-\! B_t) (B_{s + h} \!-\! B_s) \Big]
+    = \mathbf{E} \bigg[ \mathbf{E} \Big[ G_t G_s (B_{t + h} \!-\! B_t) (B_{s + h} \!-\! B_s) \Big| \mathcal{F}_t \Big] \bigg]
+\end{aligned}$$
+
+Recall that $G_t$ and $B_t$ are adapted to $\mathcal{F}_t$:
+at time $t$, we have information $\mathcal{F}_t$,
+which includes knowledge of the realized values $G_t$ and $B_t$.
+Since $t \ge s \!+\! h$ by assumption, we can simply factor out the known quantities:
+
+$$\begin{aligned}
+    \mathbf{E} \Big[ G_t G_s (B_{t + h} \!-\! B_t) (B_{s + h} \!-\! B_s) \Big]
+    = \mathbf{E} \bigg[ G_t G_s (B_{s + h} \!-\! B_s) \: \mathbf{E} \Big[ (B_{t + h} \!-\! B_t) \Big| \mathcal{F}_t \Big] \bigg]
+\end{aligned}$$
+
+However, $\mathcal{F}_t$ says nothing about
+the increment $(B_{t + h} \!-\! B_t) \sim \mathcal{N}(0, h)$,
+meaning that the conditional expectation is zero:
+
+$$\begin{aligned}
+    \mathbf{E} \Big[ G_t G_s (B_{t + h} \!-\! B_t) (B_{s + h} \!-\! B_s) \Big]
+    = 0
+    \qquad \mathrm{for}\; t \ge s + h
+\end{aligned}$$
+
+By swapping $s$ and $t$, the exact same result can be obtained for $s \ge t \!+\! h$:
+
+$$\begin{aligned}
+    \mathbf{E} \Big[ G_t G_s (B_{t + h} \!-\! B_t) (B_{s + h} \!-\! B_s) \Big]
+    = 0
+    \qquad \mathrm{for}\; s \ge t + h
+\end{aligned}$$
+
+This leaves only one case which can be nonzero: $[t, t\!+\!h] = [s, s\!+\!h]$.
+Applying the law of total expectation again yields:
+
+$$\begin{aligned}
+    \mathbf{E} \bigg[ \sum_{t = a}^{t = b} G_t (B_{t + h} \!-\! B_t) \bigg]^2
+    &= \sum_{t = a}^{t = b} \mathbf{E} \Big[ G_t^2 (B_{t + h} \!-\! B_t)^2 \Big]
+    \\
+    &= \sum_{t = a}^{t = b} \mathbf{E} \bigg[ \mathbf{E} \Big[ G_t^2 (B_{t + h} \!-\! B_t)^2 \Big| \mathcal{F}_t \Big] \bigg]
+\end{aligned}$$
+
+We know $G_t$, and the expectation value of $(B_{t+h} \!-\! B_t)^2$,
+since the increment is normally distributed, is simply the variance $h$:
+
+$$\begin{aligned}
+    \mathbf{E} \bigg[ \sum_{t = a}^{t = b} G_t (B_{t + h} \!-\! B_t) \bigg]^2
+    &= \sum_{t = a}^{t = b} \mathbf{E} \big[ G_t^2 \big] h
+    \longrightarrow
+    \int_a^b \mathbf{E} \big[ G_t^2 \big] \dd{t}
+\end{aligned}$$
+</div>
+</div>
+
+Furthermore, Itō integrals are [martingales](/know/concept/martingale/),
+meaning that the average noise contribution is zero,
+which makes intuitive sense,
+since true white noise cannot be biased.
+
+<div class="accordion">
+<input type="checkbox" id="proof-martingale"/>
+<label for="proof-martingale">Proof</label>
+<div class="hidden">
+<label for="proof-martingale">Proof.</label>
+We will prove that an arbitrary Itō integral $I_t$ is a martingale.
+Using additivity, we know that the increment $I_t \!-\! I_s$
+is as follows, given information $\mathcal{F}_s$:
+
+$$\begin{aligned}
+    \mathbf{E} \big[ I_t \!-\! I_s | \mathcal{F}_s \big]
+    = \mathbf{E} \bigg[ \int_s^t G_u \dd{B_u} \bigg| \mathcal{F}_s \bigg]
+    = \lim_{h \to 0} \sum_{u = s}^{u = t} \mathbf{E} \Big[ G_u (B_{u + h} \!-\! B_u) \Big| \mathcal{F}_s \Big]
+\end{aligned}$$
+
+We rewrite this [conditional expectation](/know/concept/conditional-expectation/)
+using the *tower property* for some $\mathcal{F}_u \supset \mathcal{F}_s$,
+such that $G_u$ and $B_u$ are known, but $B_{u+h} \!-\! B_u$ is not:
+
+$$\begin{aligned}
+    \mathbf{E} \big[ I_t \!-\! I_s | \mathcal{F}_s \big]
+    &= \lim_{h \to 0} \sum_{u = s}^{u = t}
+    \mathbf{E} \bigg[ \mathbf{E} \Big[ G_u (B_{u + h} \!-\! B_u) \Big| \mathcal{F}_u \Big] \bigg| \mathcal{F}_s \bigg]
+    = 0
+\end{aligned}$$
+
+We now have everything we need to calculate $\mathbf{E} [ I_t | \mathcal{F_s} ]$,
+giving the martingale property:
+
+$$\begin{aligned}
+    \mathbf{E} \big[ I_t | \mathcal{F}_s \big]
+    = \mathbf{E} \big[ I_s | \mathcal{F}_s \big] + \mathbf{E} \big[ I_t \!-\! I_s | \mathcal{F}_s \big]
+    = I_s + \mathbf{E} \big[ I_t \!-\! I_s | \mathcal{F}_s \big]
+    = I_s
+\end{aligned}$$
+
+For the existence of $I_t$,
+we need $\mathbf{E}[G_t^2]$ to be integrable over the target interval,
+so from the Itō isometry we have $\mathbf{E}[I]^2 < \infty$,
+and therefore $\mathbf{E}[I] < \infty$,
+so $I_t$ has all the properties of a Martingale,
+since it is trivially $\mathcal{F}_t$-adapted.
+</div>
+</div>
+
+
+
+## References
+1.  U.H. Thygesen,
+    *Lecture notes on diffusions and stochastic differential equations*,
+    2021, Polyteknisk Kompendie.
diff --git a/content/know/concept/martingale/index.pdc b/content/know/concept/martingale/index.pdc
index ffc286b..07ed1a4 100644
--- a/content/know/concept/martingale/index.pdc
+++ b/content/know/concept/martingale/index.pdc
@@ -56,6 +56,6 @@ since they will tend to increase and decrease with time, respectively.
 
 
 ## References
-1.  U.F. Thygesen,
+1.  U.H. Thygesen,
     *Lecture notes on diffusions and stochastic differential equations*,
     2021, Polyteknisk Kompendie.
diff --git a/content/know/concept/random-variable/index.pdc b/content/know/concept/random-variable/index.pdc
index fe50b60..2a8643e 100644
--- a/content/know/concept/random-variable/index.pdc
+++ b/content/know/concept/random-variable/index.pdc
@@ -119,27 +119,27 @@ $$\begin{aligned}
 
 ## Expectation value
 
-The **expectation value** $\mathbf{E}(X)$ of a random variable $X$
+The **expectation value** $\mathbf{E}[X]$ of a random variable $X$
 can be defined in the familiar way, as the sum/integral
 of every possible value of $X$ mutliplied by the corresponding probability (density).
 For continuous and discrete sample spaces $\Omega$, respectively:
 
 $$\begin{aligned}
-    \mathbf{E}(X)
+    \mathbf{E}[X]
     = \int_{-\infty}^\infty x \: f_X(x) \dd{x}
     \qquad \mathrm{or} \qquad
-    \mathbf{E}(X)
+    \mathbf{E}[X]
     = \sum_{i = 1}^N x_i \: P(X \!=\! x_i)
 \end{aligned}$$
 
 However, $f_X(x)$ is not guaranteed to exist,
 and the distinction between continuous and discrete is cumbersome.
-A more general definition of $\mathbf{E}(X)$
+A more general definition of $\mathbf{E}[X]$
 is the following Lebesgue-Stieltjes integral,
 since $F_X(x)$ always exists:
 
 $$\begin{aligned}
-    \mathbf{E}(X)
+    \mathbf{E}[X]
     = \int_{-\infty}^\infty x \dd{F_X(x)}
 \end{aligned}$$
 
@@ -147,25 +147,25 @@ This is valid for any sample space $\Omega$.
 Or, equivalently, a Lebesgue integral can be used:
 
 $$\begin{aligned}
-    \mathbf{E}(X)
+    \mathbf{E}[X]
     = \int_\Omega X(\omega) \dd{P(\omega)}
 \end{aligned}$$
 
 An expectation value defined in this way has many useful properties,
 most notably linearity.
 
-We can also define the familiar **variance** $\mathbf{V}(X)$
+We can also define the familiar **variance** $\mathbf{V}[X]$
 of a random variable $X$ as follows:
 
 $$\begin{aligned}
-    \mathbf{V}(X)
-    = \mathbf{E}\big( (X - \mathbf{E}(X))^2 \big)
-    = \mathbf{E}(X^2) - \big(\mathbf{E}(X)\big)^2
+    \mathbf{V}[X]
+    = \mathbf{E}\big[ (X - \mathbf{E}[X])^2 \big]
+    = \mathbf{E}[X^2] - \big(\mathbf{E}[X]\big)^2
 \end{aligned}$$
 
 
 
 ## References
-1.  U.F. Thygesen,
+1.  U.H. Thygesen,
     *Lecture notes on diffusions and stochastic differential equations*,
     2021, Polyteknisk Kompendie.
diff --git a/content/know/concept/sigma-algebra/index.pdc b/content/know/concept/sigma-algebra/index.pdc
index 1a459ea..96240ff 100644
--- a/content/know/concept/sigma-algebra/index.pdc
+++ b/content/know/concept/sigma-algebra/index.pdc
@@ -115,6 +115,6 @@ Clearly, $X_t$ is always adapted to its own filtration.
 
 
 ## References
-1.  U.F. Thygesen,
+1.  U.H. Thygesen,
     *Lecture notes on diffusions and stochastic differential equations*,
     2021, Polyteknisk Kompendie.
diff --git a/content/know/concept/wiener-process/index.pdc b/content/know/concept/wiener-process/index.pdc
index 49aebfb..3602b44 100644
--- a/content/know/concept/wiener-process/index.pdc
+++ b/content/know/concept/wiener-process/index.pdc
@@ -85,6 +85,6 @@ $$\begin{aligned}
 
 
 ## References
-1.  U.F. Thygesen,
+1.  U.H. Thygesen,
     *Lecture notes on diffusions and stochastic differential equations*,
     2021, Polyteknisk Kompendie.
author	Prefetch	2021-11-06 21:47:08 +0100
committer	Prefetch	2021-11-06 21:47:08 +0100
commit	f091bf0922c26238d16bf175a8ea916a16d11fba (patch)
tree	307ace9fde0b408f45fdc55bc8926fc15d8df7c6
parent	a17363fa734518ada98fc3e79c9fd20f70e42f1b (diff)