summaryrefslogtreecommitdiff
path: root/source/know/concept/conditional-expectation
diff options
context:
space:
mode:
Diffstat (limited to 'source/know/concept/conditional-expectation')
-rw-r--r--source/know/concept/conditional-expectation/index.md172
1 files changed, 172 insertions, 0 deletions
diff --git a/source/know/concept/conditional-expectation/index.md b/source/know/concept/conditional-expectation/index.md
new file mode 100644
index 0000000..c545cef
--- /dev/null
+++ b/source/know/concept/conditional-expectation/index.md
@@ -0,0 +1,172 @@
+---
+title: "Conditional expectation"
+date: 2021-10-23
+categories:
+- Mathematics
+- Statistics
+- Measure theory
+- Stochastic analysis
+layout: "concept"
+---
+
+Recall that the expectation value $\mathbf{E}[X]$
+of a [random variable](/know/concept/random-variable/) $X$
+is a function of the probability space $(\Omega, \mathcal{F}, P)$
+on which $X$ is defined, and the definition of $X$ itself.
+
+The **conditional expectation** $\mathbf{E}[X|A]$
+is the expectation value of $X$ given that an event $A$ has occurred,
+i.e. only the outcomes $\omega \in \Omega$
+satisfying $\omega \in A$ should be considered.
+If $A$ is obtained by observing a variable,
+then $\mathbf{E}[X|A]$ is a random variable in its own right.
+
+Consider two random variables $X$ and $Y$
+on the same probability space $(\Omega, \mathcal{F}, P)$,
+and suppose that $\Omega$ is discrete.
+If $Y = y$ has been observed,
+then the conditional expectation of $X$
+given the event $Y = y$ is as follows:
+
+$$\begin{aligned}
+ \mathbf{E}[X | Y \!=\! y]
+ = \sum_{x} x \: Q(X \!=\! x)
+ \qquad \quad
+ Q(X \!=\! x)
+ = \frac{P(X \!=\! x \cap Y \!=\! y)}{P(Y \!=\! y)}
+\end{aligned}$$
+
+Where $Q$ is a renormalized probability function,
+which assigns zero to all events incompatible with $Y = y$.
+If we allow $\Omega$ to be continuous,
+then from the definition $\mathbf{E}[X]$,
+we know that the following Lebesgue integral can be used,
+which we call $f(y)$:
+
+$$\begin{aligned}
+ \mathbf{E}[X | Y \!=\! y]
+ = f(y)
+ = \int_\Omega X(\omega) \dd{Q(\omega)}
+\end{aligned}$$
+
+However, this is only valid if $P(Y \!=\! y) > 0$,
+which is a problem for continuous sample spaces $\Omega$.
+Sticking with the assumption $P(Y \!=\! y) > 0$, notice that:
+
+$$\begin{aligned}
+ f(y)
+ = \frac{1}{P(Y \!=\! y)} \int_\Omega X(\omega) \dd{P(\omega \cap Y \!=\! y)}
+ = \frac{\mathbf{E}[X \cdot I(Y \!=\! y)]}{P(Y \!=\! y)}
+\end{aligned}$$
+
+Where $I$ is the indicator function,
+equal to $1$ if its argument is true, and $0$ if not.
+Multiplying the definition of $f(y)$ by $P(Y \!=\! y)$ then leads us to:
+
+$$\begin{aligned}
+ \mathbf{E}[X \cdot I(Y \!=\! y)]
+ &= f(y) \cdot P(Y \!=\! y)
+ \\
+ &= \mathbf{E}[f(Y) \cdot I(Y \!=\! y)]
+\end{aligned}$$
+
+Recall that because $Y$ is a random variable,
+$\mathbf{E}[X|Y] = f(Y)$ is too.
+In other words, $f$ maps $Y$ to another random variable,
+which, thanks to the *Doob-Dynkin lemma*
+(see [random variable](/know/concept/random-variable/)),
+means that $\mathbf{E}[X|Y]$ is measurable with respect to $\sigma(Y)$.
+Intuitively, this makes sense:
+$\mathbf{E}[X|Y]$ cannot contain more information about events
+than the $Y$ it was calculated from.
+
+This suggests a straightforward generalization of the above:
+instead of a specific value $Y = y$,
+we can condition on *any* information from $Y$.
+If $\mathcal{H} = \sigma(Y)$ is the information generated by $Y$,
+then the conditional expectation $\mathbf{E}[X|\mathcal{H}] = Z$
+is $\mathcal{H}$-measurable, and given by a $Z$ satisfying:
+
+$$\begin{aligned}
+ \boxed{
+ \mathbf{E}\big[X \cdot I(H)\big]
+ = \mathbf{E}\big[Z \cdot I(H)\big]
+ }
+\end{aligned}$$
+
+For any $H \in \mathcal{H}$. Note that $Z$ is almost surely unique:
+*almost* because it could take any value
+for an event $A$ with zero probability $P(A) = 0$.
+Fortunately, if there exists a continuous $f$
+such that $\mathbf{E}[X | \sigma(Y)] = f(Y)$,
+then $Z = \mathbf{E}[X | \sigma(Y)]$ is unique.
+
+
+## Properties
+
+A conditional expectation defined in this way has many useful properties,
+most notably linearity:
+$\mathbf{E}[aX \!+\! bY | \mathcal{H}] = a \mathbf{E}[X|\mathcal{H}] + b \mathbf{E}[Y|\mathcal{H}]$
+for any $a, b \in \mathbb{R}$.
+
+The **tower property** states that if $\mathcal{F} \supset \mathcal{G} \supset \mathcal{H}$,
+then $\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}] = \mathbf{E}[X|\mathcal{H}]$.
+Intuitively, this works as follows:
+suppose person $G$ knows more about $X$ than person $H$,
+then $\mathbf{E}[X | \mathcal{H}]$ is $H$'s expectation,
+$\mathbf{E}[X | \mathcal{G}]$ is $G$'s "better" expectation,
+and then $\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}]$
+is $H$'s prediction about what $G$'s expectation will be.
+However, $H$ does not have access to $G$'s extra information,
+so $H$'s best prediction is simply $\mathbf{E}[X | \mathcal{H}]$.
+
+The **law of total expectation** says that
+$\mathbf{E}[\mathbf{E}[X | \mathcal{G}]] = \mathbf{E}[X]$,
+and follows from the above tower property
+by choosing $\mathcal{H}$ to contain no information:
+$\mathcal{H} = \{ \varnothing, \Omega \}$.
+
+Another useful property is that $\mathbf{E}[X | \mathcal{H}] = X$
+if $X$ is $\mathcal{H}$-measurable.
+In other words, if $\mathcal{H}$ already contains
+all the information extractable from $X$,
+then we know $X$'s exact value.
+Conveniently, this can easily be generalized to products:
+$\mathbf{E}[XY | \mathcal{H}] = X \mathbf{E}[Y | \mathcal{H}]$
+if $X$ is $\mathcal{H}$-measurable:
+since $X$'s value is known, it can simply be factored out.
+
+Armed with this definition of conditional expectation,
+we can define other conditional quantities,
+such as the **conditional variance** $\mathbf{V}[X | \mathcal{H}]$:
+
+$$\begin{aligned}
+ \mathbf{V}[X | \mathcal{H}]
+ = \mathbf{E}[X^2 | \mathcal{H}] - \big[\mathbf{E}[X | \mathcal{H}]\big]^2
+\end{aligned}$$
+
+The **law of total variance** then states that
+$\mathbf{V}[X] = \mathbf{E}[\mathbf{V}[X | \mathcal{H}]] + \mathbf{V}[\mathbf{E}[X | \mathcal{H}]]$.
+
+Likewise, we can define the **conditional probability** $P$,
+**conditional distribution function** $F_{X|\mathcal{H}}$,
+and **conditional density function** $f_{X|\mathcal{H}}$
+like their non-conditional counterparts:
+
+$$\begin{aligned}
+ P(A | \mathcal{H})
+ = \mathbf{E}[I(A) | \mathcal{H}]
+ \qquad
+ F_{X|\mathcal{H}}(x)
+ = P(X \le x | \mathcal{H})
+ \qquad
+ f_{X|\mathcal{H}}(x)
+ = \dv{F_{X|\mathcal{H}}}{x}
+\end{aligned}$$
+
+
+
+## References
+1. U.H. Thygesen,
+ *Lecture notes on diffusions and stochastic differential equations*,
+ 2021, Polyteknisk Kompendie.