diff options
author | Prefetch | 2022-10-14 23:25:28 +0200 |
---|---|---|
committer | Prefetch | 2022-10-14 23:25:28 +0200 |
commit | 6ce0bb9a8f9fd7d169cbb414a9537d68c5290aae (patch) | |
tree | a0abb6b22f77c0e84ed38277d14662412ce14f39 /source/know/concept/conditional-expectation |
Initial commit after migration from Hugo
Diffstat (limited to 'source/know/concept/conditional-expectation')
-rw-r--r-- | source/know/concept/conditional-expectation/index.md | 172 |
1 files changed, 172 insertions, 0 deletions
diff --git a/source/know/concept/conditional-expectation/index.md b/source/know/concept/conditional-expectation/index.md new file mode 100644 index 0000000..c545cef --- /dev/null +++ b/source/know/concept/conditional-expectation/index.md @@ -0,0 +1,172 @@ +--- +title: "Conditional expectation" +date: 2021-10-23 +categories: +- Mathematics +- Statistics +- Measure theory +- Stochastic analysis +layout: "concept" +--- + +Recall that the expectation value $\mathbf{E}[X]$ +of a [random variable](/know/concept/random-variable/) $X$ +is a function of the probability space $(\Omega, \mathcal{F}, P)$ +on which $X$ is defined, and the definition of $X$ itself. + +The **conditional expectation** $\mathbf{E}[X|A]$ +is the expectation value of $X$ given that an event $A$ has occurred, +i.e. only the outcomes $\omega \in \Omega$ +satisfying $\omega \in A$ should be considered. +If $A$ is obtained by observing a variable, +then $\mathbf{E}[X|A]$ is a random variable in its own right. + +Consider two random variables $X$ and $Y$ +on the same probability space $(\Omega, \mathcal{F}, P)$, +and suppose that $\Omega$ is discrete. +If $Y = y$ has been observed, +then the conditional expectation of $X$ +given the event $Y = y$ is as follows: + +$$\begin{aligned} + \mathbf{E}[X | Y \!=\! y] + = \sum_{x} x \: Q(X \!=\! x) + \qquad \quad + Q(X \!=\! x) + = \frac{P(X \!=\! x \cap Y \!=\! y)}{P(Y \!=\! y)} +\end{aligned}$$ + +Where $Q$ is a renormalized probability function, +which assigns zero to all events incompatible with $Y = y$. +If we allow $\Omega$ to be continuous, +then from the definition $\mathbf{E}[X]$, +we know that the following Lebesgue integral can be used, +which we call $f(y)$: + +$$\begin{aligned} + \mathbf{E}[X | Y \!=\! y] + = f(y) + = \int_\Omega X(\omega) \dd{Q(\omega)} +\end{aligned}$$ + +However, this is only valid if $P(Y \!=\! y) > 0$, +which is a problem for continuous sample spaces $\Omega$. +Sticking with the assumption $P(Y \!=\! y) > 0$, notice that: + +$$\begin{aligned} + f(y) + = \frac{1}{P(Y \!=\! y)} \int_\Omega X(\omega) \dd{P(\omega \cap Y \!=\! y)} + = \frac{\mathbf{E}[X \cdot I(Y \!=\! y)]}{P(Y \!=\! y)} +\end{aligned}$$ + +Where $I$ is the indicator function, +equal to $1$ if its argument is true, and $0$ if not. +Multiplying the definition of $f(y)$ by $P(Y \!=\! y)$ then leads us to: + +$$\begin{aligned} + \mathbf{E}[X \cdot I(Y \!=\! y)] + &= f(y) \cdot P(Y \!=\! y) + \\ + &= \mathbf{E}[f(Y) \cdot I(Y \!=\! y)] +\end{aligned}$$ + +Recall that because $Y$ is a random variable, +$\mathbf{E}[X|Y] = f(Y)$ is too. +In other words, $f$ maps $Y$ to another random variable, +which, thanks to the *Doob-Dynkin lemma* +(see [random variable](/know/concept/random-variable/)), +means that $\mathbf{E}[X|Y]$ is measurable with respect to $\sigma(Y)$. +Intuitively, this makes sense: +$\mathbf{E}[X|Y]$ cannot contain more information about events +than the $Y$ it was calculated from. + +This suggests a straightforward generalization of the above: +instead of a specific value $Y = y$, +we can condition on *any* information from $Y$. +If $\mathcal{H} = \sigma(Y)$ is the information generated by $Y$, +then the conditional expectation $\mathbf{E}[X|\mathcal{H}] = Z$ +is $\mathcal{H}$-measurable, and given by a $Z$ satisfying: + +$$\begin{aligned} + \boxed{ + \mathbf{E}\big[X \cdot I(H)\big] + = \mathbf{E}\big[Z \cdot I(H)\big] + } +\end{aligned}$$ + +For any $H \in \mathcal{H}$. Note that $Z$ is almost surely unique: +*almost* because it could take any value +for an event $A$ with zero probability $P(A) = 0$. +Fortunately, if there exists a continuous $f$ +such that $\mathbf{E}[X | \sigma(Y)] = f(Y)$, +then $Z = \mathbf{E}[X | \sigma(Y)]$ is unique. + + +## Properties + +A conditional expectation defined in this way has many useful properties, +most notably linearity: +$\mathbf{E}[aX \!+\! bY | \mathcal{H}] = a \mathbf{E}[X|\mathcal{H}] + b \mathbf{E}[Y|\mathcal{H}]$ +for any $a, b \in \mathbb{R}$. + +The **tower property** states that if $\mathcal{F} \supset \mathcal{G} \supset \mathcal{H}$, +then $\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}] = \mathbf{E}[X|\mathcal{H}]$. +Intuitively, this works as follows: +suppose person $G$ knows more about $X$ than person $H$, +then $\mathbf{E}[X | \mathcal{H}]$ is $H$'s expectation, +$\mathbf{E}[X | \mathcal{G}]$ is $G$'s "better" expectation, +and then $\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}]$ +is $H$'s prediction about what $G$'s expectation will be. +However, $H$ does not have access to $G$'s extra information, +so $H$'s best prediction is simply $\mathbf{E}[X | \mathcal{H}]$. + +The **law of total expectation** says that +$\mathbf{E}[\mathbf{E}[X | \mathcal{G}]] = \mathbf{E}[X]$, +and follows from the above tower property +by choosing $\mathcal{H}$ to contain no information: +$\mathcal{H} = \{ \varnothing, \Omega \}$. + +Another useful property is that $\mathbf{E}[X | \mathcal{H}] = X$ +if $X$ is $\mathcal{H}$-measurable. +In other words, if $\mathcal{H}$ already contains +all the information extractable from $X$, +then we know $X$'s exact value. +Conveniently, this can easily be generalized to products: +$\mathbf{E}[XY | \mathcal{H}] = X \mathbf{E}[Y | \mathcal{H}]$ +if $X$ is $\mathcal{H}$-measurable: +since $X$'s value is known, it can simply be factored out. + +Armed with this definition of conditional expectation, +we can define other conditional quantities, +such as the **conditional variance** $\mathbf{V}[X | \mathcal{H}]$: + +$$\begin{aligned} + \mathbf{V}[X | \mathcal{H}] + = \mathbf{E}[X^2 | \mathcal{H}] - \big[\mathbf{E}[X | \mathcal{H}]\big]^2 +\end{aligned}$$ + +The **law of total variance** then states that +$\mathbf{V}[X] = \mathbf{E}[\mathbf{V}[X | \mathcal{H}]] + \mathbf{V}[\mathbf{E}[X | \mathcal{H}]]$. + +Likewise, we can define the **conditional probability** $P$, +**conditional distribution function** $F_{X|\mathcal{H}}$, +and **conditional density function** $f_{X|\mathcal{H}}$ +like their non-conditional counterparts: + +$$\begin{aligned} + P(A | \mathcal{H}) + = \mathbf{E}[I(A) | \mathcal{H}] + \qquad + F_{X|\mathcal{H}}(x) + = P(X \le x | \mathcal{H}) + \qquad + f_{X|\mathcal{H}}(x) + = \dv{F_{X|\mathcal{H}}}{x} +\end{aligned}$$ + + + +## References +1. U.H. Thygesen, + *Lecture notes on diffusions and stochastic differential equations*, + 2021, Polyteknisk Kompendie. |