From f091bf0922c26238d16bf175a8ea916a16d11fba Mon Sep 17 00:00:00 2001 From: Prefetch Date: Sat, 6 Nov 2021 21:47:08 +0100 Subject: Expand knowledge base --- .../know/concept/conditional-expectation/index.pdc | 69 +++++++++++----------- 1 file changed, 36 insertions(+), 33 deletions(-) (limited to 'content/know/concept/conditional-expectation') diff --git a/content/know/concept/conditional-expectation/index.pdc b/content/know/concept/conditional-expectation/index.pdc index 7da7660..5bcc152 100644 --- a/content/know/concept/conditional-expectation/index.pdc +++ b/content/know/concept/conditional-expectation/index.pdc @@ -13,17 +13,17 @@ markup: pandoc # Conditional expectation -Recall that the expectation value $\mathbf{E}(X)$ +Recall that the expectation value $\mathbf{E}[X]$ of a [random variable](/know/concept/random-variable/) $X$ is a function of the probability space $(\Omega, \mathcal{F}, P)$ on which $X$ is defined, and the definition of $X$ itself. -The **conditional expectation** $\mathbf{E}(X|A)$ +The **conditional expectation** $\mathbf{E}[X|A]$ is the expectation value of $X$ given that an event $A$ has occurred, i.e. only the outcomes $\omega \in \Omega$ satisfying $\omega \in A$ should be considered. -If $A$ is obtained by observing another variable, -then $\mathbf{E}(X|A)$ is a random variable in its own right. +If $A$ is obtained by observing a variable, +then $\mathbf{E}[X|A]$ is a random variable in its own right. Consider two random variables $X$ and $Y$ on the same probability space $(\Omega, \mathcal{F}, P)$, @@ -33,7 +33,7 @@ then the conditional expectation of $X$ given the event $Y = y$ is as follows: $$\begin{aligned} - \mathbf{E}(X | Y \!=\! y) + \mathbf{E}[X | Y \!=\! y] = \sum_{x} x \: Q(X \!=\! x) \qquad \quad Q(X \!=\! x) @@ -43,12 +43,12 @@ $$\begin{aligned} Where $Q$ is a renormalized probability function, which assigns zero to all events incompatible with $Y = y$. If we allow $\Omega$ to be continuous, -then from the definition $\mathbf{E}(X)$, +then from the definition $\mathbf{E}[X]$, we know that the following Lebesgue integral can be used, which we call $f(y)$: $$\begin{aligned} - \mathbf{E}(X | Y \!=\! y) + \mathbf{E}[X | Y \!=\! y] = f(y) = \int_\Omega X(\omega) \dd{Q(\omega)} \end{aligned}$$ @@ -60,7 +60,7 @@ Sticking with the assumption $P(Y \!=\! y) > 0$, notice that: $$\begin{aligned} f(y) = \frac{1}{P(Y \!=\! y)} \int_\Omega X(\omega) \dd{P(\omega \cap Y \!=\! y)} - = \frac{\mathbf{E}(X \cdot I(Y \!=\! y))}{P(Y \!=\! y)} + = \frac{\mathbf{E}[X \cdot I(Y \!=\! y)]}{P(Y \!=\! y)} \end{aligned}$$ Where $I$ is the indicator function, @@ -68,33 +68,33 @@ equal to $1$ if its argument is true, and $0$ if not. Multiplying the definition of $f(y)$ by $P(Y \!=\! y)$ then leads us to: $$\begin{aligned} - \mathbf{E}(X \cdot I(Y \!=\! y)) + \mathbf{E}[X \cdot I(Y \!=\! y)] &= f(y) \cdot P(Y \!=\! y) \\ - &= \mathbf{E}(f(Y) \cdot I(Y \!=\! y)) + &= \mathbf{E}[f(Y) \cdot I(Y \!=\! y)] \end{aligned}$$ Recall that because $Y$ is a random variable, -$\mathbf{E}(X|Y) = f(Y)$ is too. +$\mathbf{E}[X|Y] = f(Y)$ is too. In other words, $f$ maps $Y$ to another random variable, which, due to the *Doob-Dynkin lemma* (see [$\sigma$-algebra](/know/concept/sigma-algebra/)), -must mean that $\mathbf{E}(X|Y)$ is measurable with respect to $\sigma(Y)$. +must mean that $\mathbf{E}[X|Y]$ is measurable with respect to $\sigma(Y)$. Intuitively, this makes some sense: -$\mathbf{E}(X|Y)$ cannot contain more information about events +$\mathbf{E}[X|Y]$ cannot contain more information about events than the $Y$ it was calculated from. This suggests a straightforward generalization of the above: instead of a specific value $Y = y$, we can condition on *any* information from $Y$. If $\mathcal{H} = \sigma(Y)$ is the information generated by $Y$, -then the conditional expectation $\mathbf{E}(X|\mathcal{H}) = Z$ +then the conditional expectation $\mathbf{E}[X|\mathcal{H}] = Z$ is $\mathcal{H}$-measurable, and given by a $Z$ satisfying: $$\begin{aligned} \boxed{ - \mathbf{E}\big(X \cdot I(H)\big) - = \mathbf{E}\big(Z \cdot I(H)\big) + \mathbf{E}\big[X \cdot I(H)\big] + = \mathbf{E}\big[Z \cdot I(H)\big] } \end{aligned}$$ @@ -102,52 +102,55 @@ For any $H \in \mathcal{H}$. Note that $Z$ is almost surely unique: *almost* because it could take any value for an event $A$ with zero probability $P(A) = 0$. Fortunately, if there exists a continuous $f$ -such that $\mathbf{E}(X | \sigma(Y)) = f(Y)$, -then $Z = \mathbf{E}(X | \sigma(Y))$ is unique. +such that $\mathbf{E}[X | \sigma(Y)] = f(Y)$, +then $Z = \mathbf{E}[X | \sigma(Y)]$ is unique. + + +## Properties A conditional expectation defined in this way has many useful properties, most notably linearity: -$\mathbf{E}(aX \!+\! bY | \mathcal{H}) = a \mathbf{E}(X|\mathcal{H}) + b \mathbf{E}(Y|\mathcal{H})$ +$\mathbf{E}[aX \!+\! bY | \mathcal{H}] = a \mathbf{E}[X|\mathcal{H}] + b \mathbf{E}[Y|\mathcal{H}]$ for any $a, b \in \mathbb{R}$. The **tower property** states that if $\mathcal{F} \supset \mathcal{G} \supset \mathcal{H}$, -then $\mathbf{E}(\mathbf{E}(X|\mathcal{G})|\mathcal{H}) = \mathbf{E}(X|\mathcal{H})$. +then $\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}] = \mathbf{E}[X|\mathcal{H}]$. Intuitively, this works as follows: suppose person $G$ knows more about $X$ than person $H$, -then $\mathbf{E}(X | \mathcal{H})$ is $H$'s expectation, -$\mathbf{E}(X | \mathcal{G})$ is $G$'s "better" expectation, -and then $\mathbf{E}(\mathbf{E}(X|\mathcal{G})|\mathcal{H})$ +then $\mathbf{E}[X | \mathcal{H}]$ is $H$'s expectation, +$\mathbf{E}[X | \mathcal{G}]$ is $G$'s "better" expectation, +and then $\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}]$ is $H$'s prediction about what $G$'s expectation will be. However, $H$ does not have access to $G$'s extra information, -so $H$'s best prediction is simply $\mathbf{E}(X | \mathcal{H})$. +so $H$'s best prediction is simply $\mathbf{E}[X | \mathcal{H}]$. The **law of total expectation** says that -$\mathbf{E}(\mathbf{E}(X | \mathcal{G})) = \mathbf{E}(X)$, +$\mathbf{E}[\mathbf{E}[X | \mathcal{G}]] = \mathbf{E}[X]$, and follows from the above tower property by choosing $\mathcal{H}$ to contain no information: $\mathcal{H} = \{ \varnothing, \Omega \}$. -Another useful property is that $\mathbf{E}(X | \mathcal{H}) = X$ +Another useful property is that $\mathbf{E}[X | \mathcal{H}] = X$ if $X$ is $\mathcal{H}$-measurable. In other words, if $\mathcal{H}$ already contains all the information extractable from $X$, then we know $X$'s exact value. Conveniently, this can easily be generalized to products: -$\mathbf{E}(XY | \mathcal{H}) = X \mathbf{E}(Y | \mathcal{H})$ +$\mathbf{E}[XY | \mathcal{H}] = X \mathbf{E}[Y | \mathcal{H}]$ if $X$ is $\mathcal{H}$-measurable: since $X$'s value is known, it can simply be factored out. Armed with this definition of conditional expectation, we can define other conditional quantities, -such as the **conditional variance** $\mathbf{V}(X | \mathcal{H})$: +such as the **conditional variance** $\mathbf{V}[X | \mathcal{H}]$: $$\begin{aligned} - \mathbf{V}(X | \mathcal{H}) - = \mathbf{E}(X^2 | \mathcal{H}) - \big(\mathbf{E}(X | \mathcal{H})\big)^2 + \mathbf{V}[X | \mathcal{H}] + = \mathbf{E}[X^2 | \mathcal{H}] - \big[\mathbf{E}[X | \mathcal{H}]\big]^2 \end{aligned}$$ The **law of total variance** then states that -$\mathbf{V}(X) = \mathbf{E}(\mathbf{V}(X | \mathcal{H})) + \mathbf{V}(\mathbf{E}(X | \mathcal{H}))$. +$\mathbf{V}[X] = \mathbf{E}[\mathbf{V}[X | \mathcal{H}]] + \mathbf{V}[\mathbf{E}[X | \mathcal{H}]]$. Likewise, we can define the **conditional probability** $P$, **conditional distribution function** $F_{X|\mathcal{H}}$, @@ -156,7 +159,7 @@ like their non-conditional counterparts: $$\begin{aligned} P(A | \mathcal{H}) - = \mathbf{E}(I(A) | \mathcal{H}) + = \mathbf{E}[I(A) | \mathcal{H}] \qquad F_{X|\mathcal{H}}(x) = P(X \le x | \mathcal{H}) @@ -168,6 +171,6 @@ $$\begin{aligned} ## References -1. U.F. Thygesen, +1. U.H. Thygesen, *Lecture notes on diffusions and stochastic differential equations*, 2021, Polyteknisk Kompendie. -- cgit v1.2.3