summaryrefslogtreecommitdiff
path: root/source/know/concept/conditional-expectation
diff options
context:
space:
mode:
authorPrefetch2022-10-20 18:25:31 +0200
committerPrefetch2022-10-20 18:25:31 +0200
commit16555851b6514a736c5c9d8e73de7da7fc9b6288 (patch)
tree76b8bfd30f8941d0d85365990bcdbc5d0643cabc /source/know/concept/conditional-expectation
parente5b9bce79b68a68ddd2e51daa16d2fea73b84fdb (diff)
Migrate from 'jekyll-katex' to 'kramdown-math-sskatex'
Diffstat (limited to 'source/know/concept/conditional-expectation')
-rw-r--r--source/know/concept/conditional-expectation/index.md146
1 files changed, 73 insertions, 73 deletions
diff --git a/source/know/concept/conditional-expectation/index.md b/source/know/concept/conditional-expectation/index.md
index 7b13a4a..f64fa72 100644
--- a/source/know/concept/conditional-expectation/index.md
+++ b/source/know/concept/conditional-expectation/index.md
@@ -10,24 +10,24 @@ categories:
layout: "concept"
---
-Recall that the expectation value $\mathbf{E}[X]$
-of a [random variable](/know/concept/random-variable/) $X$
-is a function of the probability space $(\Omega, \mathcal{F}, P)$
-on which $X$ is defined, and the definition of $X$ itself.
-
-The **conditional expectation** $\mathbf{E}[X|A]$
-is the expectation value of $X$ given that an event $A$ has occurred,
-i.e. only the outcomes $\omega \in \Omega$
-satisfying $\omega \in A$ should be considered.
-If $A$ is obtained by observing a variable,
-then $\mathbf{E}[X|A]$ is a random variable in its own right.
-
-Consider two random variables $X$ and $Y$
-on the same probability space $(\Omega, \mathcal{F}, P)$,
-and suppose that $\Omega$ is discrete.
-If $Y = y$ has been observed,
-then the conditional expectation of $X$
-given the event $Y = y$ is as follows:
+Recall that the expectation value $$\mathbf{E}[X]$$
+of a [random variable](/know/concept/random-variable/) $$X$$
+is a function of the probability space $$(\Omega, \mathcal{F}, P)$$
+on which $$X$$ is defined, and the definition of $$X$$ itself.
+
+The **conditional expectation** $$\mathbf{E}[X|A]$$
+is the expectation value of $$X$$ given that an event $$A$$ has occurred,
+i.e. only the outcomes $$\omega \in \Omega$$
+satisfying $$\omega \in A$$ should be considered.
+If $$A$$ is obtained by observing a variable,
+then $$\mathbf{E}[X|A]$$ is a random variable in its own right.
+
+Consider two random variables $$X$$ and $$Y$$
+on the same probability space $$(\Omega, \mathcal{F}, P)$$,
+and suppose that $$\Omega$$ is discrete.
+If $$Y = y$$ has been observed,
+then the conditional expectation of $$X$$
+given the event $$Y = y$$ is as follows:
$$\begin{aligned}
\mathbf{E}[X | Y \!=\! y]
@@ -37,12 +37,12 @@ $$\begin{aligned}
= \frac{P(X \!=\! x \cap Y \!=\! y)}{P(Y \!=\! y)}
\end{aligned}$$
-Where $Q$ is a renormalized probability function,
-which assigns zero to all events incompatible with $Y = y$.
-If we allow $\Omega$ to be continuous,
-then from the definition $\mathbf{E}[X]$,
+Where $$Q$$ is a renormalized probability function,
+which assigns zero to all events incompatible with $$Y = y$$.
+If we allow $$\Omega$$ to be continuous,
+then from the definition $$\mathbf{E}[X]$$,
we know that the following Lebesgue integral can be used,
-which we call $f(y)$:
+which we call $$f(y)$$:
$$\begin{aligned}
\mathbf{E}[X | Y \!=\! y]
@@ -50,9 +50,9 @@ $$\begin{aligned}
= \int_\Omega X(\omega) \dd{Q(\omega)}
\end{aligned}$$
-However, this is only valid if $P(Y \!=\! y) > 0$,
-which is a problem for continuous sample spaces $\Omega$.
-Sticking with the assumption $P(Y \!=\! y) > 0$, notice that:
+However, this is only valid if $$P(Y \!=\! y) > 0$$,
+which is a problem for continuous sample spaces $$\Omega$$.
+Sticking with the assumption $$P(Y \!=\! y) > 0$$, notice that:
$$\begin{aligned}
f(y)
@@ -60,9 +60,9 @@ $$\begin{aligned}
= \frac{\mathbf{E}[X \cdot I(Y \!=\! y)]}{P(Y \!=\! y)}
\end{aligned}$$
-Where $I$ is the indicator function,
-equal to $1$ if its argument is true, and $0$ if not.
-Multiplying the definition of $f(y)$ by $P(Y \!=\! y)$ then leads us to:
+Where $$I$$ is the indicator function,
+equal to $$1$$ if its argument is true, and $$0$$ if not.
+Multiplying the definition of $$f(y)$$ by $$P(Y \!=\! y)$$ then leads us to:
$$\begin{aligned}
\mathbf{E}[X \cdot I(Y \!=\! y)]
@@ -71,22 +71,22 @@ $$\begin{aligned}
&= \mathbf{E}[f(Y) \cdot I(Y \!=\! y)]
\end{aligned}$$
-Recall that because $Y$ is a random variable,
-$\mathbf{E}[X|Y] = f(Y)$ is too.
-In other words, $f$ maps $Y$ to another random variable,
+Recall that because $$Y$$ is a random variable,
+$$\mathbf{E}[X|Y] = f(Y)$$ is too.
+In other words, $$f$$ maps $$Y$$ to another random variable,
which, thanks to the *Doob-Dynkin lemma*
(see [random variable](/know/concept/random-variable/)),
-means that $\mathbf{E}[X|Y]$ is measurable with respect to $\sigma(Y)$.
+means that $$\mathbf{E}[X|Y]$$ is measurable with respect to $$\sigma(Y)$$.
Intuitively, this makes sense:
-$\mathbf{E}[X|Y]$ cannot contain more information about events
-than the $Y$ it was calculated from.
+$$\mathbf{E}[X|Y]$$ cannot contain more information about events
+than the $$Y$$ it was calculated from.
This suggests a straightforward generalization of the above:
-instead of a specific value $Y = y$,
-we can condition on *any* information from $Y$.
-If $\mathcal{H} = \sigma(Y)$ is the information generated by $Y$,
-then the conditional expectation $\mathbf{E}[X|\mathcal{H}] = Z$
-is $\mathcal{H}$-measurable, and given by a $Z$ satisfying:
+instead of a specific value $$Y = y$$,
+we can condition on *any* information from $$Y$$.
+If $$\mathcal{H} = \sigma(Y)$$ is the information generated by $$Y$$,
+then the conditional expectation $$\mathbf{E}[X|\mathcal{H}] = Z$$
+is $$\mathcal{H}$$-measurable, and given by a $$Z$$ satisfying:
$$\begin{aligned}
\boxed{
@@ -95,51 +95,51 @@ $$\begin{aligned}
}
\end{aligned}$$
-For any $H \in \mathcal{H}$. Note that $Z$ is almost surely unique:
+For any $$H \in \mathcal{H}$$. Note that $$Z$$ is almost surely unique:
*almost* because it could take any value
-for an event $A$ with zero probability $P(A) = 0$.
-Fortunately, if there exists a continuous $f$
-such that $\mathbf{E}[X | \sigma(Y)] = f(Y)$,
-then $Z = \mathbf{E}[X | \sigma(Y)]$ is unique.
+for an event $$A$$ with zero probability $$P(A) = 0$$.
+Fortunately, if there exists a continuous $$f$$
+such that $$\mathbf{E}[X | \sigma(Y)] = f(Y)$$,
+then $$Z = \mathbf{E}[X | \sigma(Y)]$$ is unique.
## Properties
A conditional expectation defined in this way has many useful properties,
most notably linearity:
-$\mathbf{E}[aX \!+\! bY | \mathcal{H}] = a \mathbf{E}[X|\mathcal{H}] + b \mathbf{E}[Y|\mathcal{H}]$
-for any $a, b \in \mathbb{R}$.
+$$\mathbf{E}[aX \!+\! bY | \mathcal{H}] = a \mathbf{E}[X|\mathcal{H}] + b \mathbf{E}[Y|\mathcal{H}]$$
+for any $$a, b \in \mathbb{R}$$.
-The **tower property** states that if $\mathcal{F} \supset \mathcal{G} \supset \mathcal{H}$,
-then $\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}] = \mathbf{E}[X|\mathcal{H}]$.
+The **tower property** states that if $$\mathcal{F} \supset \mathcal{G} \supset \mathcal{H}$$,
+then $$\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}] = \mathbf{E}[X|\mathcal{H}]$$.
Intuitively, this works as follows:
-suppose person $G$ knows more about $X$ than person $H$,
-then $\mathbf{E}[X | \mathcal{H}]$ is $H$'s expectation,
-$\mathbf{E}[X | \mathcal{G}]$ is $G$'s "better" expectation,
-and then $\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}]$
-is $H$'s prediction about what $G$'s expectation will be.
-However, $H$ does not have access to $G$'s extra information,
-so $H$'s best prediction is simply $\mathbf{E}[X | \mathcal{H}]$.
+suppose person $$G$$ knows more about $$X$$ than person $$H$$,
+then $$\mathbf{E}[X | \mathcal{H}]$$ is $$H$$'s expectation,
+$$\mathbf{E}[X | \mathcal{G}]$$ is $$G$$'s "better" expectation,
+and then $$\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}]$$
+is $$H$$'s prediction about what $$G$$'s expectation will be.
+However, $$H$$ does not have access to $$G$$'s extra information,
+so $$H$$'s best prediction is simply $$\mathbf{E}[X | \mathcal{H}]$$.
The **law of total expectation** says that
-$\mathbf{E}[\mathbf{E}[X | \mathcal{G}]] = \mathbf{E}[X]$,
+$$\mathbf{E}[\mathbf{E}[X | \mathcal{G}]] = \mathbf{E}[X]$$,
and follows from the above tower property
-by choosing $\mathcal{H}$ to contain no information:
-$\mathcal{H} = \{ \varnothing, \Omega \}$.
-
-Another useful property is that $\mathbf{E}[X | \mathcal{H}] = X$
-if $X$ is $\mathcal{H}$-measurable.
-In other words, if $\mathcal{H}$ already contains
-all the information extractable from $X$,
-then we know $X$'s exact value.
+by choosing $$\mathcal{H}$$ to contain no information:
+$$\mathcal{H} = \{ \varnothing, \Omega \}$$.
+
+Another useful property is that $$\mathbf{E}[X | \mathcal{H}] = X$$
+if $$X$$ is $$\mathcal{H}$$-measurable.
+In other words, if $$\mathcal{H}$$ already contains
+all the information extractable from $$X$$,
+then we know $$X$$'s exact value.
Conveniently, this can easily be generalized to products:
-$\mathbf{E}[XY | \mathcal{H}] = X \mathbf{E}[Y | \mathcal{H}]$
-if $X$ is $\mathcal{H}$-measurable:
-since $X$'s value is known, it can simply be factored out.
+$$\mathbf{E}[XY | \mathcal{H}] = X \mathbf{E}[Y | \mathcal{H}]$$
+if $$X$$ is $$\mathcal{H}$$-measurable:
+since $$X$$'s value is known, it can simply be factored out.
Armed with this definition of conditional expectation,
we can define other conditional quantities,
-such as the **conditional variance** $\mathbf{V}[X | \mathcal{H}]$:
+such as the **conditional variance** $$\mathbf{V}[X | \mathcal{H}]$$:
$$\begin{aligned}
\mathbf{V}[X | \mathcal{H}]
@@ -147,11 +147,11 @@ $$\begin{aligned}
\end{aligned}$$
The **law of total variance** then states that
-$\mathbf{V}[X] = \mathbf{E}[\mathbf{V}[X | \mathcal{H}]] + \mathbf{V}[\mathbf{E}[X | \mathcal{H}]]$.
+$$\mathbf{V}[X] = \mathbf{E}[\mathbf{V}[X | \mathcal{H}]] + \mathbf{V}[\mathbf{E}[X | \mathcal{H}]]$$.
-Likewise, we can define the **conditional probability** $P$,
-**conditional distribution function** $F_{X|\mathcal{H}}$,
-and **conditional density function** $f_{X|\mathcal{H}}$
+Likewise, we can define the **conditional probability** $$P$$,
+**conditional distribution function** $$F_{X|\mathcal{H}}$$,
+and **conditional density function** $$f_{X|\mathcal{H}}$$
like their non-conditional counterparts:
$$\begin{aligned}