From 16555851b6514a736c5c9d8e73de7da7fc9b6288 Mon Sep 17 00:00:00 2001 From: Prefetch Date: Thu, 20 Oct 2022 18:25:31 +0200 Subject: Migrate from 'jekyll-katex' to 'kramdown-math-sskatex' --- .../know/concept/conditional-expectation/index.md | 146 ++++++++++----------- 1 file changed, 73 insertions(+), 73 deletions(-) (limited to 'source/know/concept/conditional-expectation') diff --git a/source/know/concept/conditional-expectation/index.md b/source/know/concept/conditional-expectation/index.md index 7b13a4a..f64fa72 100644 --- a/source/know/concept/conditional-expectation/index.md +++ b/source/know/concept/conditional-expectation/index.md @@ -10,24 +10,24 @@ categories: layout: "concept" --- -Recall that the expectation value $\mathbf{E}[X]$ -of a [random variable](/know/concept/random-variable/) $X$ -is a function of the probability space $(\Omega, \mathcal{F}, P)$ -on which $X$ is defined, and the definition of $X$ itself. - -The **conditional expectation** $\mathbf{E}[X|A]$ -is the expectation value of $X$ given that an event $A$ has occurred, -i.e. only the outcomes $\omega \in \Omega$ -satisfying $\omega \in A$ should be considered. -If $A$ is obtained by observing a variable, -then $\mathbf{E}[X|A]$ is a random variable in its own right. - -Consider two random variables $X$ and $Y$ -on the same probability space $(\Omega, \mathcal{F}, P)$, -and suppose that $\Omega$ is discrete. -If $Y = y$ has been observed, -then the conditional expectation of $X$ -given the event $Y = y$ is as follows: +Recall that the expectation value $$\mathbf{E}[X]$$ +of a [random variable](/know/concept/random-variable/) $$X$$ +is a function of the probability space $$(\Omega, \mathcal{F}, P)$$ +on which $$X$$ is defined, and the definition of $$X$$ itself. + +The **conditional expectation** $$\mathbf{E}[X|A]$$ +is the expectation value of $$X$$ given that an event $$A$$ has occurred, +i.e. only the outcomes $$\omega \in \Omega$$ +satisfying $$\omega \in A$$ should be considered. +If $$A$$ is obtained by observing a variable, +then $$\mathbf{E}[X|A]$$ is a random variable in its own right. + +Consider two random variables $$X$$ and $$Y$$ +on the same probability space $$(\Omega, \mathcal{F}, P)$$, +and suppose that $$\Omega$$ is discrete. +If $$Y = y$$ has been observed, +then the conditional expectation of $$X$$ +given the event $$Y = y$$ is as follows: $$\begin{aligned} \mathbf{E}[X | Y \!=\! y] @@ -37,12 +37,12 @@ $$\begin{aligned} = \frac{P(X \!=\! x \cap Y \!=\! y)}{P(Y \!=\! y)} \end{aligned}$$ -Where $Q$ is a renormalized probability function, -which assigns zero to all events incompatible with $Y = y$. -If we allow $\Omega$ to be continuous, -then from the definition $\mathbf{E}[X]$, +Where $$Q$$ is a renormalized probability function, +which assigns zero to all events incompatible with $$Y = y$$. +If we allow $$\Omega$$ to be continuous, +then from the definition $$\mathbf{E}[X]$$, we know that the following Lebesgue integral can be used, -which we call $f(y)$: +which we call $$f(y)$$: $$\begin{aligned} \mathbf{E}[X | Y \!=\! y] @@ -50,9 +50,9 @@ $$\begin{aligned} = \int_\Omega X(\omega) \dd{Q(\omega)} \end{aligned}$$ -However, this is only valid if $P(Y \!=\! y) > 0$, -which is a problem for continuous sample spaces $\Omega$. -Sticking with the assumption $P(Y \!=\! y) > 0$, notice that: +However, this is only valid if $$P(Y \!=\! y) > 0$$, +which is a problem for continuous sample spaces $$\Omega$$. +Sticking with the assumption $$P(Y \!=\! y) > 0$$, notice that: $$\begin{aligned} f(y) @@ -60,9 +60,9 @@ $$\begin{aligned} = \frac{\mathbf{E}[X \cdot I(Y \!=\! y)]}{P(Y \!=\! y)} \end{aligned}$$ -Where $I$ is the indicator function, -equal to $1$ if its argument is true, and $0$ if not. -Multiplying the definition of $f(y)$ by $P(Y \!=\! y)$ then leads us to: +Where $$I$$ is the indicator function, +equal to $$1$$ if its argument is true, and $$0$$ if not. +Multiplying the definition of $$f(y)$$ by $$P(Y \!=\! y)$$ then leads us to: $$\begin{aligned} \mathbf{E}[X \cdot I(Y \!=\! y)] @@ -71,22 +71,22 @@ $$\begin{aligned} &= \mathbf{E}[f(Y) \cdot I(Y \!=\! y)] \end{aligned}$$ -Recall that because $Y$ is a random variable, -$\mathbf{E}[X|Y] = f(Y)$ is too. -In other words, $f$ maps $Y$ to another random variable, +Recall that because $$Y$$ is a random variable, +$$\mathbf{E}[X|Y] = f(Y)$$ is too. +In other words, $$f$$ maps $$Y$$ to another random variable, which, thanks to the *Doob-Dynkin lemma* (see [random variable](/know/concept/random-variable/)), -means that $\mathbf{E}[X|Y]$ is measurable with respect to $\sigma(Y)$. +means that $$\mathbf{E}[X|Y]$$ is measurable with respect to $$\sigma(Y)$$. Intuitively, this makes sense: -$\mathbf{E}[X|Y]$ cannot contain more information about events -than the $Y$ it was calculated from. +$$\mathbf{E}[X|Y]$$ cannot contain more information about events +than the $$Y$$ it was calculated from. This suggests a straightforward generalization of the above: -instead of a specific value $Y = y$, -we can condition on *any* information from $Y$. -If $\mathcal{H} = \sigma(Y)$ is the information generated by $Y$, -then the conditional expectation $\mathbf{E}[X|\mathcal{H}] = Z$ -is $\mathcal{H}$-measurable, and given by a $Z$ satisfying: +instead of a specific value $$Y = y$$, +we can condition on *any* information from $$Y$$. +If $$\mathcal{H} = \sigma(Y)$$ is the information generated by $$Y$$, +then the conditional expectation $$\mathbf{E}[X|\mathcal{H}] = Z$$ +is $$\mathcal{H}$$-measurable, and given by a $$Z$$ satisfying: $$\begin{aligned} \boxed{ @@ -95,51 +95,51 @@ $$\begin{aligned} } \end{aligned}$$ -For any $H \in \mathcal{H}$. Note that $Z$ is almost surely unique: +For any $$H \in \mathcal{H}$$. Note that $$Z$$ is almost surely unique: *almost* because it could take any value -for an event $A$ with zero probability $P(A) = 0$. -Fortunately, if there exists a continuous $f$ -such that $\mathbf{E}[X | \sigma(Y)] = f(Y)$, -then $Z = \mathbf{E}[X | \sigma(Y)]$ is unique. +for an event $$A$$ with zero probability $$P(A) = 0$$. +Fortunately, if there exists a continuous $$f$$ +such that $$\mathbf{E}[X | \sigma(Y)] = f(Y)$$, +then $$Z = \mathbf{E}[X | \sigma(Y)]$$ is unique. ## Properties A conditional expectation defined in this way has many useful properties, most notably linearity: -$\mathbf{E}[aX \!+\! bY | \mathcal{H}] = a \mathbf{E}[X|\mathcal{H}] + b \mathbf{E}[Y|\mathcal{H}]$ -for any $a, b \in \mathbb{R}$. +$$\mathbf{E}[aX \!+\! bY | \mathcal{H}] = a \mathbf{E}[X|\mathcal{H}] + b \mathbf{E}[Y|\mathcal{H}]$$ +for any $$a, b \in \mathbb{R}$$. -The **tower property** states that if $\mathcal{F} \supset \mathcal{G} \supset \mathcal{H}$, -then $\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}] = \mathbf{E}[X|\mathcal{H}]$. +The **tower property** states that if $$\mathcal{F} \supset \mathcal{G} \supset \mathcal{H}$$, +then $$\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}] = \mathbf{E}[X|\mathcal{H}]$$. Intuitively, this works as follows: -suppose person $G$ knows more about $X$ than person $H$, -then $\mathbf{E}[X | \mathcal{H}]$ is $H$'s expectation, -$\mathbf{E}[X | \mathcal{G}]$ is $G$'s "better" expectation, -and then $\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}]$ -is $H$'s prediction about what $G$'s expectation will be. -However, $H$ does not have access to $G$'s extra information, -so $H$'s best prediction is simply $\mathbf{E}[X | \mathcal{H}]$. +suppose person $$G$$ knows more about $$X$$ than person $$H$$, +then $$\mathbf{E}[X | \mathcal{H}]$$ is $$H$$'s expectation, +$$\mathbf{E}[X | \mathcal{G}]$$ is $$G$$'s "better" expectation, +and then $$\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}]$$ +is $$H$$'s prediction about what $$G$$'s expectation will be. +However, $$H$$ does not have access to $$G$$'s extra information, +so $$H$$'s best prediction is simply $$\mathbf{E}[X | \mathcal{H}]$$. The **law of total expectation** says that -$\mathbf{E}[\mathbf{E}[X | \mathcal{G}]] = \mathbf{E}[X]$, +$$\mathbf{E}[\mathbf{E}[X | \mathcal{G}]] = \mathbf{E}[X]$$, and follows from the above tower property -by choosing $\mathcal{H}$ to contain no information: -$\mathcal{H} = \{ \varnothing, \Omega \}$. - -Another useful property is that $\mathbf{E}[X | \mathcal{H}] = X$ -if $X$ is $\mathcal{H}$-measurable. -In other words, if $\mathcal{H}$ already contains -all the information extractable from $X$, -then we know $X$'s exact value. +by choosing $$\mathcal{H}$$ to contain no information: +$$\mathcal{H} = \{ \varnothing, \Omega \}$$. + +Another useful property is that $$\mathbf{E}[X | \mathcal{H}] = X$$ +if $$X$$ is $$\mathcal{H}$$-measurable. +In other words, if $$\mathcal{H}$$ already contains +all the information extractable from $$X$$, +then we know $$X$$'s exact value. Conveniently, this can easily be generalized to products: -$\mathbf{E}[XY | \mathcal{H}] = X \mathbf{E}[Y | \mathcal{H}]$ -if $X$ is $\mathcal{H}$-measurable: -since $X$'s value is known, it can simply be factored out. +$$\mathbf{E}[XY | \mathcal{H}] = X \mathbf{E}[Y | \mathcal{H}]$$ +if $$X$$ is $$\mathcal{H}$$-measurable: +since $$X$$'s value is known, it can simply be factored out. Armed with this definition of conditional expectation, we can define other conditional quantities, -such as the **conditional variance** $\mathbf{V}[X | \mathcal{H}]$: +such as the **conditional variance** $$\mathbf{V}[X | \mathcal{H}]$$: $$\begin{aligned} \mathbf{V}[X | \mathcal{H}] @@ -147,11 +147,11 @@ $$\begin{aligned} \end{aligned}$$ The **law of total variance** then states that -$\mathbf{V}[X] = \mathbf{E}[\mathbf{V}[X | \mathcal{H}]] + \mathbf{V}[\mathbf{E}[X | \mathcal{H}]]$. +$$\mathbf{V}[X] = \mathbf{E}[\mathbf{V}[X | \mathcal{H}]] + \mathbf{V}[\mathbf{E}[X | \mathcal{H}]]$$. -Likewise, we can define the **conditional probability** $P$, -**conditional distribution function** $F_{X|\mathcal{H}}$, -and **conditional density function** $f_{X|\mathcal{H}}$ +Likewise, we can define the **conditional probability** $$P$$, +**conditional distribution function** $$F_{X|\mathcal{H}}$$, +and **conditional density function** $$f_{X|\mathcal{H}}$$ like their non-conditional counterparts: $$\begin{aligned} -- cgit v1.2.3