---
title: "Conditional expectation"
sort_title: "Conditional expectation"
date: 2021-10-23
categories:
- Mathematics
- Statistics
- Measure theory
- Stochastic analysis
layout: "concept"
---

Recall that the expectation value $$\mathbf{E}[X]$$
of a [random variable](/know/concept/random-variable/) $$X$$
is a function of the probability space $$(\Omega, \mathcal{F}, P)$$
on which $$X$$ is defined, and the definition of $$X$$ itself.

The **conditional expectation** $$\mathbf{E}[X|A]$$
is the expectation value of $$X$$ given that an event $$A$$ has occurred,
i.e. only the outcomes $$\omega \in \Omega$$
satisfying $$\omega \in A$$ should be considered.
If $$A$$ is obtained by observing a variable,
then $$\mathbf{E}[X|A]$$ is a random variable in its own right.

Consider two random variables $$X$$ and $$Y$$
on the same probability space $$(\Omega, \mathcal{F}, P)$$,
and suppose that $$\Omega$$ is discrete.
If $$Y = y$$ has been observed,
then the conditional expectation of $$X$$
given the event $$Y = y$$ is as follows:

$$\begin{aligned}
    \mathbf{E}[X | Y \!=\! y]
    = \sum_{x} x \: Q(X \!=\! x)
    \qquad \quad
    Q(X \!=\! x)
    = \frac{P(X \!=\! x \cap Y \!=\! y)}{P(Y \!=\! y)}
\end{aligned}$$

Where $$Q$$ is a renormalized probability function,
which assigns zero to all events incompatible with $$Y = y$$.
If we allow $$\Omega$$ to be continuous,
then from the definition $$\mathbf{E}[X]$$,
we know that the following Lebesgue integral can be used,
which we call $$f(y)$$:

$$\begin{aligned}
    \mathbf{E}[X | Y \!=\! y]
    = f(y)
    = \int_\Omega X(\omega) \dd{Q(\omega)}
\end{aligned}$$

However, this is only valid if $$P(Y \!=\! y) > 0$$,
which is a problem for continuous sample spaces $$\Omega$$.
Sticking with the assumption $$P(Y \!=\! y) > 0$$, notice that:

$$\begin{aligned}
    f(y)
    = \frac{1}{P(Y \!=\! y)} \int_\Omega X(\omega) \dd{P(\omega \cap Y \!=\! y)}
    = \frac{\mathbf{E}[X \cdot I(Y \!=\! y)]}{P(Y \!=\! y)}
\end{aligned}$$

Where $$I$$ is the indicator function,
equal to $$1$$ if its argument is true, and $$0$$ if not.
Multiplying the definition of $$f(y)$$ by $$P(Y \!=\! y)$$ then leads us to:

$$\begin{aligned}
    \mathbf{E}[X \cdot I(Y \!=\! y)]
    &= f(y) \cdot P(Y \!=\! y)
    \\
    &= \mathbf{E}[f(Y) \cdot I(Y \!=\! y)]
\end{aligned}$$

Recall that because $$Y$$ is a random variable,
$$\mathbf{E}[X|Y] = f(Y)$$ is too.
In other words, $$f$$ maps $$Y$$ to another random variable,
which, thanks to the *Doob-Dynkin lemma*
(see [random variable](/know/concept/random-variable/)),
means that $$\mathbf{E}[X|Y]$$ is measurable with respect to $$\sigma(Y)$$.
Intuitively, this makes sense:
$$\mathbf{E}[X|Y]$$ cannot contain more information about events
than the $$Y$$ it was calculated from.

This suggests a straightforward generalization of the above:
instead of a specific value $$Y = y$$,
we can condition on *any* information from $$Y$$.
If $$\mathcal{H} = \sigma(Y)$$ is the information generated by $$Y$$,
then the conditional expectation $$\mathbf{E}[X|\mathcal{H}] = Z$$
is $$\mathcal{H}$$-measurable, and given by a $$Z$$ satisfying:

$$\begin{aligned}
    \boxed{
        \mathbf{E}\big[X \cdot I(H)\big]
        = \mathbf{E}\big[Z \cdot I(H)\big]
    }
\end{aligned}$$

For any $$H \in \mathcal{H}$$. Note that $$Z$$ is almost surely unique:
*almost* because it could take any value
for an event $$A$$ with zero probability $$P(A) = 0$$.
Fortunately, if there exists a continuous $$f$$
such that $$\mathbf{E}[X | \sigma(Y)] = f(Y)$$,
then $$Z = \mathbf{E}[X | \sigma(Y)]$$ is unique.


## Properties

A conditional expectation defined in this way has many useful properties,
most notably linearity:
$$\mathbf{E}[aX \!+\! bY | \mathcal{H}] = a \mathbf{E}[X|\mathcal{H}] + b \mathbf{E}[Y|\mathcal{H}]$$
for any $$a, b \in \mathbb{R}$$.

The **tower property** states that if $$\mathcal{F} \supset \mathcal{G} \supset \mathcal{H}$$,
then $$\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}] = \mathbf{E}[X|\mathcal{H}]$$.
Intuitively, this works as follows:
suppose person $$G$$ knows more about $$X$$ than person $$H$$,
then $$\mathbf{E}[X | \mathcal{H}]$$ is $$H$$'s expectation,
$$\mathbf{E}[X | \mathcal{G}]$$ is $$G$$'s "better" expectation,
and then $$\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}]$$
is $$H$$'s prediction about what $$G$$'s expectation will be.
However, $$H$$ does not have access to $$G$$'s extra information,
so $$H$$'s best prediction is simply $$\mathbf{E}[X | \mathcal{H}]$$.

The **law of total expectation** says that
$$\mathbf{E}[\mathbf{E}[X | \mathcal{G}]] = \mathbf{E}[X]$$,
and follows from the above tower property
by choosing $$\mathcal{H}$$ to contain no information:
$$\mathcal{H} = \{ \varnothing, \Omega \}$$.

Another useful property is that $$\mathbf{E}[X | \mathcal{H}] = X$$
if $$X$$ is $$\mathcal{H}$$-measurable.
In other words, if $$\mathcal{H}$$ already contains
all the information extractable from $$X$$,
then we know $$X$$'s exact value.
Conveniently, this can easily be generalized to products:
$$\mathbf{E}[XY | \mathcal{H}] = X \mathbf{E}[Y | \mathcal{H}]$$
if $$X$$ is $$\mathcal{H}$$-measurable:
since $$X$$'s value is known, it can simply be factored out.

Armed with this definition of conditional expectation,
we can define other conditional quantities,
such as the **conditional variance** $$\mathbf{V}[X | \mathcal{H}]$$:

$$\begin{aligned}
    \mathbf{V}[X | \mathcal{H}]
    = \mathbf{E}[X^2 | \mathcal{H}] - \big[\mathbf{E}[X | \mathcal{H}]\big]^2
\end{aligned}$$

The **law of total variance** then states that
$$\mathbf{V}[X] = \mathbf{E}[\mathbf{V}[X | \mathcal{H}]] + \mathbf{V}[\mathbf{E}[X | \mathcal{H}]]$$.

Likewise, we can define the **conditional probability** $$P$$,
**conditional distribution function** $$F_{X|\mathcal{H}}$$,
and **conditional density function** $$f_{X|\mathcal{H}}$$
like their non-conditional counterparts:

$$\begin{aligned}
    P(A | \mathcal{H})
    = \mathbf{E}[I(A) | \mathcal{H}]
    \qquad
    F_{X|\mathcal{H}}(x)
    = P(X \le x | \mathcal{H})
    \qquad
    f_{X|\mathcal{H}}(x)
    = \dv{F_{X|\mathcal{H}}}{x}
\end{aligned}$$


## References
1.  U.H. Thygesen,
    *Lecture notes on diffusions and stochastic differential equations*,
    2021, Polyteknisk Kompendie.