Recall that the expectation value of a random variable is a function of the probability space on which is defined, and the definition of itself.
The conditional expectation is the expectation value of given that an event has occurred, i.e. only the outcomes satisfying should be considered. If is obtained by observing a variable, then is a random variable in its own right.
Consider two random variables and on the same probability space , and suppose that is discrete. If has been observed, then the conditional expectation of given the event is as follows:
Where is a renormalized probability function, which assigns zero to all events incompatible with . If we allow to be continuous, then from the definition , we know that the following Lebesgue integral can be used, which we call :
However, this is only valid if , which is a problem for continuous sample spaces . Sticking with the assumption , notice that:
Where is the indicator function, equal to if its argument is true, and if not. Multiplying the definition of by then leads us to:
Recall that because is a random variable, is too. In other words, maps to another random variable, which, thanks to the Doob-Dynkin lemma (see random variable), means that is measurable with respect to . Intuitively, this makes sense: cannot contain more information about events than the it was calculated from.
This suggests a straightforward generalization of the above: instead of a specific value , we can condition on any information from . If is the information generated by , then the conditional expectation is -measurable, and given by a satisfying:
For any . Note that is almost surely unique: almost because it could take any value for an event with zero probability . Fortunately, if there exists a continuous such that , then is unique.
A conditional expectation defined in this way has many useful properties, most notably linearity: for any .
The tower property states that if , then . Intuitively, this works as follows: suppose person knows more about than person , then is ’s expectation, is ’s “better” expectation, and then is ’s prediction about what ’s expectation will be. However, does not have access to ’s extra information, so ’s best prediction is simply .
The law of total expectation says that , and follows from the above tower property by choosing to contain no information: .
Another useful property is that if is -measurable. In other words, if already contains all the information extractable from , then we know ’s exact value. Conveniently, this can easily be generalized to products: if is -measurable: since ’s value is known, it can simply be factored out.
Armed with this definition of conditional expectation, we can define other conditional quantities, such as the conditional variance :
The law of total variance then states that .
Likewise, we can define the conditional probability , conditional distribution function , and conditional density function like their non-conditional counterparts:
- U.H. Thygesen, Lecture notes on diffusions and stochastic differential equations, 2021, Polyteknisk Kompendie.