Categories: Mathematics, Measure theory, Statistics.
Random variable
Random variables are the bread and butter of probability theory and statistics, and are simply variables whose value depends on the outcome of a random experiment. Here, we will describe the formal mathematical definition of a random variable.
Probability space
A probability space or probability triple is the formal mathematical model of a given stochastic experiment, i.e. a process with a random outcome.
The sample space is the set of all possible outcomes of the stochastic experiment. Those are selected randomly according to certain criteria. A subset is called an event, and can be regarded as a true statement about all in that .
The event space is a set of events that are interesting to us, i.e. we have subjectively chosen based on the problem at hand. Since events represent statements about outcomes , and we would like to use logic on those statements, we demand that is a -algebra.
Finally, the probability measure or probability function is a function that maps events to probabilities . Formally, is defined to satisfy:
- If , then .
- If do not overlap , then .
- The total probability .
The reason we only assign probability to events rather than individual outcomes is that if is continuous, all have zero probability, while intervals can have nonzero probability.
Random variable
Once we have a probability space , we can define a random variable as a function that maps outcomes to another set, usually the real numbers.
To be a valid real-valued random variable, a function must satisfy the following condition, in which case is said to be measurable from to :
In other words, for a given Borel set (see -algebra) , the set of all outcomes that satisfy must form a valid event; this set must be in . The point is that we need to be able to assign probabilities to statements of the form for all , which is only possible if that statement corresponds to an event in , since ’s domain is .
Given such an , and a set , the preimage or inverse image is defined as:
As suggested by the notation, can be regarded as the inverse of : it maps to the event for which . With this, our earlier requirement that be measurable can be written as: for any . This is often stated as “ is -measurable”.
Related to is the information obtained by observing a random variable . Let be the information generated by observing , i.e. the events whose occurrence can be deduced from the value of , or, more formally:
In other words, if the realized value of is found to be in a certain Borel set , then the preimage (i.e. the event yielding this ) is known to have occurred.
In general, given any -algebra , a variable is said to be -measurable if , so that contains at least all information extractable from .
Note that can be generated by another random variable , i.e. . In that case, the Doob-Dynkin lemma states that is only -measurable if can always be computed from , i.e. there exists a function such that for all .
Now, we are ready to define some familiar concepts from probability theory. The cumulative distribution function is the probability of the event where the realized value of is smaller than some given :
If is differentiable, then the probability density function is defined as:
Expectation value
The expectation value of a random variable can be defined in the familiar way, as the sum/integral of every possible value of multiplied by the corresponding probability (density). For continuous and discrete sample spaces , respectively:
However, is not guaranteed to exist, and the distinction between continuous and discrete is cumbersome. A more general definition of is the following Lebesgue-Stieltjes integral, since always exists:
This is valid for any sample space . Or, equivalently, a Lebesgue integral can be used:
An expectation value defined in this way has many useful properties, most notably linearity.
We can also define the familiar variance of a random variable as follows:
It is also possible to calculate expectation values and variances adjusted to some given event information: see conditional expectation.
References
- U.H. Thygesen, Lecture notes on diffusions and stochastic differential equations, 2021, Polyteknisk Kompendie.