Categories: Mathematics, Measure theory, Statistics.

Random variable

Random variables are the bread and butter of probability theory and statistics, and are simply variables whose value depends on the outcome of a random experiment. Here, we will describe the formal mathematical definition of a random variable.

Probability space

A probability space or probability triple $$(\Omega, \mathcal{F}, P)$$ is the formal mathematical model of a given stochastic experiment, i.e. a process with a random outcome.

The sample space $$\Omega$$ is the set of all possible outcomes $$\omega$$ of the experimement. Those $$\omega$$ are selected randomly according to certain criteria. A subset $$A \subset \Omega$$ is called an event, and can be regarded as a true statement about all $$\omega$$ in that $$A$$.

The event space $$\mathcal{F}$$ is a set of events $$A$$ that are interesting to us, i.e. we have subjectively chosen $$\mathcal{F}$$ based on the problem at hand. Since events $$A$$ represent statements about outcomes $$\omega$$, and we would like to use logic on those statemenets, we demand that $$\mathcal{F}$$ is a $$\sigma$$-algebra.

Finally, the probability measure or probability function $$P$$ is a function that maps $$A$$ events to probabilities $$P(A)$$. Formally, $$P : \mathcal{F} \to \mathbb{R}$$ is defined to satisfy:

1. If $$A \in \mathcal{F}$$, then $$P(A) \in [0, 1]$$.
2. If $$A, B \in \mathcal{F}$$ do not overlap $$A \cap B = \varnothing$$, then $$P(A \cup B) = P(A) + P(B)$$.
3. The total probability $$P(\Omega) = 1$$.

The reason we only assign probability to events $$A$$ rather than individual outcomes $$\omega$$ is that if $$\Omega$$ is continuous, all $$\omega$$ have zero probability, while intervals $$A$$ can have nonzero probability.

Random variable

Once we have a probability space $$(\Omega, \mathcal{F}, P)$$, we can define a random variable $$X$$ as a function that maps outcomes $$\omega$$ to another set, usually the real numbers.

To be a valid real-valued random variable, a function $$X : \Omega \to \mathbb{R}^n$$ must satisfy the following condition, in which case $$X$$ is said to be measurable from $$(\Omega, \mathcal{F})$$ to $$(\mathbb{R}^n, \mathcal{B}(\mathbb{R}^n))$$:

\begin{aligned} \{ \omega \in \Omega : X(\omega) \in B \} \in \mathcal{F} \quad \mathrm{for\:any\:} B \in \mathcal{B}(\mathbb{R}^n) \end{aligned}

In other words, for a given Borel set (see $$\sigma$$-algebra) $$B \in \mathcal{B}(\mathbb{R}^n)$$, the set of all outcomes $$\omega \in \Omega$$ that satisfy $$X(\omega) \in B$$ must form a valid event; this set must be in $$\mathcal{F}$$. The point is that we need to be able to assign probabilities to statements of the form $$X \in [a, b]$$ for all $$a < b$$, which is only possible if that statement corresponds to an event in $$\mathcal{F}$$, since $$P$$’s domain is $$\mathcal{F}$$.

Given such an $$X$$, and a set $$B \subseteq \mathbb{R}$$, the preimage or inverse image $$X^{-1}$$ is defined as:

\begin{aligned} X^{-1}(B) = \{ \omega \in \Omega : X(\omega) \in B \} \end{aligned}

As suggested by the notation, $$X^{-1}$$ can be regarded as the inverse of $$X$$: it maps $$B$$ to the event for which $$X \in B$$. With this, our earlier requirement that $$X$$ be measurable can be written as: $$X^{-1}(B) \in \mathcal{F}$$ for any $$B \in \mathcal{B}(\mathbb{R}^n)$$. This is also often stated as “$$X$$ is $$\mathcal{F}$$-measurable”.

Related to $$\mathcal{F}$$ is the information obtained by observing a random variable $$X$$. Let $$\sigma(X)$$ be the information generated by observing $$X$$, i.e. the events whose occurrence can be deduced from the value of $$X$$, or, more formally:

\begin{aligned} \sigma(X) = X^{-1}(\mathcal{B}(\mathbb{R}^n)) = \{ A \in \mathcal{F} : A = X^{-1}(B) \mathrm{\:for\:some\:} B \in \mathcal{B}(\mathbb{R}^n) \} \end{aligned}

In other words, if the realized value of $$X$$ is found to be in a certain Borel set $$B \in \mathcal{B}(\mathbb{R}^n)$$, then the preimage $$X^{-1}(B)$$ (i.e. the event yielding this $$B$$) is known to have occurred.

In general, given any $$\sigma$$-algebra $$\mathcal{H}$$, a variable $$Y$$ is said to be $$\mathcal{H}$$-measurable” if $$\sigma(Y) \subseteq \mathcal{H}$$, so that $$\mathcal{H}$$ contains at least all information extractable from $$Y$$.

Note that $$\mathcal{H}$$ can be generated by another random variable $$X$$, i.e. $$\mathcal{H} = \sigma(X)$$. In that case, the Doob-Dynkin lemma states that $$Y$$ is only $$\sigma(X)$$-measurable if $$Y$$ can always be computed from $$X$$, i.e. there exists a function $$f$$ such that $$Y(\omega) = f(X(\omega))$$ for all $$\omega \in \Omega$$.

Now, we are ready to define some familiar concepts from probability theory. The cumulative distribution function $$F_X(x)$$ is the probability of the event where the realized value of $$X$$ is smaller than some given $$x \in \mathbb{R}$$:

\begin{aligned} F_X(x) = P(X \le x) = P(\{ \omega \in \Omega : X(\omega) \le x \}) = P(X^{-1}(]\!-\!\infty, x])) \end{aligned}

If $$F_X(x)$$ is differentiable, then the probability density function $$f_X(x)$$ is defined as:

\begin{aligned} f_X(x) = \dv{F_X}{x} \end{aligned}

Expectation value

The expectation value $$\mathbf{E}[X]$$ of a random variable $$X$$ can be defined in the familiar way, as the sum/integral of every possible value of $$X$$ mutliplied by the corresponding probability (density). For continuous and discrete sample spaces $$\Omega$$, respectively:

\begin{aligned} \mathbf{E}[X] = \int_{-\infty}^\infty x \: f_X(x) \dd{x} \qquad \mathrm{or} \qquad \mathbf{E}[X] = \sum_{i = 1}^N x_i \: P(X \!=\! x_i) \end{aligned}

However, $$f_X(x)$$ is not guaranteed to exist, and the distinction between continuous and discrete is cumbersome. A more general definition of $$\mathbf{E}[X]$$ is the following Lebesgue-Stieltjes integral, since $$F_X(x)$$ always exists:

\begin{aligned} \mathbf{E}[X] = \int_{-\infty}^\infty x \dd{F_X(x)} \end{aligned}

This is valid for any sample space $$\Omega$$. Or, equivalently, a Lebesgue integral can be used:

\begin{aligned} \mathbf{E}[X] = \int_\Omega X(\omega) \dd{P(\omega)} \end{aligned}

An expectation value defined in this way has many useful properties, most notably linearity.

We can also define the familiar variance $$\mathbf{V}[X]$$ of a random variable $$X$$ as follows:

\begin{aligned} \mathbf{V}[X] = \mathbf{E}\big[ (X - \mathbf{E}[X])^2 \big] = \mathbf{E}[X^2] - \big(\mathbf{E}[X]\big)^2 \end{aligned}

It is also possible to calculate expectation values and variances adjusted to some given event information: see conditional expectation.

1. U.H. Thygesen, Lecture notes on diffusions and stochastic differential equations, 2021, Polyteknisk Kompendie.

© Marcus R.A. Newman, a.k.a. "Prefetch". Available under CC BY-SA 4.0.