Categories: Mathematics, Measure theory, Statistics.

**Random variables** are the bread and butter of probability theory and statistics, and are simply variables whose value depends on the outcome of a random experiment. Here, we will describe the formal mathematical definition of a random variable.

A **probability space** or **probability triple** \((\Omega, \mathcal{F}, P)\) is the formal mathematical model of a given **stochastic experiment**, i.e. a process with a random outcome.

The **sample space** \(\Omega\) is the set of all possible outcomes \(\omega\) of the experimement. Those \(\omega\) are selected randomly according to certain criteria. A subset \(A \subset \Omega\) is called an **event**, and can be regarded as a true statement about all \(\omega\) in that \(A\).

The **event space** \(\mathcal{F}\) is a set of events \(A\) that are interesting to us, i.e. we have subjectively chosen \(\mathcal{F}\) based on the problem at hand. Since events \(A\) represent statements about outcomes \(\omega\), and we would like to use logic on those statemenets, we demand that \(\mathcal{F}\) is a \(\sigma\)-algebra.

Finally, the **probability measure** or **probability function** \(P\) is a function that maps \(A\) events to probabilities \(P(A)\). Formally, \(P : \mathcal{F} \to \mathbb{R}\) is defined to satisfy:

- If \(A \in \mathcal{F}\), then \(P(A) \in [0, 1]\).
- If \(A, B \in \mathcal{F}\) do not overlap \(A \cap B = \varnothing\), then \(P(A \cup B) = P(A) + P(B)\).
- The total probability \(P(\Omega) = 1\).

The reason we only assign probability to events \(A\) rather than individual outcomes \(\omega\) is that if \(\Omega\) is continuous, all \(\omega\) have zero probability, while intervals \(A\) can have nonzero probability.

Once we have a probability space \((\Omega, \mathcal{F}, P)\), we can define a **random variable** \(X\) as a function that maps outcomes \(\omega\) to another set, usually the real numbers.

To be a valid real-valued random variable, a function \(X : \Omega \to \mathbb{R}^n\) must satisfy the following condition, in which case \(X\) is said to be **measurable** from \((\Omega, \mathcal{F})\) to \((\mathbb{R}^n, \mathcal{B}(\mathbb{R}^n))\):

\[\begin{aligned} \{ \omega \in \Omega : X(\omega) \in B \} \in \mathcal{F} \quad \mathrm{for\:any\:} B \in \mathcal{B}(\mathbb{R}^n) \end{aligned}\]

In other words, for a given Borel set (see \(\sigma\)-algebra) \(B \in \mathcal{B}(\mathbb{R}^n)\), the set of all outcomes \(\omega \in \Omega\) that satisfy \(X(\omega) \in B\) must form a valid event; this set must be in \(\mathcal{F}\). The point is that we need to be able to assign probabilities to statements of the form \(X \in [a, b]\) for all \(a < b\), which is only possible if that statement corresponds to an event in \(\mathcal{F}\), since \(P\)’s domain is \(\mathcal{F}\).

Given such an \(X\), and a set \(B \subseteq \mathbb{R}\), the **preimage** or **inverse image** \(X^{-1}\) is defined as:

\[\begin{aligned} X^{-1}(B) = \{ \omega \in \Omega : X(\omega) \in B \} \end{aligned}\]

As suggested by the notation, \(X^{-1}\) can be regarded as the inverse of \(X\): it maps \(B\) to the event for which \(X \in B\). With this, our earlier requirement that \(X\) be measurable can be written as: \(X^{-1}(B) \in \mathcal{F}\) for any \(B \in \mathcal{B}(\mathbb{R}^n)\). This is also often stated as “\(X\) is *\(\mathcal{F}\)-measurable”*.

Related to \(\mathcal{F}\) is the **information** obtained by observing a random variable \(X\). Let \(\sigma(X)\) be the information generated by observing \(X\), i.e. the events whose occurrence can be deduced from the value of \(X\), or, more formally:

\[\begin{aligned} \sigma(X) = X^{-1}(\mathcal{B}(\mathbb{R}^n)) = \{ A \in \mathcal{F} : A = X^{-1}(B) \mathrm{\:for\:some\:} B \in \mathcal{B}(\mathbb{R}^n) \} \end{aligned}\]

In other words, if the realized value of \(X\) is found to be in a certain Borel set \(B \in \mathcal{B}(\mathbb{R}^n)\), then the preimage \(X^{-1}(B)\) (i.e. the event yielding this \(B\)) is known to have occurred.

In general, given any \(\sigma\)-algebra \(\mathcal{H}\), a variable \(Y\) is said to be *“\(\mathcal{H}\)-measurable”* if \(\sigma(Y) \subseteq \mathcal{H}\), so that \(\mathcal{H}\) contains at least all information extractable from \(Y\).

Note that \(\mathcal{H}\) can be generated by another random variable \(X\), i.e. \(\mathcal{H} = \sigma(X)\). In that case, the **Doob-Dynkin lemma** states that \(Y\) is only \(\sigma(X)\)-measurable if \(Y\) can always be computed from \(X\), i.e. there exists a function \(f\) such that \(Y(\omega) = f(X(\omega))\) for all \(\omega \in \Omega\).

Now, we are ready to define some familiar concepts from probability theory. The **cumulative distribution function** \(F_X(x)\) is the probability of the event where the realized value of \(X\) is smaller than some given \(x \in \mathbb{R}\):

\[\begin{aligned} F_X(x) = P(X \le x) = P(\{ \omega \in \Omega : X(\omega) \le x \}) = P(X^{-1}(]\!-\!\infty, x])) \end{aligned}\]

If \(F_X(x)\) is differentiable, then the **probability density function** \(f_X(x)\) is defined as:

\[\begin{aligned} f_X(x) = \dv{F_X}{x} \end{aligned}\]

The **expectation value** \(\mathbf{E}[X]\) of a random variable \(X\) can be defined in the familiar way, as the sum/integral of every possible value of \(X\) mutliplied by the corresponding probability (density). For continuous and discrete sample spaces \(\Omega\), respectively:

\[\begin{aligned} \mathbf{E}[X] = \int_{-\infty}^\infty x \: f_X(x) \dd{x} \qquad \mathrm{or} \qquad \mathbf{E}[X] = \sum_{i = 1}^N x_i \: P(X \!=\! x_i) \end{aligned}\]

However, \(f_X(x)\) is not guaranteed to exist, and the distinction between continuous and discrete is cumbersome. A more general definition of \(\mathbf{E}[X]\) is the following Lebesgue-Stieltjes integral, since \(F_X(x)\) always exists:

\[\begin{aligned} \mathbf{E}[X] = \int_{-\infty}^\infty x \dd{F_X(x)} \end{aligned}\]

This is valid for any sample space \(\Omega\). Or, equivalently, a Lebesgue integral can be used:

\[\begin{aligned} \mathbf{E}[X] = \int_\Omega X(\omega) \dd{P(\omega)} \end{aligned}\]

An expectation value defined in this way has many useful properties, most notably linearity.

We can also define the familiar **variance** \(\mathbf{V}[X]\) of a random variable \(X\) as follows:

\[\begin{aligned} \mathbf{V}[X] = \mathbf{E}\big[ (X - \mathbf{E}[X])^2 \big] = \mathbf{E}[X^2] - \big(\mathbf{E}[X]\big)^2 \end{aligned}\]

It is also possible to calculate expectation values and variances adjusted to some given event information: see conditional expectation.

- U.H. Thygesen,
*Lecture notes on diffusions and stochastic differential equations*, 2021, Polyteknisk Kompendie.

© Marcus R.A. Newman, a.k.a. "Prefetch".
Available under CC BY-SA 4.0.