---
title: "Random variable"
date: 2021-10-22
categories:
- Mathematics
- Statistics
- Measure theory
layout: "concept"
---

**Random variables** are the bread and butter
of probability theory and statistics,
and are simply variables whose value depends
on the outcome of a random experiment.
Here, we will describe the formal mathematical definition
of a random variable.


## Probability space

A **probability space** or **probability triple** $(\Omega, \mathcal{F}, P)$
is the formal mathematical model of a given **stochastic experiment**,
i.e. a process with a random outcome.

The **sample space** $\Omega$ is the set
of all possible outcomes $\omega$ of the experimement.
Those $\omega$ are selected randomly according to certain criteria.
A subset $A \subset \Omega$ is called an **event**,
and can be regarded as a true statement about all $\omega$ in that $A$.

The **event space** $\mathcal{F}$ is a set of events $A$
that are interesting to us,
i.e. we have subjectively chosen $\mathcal{F}$
based on the problem at hand.
Since events $A$ represent statements about outcomes $\omega$,
and we would like to use logic on those statemenets,
we demand that $\mathcal{F}$ is a [$\sigma$-algebra](/know/concept/sigma-algebra/).

Finally, the **probability measure** or **probability function** $P$
is a function that maps $A$ events to probabilities $P(A)$.
Formally, $P : \mathcal{F} \to \mathbb{R}$ is defined to satisfy:

1.  If $A \in \mathcal{F}$, then $P(A) \in [0, 1]$.
2.  If $A, B \in \mathcal{F}$ do not overlap $A \cap B = \varnothing$,
    then $P(A \cup B) = P(A) + P(B)$.
3.  The total probability $P(\Omega) = 1$.

The reason we only assign probability to events $A$
rather than individual outcomes $\omega$ is that
if $\Omega$ is continuous, all $\omega$ have zero probability,
while intervals $A$ can have nonzero probability.


## Random variable

Once we have a probability space $(\Omega, \mathcal{F}, P)$,
we can define a **random variable** $X$
as a function that maps outcomes $\omega$
to another set, usually the real numbers.

To be a valid real-valued random variable,
a function $X : \Omega \to \mathbb{R}^n$ must satisfy the following condition,
in which case $X$ is said to be **measurable**
from $(\Omega, \mathcal{F})$ to $(\mathbb{R}^n, \mathcal{B}(\mathbb{R}^n))$:

$$\begin{aligned}
    \{ \omega \in \Omega : X(\omega) \in B \} \in \mathcal{F}
    \quad \mathrm{for\:any\:} B \in \mathcal{B}(\mathbb{R}^n)
\end{aligned}$$

In other words, for a given Borel set
(see [$\sigma$-algebra](/know/concept/sigma-algebra/)) $B \in \mathcal{B}(\mathbb{R}^n)$,
the set of all outcomes $\omega \in \Omega$ that satisfy $X(\omega) \in B$
must form a valid event; this set must be in $\mathcal{F}$.
The point is that we need to be able to assign probabilities
to statements of the form $X \in [a, b]$ for all $a < b$,
which is only possible if that statement corresponds to an event in $\mathcal{F}$,
since $P$'s domain is $\mathcal{F}$.

Given such an $X$, and a set $B \subseteq \mathbb{R}$,
the **preimage** or **inverse image** $X^{-1}$ is defined as:

$$\begin{aligned}
    X^{-1}(B)
    = \{ \omega \in \Omega : X(\omega) \in B \}
\end{aligned}$$

As suggested by the notation,
$X^{-1}$ can be regarded as the inverse of $X$:
it maps $B$ to the event for which $X \in B$.
With this, our earlier requirement that $X$ be measurable
can be written as: $X^{-1}(B) \in \mathcal{F}$ for any $B \in \mathcal{B}(\mathbb{R}^n)$.
This is also often stated as "$X$ is *$\mathcal{F}$-measurable"*.

Related to $\mathcal{F}$ is the **information**
obtained by observing a random variable $X$.
Let $\sigma(X)$ be the information generated by observing $X$,
i.e. the events whose occurrence can be deduced from the value of $X$,
or, more formally:

$$\begin{aligned}
    \sigma(X)
    = X^{-1}(\mathcal{B}(\mathbb{R}^n))
    = \{ A \in \mathcal{F} : A = X^{-1}(B) \mathrm{\:for\:some\:} B \in \mathcal{B}(\mathbb{R}^n) \}
\end{aligned}$$

In other words, if the realized value of $X$ is
found to be in a certain Borel set $B \in \mathcal{B}(\mathbb{R}^n)$,
then the preimage $X^{-1}(B)$ (i.e. the event yielding this $B$)
is known to have occurred.

In general, given any $\sigma$-algebra $\mathcal{H}$,
a variable $Y$ is said to be *"$\mathcal{H}$-measurable"*
if $\sigma(Y) \subseteq \mathcal{H}$,
so that $\mathcal{H}$ contains at least
all information extractable from $Y$.

Note that $\mathcal{H}$ can be generated by another random variable $X$,
i.e. $\mathcal{H} = \sigma(X)$.
In that case, the **Doob-Dynkin lemma** states
that $Y$ is only $\sigma(X)$-measurable
if $Y$ can always  be computed from $X$,
i.e. there exists a function $f$ such that
$Y(\omega) = f(X(\omega))$ for all $\omega \in \Omega$.

Now, we are ready to define some familiar concepts from probability theory.
The **cumulative distribution function** $F_X(x)$ is
the probability of the event where the realized value of $X$
is smaller than some given $x \in \mathbb{R}$:

$$\begin{aligned}
    F_X(x)
    = P(X \le x)
    = P(\{ \omega \in \Omega : X(\omega) \le x \})
    = P(X^{-1}(]\!-\!\infty, x]))
\end{aligned}$$

If $F_X(x)$ is differentiable,
then the **probability density function** $f_X(x)$ is defined as:

$$\begin{aligned}
    f_X(x)
    = \dv{F_X}{x}
\end{aligned}$$


## Expectation value

The **expectation value** $\mathbf{E}[X]$ of a random variable $X$
can be defined in the familiar way, as the sum/integral
of every possible value of $X$ mutliplied by the corresponding probability (density).
For continuous and discrete sample spaces $\Omega$, respectively:

$$\begin{aligned}
    \mathbf{E}[X]
    = \int_{-\infty}^\infty x \: f_X(x) \dd{x}
    \qquad \mathrm{or} \qquad
    \mathbf{E}[X]
    = \sum_{i = 1}^N x_i \: P(X \!=\! x_i)
\end{aligned}$$

However, $f_X(x)$ is not guaranteed to exist,
and the distinction between continuous and discrete is cumbersome.
A more general definition of $\mathbf{E}[X]$
is the following Lebesgue-Stieltjes integral,
since $F_X(x)$ always exists:

$$\begin{aligned}
    \mathbf{E}[X]
    = \int_{-\infty}^\infty x \dd{F_X(x)}
\end{aligned}$$

This is valid for any sample space $\Omega$.
Or, equivalently, a Lebesgue integral can be used:

$$\begin{aligned}
    \mathbf{E}[X]
    = \int_\Omega X(\omega) \dd{P(\omega)}
\end{aligned}$$

An expectation value defined in this way has many useful properties,
most notably linearity.

We can also define the familiar **variance** $\mathbf{V}[X]$
of a random variable $X$ as follows:

$$\begin{aligned}
    \mathbf{V}[X]
    = \mathbf{E}\big[ (X - \mathbf{E}[X])^2 \big]
    = \mathbf{E}[X^2] - \big(\mathbf{E}[X]\big)^2
\end{aligned}$$

It is also possible to calculate expectation values and variances
adjusted to some given event information:
see [conditional expectation](/know/concept/conditional-expectation/).


## References
1.  U.H. Thygesen,
    *Lecture notes on diffusions and stochastic differential equations*,
    2021, Polyteknisk Kompendie.