--- title: "Random variable" date: 2021-10-22 categories: - Mathematics - Statistics - Measure theory layout: "concept" --- **Random variables** are the bread and butter of probability theory and statistics, and are simply variables whose value depends on the outcome of a random experiment. Here, we will describe the formal mathematical definition of a random variable. ## Probability space A **probability space** or **probability triple** $(\Omega, \mathcal{F}, P)$ is the formal mathematical model of a given **stochastic experiment**, i.e. a process with a random outcome. The **sample space** $\Omega$ is the set of all possible outcomes $\omega$ of the experimement. Those $\omega$ are selected randomly according to certain criteria. A subset $A \subset \Omega$ is called an **event**, and can be regarded as a true statement about all $\omega$ in that $A$. The **event space** $\mathcal{F}$ is a set of events $A$ that are interesting to us, i.e. we have subjectively chosen $\mathcal{F}$ based on the problem at hand. Since events $A$ represent statements about outcomes $\omega$, and we would like to use logic on those statemenets, we demand that $\mathcal{F}$ is a [$\sigma$-algebra](/know/concept/sigma-algebra/). Finally, the **probability measure** or **probability function** $P$ is a function that maps $A$ events to probabilities $P(A)$. Formally, $P : \mathcal{F} \to \mathbb{R}$ is defined to satisfy: 1. If $A \in \mathcal{F}$, then $P(A) \in [0, 1]$. 2. If $A, B \in \mathcal{F}$ do not overlap $A \cap B = \varnothing$, then $P(A \cup B) = P(A) + P(B)$. 3. The total probability $P(\Omega) = 1$. The reason we only assign probability to events $A$ rather than individual outcomes $\omega$ is that if $\Omega$ is continuous, all $\omega$ have zero probability, while intervals $A$ can have nonzero probability. ## Random variable Once we have a probability space $(\Omega, \mathcal{F}, P)$, we can define a **random variable** $X$ as a function that maps outcomes $\omega$ to another set, usually the real numbers. To be a valid real-valued random variable, a function $X : \Omega \to \mathbb{R}^n$ must satisfy the following condition, in which case $X$ is said to be **measurable** from $(\Omega, \mathcal{F})$ to $(\mathbb{R}^n, \mathcal{B}(\mathbb{R}^n))$: $$\begin{aligned} \{ \omega \in \Omega : X(\omega) \in B \} \in \mathcal{F} \quad \mathrm{for\:any\:} B \in \mathcal{B}(\mathbb{R}^n) \end{aligned}$$ In other words, for a given Borel set (see [$\sigma$-algebra](/know/concept/sigma-algebra/)) $B \in \mathcal{B}(\mathbb{R}^n)$, the set of all outcomes $\omega \in \Omega$ that satisfy $X(\omega) \in B$ must form a valid event; this set must be in $\mathcal{F}$. The point is that we need to be able to assign probabilities to statements of the form $X \in [a, b]$ for all $a < b$, which is only possible if that statement corresponds to an event in $\mathcal{F}$, since $P$'s domain is $\mathcal{F}$. Given such an $X$, and a set $B \subseteq \mathbb{R}$, the **preimage** or **inverse image** $X^{-1}$ is defined as: $$\begin{aligned} X^{-1}(B) = \{ \omega \in \Omega : X(\omega) \in B \} \end{aligned}$$ As suggested by the notation, $X^{-1}$ can be regarded as the inverse of $X$: it maps $B$ to the event for which $X \in B$. With this, our earlier requirement that $X$ be measurable can be written as: $X^{-1}(B) \in \mathcal{F}$ for any $B \in \mathcal{B}(\mathbb{R}^n)$. This is also often stated as "$X$ is *$\mathcal{F}$-measurable"*. Related to $\mathcal{F}$ is the **information** obtained by observing a random variable $X$. Let $\sigma(X)$ be the information generated by observing $X$, i.e. the events whose occurrence can be deduced from the value of $X$, or, more formally: $$\begin{aligned} \sigma(X) = X^{-1}(\mathcal{B}(\mathbb{R}^n)) = \{ A \in \mathcal{F} : A = X^{-1}(B) \mathrm{\:for\:some\:} B \in \mathcal{B}(\mathbb{R}^n) \} \end{aligned}$$ In other words, if the realized value of $X$ is found to be in a certain Borel set $B \in \mathcal{B}(\mathbb{R}^n)$, then the preimage $X^{-1}(B)$ (i.e. the event yielding this $B$) is known to have occurred. In general, given any $\sigma$-algebra $\mathcal{H}$, a variable $Y$ is said to be *"$\mathcal{H}$-measurable"* if $\sigma(Y) \subseteq \mathcal{H}$, so that $\mathcal{H}$ contains at least all information extractable from $Y$. Note that $\mathcal{H}$ can be generated by another random variable $X$, i.e. $\mathcal{H} = \sigma(X)$. In that case, the **Doob-Dynkin lemma** states that $Y$ is only $\sigma(X)$-measurable if $Y$ can always be computed from $X$, i.e. there exists a function $f$ such that $Y(\omega) = f(X(\omega))$ for all $\omega \in \Omega$. Now, we are ready to define some familiar concepts from probability theory. The **cumulative distribution function** $F_X(x)$ is the probability of the event where the realized value of $X$ is smaller than some given $x \in \mathbb{R}$: $$\begin{aligned} F_X(x) = P(X \le x) = P(\{ \omega \in \Omega : X(\omega) \le x \}) = P(X^{-1}(]\!-\!\infty, x])) \end{aligned}$$ If $F_X(x)$ is differentiable, then the **probability density function** $f_X(x)$ is defined as: $$\begin{aligned} f_X(x) = \dv{F_X}{x} \end{aligned}$$ ## Expectation value The **expectation value** $\mathbf{E}[X]$ of a random variable $X$ can be defined in the familiar way, as the sum/integral of every possible value of $X$ mutliplied by the corresponding probability (density). For continuous and discrete sample spaces $\Omega$, respectively: $$\begin{aligned} \mathbf{E}[X] = \int_{-\infty}^\infty x \: f_X(x) \dd{x} \qquad \mathrm{or} \qquad \mathbf{E}[X] = \sum_{i = 1}^N x_i \: P(X \!=\! x_i) \end{aligned}$$ However, $f_X(x)$ is not guaranteed to exist, and the distinction between continuous and discrete is cumbersome. A more general definition of $\mathbf{E}[X]$ is the following Lebesgue-Stieltjes integral, since $F_X(x)$ always exists: $$\begin{aligned} \mathbf{E}[X] = \int_{-\infty}^\infty x \dd{F_X(x)} \end{aligned}$$ This is valid for any sample space $\Omega$. Or, equivalently, a Lebesgue integral can be used: $$\begin{aligned} \mathbf{E}[X] = \int_\Omega X(\omega) \dd{P(\omega)} \end{aligned}$$ An expectation value defined in this way has many useful properties, most notably linearity. We can also define the familiar **variance** $\mathbf{V}[X]$ of a random variable $X$ as follows: $$\begin{aligned} \mathbf{V}[X] = \mathbf{E}\big[ (X - \mathbf{E}[X])^2 \big] = \mathbf{E}[X^2] - \big(\mathbf{E}[X]\big)^2 \end{aligned}$$ It is also possible to calculate expectation values and variances adjusted to some given event information: see [conditional expectation](/know/concept/conditional-expectation/). ## References 1. U.H. Thygesen, *Lecture notes on diffusions and stochastic differential equations*, 2021, Polyteknisk Kompendie.