Categories: Mathematics, Measure theory, Statistics.

Random variable

Random variables are the bread and butter of probability theory and statistics, and are simply variables whose value depends on the outcome of a random experiment. Here, we will describe the formal mathematical definition of a random variable.

Probability space

A probability space or probability triple (Ω,F,P)(\Omega, \mathcal{F}, P) is the formal mathematical model of a given stochastic experiment, i.e. a process with a random outcome.

The sample space Ω\Omega is the set of all possible outcomes ω\omega of the stochastic experiment. Those ω\omega are selected randomly according to certain criteria. A subset AΩA \subset \Omega is called an event, and can be regarded as a true statement about all ω\omega in that AA.

The event space F\mathcal{F} is a set of events AA that are interesting to us, i.e. we have subjectively chosen F\mathcal{F} based on the problem at hand. Since events AA represent statements about outcomes ω\omega, and we would like to use logic on those statements, we demand that F\mathcal{F} is a σ\sigma-algebra.

Finally, the probability measure or probability function PP is a function that maps AA events to probabilities P(A)P(A). Formally, P:FRP : \mathcal{F} \to \mathbb{R} is defined to satisfy:

  1. If AFA \in \mathcal{F}, then P(A)[0,1]P(A) \in [0, 1].
  2. If A,BFA, B \in \mathcal{F} do not overlap AB=A \cap B = \varnothing, then P(AB)=P(A)+P(B)P(A \cup B) = P(A) + P(B).
  3. The total probability P(Ω)=1P(\Omega) = 1.

The reason we only assign probability to events AA rather than individual outcomes ω\omega is that if Ω\Omega is continuous, all ω\omega have zero probability, while intervals AA can have nonzero probability.

Random variable

Once we have a probability space (Ω,F,P)(\Omega, \mathcal{F}, P), we can define a random variable XX as a function that maps outcomes ω\omega to another set, usually the real numbers.

To be a valid real-valued random variable, a function X:ΩRnX : \Omega \to \mathbb{R}^n must satisfy the following condition, in which case XX is said to be measurable from (Ω,F)(\Omega, \mathcal{F}) to (Rn,B(Rn))(\mathbb{R}^n, \mathcal{B}(\mathbb{R}^n)):

{ωΩ:X(ω)B}FforanyBB(Rn)\begin{aligned} \{ \omega \in \Omega : X(\omega) \in B \} \in \mathcal{F} \quad \mathrm{for\:any\:} B \in \mathcal{B}(\mathbb{R}^n) \end{aligned}

In other words, for a given Borel set (see σ\sigma-algebra) BB(Rn)B \in \mathcal{B}(\mathbb{R}^n), the set of all outcomes ωΩ\omega \in \Omega that satisfy X(ω)BX(\omega) \in B must form a valid event; this set must be in F\mathcal{F}. The point is that we need to be able to assign probabilities to statements of the form X[a,b]X \in [a, b] for all a<ba < b, which is only possible if that statement corresponds to an event in F\mathcal{F}, since PP’s domain is F\mathcal{F}.

Given such an XX, and a set BRB \subseteq \mathbb{R}, the preimage or inverse image X1X^{-1} is defined as:

X1(B)={ωΩ:X(ω)B}\begin{aligned} X^{-1}(B) = \{ \omega \in \Omega : X(\omega) \in B \} \end{aligned}

As suggested by the notation, X1X^{-1} can be regarded as the inverse of XX: it maps BB to the event for which XBX \in B. With this, our earlier requirement that XX be measurable can be written as: X1(B)FX^{-1}(B) \in \mathcal{F} for any BB(Rn)B \in \mathcal{B}(\mathbb{R}^n). This is often stated as “XX is F\mathcal{F}-measurable”.

Related to F\mathcal{F} is the information obtained by observing a random variable XX. Let σ(X)\sigma(X) be the information generated by observing XX, i.e. the events whose occurrence can be deduced from the value of XX, or, more formally:

σ(X)=X1(B(Rn))={AF:A=X1(B)forsomeBB(Rn)}\begin{aligned} \sigma(X) = X^{-1}(\mathcal{B}(\mathbb{R}^n)) = \{ A \in \mathcal{F} : A = X^{-1}(B) \mathrm{\:for\:some\:} B \in \mathcal{B}(\mathbb{R}^n) \} \end{aligned}

In other words, if the realized value of XX is found to be in a certain Borel set BB(Rn)B \in \mathcal{B}(\mathbb{R}^n), then the preimage X1(B)X^{-1}(B) (i.e. the event yielding this BB) is known to have occurred.

In general, given any σ\sigma-algebra H\mathcal{H}, a variable YY is said to be H\mathcal{H}-measurable if σ(Y)H\sigma(Y) \subseteq \mathcal{H}, so that H\mathcal{H} contains at least all information extractable from YY.

Note that H\mathcal{H} can be generated by another random variable XX, i.e. H=σ(X)\mathcal{H} = \sigma(X). In that case, the Doob-Dynkin lemma states that YY is only σ(X)\sigma(X)-measurable if YY can always be computed from XX, i.e. there exists a function ff such that Y(ω)=f(X(ω))Y(\omega) = f(X(\omega)) for all ωΩ\omega \in \Omega.

Now, we are ready to define some familiar concepts from probability theory. The cumulative distribution function FX(x)F_X(x) is the probability of the event where the realized value of XX is smaller than some given xRx \in \mathbb{R}:

FX(x)=P(Xx)=P({ωΩ:X(ω)x})=P(X1(] ⁣ ⁣,x]))\begin{aligned} F_X(x) = P(X \le x) = P(\{ \omega \in \Omega : X(\omega) \le x \}) = P(X^{-1}(]\!-\!\infty, x])) \end{aligned}

If FX(x)F_X(x) is differentiable, then the probability density function fX(x)f_X(x) is defined as:

fX(x)=dFXdx\begin{aligned} f_X(x) = \dv{F_X}{x} \end{aligned}

Expectation value

The expectation value E[X]\mathbf{E}[X] of a random variable XX can be defined in the familiar way, as the sum/integral of every possible value of XX multiplied by the corresponding probability (density). For continuous and discrete sample spaces Ω\Omega, respectively:

E[X]=xfX(x)dxorE[X]=i=1NxiP(X ⁣= ⁣xi)\begin{aligned} \mathbf{E}[X] = \int_{-\infty}^\infty x \: f_X(x) \dd{x} \qquad \mathrm{or} \qquad \mathbf{E}[X] = \sum_{i = 1}^N x_i \: P(X \!=\! x_i) \end{aligned}

However, fX(x)f_X(x) is not guaranteed to exist, and the distinction between continuous and discrete is cumbersome. A more general definition of E[X]\mathbf{E}[X] is the following Lebesgue-Stieltjes integral, since FX(x)F_X(x) always exists:

E[X]=xdFX(x)\begin{aligned} \mathbf{E}[X] = \int_{-\infty}^\infty x \dd{F_X(x)} \end{aligned}

This is valid for any sample space Ω\Omega. Or, equivalently, a Lebesgue integral can be used:

E[X]=ΩX(ω)dP(ω)\begin{aligned} \mathbf{E}[X] = \int_\Omega X(\omega) \dd{P(\omega)} \end{aligned}

An expectation value defined in this way has many useful properties, most notably linearity.

We can also define the familiar variance V[X]\mathbf{V}[X] of a random variable XX as follows:

V[X]=E[(XE[X])2]=E[X2](E[X])2\begin{aligned} \mathbf{V}[X] = \mathbf{E}\big[ (X - \mathbf{E}[X])^2 \big] = \mathbf{E}[X^2] - \big(\mathbf{E}[X]\big)^2 \end{aligned}

It is also possible to calculate expectation values and variances adjusted to some given event information: see conditional expectation.

References

  1. U.H. Thygesen, Lecture notes on diffusions and stochastic differential equations, 2021, Polyteknisk Kompendie.