summaryrefslogtreecommitdiff
path: root/source/know/concept/random-variable/index.md
blob: ecb8e96155597e40705299da083167091d4a6f5f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
---
title: "Random variable"
sort_title: "Random variable"
date: 2021-10-22
categories:
- Mathematics
- Statistics
- Measure theory
layout: "concept"
---

**Random variables** are the bread and butter
of probability theory and statistics,
and are simply variables whose value depends
on the outcome of a random experiment.
Here, we will describe the formal mathematical definition
of a random variable.


## Probability space

A **probability space** or **probability triple** $$(\Omega, \mathcal{F}, P)$$
is the formal mathematical model of a given **stochastic experiment**,
i.e. a process with a random outcome.

The **sample space** $$\Omega$$ is the set
of all possible outcomes $$\omega$$ of the experimement.
Those $$\omega$$ are selected randomly according to certain criteria.
A subset $$A \subset \Omega$$ is called an **event**,
and can be regarded as a true statement about all $$\omega$$ in that $$A$$.

The **event space** $$\mathcal{F}$$ is a set of events $$A$$
that are interesting to us,
i.e. we have subjectively chosen $$\mathcal{F}$$
based on the problem at hand.
Since events $$A$$ represent statements about outcomes $$\omega$$,
and we would like to use logic on those statemenets,
we demand that $$\mathcal{F}$$ is a [$$\sigma$$-algebra](/know/concept/sigma-algebra/).

Finally, the **probability measure** or **probability function** $$P$$
is a function that maps $$A$$ events to probabilities $$P(A)$$.
Formally, $$P : \mathcal{F} \to \mathbb{R}$$ is defined to satisfy:

1.  If $$A \in \mathcal{F}$$, then $$P(A) \in [0, 1]$$.
2.  If $$A, B \in \mathcal{F}$$ do not overlap $$A \cap B = \varnothing$$,
    then $$P(A \cup B) = P(A) + P(B)$$.
3.  The total probability $$P(\Omega) = 1$$.

The reason we only assign probability to events $$A$$
rather than individual outcomes $$\omega$$ is that
if $$\Omega$$ is continuous, all $$\omega$$ have zero probability,
while intervals $$A$$ can have nonzero probability.


## Random variable

Once we have a probability space $$(\Omega, \mathcal{F}, P)$$,
we can define a **random variable** $$X$$
as a function that maps outcomes $$\omega$$
to another set, usually the real numbers.

To be a valid real-valued random variable,
a function $$X : \Omega \to \mathbb{R}^n$$ must satisfy the following condition,
in which case $$X$$ is said to be **measurable**
from $$(\Omega, \mathcal{F})$$ to $$(\mathbb{R}^n, \mathcal{B}(\mathbb{R}^n))$$:

$$\begin{aligned}
    \{ \omega \in \Omega : X(\omega) \in B \} \in \mathcal{F}
    \quad \mathrm{for\:any\:} B \in \mathcal{B}(\mathbb{R}^n)
\end{aligned}$$

In other words, for a given Borel set
(see [$$\sigma$$-algebra](/know/concept/sigma-algebra/)) $$B \in \mathcal{B}(\mathbb{R}^n)$$,
the set of all outcomes $$\omega \in \Omega$$ that satisfy $$X(\omega) \in B$$
must form a valid event; this set must be in $$\mathcal{F}$$.
The point is that we need to be able to assign probabilities
to statements of the form $$X \in [a, b]$$ for all $$a < b$$,
which is only possible if that statement corresponds to an event in $$\mathcal{F}$$,
since $$P$$'s domain is $$\mathcal{F}$$.

Given such an $$X$$, and a set $$B \subseteq \mathbb{R}$$,
the **preimage** or **inverse image** $$X^{-1}$$ is defined as:

$$\begin{aligned}
    X^{-1}(B)
    = \{ \omega \in \Omega : X(\omega) \in B \}
\end{aligned}$$

As suggested by the notation,
$$X^{-1}$$ can be regarded as the inverse of $$X$$:
it maps $$B$$ to the event for which $$X \in B$$.
With this, our earlier requirement that $$X$$ be measurable
can be written as: $$X^{-1}(B) \in \mathcal{F}$$ for any $$B \in \mathcal{B}(\mathbb{R}^n)$$.
This is also often stated as "$$X$$ is *$$\mathcal{F}$$-measurable"*.

Related to $$\mathcal{F}$$ is the **information**
obtained by observing a random variable $$X$$.
Let $$\sigma(X)$$ be the information generated by observing $$X$$,
i.e. the events whose occurrence can be deduced from the value of $$X$$,
or, more formally:

$$\begin{aligned}
    \sigma(X)
    = X^{-1}(\mathcal{B}(\mathbb{R}^n))
    = \{ A \in \mathcal{F} : A = X^{-1}(B) \mathrm{\:for\:some\:} B \in \mathcal{B}(\mathbb{R}^n) \}
\end{aligned}$$

In other words, if the realized value of $$X$$ is
found to be in a certain Borel set $$B \in \mathcal{B}(\mathbb{R}^n)$$,
then the preimage $$X^{-1}(B)$$ (i.e. the event yielding this $$B$$)
is known to have occurred.

In general, given any $$\sigma$$-algebra $$\mathcal{H}$$,
a variable $$Y$$ is said to be *"$$\mathcal{H}$$-measurable"*
if $$\sigma(Y) \subseteq \mathcal{H}$$,
so that $$\mathcal{H}$$ contains at least
all information extractable from $$Y$$.

Note that $$\mathcal{H}$$ can be generated by another random variable $$X$$,
i.e. $$\mathcal{H} = \sigma(X)$$.
In that case, the **Doob-Dynkin lemma** states
that $$Y$$ is only $$\sigma(X)$$-measurable
if $$Y$$ can always  be computed from $$X$$,
i.e. there exists a function $$f$$ such that
$$Y(\omega) = f(X(\omega))$$ for all $$\omega \in \Omega$$.

Now, we are ready to define some familiar concepts from probability theory.
The **cumulative distribution function** $$F_X(x)$$ is
the probability of the event where the realized value of $$X$$
is smaller than some given $$x \in \mathbb{R}$$:

$$\begin{aligned}
    F_X(x)
    = P(X \le x)
    = P(\{ \omega \in \Omega : X(\omega) \le x \})
    = P(X^{-1}(]\!-\!\infty, x]))
\end{aligned}$$

If $$F_X(x)$$ is differentiable,
then the **probability density function** $$f_X(x)$$ is defined as:

$$\begin{aligned}
    f_X(x)
    = \dv{F_X}{x}
\end{aligned}$$


## Expectation value

The **expectation value** $$\mathbf{E}[X]$$ of a random variable $$X$$
can be defined in the familiar way, as the sum/integral
of every possible value of $$X$$ mutliplied by the corresponding probability (density).
For continuous and discrete sample spaces $$\Omega$$, respectively:

$$\begin{aligned}
    \mathbf{E}[X]
    = \int_{-\infty}^\infty x \: f_X(x) \dd{x}
    \qquad \mathrm{or} \qquad
    \mathbf{E}[X]
    = \sum_{i = 1}^N x_i \: P(X \!=\! x_i)
\end{aligned}$$

However, $$f_X(x)$$ is not guaranteed to exist,
and the distinction between continuous and discrete is cumbersome.
A more general definition of $$\mathbf{E}[X]$$
is the following Lebesgue-Stieltjes integral,
since $$F_X(x)$$ always exists:

$$\begin{aligned}
    \mathbf{E}[X]
    = \int_{-\infty}^\infty x \dd{F_X(x)}
\end{aligned}$$

This is valid for any sample space $$\Omega$$.
Or, equivalently, a Lebesgue integral can be used:

$$\begin{aligned}
    \mathbf{E}[X]
    = \int_\Omega X(\omega) \dd{P(\omega)}
\end{aligned}$$

An expectation value defined in this way has many useful properties,
most notably linearity.

We can also define the familiar **variance** $$\mathbf{V}[X]$$
of a random variable $$X$$ as follows:

$$\begin{aligned}
    \mathbf{V}[X]
    = \mathbf{E}\big[ (X - \mathbf{E}[X])^2 \big]
    = \mathbf{E}[X^2] - \big(\mathbf{E}[X]\big)^2
\end{aligned}$$

It is also possible to calculate expectation values and variances
adjusted to some given event information:
see [conditional expectation](/know/concept/conditional-expectation/).



## References
1.  U.H. Thygesen,
    *Lecture notes on diffusions and stochastic differential equations*,
    2021, Polyteknisk Kompendie.