diff options
-rw-r--r-- | content/know/concept/conditional-expectation/index.pdc | 173 | ||||
-rw-r--r-- | content/know/concept/random-variable/index.pdc | 171 | ||||
-rw-r--r-- | content/know/concept/sigma-algebra/index.pdc | 89 |
3 files changed, 433 insertions, 0 deletions
diff --git a/content/know/concept/conditional-expectation/index.pdc b/content/know/concept/conditional-expectation/index.pdc new file mode 100644 index 0000000..7da7660 --- /dev/null +++ b/content/know/concept/conditional-expectation/index.pdc @@ -0,0 +1,173 @@ +--- +title: "Conditional expectation" +firstLetter: "C" +publishDate: 2021-10-23 +categories: +- Mathematics +- Statistics + +date: 2021-10-22T15:19:23+02:00 +draft: false +markup: pandoc +--- + +# Conditional expectation + +Recall that the expectation value $\mathbf{E}(X)$ +of a [random variable](/know/concept/random-variable/) $X$ +is a function of the probability space $(\Omega, \mathcal{F}, P)$ +on which $X$ is defined, and the definition of $X$ itself. + +The **conditional expectation** $\mathbf{E}(X|A)$ +is the expectation value of $X$ given that an event $A$ has occurred, +i.e. only the outcomes $\omega \in \Omega$ +satisfying $\omega \in A$ should be considered. +If $A$ is obtained by observing another variable, +then $\mathbf{E}(X|A)$ is a random variable in its own right. + +Consider two random variables $X$ and $Y$ +on the same probability space $(\Omega, \mathcal{F}, P)$, +and suppose that $\Omega$ is discrete. +If $Y = y$ has been observed, +then the conditional expectation of $X$ +given the event $Y = y$ is as follows: + +$$\begin{aligned} + \mathbf{E}(X | Y \!=\! y) + = \sum_{x} x \: Q(X \!=\! x) + \qquad \quad + Q(X \!=\! x) + = \frac{P(X \!=\! x \cap Y \!=\! y)}{P(Y \!=\! y)} +\end{aligned}$$ + +Where $Q$ is a renormalized probability function, +which assigns zero to all events incompatible with $Y = y$. +If we allow $\Omega$ to be continuous, +then from the definition $\mathbf{E}(X)$, +we know that the following Lebesgue integral can be used, +which we call $f(y)$: + +$$\begin{aligned} + \mathbf{E}(X | Y \!=\! y) + = f(y) + = \int_\Omega X(\omega) \dd{Q(\omega)} +\end{aligned}$$ + +However, this is only valid if $P(Y \!=\! y) > 0$, +which is a problem for continuous sample spaces $\Omega$. +Sticking with the assumption $P(Y \!=\! y) > 0$, notice that: + +$$\begin{aligned} + f(y) + = \frac{1}{P(Y \!=\! y)} \int_\Omega X(\omega) \dd{P(\omega \cap Y \!=\! y)} + = \frac{\mathbf{E}(X \cdot I(Y \!=\! y))}{P(Y \!=\! y)} +\end{aligned}$$ + +Where $I$ is the indicator function, +equal to $1$ if its argument is true, and $0$ if not. +Multiplying the definition of $f(y)$ by $P(Y \!=\! y)$ then leads us to: + +$$\begin{aligned} + \mathbf{E}(X \cdot I(Y \!=\! y)) + &= f(y) \cdot P(Y \!=\! y) + \\ + &= \mathbf{E}(f(Y) \cdot I(Y \!=\! y)) +\end{aligned}$$ + +Recall that because $Y$ is a random variable, +$\mathbf{E}(X|Y) = f(Y)$ is too. +In other words, $f$ maps $Y$ to another random variable, +which, due to the *Doob-Dynkin lemma* +(see [$\sigma$-algebra](/know/concept/sigma-algebra/)), +must mean that $\mathbf{E}(X|Y)$ is measurable with respect to $\sigma(Y)$. +Intuitively, this makes some sense: +$\mathbf{E}(X|Y)$ cannot contain more information about events +than the $Y$ it was calculated from. + +This suggests a straightforward generalization of the above: +instead of a specific value $Y = y$, +we can condition on *any* information from $Y$. +If $\mathcal{H} = \sigma(Y)$ is the information generated by $Y$, +then the conditional expectation $\mathbf{E}(X|\mathcal{H}) = Z$ +is $\mathcal{H}$-measurable, and given by a $Z$ satisfying: + +$$\begin{aligned} + \boxed{ + \mathbf{E}\big(X \cdot I(H)\big) + = \mathbf{E}\big(Z \cdot I(H)\big) + } +\end{aligned}$$ + +For any $H \in \mathcal{H}$. Note that $Z$ is almost surely unique: +*almost* because it could take any value +for an event $A$ with zero probability $P(A) = 0$. +Fortunately, if there exists a continuous $f$ +such that $\mathbf{E}(X | \sigma(Y)) = f(Y)$, +then $Z = \mathbf{E}(X | \sigma(Y))$ is unique. + +A conditional expectation defined in this way has many useful properties, +most notably linearity: +$\mathbf{E}(aX \!+\! bY | \mathcal{H}) = a \mathbf{E}(X|\mathcal{H}) + b \mathbf{E}(Y|\mathcal{H})$ +for any $a, b \in \mathbb{R}$. + +The **tower property** states that if $\mathcal{F} \supset \mathcal{G} \supset \mathcal{H}$, +then $\mathbf{E}(\mathbf{E}(X|\mathcal{G})|\mathcal{H}) = \mathbf{E}(X|\mathcal{H})$. +Intuitively, this works as follows: +suppose person $G$ knows more about $X$ than person $H$, +then $\mathbf{E}(X | \mathcal{H})$ is $H$'s expectation, +$\mathbf{E}(X | \mathcal{G})$ is $G$'s "better" expectation, +and then $\mathbf{E}(\mathbf{E}(X|\mathcal{G})|\mathcal{H})$ +is $H$'s prediction about what $G$'s expectation will be. +However, $H$ does not have access to $G$'s extra information, +so $H$'s best prediction is simply $\mathbf{E}(X | \mathcal{H})$. + +The **law of total expectation** says that +$\mathbf{E}(\mathbf{E}(X | \mathcal{G})) = \mathbf{E}(X)$, +and follows from the above tower property +by choosing $\mathcal{H}$ to contain no information: +$\mathcal{H} = \{ \varnothing, \Omega \}$. + +Another useful property is that $\mathbf{E}(X | \mathcal{H}) = X$ +if $X$ is $\mathcal{H}$-measurable. +In other words, if $\mathcal{H}$ already contains +all the information extractable from $X$, +then we know $X$'s exact value. +Conveniently, this can easily be generalized to products: +$\mathbf{E}(XY | \mathcal{H}) = X \mathbf{E}(Y | \mathcal{H})$ +if $X$ is $\mathcal{H}$-measurable: +since $X$'s value is known, it can simply be factored out. + +Armed with this definition of conditional expectation, +we can define other conditional quantities, +such as the **conditional variance** $\mathbf{V}(X | \mathcal{H})$: + +$$\begin{aligned} + \mathbf{V}(X | \mathcal{H}) + = \mathbf{E}(X^2 | \mathcal{H}) - \big(\mathbf{E}(X | \mathcal{H})\big)^2 +\end{aligned}$$ + +The **law of total variance** then states that +$\mathbf{V}(X) = \mathbf{E}(\mathbf{V}(X | \mathcal{H})) + \mathbf{V}(\mathbf{E}(X | \mathcal{H}))$. + +Likewise, we can define the **conditional probability** $P$, +**conditional distribution function** $F_{X|\mathcal{H}}$, +and **conditional density function** $f_{X|\mathcal{H}}$ +like their non-conditional counterparts: + +$$\begin{aligned} + P(A | \mathcal{H}) + = \mathbf{E}(I(A) | \mathcal{H}) + \qquad + F_{X|\mathcal{H}}(x) + = P(X \le x | \mathcal{H}) + \qquad + f_{X|\mathcal{H}}(x) + = \dv{F_{X|\mathcal{H}}}{x} +\end{aligned}$$ + + + +## References +1. U.F. Thygesen, + *Lecture notes on diffusions and stochastic differential equations*, + 2021, Polyteknisk Kompendie. diff --git a/content/know/concept/random-variable/index.pdc b/content/know/concept/random-variable/index.pdc new file mode 100644 index 0000000..fe50b60 --- /dev/null +++ b/content/know/concept/random-variable/index.pdc @@ -0,0 +1,171 @@ +--- +title: "Random variable" +firstLetter: "R" +publishDate: 2021-10-22 +categories: +- Mathematics +- Statistics + +date: 2021-10-21T20:40:42+02:00 +draft: false +markup: pandoc +--- + +# Random variable + +**Random variables** are the bread and butter +of probability theory and statistics, +and are simply variables whose value depends +on the outcome of a random experiment. +Here, we will describe the formal mathematical definition +of a random variable. + + +## Probability space + +A **probability space** or **probability triple** $(\Omega, \mathcal{F}, P)$ +is the formal mathematical model of a given **stochastic experiment**, +i.e. a process with a random outcome. + +The **sample space** $\Omega$ is the set +of all possible outcomes $\omega$ of the experimement. +Those $\omega$ are selected randomly according to certain criteria. +A subset $A \subset \Omega$ is called an **event**, +and can be regarded as a true statement about all $\omega$ in that $A$. + +The **event space** $\mathcal{F}$ is a set of events $A$ +that are interesting to us, +i.e. we have subjectively chosen $\mathcal{F}$ +based on the problem at hand. +Since events $A$ represent statements about outcomes $\omega$, +and we would like to use logic on those statemenets, +we demand that $\mathcal{F}$ is a [$\sigma$-algebra](/know/concept/sigma-algebra/). + +Finally, the **probability measure** or **probability function** $P$ +is a function that maps $A$ events to probabilities $P(A)$. +Formally, $P : \mathcal{F} \to \mathbb{R}$ is defined to satisfy: + +1. If $A \in \mathcal{F}$, then $P(A) \in [0, 1]$. +2. If $A, B \in \mathcal{F}$ do not overlap $A \cap B = \varnothing$, + then $P(A \cup B) = P(A) + P(B)$. +3. The total probability $P(\Omega) = 1$. + +The reason we only assign probability to events $A$ +rather than individual outcomes $\omega$ is that +if $\Omega$ is continuous, all $\omega$ have zero probability, +while intervals $A$ can have nonzero probability. + + +## Random variable + +Once we have a probability space $(\Omega, \mathcal{F}, P)$, +we can define a **random variable** $X$ +as a function that maps outcomes $\omega$ +to another set, usually the real numbers. + +To be a valid real-valued random variable, +a function $X : \Omega \to \mathbb{R}^n$ must satisfy the following condition, +in which case $X$ is said to be **measurable** +from $(\Omega, \mathcal{F})$ to $(\mathbb{R}^n, \mathcal{B}(\mathbb{R}^n))$: + +$$\begin{aligned} + \{ \omega \in \Omega : X(\omega) \in B \} \in \mathcal{F} + \quad \mathrm{for\:any\:} B \in \mathcal{B}(\mathbb{R}^n) +\end{aligned}$$ + +In other words, for a given Borel set (see $\sigma$-algebra) $B \in \mathcal{B}(\mathbb{R}^n)$, +the set of all outcomes $\omega \in \Omega$ that satisfy $X(\omega) \in B$ +must form a valid event; this set must be in $\mathcal{F}$. +The point is that we need to be able to assign probabilities +to statements of the form $X \in [a, b]$ for all $a < b$, +which is only possible if that statement corresponds to an event in $\mathcal{F}$, +since $P$'s domain is $\mathcal{F}$. + +Given such an $X$, and a set $B \subseteq \mathbb{R}$, +the **preimage** or **inverse image** $X^{-1}$ is defined as: + +$$\begin{aligned} + X^{-1}(B) + = \{ \omega \in \Omega : X(\omega) \in B \} +\end{aligned}$$ + +As suggested by the notation, +$X^{-1}$ can be regarded as the inverse of $X$: +it maps $B$ to the event for which $X \in B$. +With this, our earlier requirement that $X$ be measurable +can be written as: $X^{-1}(B) \in \mathcal{F}$ for any $B \in \mathcal{B}(\mathbb{R}^n)$. +This is also often stated as *"$X$ is $\mathcal{F}$-measurable"*. + +Now, we are ready to define some familiar concepts from probability theory. +The **cumulative distribution function** $F_X(x)$ is +the probability of the event where the realized value of $X$ +is smaller than some given $x \in \mathbb{R}$: + +$$\begin{aligned} + F_X(x) + = P(X \le x) + = P(\{ \omega \in \Omega : X(\omega) \le x \}) + = P(X^{-1}(]\!-\!\infty, x])) +\end{aligned}$$ + +If $F_X(x)$ is differentiable, +then the **probability density function** $f_X(x)$ is defined as: + +$$\begin{aligned} + f_X(x) + = \dv{F_X}{x} +\end{aligned}$$ + + +## Expectation value + +The **expectation value** $\mathbf{E}(X)$ of a random variable $X$ +can be defined in the familiar way, as the sum/integral +of every possible value of $X$ mutliplied by the corresponding probability (density). +For continuous and discrete sample spaces $\Omega$, respectively: + +$$\begin{aligned} + \mathbf{E}(X) + = \int_{-\infty}^\infty x \: f_X(x) \dd{x} + \qquad \mathrm{or} \qquad + \mathbf{E}(X) + = \sum_{i = 1}^N x_i \: P(X \!=\! x_i) +\end{aligned}$$ + +However, $f_X(x)$ is not guaranteed to exist, +and the distinction between continuous and discrete is cumbersome. +A more general definition of $\mathbf{E}(X)$ +is the following Lebesgue-Stieltjes integral, +since $F_X(x)$ always exists: + +$$\begin{aligned} + \mathbf{E}(X) + = \int_{-\infty}^\infty x \dd{F_X(x)} +\end{aligned}$$ + +This is valid for any sample space $\Omega$. +Or, equivalently, a Lebesgue integral can be used: + +$$\begin{aligned} + \mathbf{E}(X) + = \int_\Omega X(\omega) \dd{P(\omega)} +\end{aligned}$$ + +An expectation value defined in this way has many useful properties, +most notably linearity. + +We can also define the familiar **variance** $\mathbf{V}(X)$ +of a random variable $X$ as follows: + +$$\begin{aligned} + \mathbf{V}(X) + = \mathbf{E}\big( (X - \mathbf{E}(X))^2 \big) + = \mathbf{E}(X^2) - \big(\mathbf{E}(X)\big)^2 +\end{aligned}$$ + + + +## References +1. U.F. Thygesen, + *Lecture notes on diffusions and stochastic differential equations*, + 2021, Polyteknisk Kompendie. diff --git a/content/know/concept/sigma-algebra/index.pdc b/content/know/concept/sigma-algebra/index.pdc new file mode 100644 index 0000000..6e90fcb --- /dev/null +++ b/content/know/concept/sigma-algebra/index.pdc @@ -0,0 +1,89 @@ +--- +title: "σ-algebra" +firstLetter: "S" +publishDate: 2021-10-22 +categories: +- Mathematics + +date: 2021-10-18T10:01:35+02:00 +draft: false +markup: pandoc +--- + +# $\sigma$-algebra + +In set theory, given a set $\Omega$, a $\sigma$**-algebra** +is a family $\mathcal{F}$ of subsets of $\Omega$ +with these properties: + +1. The full set is included $\Omega \in \mathcal{F}$. +2. For all subsets $A$, if $A \in \mathcal{F}$, + then its complement $\Omega \backslash A \in \mathcal{F}$ too. +3. If two events $A, B \in \mathcal{F}$, + then their union $A \cup B \in \mathcal{F}$ too. + +This forms a Boolean algebra: +property (1) represents TRUE, +(2) is NOT, and (3) is AND, +and that is all we need to define all logic. +For example, FALSE and OR follow from the above points: + +4. The empty set is included $\varnothing \in \mathcal{F}$. +5. If two events $A, B \in \mathcal{F}$, + then their intersection $A \cap B \in \mathcal{F}$ too. + +For a given $\Omega$, there are typically multiple valid $\mathcal{F}$, +in which case you need to specify your choice. +Usually this would be the smallest $\mathcal{F}$ +(i.e. smallest family of subsets) +that contains all subsets of special interest +for the topic at hand. +Likewise, a **sub-$\sigma$-algebra** +is a sub-family of a certain $\mathcal{F}$, +which is a valid $\sigma$-algebra in its own right. + +A notable $\sigma$-algebra is the **Borel algebra** $\mathcal{B}(\Omega)$, +which is defined when $\Omega$ is a metric space, +such as the real numbers $\mathbb{R}$. +Using that as an example, the Borel algebra $\mathcal{B}(\mathbb{R})$ +is defined as the family of all open intervals of the real line, +and all the subsets of $\mathbb{R}$ obtained by countable sequences +of unions and intersections of those intervals. +The elements of $\mathcal{B}$ are **Borel sets**. + +Another example of a $\sigma$-algebra is the **information** +obtained by observing a [random variable](/know/concept/random-variable/) $X$. +Let $\sigma(X)$ be the information generated by observing $X$, +i.e. the events whose occurrence can be deduced from the value of $X$: + +$$\begin{aligned} + \sigma(X) + = X^{-1}(\mathcal{B}(\mathbb{R}^n)) + = \{ A \in \mathcal{F} : A = X^{-1}(B) \mathrm{\:for\:some\:} B \in \mathcal{B}(\mathbb{R}^n) \} +\end{aligned}$$ + +In other words, if the realized value of $X$ is +found to be in a certain Borel set $B \in \mathcal{B}(\mathbb{R}^n)$, +then the preimage $X^{-1}(B)$ (i.e. the event yielding this $B$) +is known to have occurred. + +Given a $\sigma$-algebra $\mathcal{H}$, +a random variable $Y$ is said to be *"$\mathcal{H}$-measurable"* +if $\sigma(Y) \subseteq \mathcal{H}$, +meaning that $\mathcal{H}$ contains at least +all information extractable from $Y$. + +Note that $\mathcal{H}$ can be generated by another random variable $X$, +i.e. $\mathcal{H} = \sigma(X)$. +In that case, the **Doob-Dynkin lemma** states +that $Y$ is only $\sigma(X)$-measurable +if $Y$ can always be computed from $X$, +i.e. there exists a function $f$ such that +$Y(\omega) = f(X(\omega))$ for all $\omega \in \Omega$. + + + +## References +1. U.F. Thygesen, + *Lecture notes on diffusions and stochastic differential equations*, + 2021, Polyteknisk Kompendie. |