3 files changed, 433 insertions, 0 deletions
diff --git a/content/know/concept/conditional-expectation/index.pdc b/content/know/concept/conditional-expectation/index.pdc
new file mode 100644
index 0000000..7da7660
--- /dev/null
+++ b/content/know/concept/conditional-expectation/index.pdc
@@ -0,0 +1,173 @@
+---
+title: "Conditional expectation"
+firstLetter: "C"
+publishDate: 2021-10-23
+categories:
+- Mathematics
+- Statistics
+
+date: 2021-10-22T15:19:23+02:00
+draft: false
+markup: pandoc
+---
+
+# Conditional expectation
+
+Recall that the expectation value $\mathbf{E}(X)$
+of a [random variable](/know/concept/random-variable/) $X$
+is a function of the probability space $(\Omega, \mathcal{F}, P)$
+on which $X$ is defined, and the definition of $X$ itself.
+
+The **conditional expectation** $\mathbf{E}(X|A)$
+is the expectation value of $X$ given that an event $A$ has occurred,
+i.e. only the outcomes $\omega \in \Omega$
+satisfying $\omega \in A$ should be considered.
+If $A$ is obtained by observing another variable,
+then $\mathbf{E}(X|A)$ is a random variable in its own right.
+
+Consider two random variables $X$ and $Y$
+on the same probability space $(\Omega, \mathcal{F}, P)$,
+and suppose that $\Omega$ is discrete.
+If $Y = y$ has been observed,
+then the conditional expectation of $X$
+given the event $Y = y$ is as follows:
+
+$$\begin{aligned}
+    \mathbf{E}(X | Y \!=\! y)
+    = \sum_{x} x \: Q(X \!=\! x)
+    \qquad \quad
+    Q(X \!=\! x)
+    = \frac{P(X \!=\! x \cap Y \!=\! y)}{P(Y \!=\! y)}
+\end{aligned}$$
+
+Where $Q$ is a renormalized probability function,
+which assigns zero to all events incompatible with $Y = y$.
+If we allow $\Omega$ to be continuous,
+then from the definition $\mathbf{E}(X)$,
+we know that the following Lebesgue integral can be used,
+which we call $f(y)$:
+
+$$\begin{aligned}
+    \mathbf{E}(X | Y \!=\! y)
+    = f(y)
+    = \int_\Omega X(\omega) \dd{Q(\omega)}
+\end{aligned}$$
+
+However, this is only valid if $P(Y \!=\! y) > 0$,
+which is a problem for continuous sample spaces $\Omega$.
+Sticking with the assumption $P(Y \!=\! y) > 0$, notice that:
+
+$$\begin{aligned}
+    f(y)
+    = \frac{1}{P(Y \!=\! y)} \int_\Omega X(\omega) \dd{P(\omega \cap Y \!=\! y)}
+    = \frac{\mathbf{E}(X \cdot I(Y \!=\! y))}{P(Y \!=\! y)}
+\end{aligned}$$
+
+Where $I$ is the indicator function,
+equal to $1$ if its argument is true, and $0$ if not.
+Multiplying the definition of $f(y)$ by $P(Y \!=\! y)$ then leads us to:
+
+$$\begin{aligned}
+    \mathbf{E}(X \cdot I(Y \!=\! y))
+    &= f(y) \cdot P(Y \!=\! y)
+    \\
+    &= \mathbf{E}(f(Y) \cdot I(Y \!=\! y))
+\end{aligned}$$
+
+Recall that because $Y$ is a random variable,
+$\mathbf{E}(X|Y) = f(Y)$ is too.
+In other words, $f$ maps $Y$ to another random variable,
+which, due to the *Doob-Dynkin lemma*
+(see [$\sigma$-algebra](/know/concept/sigma-algebra/)),
+must mean that $\mathbf{E}(X|Y)$ is measurable with respect to $\sigma(Y)$.
+Intuitively, this makes some sense:
+$\mathbf{E}(X|Y)$ cannot contain more information about events
+than the $Y$ it was calculated from.
+
+This suggests a straightforward generalization of the above:
+instead of a specific value $Y = y$,
+we can condition on *any* information from $Y$.
+If $\mathcal{H} = \sigma(Y)$ is the information generated by $Y$,
+then the conditional expectation $\mathbf{E}(X|\mathcal{H}) = Z$
+is $\mathcal{H}$-measurable, and given by a $Z$ satisfying:
+
+$$\begin{aligned}
+    \boxed{
+        \mathbf{E}\big(X \cdot I(H)\big)
+        = \mathbf{E}\big(Z \cdot I(H)\big)
+    }
+\end{aligned}$$
+
+For any $H \in \mathcal{H}$. Note that $Z$ is almost surely unique:
+*almost* because it could take any value
+for an event $A$ with zero probability $P(A) = 0$.
+Fortunately, if there exists a continuous $f$
+such that $\mathbf{E}(X | \sigma(Y)) = f(Y)$,
+then $Z = \mathbf{E}(X | \sigma(Y))$ is unique.
+
+A conditional expectation defined in this way has many useful properties,
+most notably linearity:
+$\mathbf{E}(aX \!+\! bY | \mathcal{H}) = a \mathbf{E}(X|\mathcal{H}) + b \mathbf{E}(Y|\mathcal{H})$
+for any $a, b \in \mathbb{R}$.
+
+The **tower property** states that if $\mathcal{F} \supset \mathcal{G} \supset \mathcal{H}$,
+then $\mathbf{E}(\mathbf{E}(X|\mathcal{G})|\mathcal{H}) = \mathbf{E}(X|\mathcal{H})$.
+Intuitively, this works as follows:
+suppose person $G$ knows more about $X$ than person $H$,
+then $\mathbf{E}(X | \mathcal{H})$ is $H$'s expectation,
+$\mathbf{E}(X | \mathcal{G})$ is $G$'s "better" expectation,
+and then $\mathbf{E}(\mathbf{E}(X|\mathcal{G})|\mathcal{H})$
+is $H$'s prediction about what $G$'s expectation will be.
+However, $H$ does not have access to $G$'s extra information,
+so $H$'s best prediction is simply $\mathbf{E}(X | \mathcal{H})$.
+
+The **law of total expectation** says that
+$\mathbf{E}(\mathbf{E}(X | \mathcal{G})) = \mathbf{E}(X)$,
+and follows from the above tower property
+by choosing $\mathcal{H}$ to contain no information:
+$\mathcal{H} = \{ \varnothing, \Omega \}$.
+
+Another useful property is that $\mathbf{E}(X | \mathcal{H}) = X$
+if $X$ is $\mathcal{H}$-measurable.
+In other words, if $\mathcal{H}$ already contains
+all the information extractable from $X$,
+then we know $X$'s exact value.
+Conveniently, this can easily be generalized to products:
+$\mathbf{E}(XY | \mathcal{H}) = X \mathbf{E}(Y | \mathcal{H})$
+if $X$ is $\mathcal{H}$-measurable:
+since $X$'s value is known, it can simply be factored out.
+
+Armed with this definition of conditional expectation,
+we can define other conditional quantities,
+such as the **conditional variance** $\mathbf{V}(X | \mathcal{H})$:
+
+$$\begin{aligned}
+    \mathbf{V}(X | \mathcal{H})
+    = \mathbf{E}(X^2 | \mathcal{H}) - \big(\mathbf{E}(X | \mathcal{H})\big)^2
+\end{aligned}$$
+
+The **law of total variance** then states that
+$\mathbf{V}(X) = \mathbf{E}(\mathbf{V}(X | \mathcal{H})) + \mathbf{V}(\mathbf{E}(X | \mathcal{H}))$.
+
+Likewise, we can define the **conditional probability** $P$,
+**conditional distribution function** $F_{X|\mathcal{H}}$,
+and **conditional density function** $f_{X|\mathcal{H}}$
+like their non-conditional counterparts:
+
+$$\begin{aligned}
+    P(A | \mathcal{H})
+    = \mathbf{E}(I(A) | \mathcal{H})
+    \qquad
+    F_{X|\mathcal{H}}(x)
+    = P(X \le x | \mathcal{H})
+    \qquad
+    f_{X|\mathcal{H}}(x)
+    = \dv{F_{X|\mathcal{H}}}{x}
+\end{aligned}$$
+
+
+
+## References
+1.  U.F. Thygesen,
+    *Lecture notes on diffusions and stochastic differential equations*,
+    2021, Polyteknisk Kompendie.
diff --git a/content/know/concept/random-variable/index.pdc b/content/know/concept/random-variable/index.pdc
new file mode 100644
index 0000000..fe50b60
--- /dev/null
+++ b/content/know/concept/random-variable/index.pdc
@@ -0,0 +1,171 @@
+---
+title: "Random variable"
+firstLetter: "R"
+publishDate: 2021-10-22
+categories:
+- Mathematics
+- Statistics
+
+date: 2021-10-21T20:40:42+02:00
+draft: false
+markup: pandoc
+---
+
+# Random variable
+
+**Random variables** are the bread and butter
+of probability theory and statistics,
+and are simply variables whose value depends
+on the outcome of a random experiment.
+Here, we will describe the formal mathematical definition
+of a random variable.
+
+
+## Probability space
+
+A **probability space** or **probability triple** $(\Omega, \mathcal{F}, P)$
+is the formal mathematical model of a given **stochastic experiment**,
+i.e. a process with a random outcome.
+
+The **sample space** $\Omega$ is the set
+of all possible outcomes $\omega$ of the experimement.
+Those $\omega$ are selected randomly according to certain criteria.
+A subset $A \subset \Omega$ is called an **event**,
+and can be regarded as a true statement about all $\omega$ in that $A$.
+
+The **event space** $\mathcal{F}$ is a set of events $A$
+that are interesting to us,
+i.e. we have subjectively chosen $\mathcal{F}$
+based on the problem at hand.
+Since events $A$ represent statements about outcomes $\omega$,
+and we would like to use logic on those statemenets,
+we demand that $\mathcal{F}$ is a [$\sigma$-algebra](/know/concept/sigma-algebra/).
+
+Finally, the **probability measure** or **probability function** $P$
+is a function that maps $A$ events to probabilities $P(A)$.
+Formally, $P : \mathcal{F} \to \mathbb{R}$ is defined to satisfy:
+
+1.  If $A \in \mathcal{F}$, then $P(A) \in [0, 1]$.
+2.  If $A, B \in \mathcal{F}$ do not overlap $A \cap B = \varnothing$,
+    then $P(A \cup B) = P(A) + P(B)$.
+3.  The total probability $P(\Omega) = 1$.
+
+The reason we only assign probability to events $A$
+rather than individual outcomes $\omega$ is that
+if $\Omega$ is continuous, all $\omega$ have zero probability,
+while intervals $A$ can have nonzero probability.
+
+
+## Random variable
+
+Once we have a probability space $(\Omega, \mathcal{F}, P)$,
+we can define a **random variable** $X$
+as a function that maps outcomes $\omega$
+to another set, usually the real numbers.
+
+To be a valid real-valued random variable,
+a function $X : \Omega \to \mathbb{R}^n$ must satisfy the following condition,
+in which case $X$ is said to be **measurable**
+from $(\Omega, \mathcal{F})$ to $(\mathbb{R}^n, \mathcal{B}(\mathbb{R}^n))$:
+
+$$\begin{aligned}
+    \{ \omega \in \Omega : X(\omega) \in B \} \in \mathcal{F}
+    \quad \mathrm{for\:any\:} B \in \mathcal{B}(\mathbb{R}^n)
+\end{aligned}$$
+
+In other words, for a given Borel set (see $\sigma$-algebra) $B \in \mathcal{B}(\mathbb{R}^n)$,
+the set of all outcomes $\omega \in \Omega$ that satisfy $X(\omega) \in B$
+must form a valid event; this set must be in $\mathcal{F}$.
+The point is that we need to be able to assign probabilities
+to statements of the form $X \in [a, b]$ for all $a < b$,
+which is only possible if that statement corresponds to an event in $\mathcal{F}$,
+since $P$'s domain is $\mathcal{F}$.
+
+Given such an $X$, and a set $B \subseteq \mathbb{R}$,
+the **preimage** or **inverse image** $X^{-1}$ is defined as:
+
+$$\begin{aligned}
+    X^{-1}(B)
+    = \{ \omega \in \Omega : X(\omega) \in B \}
+\end{aligned}$$
+
+As suggested by the notation,
+$X^{-1}$ can be regarded as the inverse of $X$:
+it maps $B$ to the event for which $X \in B$.
+With this, our earlier requirement that $X$ be measurable
+can be written as: $X^{-1}(B) \in \mathcal{F}$ for any $B \in \mathcal{B}(\mathbb{R}^n)$.
+This is also often stated as *"$X$ is $\mathcal{F}$-measurable"*.
+
+Now, we are ready to define some familiar concepts from probability theory.
+The **cumulative distribution function** $F_X(x)$ is
+the probability of the event where the realized value of $X$
+is smaller than some given $x \in \mathbb{R}$:
+
+$$\begin{aligned}
+    F_X(x)
+    = P(X \le x)
+    = P(\{ \omega \in \Omega : X(\omega) \le x \})
+    = P(X^{-1}(]\!-\!\infty, x]))
+\end{aligned}$$
+
+If $F_X(x)$ is differentiable,
+then the **probability density function** $f_X(x)$ is defined as:
+
+$$\begin{aligned}
+    f_X(x)
+    = \dv{F_X}{x}
+\end{aligned}$$
+
+
+## Expectation value
+
+The **expectation value** $\mathbf{E}(X)$ of a random variable $X$
+can be defined in the familiar way, as the sum/integral
+of every possible value of $X$ mutliplied by the corresponding probability (density).
+For continuous and discrete sample spaces $\Omega$, respectively:
+
+$$\begin{aligned}
+    \mathbf{E}(X)
+    = \int_{-\infty}^\infty x \: f_X(x) \dd{x}
+    \qquad \mathrm{or} \qquad
+    \mathbf{E}(X)
+    = \sum_{i = 1}^N x_i \: P(X \!=\! x_i)
+\end{aligned}$$
+
+However, $f_X(x)$ is not guaranteed to exist,
+and the distinction between continuous and discrete is cumbersome.
+A more general definition of $\mathbf{E}(X)$
+is the following Lebesgue-Stieltjes integral,
+since $F_X(x)$ always exists:
+
+$$\begin{aligned}
+    \mathbf{E}(X)
+    = \int_{-\infty}^\infty x \dd{F_X(x)}
+\end{aligned}$$
+
+This is valid for any sample space $\Omega$.
+Or, equivalently, a Lebesgue integral can be used:
+
+$$\begin{aligned}
+    \mathbf{E}(X)
+    = \int_\Omega X(\omega) \dd{P(\omega)}
+\end{aligned}$$
+
+An expectation value defined in this way has many useful properties,
+most notably linearity.
+
+We can also define the familiar **variance** $\mathbf{V}(X)$
+of a random variable $X$ as follows:
+
+$$\begin{aligned}
+    \mathbf{V}(X)
+    = \mathbf{E}\big( (X - \mathbf{E}(X))^2 \big)
+    = \mathbf{E}(X^2) - \big(\mathbf{E}(X)\big)^2
+\end{aligned}$$
+
+
+
+## References
+1.  U.F. Thygesen,
+    *Lecture notes on diffusions and stochastic differential equations*,
+    2021, Polyteknisk Kompendie.
diff --git a/content/know/concept/sigma-algebra/index.pdc b/content/know/concept/sigma-algebra/index.pdc
new file mode 100644
index 0000000..6e90fcb
--- /dev/null
+++ b/content/know/concept/sigma-algebra/index.pdc
@@ -0,0 +1,89 @@
+---
+title: "σ-algebra"
+firstLetter: "S"
+publishDate: 2021-10-22
+categories:
+- Mathematics
+
+date: 2021-10-18T10:01:35+02:00
+draft: false
+markup: pandoc
+---
+
+# $\sigma$-algebra
+
+In set theory, given a set $\Omega$, a $\sigma$**-algebra**
+is a family $\mathcal{F}$ of subsets of $\Omega$
+with these properties:
+
+1.  The full set is included $\Omega \in \mathcal{F}$.
+2.  For all subsets $A$, if $A \in \mathcal{F}$,
+    then its complement $\Omega \backslash A \in \mathcal{F}$ too.
+3.  If two events $A, B \in \mathcal{F}$,
+    then their union $A \cup B \in \mathcal{F}$ too.
+
+This forms a Boolean algebra:
+property (1) represents TRUE,
+(2) is NOT, and (3) is AND,
+and that is all we need to define all logic.
+For example, FALSE and OR follow from the above points:
+
+4.  The empty set is included $\varnothing \in \mathcal{F}$.
+5.  If two events $A, B \in \mathcal{F}$,
+    then their intersection $A \cap B \in \mathcal{F}$ too.
+
+For a given $\Omega$, there are typically multiple valid $\mathcal{F}$,
+in which case you need to specify your choice.
+Usually this would be the smallest $\mathcal{F}$
+(i.e. smallest family of subsets)
+that contains all subsets of special interest
+for the topic at hand.
+Likewise, a **sub-$\sigma$-algebra**
+is a sub-family of a certain $\mathcal{F}$,
+which is a valid $\sigma$-algebra in its own right.
+
+A notable $\sigma$-algebra is the **Borel algebra** $\mathcal{B}(\Omega)$,
+which is defined when $\Omega$ is a metric space,
+such as the real numbers $\mathbb{R}$.
+Using that as an example, the Borel algebra $\mathcal{B}(\mathbb{R})$
+is defined as the family of all open intervals of the real line,
+and all the subsets of $\mathbb{R}$ obtained by countable sequences
+of unions and intersections of those intervals.
+The elements of $\mathcal{B}$ are **Borel sets**.
+
+Another example of a $\sigma$-algebra is the **information**
+obtained by observing a [random variable](/know/concept/random-variable/) $X$.
+Let $\sigma(X)$ be the information generated by observing $X$,
+i.e. the events whose occurrence can be deduced from the value of $X$:
+
+$$\begin{aligned}
+    \sigma(X)
+    = X^{-1}(\mathcal{B}(\mathbb{R}^n))
+    = \{ A \in \mathcal{F} : A = X^{-1}(B) \mathrm{\:for\:some\:} B \in \mathcal{B}(\mathbb{R}^n) \}
+\end{aligned}$$
+
+In other words, if the realized value of $X$ is
+found to be in a certain Borel set $B \in \mathcal{B}(\mathbb{R}^n)$,
+then the preimage $X^{-1}(B)$ (i.e. the event yielding this $B$)
+is known to have occurred.
+
+Given a $\sigma$-algebra $\mathcal{H}$,
+a random variable $Y$ is said to be *"$\mathcal{H}$-measurable"*
+if $\sigma(Y) \subseteq \mathcal{H}$,
+meaning that $\mathcal{H}$ contains at least
+all information extractable from $Y$.
+
+Note that $\mathcal{H}$ can be generated by another random variable $X$,
+i.e. $\mathcal{H} = \sigma(X)$.
+In that case, the **Doob-Dynkin lemma** states
+that $Y$ is only $\sigma(X)$-measurable
+if $Y$ can always  be computed from $X$,
+i.e. there exists a function $f$ such that
+$Y(\omega) = f(X(\omega))$ for all $\omega \in \Omega$.
+
+
+
+## References
+1.  U.F. Thygesen,
+    *Lecture notes on diffusions and stochastic differential equations*,
+    2021, Polyteknisk Kompendie.