1 files changed, 36 insertions, 33 deletions
diff --git a/content/know/concept/conditional-expectation/index.pdc b/content/know/concept/conditional-expectation/index.pdc
index 7da7660..5bcc152 100644
--- a/content/know/concept/conditional-expectation/index.pdc
+++ b/content/know/concept/conditional-expectation/index.pdc
@@ -13,17 +13,17 @@ markup: pandoc
 
 # Conditional expectation
 
-Recall that the expectation value $\mathbf{E}(X)$
+Recall that the expectation value $\mathbf{E}[X]$
 of a [random variable](/know/concept/random-variable/) $X$
 is a function of the probability space $(\Omega, \mathcal{F}, P)$
 on which $X$ is defined, and the definition of $X$ itself.
 
-The **conditional expectation** $\mathbf{E}(X|A)$
+The **conditional expectation** $\mathbf{E}[X|A]$
 is the expectation value of $X$ given that an event $A$ has occurred,
 i.e. only the outcomes $\omega \in \Omega$
 satisfying $\omega \in A$ should be considered.
-If $A$ is obtained by observing another variable,
-then $\mathbf{E}(X|A)$ is a random variable in its own right.
+If $A$ is obtained by observing a variable,
+then $\mathbf{E}[X|A]$ is a random variable in its own right.
 
 Consider two random variables $X$ and $Y$
 on the same probability space $(\Omega, \mathcal{F}, P)$,
@@ -33,7 +33,7 @@ then the conditional expectation of $X$
 given the event $Y = y$ is as follows:
 
 $$\begin{aligned}
-    \mathbf{E}(X | Y \!=\! y)
+    \mathbf{E}[X | Y \!=\! y]
     = \sum_{x} x \: Q(X \!=\! x)
     \qquad \quad
     Q(X \!=\! x)
@@ -43,12 +43,12 @@ $$\begin{aligned}
 Where $Q$ is a renormalized probability function,
 which assigns zero to all events incompatible with $Y = y$.
 If we allow $\Omega$ to be continuous,
-then from the definition $\mathbf{E}(X)$,
+then from the definition $\mathbf{E}[X]$,
 we know that the following Lebesgue integral can be used,
 which we call $f(y)$:
 
 $$\begin{aligned}
-    \mathbf{E}(X | Y \!=\! y)
+    \mathbf{E}[X | Y \!=\! y]
     = f(y)
     = \int_\Omega X(\omega) \dd{Q(\omega)}
 \end{aligned}$$
@@ -60,7 +60,7 @@ Sticking with the assumption $P(Y \!=\! y) > 0$, notice that:
 $$\begin{aligned}
     f(y)
     = \frac{1}{P(Y \!=\! y)} \int_\Omega X(\omega) \dd{P(\omega \cap Y \!=\! y)}
-    = \frac{\mathbf{E}(X \cdot I(Y \!=\! y))}{P(Y \!=\! y)}
+    = \frac{\mathbf{E}[X \cdot I(Y \!=\! y)]}{P(Y \!=\! y)}
 \end{aligned}$$
 
 Where $I$ is the indicator function,
@@ -68,33 +68,33 @@ equal to $1$ if its argument is true, and $0$ if not.
 Multiplying the definition of $f(y)$ by $P(Y \!=\! y)$ then leads us to:
 
 $$\begin{aligned}
-    \mathbf{E}(X \cdot I(Y \!=\! y))
+    \mathbf{E}[X \cdot I(Y \!=\! y)]
     &= f(y) \cdot P(Y \!=\! y)
     \\
-    &= \mathbf{E}(f(Y) \cdot I(Y \!=\! y))
+    &= \mathbf{E}[f(Y) \cdot I(Y \!=\! y)]
 \end{aligned}$$
 
 Recall that because $Y$ is a random variable,
-$\mathbf{E}(X|Y) = f(Y)$ is too.
+$\mathbf{E}[X|Y] = f(Y)$ is too.
 In other words, $f$ maps $Y$ to another random variable,
 which, due to the *Doob-Dynkin lemma*
 (see [$\sigma$-algebra](/know/concept/sigma-algebra/)),
-must mean that $\mathbf{E}(X|Y)$ is measurable with respect to $\sigma(Y)$.
+must mean that $\mathbf{E}[X|Y]$ is measurable with respect to $\sigma(Y)$.
 Intuitively, this makes some sense:
-$\mathbf{E}(X|Y)$ cannot contain more information about events
+$\mathbf{E}[X|Y]$ cannot contain more information about events
 than the $Y$ it was calculated from.
 
 This suggests a straightforward generalization of the above:
 instead of a specific value $Y = y$,
 we can condition on *any* information from $Y$.
 If $\mathcal{H} = \sigma(Y)$ is the information generated by $Y$,
-then the conditional expectation $\mathbf{E}(X|\mathcal{H}) = Z$
+then the conditional expectation $\mathbf{E}[X|\mathcal{H}] = Z$
 is $\mathcal{H}$-measurable, and given by a $Z$ satisfying:
 
 $$\begin{aligned}
     \boxed{
-        \mathbf{E}\big(X \cdot I(H)\big)
-        = \mathbf{E}\big(Z \cdot I(H)\big)
+        \mathbf{E}\big[X \cdot I(H)\big]
+        = \mathbf{E}\big[Z \cdot I(H)\big]
     }
 \end{aligned}$$
 
@@ -102,52 +102,55 @@ For any $H \in \mathcal{H}$. Note that $Z$ is almost surely unique:
 *almost* because it could take any value
 for an event $A$ with zero probability $P(A) = 0$.
 Fortunately, if there exists a continuous $f$
-such that $\mathbf{E}(X | \sigma(Y)) = f(Y)$,
-then $Z = \mathbf{E}(X | \sigma(Y))$ is unique.
+such that $\mathbf{E}[X | \sigma(Y)] = f(Y)$,
+then $Z = \mathbf{E}[X | \sigma(Y)]$ is unique.
+
+
+## Properties
 
 A conditional expectation defined in this way has many useful properties,
 most notably linearity:
-$\mathbf{E}(aX \!+\! bY | \mathcal{H}) = a \mathbf{E}(X|\mathcal{H}) + b \mathbf{E}(Y|\mathcal{H})$
+$\mathbf{E}[aX \!+\! bY | \mathcal{H}] = a \mathbf{E}[X|\mathcal{H}] + b \mathbf{E}[Y|\mathcal{H}]$
 for any $a, b \in \mathbb{R}$.
 
 The **tower property** states that if $\mathcal{F} \supset \mathcal{G} \supset \mathcal{H}$,
-then $\mathbf{E}(\mathbf{E}(X|\mathcal{G})|\mathcal{H}) = \mathbf{E}(X|\mathcal{H})$.
+then $\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}] = \mathbf{E}[X|\mathcal{H}]$.
 Intuitively, this works as follows:
 suppose person $G$ knows more about $X$ than person $H$,
-then $\mathbf{E}(X | \mathcal{H})$ is $H$'s expectation,
-$\mathbf{E}(X | \mathcal{G})$ is $G$'s "better" expectation,
-and then $\mathbf{E}(\mathbf{E}(X|\mathcal{G})|\mathcal{H})$
+then $\mathbf{E}[X | \mathcal{H}]$ is $H$'s expectation,
+$\mathbf{E}[X | \mathcal{G}]$ is $G$'s "better" expectation,
+and then $\mathbf{E}[\mathbf{E}[X|\mathcal{G}]|\mathcal{H}]$
 is $H$'s prediction about what $G$'s expectation will be.
 However, $H$ does not have access to $G$'s extra information,
-so $H$'s best prediction is simply $\mathbf{E}(X | \mathcal{H})$.
+so $H$'s best prediction is simply $\mathbf{E}[X | \mathcal{H}]$.
 
 The **law of total expectation** says that
-$\mathbf{E}(\mathbf{E}(X | \mathcal{G})) = \mathbf{E}(X)$,
+$\mathbf{E}[\mathbf{E}[X | \mathcal{G}]] = \mathbf{E}[X]$,
 and follows from the above tower property
 by choosing $\mathcal{H}$ to contain no information:
 $\mathcal{H} = \{ \varnothing, \Omega \}$.
 
-Another useful property is that $\mathbf{E}(X | \mathcal{H}) = X$
+Another useful property is that $\mathbf{E}[X | \mathcal{H}] = X$
 if $X$ is $\mathcal{H}$-measurable.
 In other words, if $\mathcal{H}$ already contains
 all the information extractable from $X$,
 then we know $X$'s exact value.
 Conveniently, this can easily be generalized to products:
-$\mathbf{E}(XY | \mathcal{H}) = X \mathbf{E}(Y | \mathcal{H})$
+$\mathbf{E}[XY | \mathcal{H}] = X \mathbf{E}[Y | \mathcal{H}]$
 if $X$ is $\mathcal{H}$-measurable:
 since $X$'s value is known, it can simply be factored out.
 
 Armed with this definition of conditional expectation,
 we can define other conditional quantities,
-such as the **conditional variance** $\mathbf{V}(X | \mathcal{H})$:
+such as the **conditional variance** $\mathbf{V}[X | \mathcal{H}]$:
 
 $$\begin{aligned}
-    \mathbf{V}(X | \mathcal{H})
-    = \mathbf{E}(X^2 | \mathcal{H}) - \big(\mathbf{E}(X | \mathcal{H})\big)^2
+    \mathbf{V}[X | \mathcal{H}]
+    = \mathbf{E}[X^2 | \mathcal{H}] - \big[\mathbf{E}[X | \mathcal{H}]\big]^2
 \end{aligned}$$
 
 The **law of total variance** then states that
-$\mathbf{V}(X) = \mathbf{E}(\mathbf{V}(X | \mathcal{H})) + \mathbf{V}(\mathbf{E}(X | \mathcal{H}))$.
+$\mathbf{V}[X] = \mathbf{E}[\mathbf{V}[X | \mathcal{H}]] + \mathbf{V}[\mathbf{E}[X | \mathcal{H}]]$.
 
 Likewise, we can define the **conditional probability** $P$,
 **conditional distribution function** $F_{X|\mathcal{H}}$,
@@ -156,7 +159,7 @@ like their non-conditional counterparts:
 
 $$\begin{aligned}
     P(A | \mathcal{H})
-    = \mathbf{E}(I(A) | \mathcal{H})
+    = \mathbf{E}[I(A) | \mathcal{H}]
     \qquad
     F_{X|\mathcal{H}}(x)
     = P(X \le x | \mathcal{H})
@@ -168,6 +171,6 @@ $$\begin{aligned}
 
 
 ## References
-1.  U.F. Thygesen,
+1.  U.H. Thygesen,
     *Lecture notes on diffusions and stochastic differential equations*,
     2021, Polyteknisk Kompendie.