source/know/concept/binomial-distribution/index.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241

---
title: "Binomial distribution"
sort_title: "Binomial distribution"
date: 2021-02-26
categories:
- Statistics
- Mathematics
layout: "concept"
---

The **binomial distribution** is a discrete probability distribution
describing a **Bernoulli process**: a set of independent $$N$$ trials where
each has only two possible outcomes, "success" and "failure",
the former with probability $$p$$ and the latter with $$q = 1 - p$$.
The binomial distribution then gives the probability
that $$n$$ out of the $$N$$ trials succeed:

$$\begin{aligned}
    \boxed{
        P_N(n) = \binom{N}{n} \: p^n q^{N - n}
    }
\end{aligned}$$

The first factor is known as the **binomial coefficient**, which describes the
number of microstates (i.e. permutations) that have $$n$$ successes out of $$N$$ trials.
These happen to be the coefficients in the polynomial $$(a + b)^N$$,
and can be read off of Pascal's triangle.
It is defined as follows:

$$\begin{aligned}
    \boxed{
        \binom{N}{n} = \frac{N!}{n! (N - n)!}
    }
\end{aligned}$$

The remaining factor $$p^n (1 - p)^{N - n}$$ is then just the
probability of attaining each microstate.

The expected or mean number of successes $$\mu$$ after $$N$$ trials is as follows:

$$\begin{aligned}
    \boxed{
        \mu = N p
    }
\end{aligned}$$


{% include proof/start.html id="proof-mean" -%}
The trick is to treat $$p$$ and $$q$$ as independent and introduce a derivative:

$$\begin{aligned}
    \mu
    &= \sum_{n = 0}^N n P_N(n)
    = \sum_{n = 0}^N n \binom{N}{n} p^n q^{N - n}
    = \sum_{n = 0}^N \binom{N}{n} \bigg( p \pdv{(p^n)}{p} \bigg) q^{N - n}
\end{aligned}$$

Then, using the fact that the binomial coefficients appear when writing out $$(p + q)^N$$:

$$\begin{aligned}
    \mu
    &= p \pdv{}{p}\sum_{n = 0}^N \binom{N}{n} p^n q^{N - n}
    = p \pdv{}{p}(p + q)^N
    = N p (p + q)^{N - 1}
\end{aligned}$$

Finally, inserting $$q = 1 - p$$ gives the desired result.
{% include proof/end.html id="proof-mean" %}


Meanwhile, we find the following variance $$\sigma^2$$,
with $$\sigma$$ being the standard deviation:

$$\begin{aligned}
    \boxed{
        \sigma^2 = N p q
    }
\end{aligned}$$


{% include proof/start.html id="proof-var" -%}
We reuse the previous trick to find $$\overline{n^2}$$
(the mean squared number of successes):

$$\begin{aligned}
    \overline{n^2}
    &= \sum_{n = 0}^N n^2 \binom{N}{n} p^n q^{N - n}
    = \sum_{n = 0}^N n \binom{N}{n} \bigg( p \pdv{}{p} \bigg) p^n q^{N - n}
    \\
    &= \sum_{n = 0}^N \binom{N}{n} \bigg( p \pdv{}{p} \bigg)^2 p^n q^{N - n}
    = \bigg( p \pdv{}{p} \bigg)^2 \sum_{n = 0}^N \binom{N}{n} p^n q^{N - n}
    \\
    &= \bigg( p \pdv{}{p} \bigg)^2 (p + q)^N
    = N p \pdv{}{p}p (p + q)^{N - 1}
    \\
    &= N p \big( (p + q)^{N - 1} + (N - 1) p (p + q)^{N - 2} \big)
    \\
    &= N p + N^2 p^2 - N p^2
\end{aligned}$$

Using this and the earlier expression $$\mu = N p$$, we find the variance $$\sigma^2$$:

$$\begin{aligned}
    \sigma^2
    &= \overline{n^2} - \mu^2
    = N p + N^2 p^2 - N p^2 - N^2 p^2
    = N p (1 - p)
\end{aligned}$$

By inserting $$q = 1 - p$$, we arrive at the desired expression.
{% include proof/end.html id="proof-var" %}


As $$N \to \infty$$, the binomial distribution
turns into the continuous normal distribution,
a fact that is sometimes called the **de Moivre-Laplace theorem**:

$$\begin{aligned}
    \boxed{
        \lim_{N \to \infty} P_N(n) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\!\bigg(\!-\!\frac{(n - \mu)^2}{2 \sigma^2} \bigg)
    }
\end{aligned}$$


{% include proof/start.html id="proof-normal" -%}
We take the Taylor expansion of $$\ln\!\big(P_N(n)\big)$$
around the mean $$\mu = Np$$:

$$\begin{aligned}
    \ln\!\big(P_N(n)\big)
    &= \sum_{m = 0}^\infty \frac{(n - \mu)^m}{m!} D_m(\mu)
    \quad \mathrm{where} \quad
    D_m(n)
    \equiv \dvn{m}{\ln\!\big(P_N(n)\big)}{n}
\end{aligned}$$

For future convenience while calculating the $$D_m$$, we write out $$\ln(P_N)$$ now:

$$\begin{aligned}
    \ln\!\big(P_N(n)\big)
    &= \ln(N!) - \ln(n!) - \ln\!\big((N \!-\! n)!\big) + n \ln(p) + (N \!-\! n) \ln(q)
\end{aligned}$$

For $$D_0(\mu)$$ specifically,
we need to use a strong version of *Stirling's approximation*
to arrive at a nonzero result in the end.
We know that $$N - N p = N q$$:

$$\begin{aligned}
    D_0(\mu)
    &= \ln\!\big(P_N(n)\big) \big|_{n = \mu}
    \\
    &= \ln(N!) - \ln(\mu!) - \ln\!\big((N \!-\! \mu)!\big) + \mu \ln(p) + (N \!-\! \mu) \ln(q)
    \\
    &= \ln(N!) - \ln\!\big((N p)!\big) - \ln\!\big((N q)!\big) + N p \ln(p) + N q \ln(q)
    \\
    &\approx \Big( N \ln(N) - N + \frac{1}{2} \ln(2\pi N) \Big)
    - \Big( N p \ln(N p) - N p + \frac{1}{2} \ln(2\pi N p) \Big) \\
    &\qquad - \Big( N q \ln(N q) - N q + \frac{1}{2} \ln(2\pi N q) \Big)
    + N p \ln(p) + N q \ln(q)
    \\
    &= N \ln(N) - N (p \!+\! q) \ln(N) + N (p \!+\! q) - N - \frac{1}{2} \ln(2\pi N p q)
    \\
    &= - \frac{1}{2} \ln(2\pi N p q)
    = \ln\!\bigg( \frac{1}{\sqrt{2\pi \sigma^2}} \bigg)
\end{aligned}$$

Next, for $$D_m(\mu)$$ with $$m \ge 1$$,
we can use a weaker version of Stirling's approximation:

$$\begin{aligned}
    \ln(P_N)
    &\approx \ln(N!) - n \big( \ln(n) \!-\! 1 \big) - (N \!-\! n) \big( \ln(N \!-\! n) \!-\! 1 \big) + n \ln(p) + (N \!-\! n) \ln(q)
    \\
    &\approx \ln(N!) - n \big( \ln(n) - \ln(p) - 1 \big) - (N\!-\!n) \big( \ln(N\!-\!n) - \ln(q) - 1 \big)
\end{aligned}$$

We expect that $$D_1(\mu) = 0$$, because $$P_N$$ is maximized at $$\mu$$.
Indeed it is:

$$\begin{aligned}
    D_1(n)
    &= \dv{}{n} \ln\!\big((P_N(n)\big)
    \\
    &= - \big( \ln(n) - \ln(p) - 1 \big) + \big( \ln(N\!-\!n) - \ln(q) - 1 \big) - \frac{n}{n} + \frac{N \!-\! n}{N \!-\! n}
    \\
    &= - \ln(n) + \ln(N \!-\! n) + \ln(p) - \ln(q)
    \\
    D_1(\mu)
    &= - \ln(\mu) + \ln(N \!-\! \mu) + \ln(p) - \ln(q)
    \\
    &= - \ln(N p q) + \ln(N p q)
    \\
    &= 0
\end{aligned}$$

For the same reason, we expect $$D_2(\mu)$$ to be negative.
We find the following expression:

$$\begin{aligned}
    D_2(n)
    &= \dvn{2}{}{n} \ln\!\big((P_N(n)\big)
    = \dv{}{n} D_1(n)
    = - \frac{1}{n} - \frac{1}{N - n}
    \\
    D_2(\mu)
    &= - \frac{1}{Np} - \frac{1}{Nq}
    = - \frac{p + q}{N p q}
    = - \frac{1}{\sigma^2}
\end{aligned}$$

The higher-order derivatives vanish much faster as $$N \to \infty$$, so we discard them:

$$\begin{aligned}
    D_3(n)
    = \frac{1}{n^2} - \frac{1}{(N - n)^2}
    \qquad \quad
    D_4(n)
    = - \frac{2}{n^3} - \frac{2}{(N - n)^3}
    \qquad \quad
    \cdots
\end{aligned}$$

Putting everything together, for large $$N$$,
the Taylor series approximately becomes:

$$\begin{aligned}
    \ln\!\big(P_N(n)\big)
    \approx D_0(\mu) + \frac{(n - \mu)^2}{2} D_2(\mu)
    = \ln\!\bigg( \frac{1}{\sqrt{2\pi \sigma^2}} \bigg) - \frac{(n - \mu)^2}{2 \sigma^2}
\end{aligned}$$

Raising $$e$$ to this expression then yields a normalized Gaussian distribution.
{% include proof/end.html id="proof-normal" %}


## References
1.  H. Gould, J. Tobochnik,
    *Statistical and thermal physics*, 2nd edition,
    Princeton.