--- title: "Binomial distribution" sort_title: "Binomial distribution" date: 2021-02-26 categories: - Statistics - Mathematics layout: "concept" --- The **binomial distribution** is a discrete probability distribution describing a **Bernoulli process**: a set of independent $$N$$ trials where each has only two possible outcomes, "success" and "failure", the former with probability $$p$$ and the latter with $$q = 1 - p$$. The binomial distribution then gives the probability that $$n$$ out of the $$N$$ trials succeed: $$\begin{aligned} \boxed{ P_N(n) = \binom{N}{n} \: p^n q^{N - n} } \end{aligned}$$ The first factor is known as the **binomial coefficient**, which describes the number of microstates (i.e. permutations) that have $$n$$ successes out of $$N$$ trials. These happen to be the coefficients in the polynomial $$(a + b)^N$$, and can be read off of Pascal's triangle. It is defined as follows: $$\begin{aligned} \boxed{ \binom{N}{n} = \frac{N!}{n! (N - n)!} } \end{aligned}$$ The remaining factor $$p^n (1 - p)^{N - n}$$ is then just the probability of attaining each microstate. The expected or mean number of successes $$\mu$$ after $$N$$ trials is as follows: $$\begin{aligned} \boxed{ \mu = N p } \end{aligned}$$ {% include proof/start.html id="proof-mean" -%} The trick is to treat $$p$$ and $$q$$ as independent and introduce a derivative: $$\begin{aligned} \mu &= \sum_{n = 0}^N n P_N(n) = \sum_{n = 0}^N n \binom{N}{n} p^n q^{N - n} = \sum_{n = 0}^N \binom{N}{n} \bigg( p \pdv{(p^n)}{p} \bigg) q^{N - n} \end{aligned}$$ Then, using the fact that the binomial coefficients appear when writing out $$(p + q)^N$$: $$\begin{aligned} \mu &= p \pdv{}{p}\sum_{n = 0}^N \binom{N}{n} p^n q^{N - n} = p \pdv{}{p}(p + q)^N = N p (p + q)^{N - 1} \end{aligned}$$ Finally, inserting $$q = 1 - p$$ gives the desired result. {% include proof/end.html id="proof-mean" %} Meanwhile, we find the following variance $$\sigma^2$$, with $$\sigma$$ being the standard deviation: $$\begin{aligned} \boxed{ \sigma^2 = N p q } \end{aligned}$$ {% include proof/start.html id="proof-var" -%} We reuse the previous trick to find $$\overline{n^2}$$ (the mean squared number of successes): $$\begin{aligned} \overline{n^2} &= \sum_{n = 0}^N n^2 \binom{N}{n} p^n q^{N - n} = \sum_{n = 0}^N n \binom{N}{n} \bigg( p \pdv{}{p} \bigg) p^n q^{N - n} \\ &= \sum_{n = 0}^N \binom{N}{n} \bigg( p \pdv{}{p} \bigg)^2 p^n q^{N - n} = \bigg( p \pdv{}{p} \bigg)^2 \sum_{n = 0}^N \binom{N}{n} p^n q^{N - n} \\ &= \bigg( p \pdv{}{p} \bigg)^2 (p + q)^N = N p \pdv{}{p}p (p + q)^{N - 1} \\ &= N p \big( (p + q)^{N - 1} + (N - 1) p (p + q)^{N - 2} \big) \\ &= N p + N^2 p^2 - N p^2 \end{aligned}$$ Using this and the earlier expression $$\mu = N p$$, we find the variance $$\sigma^2$$: $$\begin{aligned} \sigma^2 &= \overline{n^2} - \mu^2 = N p + N^2 p^2 - N p^2 - N^2 p^2 = N p (1 - p) \end{aligned}$$ By inserting $$q = 1 - p$$, we arrive at the desired expression. {% include proof/end.html id="proof-var" %} As $$N \to \infty$$, the binomial distribution turns into the continuous normal distribution, a fact that is sometimes called the **de Moivre-Laplace theorem**: $$\begin{aligned} \boxed{ \lim_{N \to \infty} P_N(n) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\!\bigg(\!-\!\frac{(n - \mu)^2}{2 \sigma^2} \bigg) } \end{aligned}$$ {% include proof/start.html id="proof-normal" -%} We take the Taylor expansion of $$\ln\!\big(P_N(n)\big)$$ around the mean $$\mu = Np$$: $$\begin{aligned} \ln\!\big(P_N(n)\big) &= \sum_{m = 0}^\infty \frac{(n - \mu)^m}{m!} D_m(\mu) \quad \mathrm{where} \quad D_m(n) \equiv \dvn{m}{\ln\!\big(P_N(n)\big)}{n} \end{aligned}$$ For future convenience while calculating the $$D_m$$, we write out $$\ln(P_N)$$ now: $$\begin{aligned} \ln\!\big(P_N(n)\big) &= \ln(N!) - \ln(n!) - \ln\!\big((N \!-\! n)!\big) + n \ln(p) + (N \!-\! n) \ln(q) \end{aligned}$$ For $$D_0(\mu)$$ specifically, we need to use a strong version of *Stirling's approximation* to arrive at a nonzero result in the end. We know that $$N - N p = N q$$: $$\begin{aligned} D_0(\mu) &= \ln\!\big(P_N(n)\big) \big|_{n = \mu} \\ &= \ln(N!) - \ln(\mu!) - \ln\!\big((N \!-\! \mu)!\big) + \mu \ln(p) + (N \!-\! \mu) \ln(q) \\ &= \ln(N!) - \ln\!\big((N p)!\big) - \ln\!\big((N q)!\big) + N p \ln(p) + N q \ln(q) \\ &\approx \Big( N \ln(N) - N + \frac{1}{2} \ln(2\pi N) \Big) - \Big( N p \ln(N p) - N p + \frac{1}{2} \ln(2\pi N p) \Big) \\ &\qquad - \Big( N q \ln(N q) - N q + \frac{1}{2} \ln(2\pi N q) \Big) + N p \ln(p) + N q \ln(q) \\ &= N \ln(N) - N (p \!+\! q) \ln(N) + N (p \!+\! q) - N - \frac{1}{2} \ln(2\pi N p q) \\ &= - \frac{1}{2} \ln(2\pi N p q) = \ln\!\bigg( \frac{1}{\sqrt{2\pi \sigma^2}} \bigg) \end{aligned}$$ Next, for $$D_m(\mu)$$ with $$m \ge 1$$, we can use a weaker version of Stirling's approximation: $$\begin{aligned} \ln(P_N) &\approx \ln(N!) - n \big( \ln(n) \!-\! 1 \big) - (N \!-\! n) \big( \ln(N \!-\! n) \!-\! 1 \big) + n \ln(p) + (N \!-\! n) \ln(q) \\ &\approx \ln(N!) - n \big( \ln(n) - \ln(p) - 1 \big) - (N\!-\!n) \big( \ln(N\!-\!n) - \ln(q) - 1 \big) \end{aligned}$$ We expect that $$D_1(\mu) = 0$$, because $$P_N$$ is maximized at $$\mu$$. Indeed it is: $$\begin{aligned} D_1(n) &= \dv{}{n} \ln\!\big((P_N(n)\big) \\ &= - \big( \ln(n) - \ln(p) - 1 \big) + \big( \ln(N\!-\!n) - \ln(q) - 1 \big) - \frac{n}{n} + \frac{N \!-\! n}{N \!-\! n} \\ &= - \ln(n) + \ln(N \!-\! n) + \ln(p) - \ln(q) \\ D_1(\mu) &= - \ln(\mu) + \ln(N \!-\! \mu) + \ln(p) - \ln(q) \\ &= - \ln(N p q) + \ln(N p q) \\ &= 0 \end{aligned}$$ For the same reason, we expect $$D_2(\mu)$$ to be negative. We find the following expression: $$\begin{aligned} D_2(n) &= \dvn{2}{}{n} \ln\!\big((P_N(n)\big) = \dv{}{n} D_1(n) = - \frac{1}{n} - \frac{1}{N - n} \\ D_2(\mu) &= - \frac{1}{Np} - \frac{1}{Nq} = - \frac{p + q}{N p q} = - \frac{1}{\sigma^2} \end{aligned}$$ The higher-order derivatives vanish much faster as $$N \to \infty$$, so we discard them: $$\begin{aligned} D_3(n) = \frac{1}{n^2} - \frac{1}{(N - n)^2} \qquad \quad D_4(n) = - \frac{2}{n^3} - \frac{2}{(N - n)^3} \qquad \quad \cdots \end{aligned}$$ Putting everything together, for large $$N$$, the Taylor series approximately becomes: $$\begin{aligned} \ln\!\big(P_N(n)\big) \approx D_0(\mu) + \frac{(n - \mu)^2}{2} D_2(\mu) = \ln\!\bigg( \frac{1}{\sqrt{2\pi \sigma^2}} \bigg) - \frac{(n - \mu)^2}{2 \sigma^2} \end{aligned}$$ Raising $$e$$ to this expression then yields a normalized Gaussian distribution. {% include proof/end.html id="proof-normal" %} ## References 1. H. Gould, J. Tobochnik, *Statistical and thermal physics*, 2nd edition, Princeton.