Categories: Mathematics, Statistics.

Binomial distribution

The binomial distribution is a discrete probability distribution describing a Bernoulli process: a set of independent NN trials where each has only two possible outcomes, “success” and “failure”, the former with probability pp and the latter with q=1pq = 1 - p. The binomial distribution then gives the probability that nn out of the NN trials succeed:

PN(n)=(Nn)pnqNn\begin{aligned} \boxed{ P_N(n) = \binom{N}{n} \: p^n q^{N - n} } \end{aligned}

The first factor is known as the binomial coefficient, which describes the number of microstates (i.e. permutations) that have nn successes out of NN trials. These happen to be the coefficients in the polynomial (a+b)N(a + b)^N, and can be read off of Pascal’s triangle. It is defined as follows:

(Nn)=N!n!(Nn)!\begin{aligned} \boxed{ \binom{N}{n} = \frac{N!}{n! (N - n)!} } \end{aligned}

The remaining factor pn(1p)Nnp^n (1 - p)^{N - n} is then just the probability of attaining each microstate.

The expected or mean number of successes μ\mu after NN trials is as follows:

μ=Np\begin{aligned} \boxed{ \mu = N p } \end{aligned}

The trick is to treat pp and qq as independent and introduce a derivative:

μ=n=0NnPN(n)=n=0Nn(Nn)pnqNn=n=0N(Nn)(p(pn)p)qNn\begin{aligned} \mu &= \sum_{n = 0}^N n P_N(n) = \sum_{n = 0}^N n \binom{N}{n} p^n q^{N - n} = \sum_{n = 0}^N \binom{N}{n} \bigg( p \pdv{(p^n)}{p} \bigg) q^{N - n} \end{aligned}

Then, using the fact that the binomial coefficients appear when writing out (p+q)N(p + q)^N:

μ=ppn=0N(Nn)pnqNn=pp(p+q)N=Np(p+q)N1\begin{aligned} \mu &= p \pdv{}{p}\sum_{n = 0}^N \binom{N}{n} p^n q^{N - n} = p \pdv{}{p}(p + q)^N = N p (p + q)^{N - 1} \end{aligned}

Finally, inserting q=1pq = 1 - p gives the desired result.

Meanwhile, we find the following variance σ2\sigma^2, with σ\sigma being the standard deviation:

σ2=Npq\begin{aligned} \boxed{ \sigma^2 = N p q } \end{aligned}

We reuse the previous trick to find n2\overline{n^2} (the mean squared number of successes):

n2=n=0Nn2(Nn)pnqNn=n=0Nn(Nn)(pp)pnqNn=n=0N(Nn)(pp)2pnqNn=(pp)2n=0N(Nn)pnqNn=(pp)2(p+q)N=Nppp(p+q)N1=Np((p+q)N1+(N1)p(p+q)N2)=Np+N2p2Np2\begin{aligned} \overline{n^2} &= \sum_{n = 0}^N n^2 \binom{N}{n} p^n q^{N - n} = \sum_{n = 0}^N n \binom{N}{n} \bigg( p \pdv{}{p} \bigg) p^n q^{N - n} \\ &= \sum_{n = 0}^N \binom{N}{n} \bigg( p \pdv{}{p} \bigg)^2 p^n q^{N - n} = \bigg( p \pdv{}{p} \bigg)^2 \sum_{n = 0}^N \binom{N}{n} p^n q^{N - n} \\ &= \bigg( p \pdv{}{p} \bigg)^2 (p + q)^N = N p \pdv{}{p}p (p + q)^{N - 1} \\ &= N p \big( (p + q)^{N - 1} + (N - 1) p (p + q)^{N - 2} \big) \\ &= N p + N^2 p^2 - N p^2 \end{aligned}

Using this and the earlier expression μ=Np\mu = N p, we find the variance σ2\sigma^2:

σ2=n2μ2=Np+N2p2Np2N2p2=Np(1p)\begin{aligned} \sigma^2 &= \overline{n^2} - \mu^2 = N p + N^2 p^2 - N p^2 - N^2 p^2 = N p (1 - p) \end{aligned}

By inserting q=1pq = 1 - p, we arrive at the desired expression.

As NN \to \infty, the binomial distribution turns into the continuous normal distribution, a fact that is sometimes called the de Moivre-Laplace theorem:

limNPN(n)=12πσ2exp ⁣( ⁣ ⁣(nμ)22σ2)\begin{aligned} \boxed{ \lim_{N \to \infty} P_N(n) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\!\bigg(\!-\!\frac{(n - \mu)^2}{2 \sigma^2} \bigg) } \end{aligned}

We take the Taylor expansion of ln ⁣(PN(n))\ln\!\big(P_N(n)\big) around the mean μ=Np\mu = Np:

ln ⁣(PN(n))=m=0(nμ)mm!Dm(μ)whereDm(n)dmln ⁣(PN(n))dnm\begin{aligned} \ln\!\big(P_N(n)\big) &= \sum_{m = 0}^\infty \frac{(n - \mu)^m}{m!} D_m(\mu) \quad \mathrm{where} \quad D_m(n) \equiv \dvn{m}{\ln\!\big(P_N(n)\big)}{n} \end{aligned}

For future convenience while calculating the DmD_m, we write out ln(PN)\ln(P_N) now:

ln ⁣(PN(n))=ln(N!)ln(n!)ln ⁣((N ⁣ ⁣n)!)+nln(p)+(N ⁣ ⁣n)ln(q)\begin{aligned} \ln\!\big(P_N(n)\big) &= \ln(N!) - \ln(n!) - \ln\!\big((N \!-\! n)!\big) + n \ln(p) + (N \!-\! n) \ln(q) \end{aligned}

For D0(μ)D_0(\mu) specifically, we need to use a strong version of Stirling’s approximation to arrive at a nonzero result in the end. We know that NNp=NqN - N p = N q:

D0(μ)=ln ⁣(PN(n))n=μ=ln(N!)ln(μ!)ln ⁣((N ⁣ ⁣μ)!)+μln(p)+(N ⁣ ⁣μ)ln(q)=ln(N!)ln ⁣((Np)!)ln ⁣((Nq)!)+Npln(p)+Nqln(q)(Nln(N)N+12ln(2πN))(Npln(Np)Np+12ln(2πNp))(Nqln(Nq)Nq+12ln(2πNq))+Npln(p)+Nqln(q)=Nln(N)N(p ⁣+ ⁣q)ln(N)+N(p ⁣+ ⁣q)N12ln(2πNpq)=12ln(2πNpq)=ln ⁣(12πσ2)\begin{aligned} D_0(\mu) &= \ln\!\big(P_N(n)\big) \big|_{n = \mu} \\ &= \ln(N!) - \ln(\mu!) - \ln\!\big((N \!-\! \mu)!\big) + \mu \ln(p) + (N \!-\! \mu) \ln(q) \\ &= \ln(N!) - \ln\!\big((N p)!\big) - \ln\!\big((N q)!\big) + N p \ln(p) + N q \ln(q) \\ &\approx \Big( N \ln(N) - N + \frac{1}{2} \ln(2\pi N) \Big) - \Big( N p \ln(N p) - N p + \frac{1}{2} \ln(2\pi N p) \Big) \\ &\qquad - \Big( N q \ln(N q) - N q + \frac{1}{2} \ln(2\pi N q) \Big) + N p \ln(p) + N q \ln(q) \\ &= N \ln(N) - N (p \!+\! q) \ln(N) + N (p \!+\! q) - N - \frac{1}{2} \ln(2\pi N p q) \\ &= - \frac{1}{2} \ln(2\pi N p q) = \ln\!\bigg( \frac{1}{\sqrt{2\pi \sigma^2}} \bigg) \end{aligned}

Next, for Dm(μ)D_m(\mu) with m1m \ge 1, we can use a weaker version of Stirling’s approximation:

ln(PN)ln(N!)n(ln(n) ⁣ ⁣1)(N ⁣ ⁣n)(ln(N ⁣ ⁣n) ⁣ ⁣1)+nln(p)+(N ⁣ ⁣n)ln(q)ln(N!)n(ln(n)ln(p)1)(N ⁣ ⁣n)(ln(N ⁣ ⁣n)ln(q)1)\begin{aligned} \ln(P_N) &\approx \ln(N!) - n \big( \ln(n) \!-\! 1 \big) - (N \!-\! n) \big( \ln(N \!-\! n) \!-\! 1 \big) + n \ln(p) + (N \!-\! n) \ln(q) \\ &\approx \ln(N!) - n \big( \ln(n) - \ln(p) - 1 \big) - (N\!-\!n) \big( \ln(N\!-\!n) - \ln(q) - 1 \big) \end{aligned}

We expect that D1(μ)=0D_1(\mu) = 0, because PNP_N is maximized at μ\mu. Indeed it is:

D1(n)=ddnln ⁣((PN(n))=(ln(n)ln(p)1)+(ln(N ⁣ ⁣n)ln(q)1)nn+N ⁣ ⁣nN ⁣ ⁣n=ln(n)+ln(N ⁣ ⁣n)+ln(p)ln(q)D1(μ)=ln(μ)+ln(N ⁣ ⁣μ)+ln(p)ln(q)=ln(Npq)+ln(Npq)=0\begin{aligned} D_1(n) &= \dv{}{n} \ln\!\big((P_N(n)\big) \\ &= - \big( \ln(n) - \ln(p) - 1 \big) + \big( \ln(N\!-\!n) - \ln(q) - 1 \big) - \frac{n}{n} + \frac{N \!-\! n}{N \!-\! n} \\ &= - \ln(n) + \ln(N \!-\! n) + \ln(p) - \ln(q) \\ D_1(\mu) &= - \ln(\mu) + \ln(N \!-\! \mu) + \ln(p) - \ln(q) \\ &= - \ln(N p q) + \ln(N p q) \\ &= 0 \end{aligned}

For the same reason, we expect D2(μ)D_2(\mu) to be negative. We find the following expression:

D2(n)=d2dn2ln ⁣((PN(n))=ddnD1(n)=1n1NnD2(μ)=1Np1Nq=p+qNpq=1σ2\begin{aligned} D_2(n) &= \dvn{2}{}{n} \ln\!\big((P_N(n)\big) = \dv{}{n} D_1(n) = - \frac{1}{n} - \frac{1}{N - n} \\ D_2(\mu) &= - \frac{1}{Np} - \frac{1}{Nq} = - \frac{p + q}{N p q} = - \frac{1}{\sigma^2} \end{aligned}

The higher-order derivatives vanish much faster as NN \to \infty, so we discard them:

D3(n)=1n21(Nn)2D4(n)=2n32(Nn)3\begin{aligned} D_3(n) = \frac{1}{n^2} - \frac{1}{(N - n)^2} \qquad \quad D_4(n) = - \frac{2}{n^3} - \frac{2}{(N - n)^3} \qquad \quad \cdots \end{aligned}

Putting everything together, for large NN, the Taylor series approximately becomes:

ln ⁣(PN(n))D0(μ)+(nμ)22D2(μ)=ln ⁣(12πσ2)(nμ)22σ2\begin{aligned} \ln\!\big(P_N(n)\big) \approx D_0(\mu) + \frac{(n - \mu)^2}{2} D_2(\mu) = \ln\!\bigg( \frac{1}{\sqrt{2\pi \sigma^2}} \bigg) - \frac{(n - \mu)^2}{2 \sigma^2} \end{aligned}

Raising ee to this expression then yields a normalized Gaussian distribution.

References

  1. H. Gould, J. Tobochnik, Statistical and thermal physics, 2nd edition, Princeton.