Categories:
Mathematics ,
Statistics .
Binomial distribution
The binomial distribution is a discrete probability distribution
describing a Bernoulli process : a set of independent N N N trials where
each has only two possible outcomes, “success” and “failure”,
the former with probability p p p and the latter with q = 1 − p q = 1 - p q = 1 − p .
The binomial distribution then gives the probability
that n n n out of the N N N trials succeed:
P N ( n ) = ( N n ) p n q N − n \begin{aligned}
\boxed{
P_N(n) = \binom{N}{n} \: p^n q^{N - n}
}
\end{aligned} P N ( n ) = ( n N ) p n q N − n
The first factor is known as the binomial coefficient , which describes the
number of microstates (i.e. permutations) that have n n n successes out of N N N trials.
These happen to be the coefficients in the polynomial ( a + b ) N (a + b)^N ( a + b ) N ,
and can be read off of Pascal’s triangle.
It is defined as follows:
( N n ) = N ! n ! ( N − n ) ! \begin{aligned}
\boxed{
\binom{N}{n} = \frac{N!}{n! (N - n)!}
}
\end{aligned} ( n N ) = n ! ( N − n )! N !
The remaining factor p n ( 1 − p ) N − n p^n (1 - p)^{N - n} p n ( 1 − p ) N − n is then just the
probability of attaining each microstate.
The expected or mean number of successes μ \mu μ after N N N trials is as follows:
μ = N p \begin{aligned}
\boxed{
\mu = N p
}
\end{aligned} μ = Np
Proof
Proof.
The trick is to treat p p p and q q q as independent and introduce a derivative:
μ = ∑ n = 0 N n P N ( n ) = ∑ n = 0 N n ( N n ) p n q N − n = ∑ n = 0 N ( N n ) ( p ∂ ( p n ) ∂ p ) q N − n \begin{aligned}
\mu
&= \sum_{n = 0}^N n P_N(n)
= \sum_{n = 0}^N n \binom{N}{n} p^n q^{N - n}
= \sum_{n = 0}^N \binom{N}{n} \bigg( p \pdv{(p^n)}{p} \bigg) q^{N - n}
\end{aligned} μ = n = 0 ∑ N n P N ( n ) = n = 0 ∑ N n ( n N ) p n q N − n = n = 0 ∑ N ( n N ) ( p ∂ p ∂ ( p n ) ) q N − n
Then, using the fact that the binomial coefficients appear when writing out ( p + q ) N (p + q)^N ( p + q ) N :
μ = p ∂ ∂ p ∑ n = 0 N ( N n ) p n q N − n = p ∂ ∂ p ( p + q ) N = N p ( p + q ) N − 1 \begin{aligned}
\mu
&= p \pdv{}{p}\sum_{n = 0}^N \binom{N}{n} p^n q^{N - n}
= p \pdv{}{p}(p + q)^N
= N p (p + q)^{N - 1}
\end{aligned} μ = p ∂ p ∂ n = 0 ∑ N ( n N ) p n q N − n = p ∂ p ∂ ( p + q ) N = Np ( p + q ) N − 1
Finally, inserting q = 1 − p q = 1 - p q = 1 − p gives the desired result.
Meanwhile, we find the following variance σ 2 \sigma^2 σ 2 ,
with σ \sigma σ being the standard deviation:
σ 2 = N p q \begin{aligned}
\boxed{
\sigma^2 = N p q
}
\end{aligned} σ 2 = Npq
Proof
Proof.
We reuse the previous trick to find n 2 ‾ \overline{n^2} n 2
(the mean squared number of successes):
n 2 ‾ = ∑ n = 0 N n 2 ( N n ) p n q N − n = ∑ n = 0 N n ( N n ) ( p ∂ ∂ p ) p n q N − n = ∑ n = 0 N ( N n ) ( p ∂ ∂ p ) 2 p n q N − n = ( p ∂ ∂ p ) 2 ∑ n = 0 N ( N n ) p n q N − n = ( p ∂ ∂ p ) 2 ( p + q ) N = N p ∂ ∂ p p ( p + q ) N − 1 = N p ( ( p + q ) N − 1 + ( N − 1 ) p ( p + q ) N − 2 ) = N p + N 2 p 2 − N p 2 \begin{aligned}
\overline{n^2}
&= \sum_{n = 0}^N n^2 \binom{N}{n} p^n q^{N - n}
= \sum_{n = 0}^N n \binom{N}{n} \bigg( p \pdv{}{p} \bigg) p^n q^{N - n}
\\
&= \sum_{n = 0}^N \binom{N}{n} \bigg( p \pdv{}{p} \bigg)^2 p^n q^{N - n}
= \bigg( p \pdv{}{p} \bigg)^2 \sum_{n = 0}^N \binom{N}{n} p^n q^{N - n}
\\
&= \bigg( p \pdv{}{p} \bigg)^2 (p + q)^N
= N p \pdv{}{p}p (p + q)^{N - 1}
\\
&= N p \big( (p + q)^{N - 1} + (N - 1) p (p + q)^{N - 2} \big)
\\
&= N p + N^2 p^2 - N p^2
\end{aligned} n 2 = n = 0 ∑ N n 2 ( n N ) p n q N − n = n = 0 ∑ N n ( n N ) ( p ∂ p ∂ ) p n q N − n = n = 0 ∑ N ( n N ) ( p ∂ p ∂ ) 2 p n q N − n = ( p ∂ p ∂ ) 2 n = 0 ∑ N ( n N ) p n q N − n = ( p ∂ p ∂ ) 2 ( p + q ) N = Np ∂ p ∂ p ( p + q ) N − 1 = Np ( ( p + q ) N − 1 + ( N − 1 ) p ( p + q ) N − 2 ) = Np + N 2 p 2 − N p 2
Using this and the earlier expression μ = N p \mu = N p μ = Np , we find the variance σ 2 \sigma^2 σ 2 :
σ 2 = n 2 ‾ − μ 2 = N p + N 2 p 2 − N p 2 − N 2 p 2 = N p ( 1 − p ) \begin{aligned}
\sigma^2
&= \overline{n^2} - \mu^2
= N p + N^2 p^2 - N p^2 - N^2 p^2
= N p (1 - p)
\end{aligned} σ 2 = n 2 − μ 2 = Np + N 2 p 2 − N p 2 − N 2 p 2 = Np ( 1 − p )
By inserting q = 1 − p q = 1 - p q = 1 − p , we arrive at the desired expression.
As N → ∞ N \to \infty N → ∞ , the binomial distribution
turns into the continuous normal distribution,
a fact that is sometimes called the de Moivre-Laplace theorem :
lim N → ∞ P N ( n ) = 1 2 π σ 2 exp ( − ( n − μ ) 2 2 σ 2 ) \begin{aligned}
\boxed{
\lim_{N \to \infty} P_N(n) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\!\bigg(\!-\!\frac{(n - \mu)^2}{2 \sigma^2} \bigg)
}
\end{aligned} N → ∞ lim P N ( n ) = 2 π σ 2 1 exp ( − 2 σ 2 ( n − μ ) 2 )
Proof
Proof.
We take the Taylor expansion of ln ( P N ( n ) ) \ln\!\big(P_N(n)\big) ln ( P N ( n ) )
around the mean μ = N p \mu = Np μ = Np :
ln ( P N ( n ) ) = ∑ m = 0 ∞ ( n − μ ) m m ! D m ( μ ) w h e r e D m ( n ) ≡ d m ln ( P N ( n ) ) d n m \begin{aligned}
\ln\!\big(P_N(n)\big)
&= \sum_{m = 0}^\infty \frac{(n - \mu)^m}{m!} D_m(\mu)
\quad \mathrm{where} \quad
D_m(n)
\equiv \dvn{m}{\ln\!\big(P_N(n)\big)}{n}
\end{aligned} ln ( P N ( n ) ) = m = 0 ∑ ∞ m ! ( n − μ ) m D m ( μ ) where D m ( n ) ≡ d n m d m ln ( P N ( n ) )
For future convenience while calculating the D m D_m D m , we write out ln ( P N ) \ln(P_N) ln ( P N ) now:
ln ( P N ( n ) ) = ln ( N ! ) − ln ( n ! ) − ln ( ( N − n ) ! ) + n ln ( p ) + ( N − n ) ln ( q ) \begin{aligned}
\ln\!\big(P_N(n)\big)
&= \ln(N!) - \ln(n!) - \ln\!\big((N \!-\! n)!\big) + n \ln(p) + (N \!-\! n) \ln(q)
\end{aligned} ln ( P N ( n ) ) = ln ( N !) − ln ( n !) − ln ( ( N − n )! ) + n ln ( p ) + ( N − n ) ln ( q )
For D 0 ( μ ) D_0(\mu) D 0 ( μ ) specifically,
we need to use a strong version of Stirling’s approximation
to arrive at a nonzero result in the end.
We know that N − N p = N q N - N p = N q N − Np = Nq :
D 0 ( μ ) = ln ( P N ( n ) ) ∣ n = μ = ln ( N ! ) − ln ( μ ! ) − ln ( ( N − μ ) ! ) + μ ln ( p ) + ( N − μ ) ln ( q ) = ln ( N ! ) − ln ( ( N p ) ! ) − ln ( ( N q ) ! ) + N p ln ( p ) + N q ln ( q ) ≈ ( N ln ( N ) − N + 1 2 ln ( 2 π N ) ) − ( N p ln ( N p ) − N p + 1 2 ln ( 2 π N p ) ) − ( N q ln ( N q ) − N q + 1 2 ln ( 2 π N q ) ) + N p ln ( p ) + N q ln ( q ) = N ln ( N ) − N ( p + q ) ln ( N ) + N ( p + q ) − N − 1 2 ln ( 2 π N p q ) = − 1 2 ln ( 2 π N p q ) = ln ( 1 2 π σ 2 ) \begin{aligned}
D_0(\mu)
&= \ln\!\big(P_N(n)\big) \big|_{n = \mu}
\\
&= \ln(N!) - \ln(\mu!) - \ln\!\big((N \!-\! \mu)!\big) + \mu \ln(p) + (N \!-\! \mu) \ln(q)
\\
&= \ln(N!) - \ln\!\big((N p)!\big) - \ln\!\big((N q)!\big) + N p \ln(p) + N q \ln(q)
\\
&\approx \Big( N \ln(N) - N + \frac{1}{2} \ln(2\pi N) \Big)
- \Big( N p \ln(N p) - N p + \frac{1}{2} \ln(2\pi N p) \Big) \\
&\qquad - \Big( N q \ln(N q) - N q + \frac{1}{2} \ln(2\pi N q) \Big)
+ N p \ln(p) + N q \ln(q)
\\
&= N \ln(N) - N (p \!+\! q) \ln(N) + N (p \!+\! q) - N - \frac{1}{2} \ln(2\pi N p q)
\\
&= - \frac{1}{2} \ln(2\pi N p q)
= \ln\!\bigg( \frac{1}{\sqrt{2\pi \sigma^2}} \bigg)
\end{aligned} D 0 ( μ ) = ln ( P N ( n ) ) n = μ = ln ( N !) − ln ( μ !) − ln ( ( N − μ )! ) + μ ln ( p ) + ( N − μ ) ln ( q ) = ln ( N !) − ln ( ( Np )! ) − ln ( ( Nq )! ) + Np ln ( p ) + Nq ln ( q ) ≈ ( N ln ( N ) − N + 2 1 ln ( 2 π N ) ) − ( Np ln ( Np ) − Np + 2 1 ln ( 2 π Np ) ) − ( Nq ln ( Nq ) − Nq + 2 1 ln ( 2 π Nq ) ) + Np ln ( p ) + Nq ln ( q ) = N ln ( N ) − N ( p + q ) ln ( N ) + N ( p + q ) − N − 2 1 ln ( 2 π Npq ) = − 2 1 ln ( 2 π Npq ) = ln ( 2 π σ 2 1 )
Next, for D m ( μ ) D_m(\mu) D m ( μ ) with m ≥ 1 m \ge 1 m ≥ 1 ,
we can use a weaker version of Stirling’s approximation:
ln ( P N ) ≈ ln ( N ! ) − n ( ln ( n ) − 1 ) − ( N − n ) ( ln ( N − n ) − 1 ) + n ln ( p ) + ( N − n ) ln ( q ) ≈ ln ( N ! ) − n ( ln ( n ) − ln ( p ) − 1 ) − ( N − n ) ( ln ( N − n ) − ln ( q ) − 1 ) \begin{aligned}
\ln(P_N)
&\approx \ln(N!) - n \big( \ln(n) \!-\! 1 \big) - (N \!-\! n) \big( \ln(N \!-\! n) \!-\! 1 \big) + n \ln(p) + (N \!-\! n) \ln(q)
\\
&\approx \ln(N!) - n \big( \ln(n) - \ln(p) - 1 \big) - (N\!-\!n) \big( \ln(N\!-\!n) - \ln(q) - 1 \big)
\end{aligned} ln ( P N ) ≈ ln ( N !) − n ( ln ( n ) − 1 ) − ( N − n ) ( ln ( N − n ) − 1 ) + n ln ( p ) + ( N − n ) ln ( q ) ≈ ln ( N !) − n ( ln ( n ) − ln ( p ) − 1 ) − ( N − n ) ( ln ( N − n ) − ln ( q ) − 1 )
We expect that D 1 ( μ ) = 0 D_1(\mu) = 0 D 1 ( μ ) = 0 , because P N P_N P N is maximized at μ \mu μ .
Indeed it is:
D 1 ( n ) = d d n ln ( ( P N ( n ) ) = − ( ln ( n ) − ln ( p ) − 1 ) + ( ln ( N − n ) − ln ( q ) − 1 ) − n n + N − n N − n = − ln ( n ) + ln ( N − n ) + ln ( p ) − ln ( q ) D 1 ( μ ) = − ln ( μ ) + ln ( N − μ ) + ln ( p ) − ln ( q ) = − ln ( N p q ) + ln ( N p q ) = 0 \begin{aligned}
D_1(n)
&= \dv{}{n} \ln\!\big((P_N(n)\big)
\\
&= - \big( \ln(n) - \ln(p) - 1 \big) + \big( \ln(N\!-\!n) - \ln(q) - 1 \big) - \frac{n}{n} + \frac{N \!-\! n}{N \!-\! n}
\\
&= - \ln(n) + \ln(N \!-\! n) + \ln(p) - \ln(q)
\\
D_1(\mu)
&= - \ln(\mu) + \ln(N \!-\! \mu) + \ln(p) - \ln(q)
\\
&= - \ln(N p q) + \ln(N p q)
\\
&= 0
\end{aligned} D 1 ( n ) D 1 ( μ ) = d n d ln ( ( P N ( n ) ) = − ( ln ( n ) − ln ( p ) − 1 ) + ( ln ( N − n ) − ln ( q ) − 1 ) − n n + N − n N − n = − ln ( n ) + ln ( N − n ) + ln ( p ) − ln ( q ) = − ln ( μ ) + ln ( N − μ ) + ln ( p ) − ln ( q ) = − ln ( Npq ) + ln ( Npq ) = 0
For the same reason, we expect D 2 ( μ ) D_2(\mu) D 2 ( μ ) to be negative.
We find the following expression:
D 2 ( n ) = d 2 d n 2 ln ( ( P N ( n ) ) = d d n D 1 ( n ) = − 1 n − 1 N − n D 2 ( μ ) = − 1 N p − 1 N q = − p + q N p q = − 1 σ 2 \begin{aligned}
D_2(n)
&= \dvn{2}{}{n} \ln\!\big((P_N(n)\big)
= \dv{}{n} D_1(n)
= - \frac{1}{n} - \frac{1}{N - n}
\\
D_2(\mu)
&= - \frac{1}{Np} - \frac{1}{Nq}
= - \frac{p + q}{N p q}
= - \frac{1}{\sigma^2}
\end{aligned} D 2 ( n ) D 2 ( μ ) = d n 2 d 2 ln ( ( P N ( n ) ) = d n d D 1 ( n ) = − n 1 − N − n 1 = − Np 1 − Nq 1 = − Npq p + q = − σ 2 1
The higher-order derivatives vanish much faster as N → ∞ N \to \infty N → ∞ , so we discard them:
D 3 ( n ) = 1 n 2 − 1 ( N − n ) 2 D 4 ( n ) = − 2 n 3 − 2 ( N − n ) 3 ⋯ \begin{aligned}
D_3(n)
= \frac{1}{n^2} - \frac{1}{(N - n)^2}
\qquad \quad
D_4(n)
= - \frac{2}{n^3} - \frac{2}{(N - n)^3}
\qquad \quad
\cdots
\end{aligned} D 3 ( n ) = n 2 1 − ( N − n ) 2 1 D 4 ( n ) = − n 3 2 − ( N − n ) 3 2 ⋯
Putting everything together, for large N N N ,
the Taylor series approximately becomes:
ln ( P N ( n ) ) ≈ D 0 ( μ ) + ( n − μ ) 2 2 D 2 ( μ ) = ln ( 1 2 π σ 2 ) − ( n − μ ) 2 2 σ 2 \begin{aligned}
\ln\!\big(P_N(n)\big)
\approx D_0(\mu) + \frac{(n - \mu)^2}{2} D_2(\mu)
= \ln\!\bigg( \frac{1}{\sqrt{2\pi \sigma^2}} \bigg) - \frac{(n - \mu)^2}{2 \sigma^2}
\end{aligned} ln ( P N ( n ) ) ≈ D 0 ( μ ) + 2 ( n − μ ) 2 D 2 ( μ ) = ln ( 2 π σ 2 1 ) − 2 σ 2 ( n − μ ) 2
Raising e e e to this expression then yields a normalized Gaussian distribution.
References
H. Gould, J. Tobochnik,
Statistical and thermal physics , 2nd edition,
Princeton.