Categories:
Mathematics ,
Statistics .
Central limit theorem
In statistics, the central limit theorem states that
the sum of many independent variables tends to a normal distribution,
even if the individual variables x n x_n x n follow different distributions.
For example, by taking M M M samples of size N N N from a population,
and calculating M M M averages μ m \mu_m μ m (which involves summing over N N N ),
the resulting means μ m \mu_m μ m are normally distributed
across the M M M samples if N N N is sufficiently large.
More formally, for N N N independent variables x n x_n x n with probability distributions p ( x n ) p(x_n) p ( x n ) ,
we define the following totals of all variables, means and variances:
t ≡ ∑ n = 1 N x n μ t ≡ ∑ n = 1 N μ n σ t 2 ≡ ∑ n = 1 N σ n 2 \begin{aligned}
t \equiv \sum_{n = 1}^N x_n
\qquad \qquad
\mu_t \equiv \sum_{n = 1}^N \mu_n
\qquad \qquad
\sigma_t^2 \equiv \sum_{n = 1}^N \sigma_n^2
\end{aligned} t ≡ n = 1 ∑ N x n μ t ≡ n = 1 ∑ N μ n σ t 2 ≡ n = 1 ∑ N σ n 2
The central limit theorem then states that
the probability distribution p N ( t ) p_N(t) p N ( t ) of t t t for N N N variables
will become a normal distribution when N N N goes to infinity:
lim N → ∞ ( p N ( t ) ) = 1 σ t 2 π exp ( − ( t − μ t ) 2 2 σ t 2 ) \begin{aligned}
\boxed{
\lim_{N \to \infty} \!\big(p_N(t)\big)
= \frac{1}{\sigma_t \sqrt{2 \pi}} \exp\!\bigg( -\frac{(t - \mu_t)^2}{2 \sigma_t^2} \bigg)
}
\end{aligned} N → ∞ lim ( p N ( t ) ) = σ t 2 π 1 exp ( − 2 σ t 2 ( t − μ t ) 2 )
We prove this below,
but first we need to introduce some tools.
Given a probability density p ( x ) p(x) p ( x ) , its Fourier transform
is called the characteristic function ϕ ( k ) \phi(k) ϕ ( k ) :
ϕ ( k ) ≡ ∫ − ∞ ∞ p ( x ) exp ( i k x ) d x \begin{aligned}
\phi(k)
\equiv \int_{-\infty}^\infty p(x) \exp(i k x) \dd{x}
\end{aligned} ϕ ( k ) ≡ ∫ − ∞ ∞ p ( x ) exp ( ik x ) d x
Note that ϕ ( k ) \phi(k) ϕ ( k ) can be interpreted as the average of exp ( i k x ) \exp(i k x) exp ( ik x ) .
We take its Taylor expansion in two separate ways,
where an overline denotes the mean:
ϕ ( k ) = ∑ n = 0 ∞ k n n ! ( d n ϕ d k n ∣ k = 0 ) ϕ ( k ) = exp ( i k x ) ‾ = ∑ n = 0 ∞ ( i k ) n n ! x n ‾ \begin{aligned}
\phi(k)
= \sum_{n = 0}^\infty \frac{k^n}{n!} \bigg( \dvn{n}{\phi}{k} \Big|_{k = 0} \bigg)
\qquad \qquad
\phi(k)
= \overline{\exp(i k x)}
= \sum_{n = 0}^\infty \frac{(ik)^n}{n!} \overline{x^n}
\end{aligned} ϕ ( k ) = n = 0 ∑ ∞ n ! k n ( d k n d n ϕ k = 0 ) ϕ ( k ) = exp ( ik x ) = n = 0 ∑ ∞ n ! ( ik ) n x n
By comparing the coefficients of these two power series,
we get a useful relation:
d n ϕ d k n ∣ k = 0 = i n x n ‾ \begin{aligned}
\dvn{n}{\phi}{k} \Big|_{k = 0}
= i^n \: \overline{x^n}
\end{aligned} d k n d n ϕ k = 0 = i n x n
Next, the cumulants C ( n ) C^{(n)} C ( n ) are defined from the Taylor expansion of ln ( ϕ ( k ) ) \ln\!\big(\phi(k)\big) ln ( ϕ ( k ) ) :
ln ( ϕ ( k ) ) = ∑ n = 1 ∞ ( i k ) n n ! C ( n ) w h e r e C ( n ) ≡ 1 i n d n d k n ln ( ϕ ( k ) ) ∣ k = 0 \begin{aligned}
\ln\!\big( \phi(k) \big)
= \sum_{n = 1}^\infty \frac{(ik)^n}{n!} C^{(n)}
\quad \mathrm{where} \quad
C^{(n)}
\equiv \frac{1}{i^n} \: \dvn{n}{}{k} \ln\!\big(\phi(k)\big) \Big|_{k = 0}
\end{aligned} ln ( ϕ ( k ) ) = n = 1 ∑ ∞ n ! ( ik ) n C ( n ) where C ( n ) ≡ i n 1 d k n d n ln ( ϕ ( k ) ) k = 0
The first two cumulants C ( 1 ) C^{(1)} C ( 1 ) and C ( 2 ) C^{(2)} C ( 2 ) are of particular interest,
since they turn out to be the mean and the variance respectively.
Using our earlier relation:
C ( 1 ) = − i d d k ln ( ϕ ( k ) ) ∣ k = 0 = − i ϕ ′ ( 0 ) exp ( 0 ) = x ‾ C ( 2 ) = − d 2 d k 2 ln ( ϕ ( k ) ) ∣ k = 0 = ( ϕ ′ ( 0 ) ) 2 exp ( 0 ) 2 − ϕ ′ ′ ( 0 ) exp ( 0 ) = − x ‾ 2 + x 2 ‾ = σ 2 \begin{aligned}
C^{(1)}
&= - i \dv{}{k} \ln\!\big(\phi(k)\big) \Big|_{k = 0}
= - i \frac{\phi'(0)}{\exp(0)}
= \overline{x}
\\
C^{(2)}
&= - \dvn{2}{}{k} \ln\!\big(\phi(k)\big) \Big|_{k = 0}
= \frac{\big(\phi'(0)\big)^2}{\exp(0)^2} - \frac{\phi''(0)}{\exp(0)}
= - \overline{x}^2 + \overline{x^2} = \sigma^2
\end{aligned} C ( 1 ) C ( 2 ) = − i d k d ln ( ϕ ( k ) ) k = 0 = − i exp ( 0 ) ϕ ′ ( 0 ) = x = − d k 2 d 2 ln ( ϕ ( k ) ) k = 0 = exp ( 0 ) 2 ( ϕ ′ ( 0 ) ) 2 − exp ( 0 ) ϕ ′′ ( 0 ) = − x 2 + x 2 = σ 2
Now that we have introduced these tools,
we define t t t as the sum
of N N N independent variables x n x_n x n , in other words:
t ≡ ∑ n = 1 N x n = x 1 + x 2 + . . . + x N \begin{aligned}
t
\equiv \sum_{n = 1}^N x_n = x_1 + x_2 + ... + x_N
\end{aligned} t ≡ n = 1 ∑ N x n = x 1 + x 2 + ... + x N
The probability density of t t t is then as follows, where p ( x n ) p(x_n) p ( x n ) are
the densities of all the individual variables and δ \delta δ is
the Dirac delta function :
p ( t ) = ∫ ⋯ ∫ − ∞ ∞ ( ∏ n = 1 N p ( x n ) ) δ ( t − ∑ n = 1 N x n ) d x 1 ⋯ d x N = ( p 1 ∗ ( p 2 ∗ ( . . . ∗ ( p N ∗ δ ) ) ) ) ( t ) \begin{aligned}
p(t)
&= \int\cdots\int_{-\infty}^\infty \Big( \prod_{n = 1}^N p(x_n) \Big) \: \delta\Big( t - \sum_{n = 1}^N x_n \Big) \dd{x_1} \cdots \dd{x_N}
\\
&= \Big( p_1 * \big( p_2 * ( ... * (p_N * \delta))\big)\Big)(t)
\end{aligned} p ( t ) = ∫ ⋯ ∫ − ∞ ∞ ( n = 1 ∏ N p ( x n ) ) δ ( t − n = 1 ∑ N x n ) d x 1 ⋯ d x N = ( p 1 ∗ ( p 2 ∗ ( ... ∗ ( p N ∗ δ )) ) ) ( t )
In other words, the integrals pick out all combinations of x n x_n x n which
add up to the desired t t t -value, and multiply the probabilities
p ( x 1 ) p ( x 2 ) ⋯ p ( x N ) p(x_1) p(x_2) \cdots p(x_N) p ( x 1 ) p ( x 2 ) ⋯ p ( x N ) of each such case. This is a convolution,
so the convolution theorem
states that it is a product in the Fourier domain:
ϕ t ( k ) = ∏ n = 1 N ϕ n ( k ) \begin{aligned}
\phi_t(k)
= \prod_{n = 1}^N \phi_n(k)
\end{aligned} ϕ t ( k ) = n = 1 ∏ N ϕ n ( k )
By taking the logarithm of both sides, the product becomes a sum,
which we further expand:
ln ( ϕ t ( k ) ) = ∑ n = 1 N ln ( ϕ n ( k ) ) = ∑ n = 1 N ∑ m = 1 ∞ ( i k ) m m ! C n ( m ) \begin{aligned}
\ln\!\big(\phi_t(k)\big)
= \sum_{n = 1}^N \ln\!\big(\phi_n(k)\big)
= \sum_{n = 1}^N \sum_{m = 1}^{\infty} \frac{(ik)^m}{m!} C_n^{(m)}
\end{aligned} ln ( ϕ t ( k ) ) = n = 1 ∑ N ln ( ϕ n ( k ) ) = n = 1 ∑ N m = 1 ∑ ∞ m ! ( ik ) m C n ( m )
Consequently, the cumulants C ( m ) C^{(m)} C ( m ) stack additively for the sum t t t
of independent variables x m x_m x m , and therefore
the means C ( 1 ) C^{(1)} C ( 1 ) and variances C ( 2 ) C^{(2)} C ( 2 ) do too:
C t ( m ) = ∑ n = 1 N C n ( m ) = C 1 ( m ) + C 2 ( m ) + . . . + C N ( m ) \begin{aligned}
C_t^{(m)}
= \sum_{n = 1}^N C_n^{(m)}
= C_1^{(m)} + C_2^{(m)} + ... + C_N^{(m)}
\end{aligned} C t ( m ) = n = 1 ∑ N C n ( m ) = C 1 ( m ) + C 2 ( m ) + ... + C N ( m )
We now introduce the scaled sum z z z as the new combined variable:
z ≡ t N = 1 N ( x 1 + x 2 + . . . + x N ) \begin{aligned}
z
\equiv \frac{t}{\sqrt{N}}
= \frac{1}{\sqrt{N}} (x_1 + x_2 + ... + x_N)
\end{aligned} z ≡ N t = N 1 ( x 1 + x 2 + ... + x N )
Its characteristic function ϕ z ( k ) \phi_z(k) ϕ z ( k ) is then as follows,
with N \sqrt{N} N appearing in the arguments of ϕ n \phi_n ϕ n :
ϕ z ( k ) = ∫ ⋯ ∫ ( ∏ n = 1 N p ( x n ) ) δ ( z − 1 N ∑ n = 1 N x n ) exp ( i k z ) d x 1 ⋯ d x N = ∫ ⋯ ∫ ( ∏ n = 1 N p ( x n ) ) exp ( i k N ∑ n = 1 N x n ) d x 1 ⋯ d x N = ∏ n = 1 N ϕ n ( k N ) \begin{aligned}
\phi_z(k)
&= \int\cdots\int
\Big( \prod_{n = 1}^N p(x_n) \Big) \: \delta\Big( z - \frac{1}{\sqrt{N}} \sum_{n = 1}^N x_n \Big) \exp(i k z)
\dd{x_1} \cdots \dd{x_N}
\\
&= \int\cdots\int
\Big( \prod_{n = 1}^N p(x_n) \Big) \exp\!\Big( i \frac{k}{\sqrt{N}} \sum_{n = 1}^N x_n \Big)
\dd{x_1} \cdots \dd{x_N}
\\
&= \prod_{n = 1}^N \phi_n\Big(\frac{k}{\sqrt{N}}\Big)
\end{aligned} ϕ z ( k ) = ∫ ⋯ ∫ ( n = 1 ∏ N p ( x n ) ) δ ( z − N 1 n = 1 ∑ N x n ) exp ( ik z ) d x 1 ⋯ d x N = ∫ ⋯ ∫ ( n = 1 ∏ N p ( x n ) ) exp ( i N k n = 1 ∑ N x n ) d x 1 ⋯ d x N = n = 1 ∏ N ϕ n ( N k )
By expanding ln ( ϕ z ( k ) ) \ln\!\big(\phi_z(k)\big) ln ( ϕ z ( k ) ) in terms of its cumulants C ( m ) C^{(m)} C ( m )
and introducing κ = k / N \kappa = k / \sqrt{N} κ = k / N , we see that the higher-order terms
become smaller for larger N N N :
ln ( ϕ z ( k ) ) = ∑ m = 1 ∞ ( i k ) m m ! C ( m ) C ( m ) = 1 i m d m d k m ∑ n = 1 N ln ( ϕ n ( k N ) ) = 1 i m N m / 2 d m d κ m ∑ n = 1 N ln ( ϕ n ( κ ) ) \begin{gathered}
\ln\!\big( \phi_z(k) \big)
= \sum_{m = 1}^\infty \frac{(ik)^m}{m!} C^{(m)}
\\
C^{(m)}
= \frac{1}{i^m} \dvn{m}{}{k} \sum_{n = 1}^N \ln\!\bigg( \phi_n\Big(\frac{k}{\sqrt{N}}\Big) \bigg)
= \frac{1}{i^m N^{m/2}} \dvn{m}{}{\kappa} \sum_{n = 1}^N \ln\!\big( \phi_n(\kappa) \big)
\end{gathered} ln ( ϕ z ( k ) ) = m = 1 ∑ ∞ m ! ( ik ) m C ( m ) C ( m ) = i m 1 d k m d m n = 1 ∑ N ln ( ϕ n ( N k ) ) = i m N m /2 1 d κ m d m n = 1 ∑ N ln ( ϕ n ( κ ) )
For sufficiently large N N N , we can therefore approximate it using just the first two terms:
ln ( ϕ z ( k ) ) ≈ i k C ( 1 ) − k 2 2 C ( 2 ) = i k μ z − k 2 2 σ z 2 ⟹ ϕ z ( k ) ≈ exp ( i k μ z ) exp ( − k 2 σ z 2 / 2 ) \begin{aligned}
\ln\!\big( \phi_z(k) \big)
&\approx i k C^{(1)} - \frac{k^2}{2} C^{(2)}
= i k \mu_z - \frac{k^2}{2} \sigma_z^2
\\
\implies \quad
\phi_z(k)
&\approx \exp(i k \mu_z) \exp(- k^2 \sigma_z^2 / 2)
\end{aligned} ln ( ϕ z ( k ) ) ⟹ ϕ z ( k ) ≈ ik C ( 1 ) − 2 k 2 C ( 2 ) = ik μ z − 2 k 2 σ z 2 ≈ exp ( ik μ z ) exp ( − k 2 σ z 2 /2 )
We take its inverse Fourier transform to get the density p ( z ) p(z) p ( z ) ,
which turns out to be a Gaussian normal distribution
and is even already normalized:
p ( z ) = F ^ − 1 { ϕ z ( k ) } = 1 2 π ∫ − ∞ ∞ exp ( − i k ( z − μ z ) ) exp ( − k 2 σ z 2 / 2 ) d k = 1 2 π σ z 2 exp ( − ( z − μ z ) 2 2 σ z 2 ) \begin{aligned}
p(z)
= \hat{\mathcal{F}}^{-1} \{\phi_z(k)\}
&= \frac{1}{2 \pi} \int_{-\infty}^\infty \exp\!\big(\!-\! i k (z - \mu_z)\big) \exp(- k^2 \sigma_z^2 / 2) \dd{k}
\\
&= \frac{1}{\sqrt{2 \pi \sigma_z^2}} \exp\!\Big(\!-\! \frac{(z - \mu_z)^2}{2 \sigma_z^2} \Big)
\end{aligned} p ( z ) = F ^ − 1 { ϕ z ( k )} = 2 π 1 ∫ − ∞ ∞ exp ( − ik ( z − μ z ) ) exp ( − k 2 σ z 2 /2 ) d k = 2 π σ z 2 1 exp ( − 2 σ z 2 ( z − μ z ) 2 )
Therefore, the sum of many independent variables tends to a normal distribution,
regardless of the densities of the individual variables.
References
H. Gould, J. Tobochnik,
Statistical and thermal physics , 2nd edition,
Princeton.