Categories: Mathematics, Statistics.

Central limit theorem

In statistics, the central limit theorem states that the sum of many independent variables tends to a normal distribution, even if the individual variables xnx_n follow different distributions.

For example, by taking MM samples of size NN from a population, and calculating MM averages μm\mu_m (which involves summing over NN), the resulting means μm\mu_m are normally distributed across the MM samples if NN is sufficiently large.

More formally, for NN independent variables xnx_n with probability distributions p(xn)p(x_n), we define the following totals of all variables, means and variances:

tn=1Nxnμtn=1Nμnσt2n=1Nσn2\begin{aligned} t \equiv \sum_{n = 1}^N x_n \qquad \qquad \mu_t \equiv \sum_{n = 1}^N \mu_n \qquad \qquad \sigma_t^2 \equiv \sum_{n = 1}^N \sigma_n^2 \end{aligned}

The central limit theorem then states that the probability distribution pN(t)p_N(t) of tt for NN variables will become a normal distribution when NN goes to infinity:

limN ⁣(pN(t))=1σt2πexp ⁣((tμt)22σt2)\begin{aligned} \boxed{ \lim_{N \to \infty} \!\big(p_N(t)\big) = \frac{1}{\sigma_t \sqrt{2 \pi}} \exp\!\bigg( -\frac{(t - \mu_t)^2}{2 \sigma_t^2} \bigg) } \end{aligned}

We prove this below, but first we need to introduce some tools. Given a probability density p(x)p(x), its Fourier transform is called the characteristic function ϕ(k)\phi(k):

ϕ(k)p(x)exp(ikx)dx\begin{aligned} \phi(k) \equiv \int_{-\infty}^\infty p(x) \exp(i k x) \dd{x} \end{aligned}

Note that ϕ(k)\phi(k) can be interpreted as the average of exp(ikx)\exp(i k x). We take its Taylor expansion in two separate ways, where an overline denotes the mean:

ϕ(k)=n=0knn!(dnϕdknk=0)ϕ(k)=exp(ikx)=n=0(ik)nn!xn\begin{aligned} \phi(k) = \sum_{n = 0}^\infty \frac{k^n}{n!} \bigg( \dvn{n}{\phi}{k} \Big|_{k = 0} \bigg) \qquad \qquad \phi(k) = \overline{\exp(i k x)} = \sum_{n = 0}^\infty \frac{(ik)^n}{n!} \overline{x^n} \end{aligned}

By comparing the coefficients of these two power series, we get a useful relation:

dnϕdknk=0=inxn\begin{aligned} \dvn{n}{\phi}{k} \Big|_{k = 0} = i^n \: \overline{x^n} \end{aligned}

Next, the cumulants C(n)C^{(n)} are defined from the Taylor expansion of ln ⁣(ϕ(k))\ln\!\big(\phi(k)\big):

ln ⁣(ϕ(k))=n=1(ik)nn!C(n)whereC(n)1indndknln ⁣(ϕ(k))k=0\begin{aligned} \ln\!\big( \phi(k) \big) = \sum_{n = 1}^\infty \frac{(ik)^n}{n!} C^{(n)} \quad \mathrm{where} \quad C^{(n)} \equiv \frac{1}{i^n} \: \dvn{n}{}{k} \ln\!\big(\phi(k)\big) \Big|_{k = 0} \end{aligned}

The first two cumulants C(1)C^{(1)} and C(2)C^{(2)} are of particular interest, since they turn out to be the mean and the variance respectively. Using our earlier relation:

C(1)=iddkln ⁣(ϕ(k))k=0=iϕ(0)exp(0)=xC(2)=d2dk2ln ⁣(ϕ(k))k=0=(ϕ(0))2exp(0)2ϕ(0)exp(0)=x2+x2=σ2\begin{aligned} C^{(1)} &= - i \dv{}{k} \ln\!\big(\phi(k)\big) \Big|_{k = 0} = - i \frac{\phi'(0)}{\exp(0)} = \overline{x} \\ C^{(2)} &= - \dvn{2}{}{k} \ln\!\big(\phi(k)\big) \Big|_{k = 0} = \frac{\big(\phi'(0)\big)^2}{\exp(0)^2} - \frac{\phi''(0)}{\exp(0)} = - \overline{x}^2 + \overline{x^2} = \sigma^2 \end{aligned}

Now that we have introduced these tools, we define tt as the sum of NN independent variables xnx_n, in other words:

tn=1Nxn=x1+x2+...+xN\begin{aligned} t \equiv \sum_{n = 1}^N x_n = x_1 + x_2 + ... + x_N \end{aligned}

The probability density of tt is then as follows, where p(xn)p(x_n) are the densities of all the individual variables and δ\delta is the Dirac delta function:

p(t)=(n=1Np(xn))δ(tn=1Nxn)dx1dxN=(p1(p2(...(pNδ))))(t)\begin{aligned} p(t) &= \int\cdots\int_{-\infty}^\infty \Big( \prod_{n = 1}^N p(x_n) \Big) \: \delta\Big( t - \sum_{n = 1}^N x_n \Big) \dd{x_1} \cdots \dd{x_N} \\ &= \Big( p_1 * \big( p_2 * ( ... * (p_N * \delta))\big)\Big)(t) \end{aligned}

In other words, the integrals pick out all combinations of xnx_n which add up to the desired tt-value, and multiply the probabilities p(x1)p(x2)p(xN)p(x_1) p(x_2) \cdots p(x_N) of each such case. This is a convolution, so the convolution theorem states that it is a product in the Fourier domain:

ϕt(k)=n=1Nϕn(k)\begin{aligned} \phi_t(k) = \prod_{n = 1}^N \phi_n(k) \end{aligned}

By taking the logarithm of both sides, the product becomes a sum, which we further expand:

ln ⁣(ϕt(k))=n=1Nln ⁣(ϕn(k))=n=1Nm=1(ik)mm!Cn(m)\begin{aligned} \ln\!\big(\phi_t(k)\big) = \sum_{n = 1}^N \ln\!\big(\phi_n(k)\big) = \sum_{n = 1}^N \sum_{m = 1}^{\infty} \frac{(ik)^m}{m!} C_n^{(m)} \end{aligned}

Consequently, the cumulants C(m)C^{(m)} stack additively for the sum tt of independent variables xmx_m, and therefore the means C(1)C^{(1)} and variances C(2)C^{(2)} do too:

Ct(m)=n=1NCn(m)=C1(m)+C2(m)+...+CN(m)\begin{aligned} C_t^{(m)} = \sum_{n = 1}^N C_n^{(m)} = C_1^{(m)} + C_2^{(m)} + ... + C_N^{(m)} \end{aligned}

We now introduce the scaled sum zz as the new combined variable:

ztN=1N(x1+x2+...+xN)\begin{aligned} z \equiv \frac{t}{\sqrt{N}} = \frac{1}{\sqrt{N}} (x_1 + x_2 + ... + x_N) \end{aligned}

Its characteristic function ϕz(k)\phi_z(k) is then as follows, with N\sqrt{N} appearing in the arguments of ϕn\phi_n:

ϕz(k)=(n=1Np(xn))δ(z1Nn=1Nxn)exp(ikz)dx1dxN=(n=1Np(xn))exp ⁣(ikNn=1Nxn)dx1dxN=n=1Nϕn(kN)\begin{aligned} \phi_z(k) &= \int\cdots\int \Big( \prod_{n = 1}^N p(x_n) \Big) \: \delta\Big( z - \frac{1}{\sqrt{N}} \sum_{n = 1}^N x_n \Big) \exp(i k z) \dd{x_1} \cdots \dd{x_N} \\ &= \int\cdots\int \Big( \prod_{n = 1}^N p(x_n) \Big) \exp\!\Big( i \frac{k}{\sqrt{N}} \sum_{n = 1}^N x_n \Big) \dd{x_1} \cdots \dd{x_N} \\ &= \prod_{n = 1}^N \phi_n\Big(\frac{k}{\sqrt{N}}\Big) \end{aligned}

By expanding ln ⁣(ϕz(k))\ln\!\big(\phi_z(k)\big) in terms of its cumulants C(m)C^{(m)} and introducing κ=k/N\kappa = k / \sqrt{N}, we see that the higher-order terms become smaller for larger NN:

ln ⁣(ϕz(k))=m=1(ik)mm!C(m)C(m)=1imdmdkmn=1Nln ⁣(ϕn(kN))=1imNm/2dmdκmn=1Nln ⁣(ϕn(κ))\begin{gathered} \ln\!\big( \phi_z(k) \big) = \sum_{m = 1}^\infty \frac{(ik)^m}{m!} C^{(m)} \\ C^{(m)} = \frac{1}{i^m} \dvn{m}{}{k} \sum_{n = 1}^N \ln\!\bigg( \phi_n\Big(\frac{k}{\sqrt{N}}\Big) \bigg) = \frac{1}{i^m N^{m/2}} \dvn{m}{}{\kappa} \sum_{n = 1}^N \ln\!\big( \phi_n(\kappa) \big) \end{gathered}

For sufficiently large NN, we can therefore approximate it using just the first two terms:

ln ⁣(ϕz(k))ikC(1)k22C(2)=ikμzk22σz2    ϕz(k)exp(ikμz)exp(k2σz2/2)\begin{aligned} \ln\!\big( \phi_z(k) \big) &\approx i k C^{(1)} - \frac{k^2}{2} C^{(2)} = i k \mu_z - \frac{k^2}{2} \sigma_z^2 \\ \implies \quad \phi_z(k) &\approx \exp(i k \mu_z) \exp(- k^2 \sigma_z^2 / 2) \end{aligned}

We take its inverse Fourier transform to get the density p(z)p(z), which turns out to be a Gaussian normal distribution and is even already normalized:

p(z)=F^1{ϕz(k)}=12πexp ⁣( ⁣ ⁣ik(zμz))exp(k2σz2/2)dk=12πσz2exp ⁣( ⁣ ⁣(zμz)22σz2)\begin{aligned} p(z) = \hat{\mathcal{F}}^{-1} \{\phi_z(k)\} &= \frac{1}{2 \pi} \int_{-\infty}^\infty \exp\!\big(\!-\! i k (z - \mu_z)\big) \exp(- k^2 \sigma_z^2 / 2) \dd{k} \\ &= \frac{1}{\sqrt{2 \pi \sigma_z^2}} \exp\!\Big(\!-\! \frac{(z - \mu_z)^2}{2 \sigma_z^2} \Big) \end{aligned}

Therefore, the sum of many independent variables tends to a normal distribution, regardless of the densities of the individual variables.

References

  1. H. Gould, J. Tobochnik, Statistical and thermal physics, 2nd edition, Princeton.