From 75636ed8772512bdf38e3dec431888837eaddc5d Mon Sep 17 00:00:00 2001 From: Prefetch Date: Mon, 20 Feb 2023 18:08:31 +0100 Subject: Improve knowledge base --- source/know/concept/bells-theorem/index.md | 266 +++------------------- source/know/concept/chsh-inequality/index.md | 273 +++++++++++++++++++++++ source/know/concept/lagrange-multiplier/index.md | 2 +- source/know/concept/pulay-mixing/index.md | 121 +++++----- 4 files changed, 365 insertions(+), 297 deletions(-) create mode 100644 source/know/concept/chsh-inequality/index.md (limited to 'source/know') diff --git a/source/know/concept/bells-theorem/index.md b/source/know/concept/bells-theorem/index.md index a01bf9e..1589a7a 100644 --- a/source/know/concept/bells-theorem/index.md +++ b/source/know/concept/bells-theorem/index.md @@ -17,13 +17,13 @@ Suppose that we have two spin-1/2 particles, called $$A$$ and $$B$$, in an entangled [Bell state](/know/concept/bell-state/): $$\begin{aligned} - \Ket{\Psi^{-}} - = \frac{1}{\sqrt{2}} \Big( \Ket{\uparrow \downarrow} - \Ket{\downarrow \uparrow} \Big) + \ket{\Psi^{-}} + = \frac{1}{\sqrt{2}} \Big( \ket{\uparrow \downarrow} - \ket{\downarrow \uparrow} \Big) \end{aligned}$$ Since they are entangled, -if we measure the $$z$$-spin of particle $$A$$, and find e.g. $$\Ket{\uparrow}$$, -then particle $$B$$ immediately takes the opposite state $$\Ket{\downarrow}$$. +if we measure the $$z$$-spin of particle $$A$$, and find e.g. $$\ket{\uparrow}$$, +then particle $$B$$ immediately takes the opposite state $$\ket{\downarrow}$$. The point is that this collapse is instant, regardless of the distance between $$A$$ and $$B$$. @@ -69,21 +69,29 @@ $$\begin{aligned} \end{aligned}$$ The product of the outcomes of $$A$$ and $$B$$ then has the following expectation value. -Note that we only multiply $$A$$ and $$B$$ for shared $$\lambda$$-values: -this is what makes it a **local** hidden variable: +Note that we multiply $$A$$ and $$B$$ at the same $$\lambda$$-value, +hence it is a *local* hidden variable: $$\begin{aligned} - \Expval{A_a B_b} - = \int \rho(\lambda) \: A(\vec{a}, \lambda) \: B(\vec{b}, \lambda) \dd{\lambda} + \expval{A_a B_b} + \equiv \int \rho(\lambda) \: A(\vec{a}, \lambda) \: B(\vec{b}, \lambda) \dd{\lambda} \end{aligned}$$ -From this, two inequalities can be derived, -which both prove Bell's theorem. +From this, we can make several predictions about LHV theories, +which turn out to disagree with various theoretical +and experimental results in quantum mechanics. +The two most famous LHV predictions are +the **Bell inequality** and +the [CHSH inequality](/know/concept/chsh-inequality/). + ## Bell inequality -If $$\vec{a} = \vec{b}$$, then we know that $$A$$ and $$B$$ always have opposite spins: +We present Bell's original proof of his theorem. +If $$\vec{a} = \vec{b}$$, then we know that +measuring $$A$$ and $$B$$ gives them opposite spins, +because they start in the entangled state $$\ket{\Psi^{-}}$$: $$\begin{aligned} A(\vec{a}, \lambda) @@ -94,7 +102,7 @@ $$\begin{aligned} The expectation value of the product can therefore be rewritten as follows: $$\begin{aligned} - \Expval{A_a B_b} + \expval{A_a B_b} = - \int \rho(\lambda) \: A(\vec{a}, \lambda) \: A(\vec{b}, \lambda) \dd{\lambda} \end{aligned}$$ @@ -102,7 +110,7 @@ Next, we introduce an arbitrary third direction $$\vec{c}$$, and use the fact that $$( A(\vec{b}, \lambda) )^2 = 1$$: $$\begin{aligned} - \Expval{A_a B_b} - \Expval{A_a B_c} + \expval{A_a B_b} - \expval{A_a B_c} &= - \int \rho(\lambda) \Big( A(\vec{a}, \lambda) \: A(\vec{b}, \lambda) - A(\vec{a}, \lambda) \: A(\vec{c}, \lambda) \Big) \dd{\lambda} \\ &= - \int \rho(\lambda) \Big( 1 - A(\vec{b}, \lambda) \: A(\vec{c}, \lambda) \Big) A(\vec{a}, \lambda) \: A(\vec{b}, \lambda) \dd{\lambda} @@ -114,7 +122,7 @@ Taking the absolute value of the whole left, and of the integrand on the right, we thus get: $$\begin{aligned} - \Big| \Expval{A_a B_b} - \Expval{A_a B_c} \Big| + \Big| \expval{A_a B_b} - \expval{A_a B_c} \Big| &\le \int \rho(\lambda) \Big( 1 - A(\vec{b}, \lambda) \: A(\vec{c}, \lambda) \Big) \: \Big| A(\vec{a}, \lambda) \: A(\vec{b}, \lambda) \Big| \dd{\lambda} \\ @@ -122,24 +130,24 @@ $$\begin{aligned} \end{aligned}$$ Since $$\rho(\lambda)$$ is a normalized probability density function, -we arrive at the **Bell inequality**: +we arrive at the Bell inequality: $$\begin{aligned} \boxed{ - \Big| \Expval{A_a B_b} - \Expval{A_a B_c} \Big| - \le 1 + \Expval{A_b B_c} + \Big| \expval{A_a B_b} - \expval{A_a B_c} \Big| + \le 1 + \expval{A_b B_c} } \end{aligned}$$ Any theory involving an LHV $$\lambda$$ must obey this inequality. -The problem, however, is that quantum mechanics dictates the expectation values -for the state $$\Ket{\Psi^{-}}$$: +The problem, however, is that quantum mechanics dictates +the expectation values for the state $$\ket{\Psi^{-}}$$: $$\begin{aligned} - \Expval{A_a B_b} = - \vec{a} \cdot \vec{b} + \expval{A_a B_b} = - \vec{a} \cdot \vec{b} \end{aligned}$$ -Finding directions which violate the Bell inequality is easy: +Finding directions that violate the Bell inequality is easy: for example, if $$\vec{a}$$ and $$\vec{b}$$ are orthogonal, and $$\vec{c}$$ is at a $$\pi/4$$ angle to both of them, then the left becomes $$0.707$$ and the right $$0.293$$, @@ -147,222 +155,6 @@ which clearly disagrees with the inequality, meaning that LHVs are impossible. -## CHSH inequality - -The **Clauser-Horne-Shimony-Holt** or simply **CHSH inequality** -takes a slightly different approach, and is more useful in practice. - -Consider four spin directions, two for $$A$$ called $$\vec{a}_1$$ and $$\vec{a}_2$$, -and two for $$B$$ called $$\vec{b}_1$$ and $$\vec{b}_2$$. -Let us introduce the following abbreviations: - -$$\begin{aligned} - A_1 &= A(\vec{a}_1, \lambda) - \qquad \quad - A_2 = A(\vec{a}_2, \lambda) - \\ - B_1 &= B(\vec{b}_1, \lambda) - \qquad \quad - B_2 = B(\vec{b}_2, \lambda) -\end{aligned}$$ - -From the definition of the expectation value, -we know that the difference is given by: - -$$\begin{aligned} - \Expval{A_1 B_1} - \Expval{A_1 B_2} - = \int \rho(\lambda) \Big( A_1 B_1 - A_1 B_2 \Big) \dd{\lambda} -\end{aligned}$$ - -We introduce some new terms and rearrange the resulting expression: - -$$\begin{aligned} - \Expval{A_1 B_1} - \Expval{A_1 B_2} - &= \int \rho(\lambda) \Big( A_1 B_1 - A_1 B_2 \pm A_1 B_1 A_2 B_2 \mp A_1 B_1 A_2 B_2 \Big) \dd{\lambda} - \\ - &= \int \rho(\lambda) A_1 B_1 \Big( 1 \pm A_2 B_2 \Big) \dd{\lambda} - - \!\int \rho(\lambda) A_1 B_2 \Big( 1 \pm A_2 B_1 \Big) \dd{\lambda} -\end{aligned}$$ - -Taking the absolute value of both sides -and invoking the triangle inequality then yields: - -$$\begin{aligned} - \Big| \Expval{A_1 B_1} - \Expval{A_1 B_2} \Big| - &= \bigg|\! \int \rho(\lambda) A_1 B_1 \Big( 1 \pm A_2 B_2 \Big) \dd{\lambda} - - \!\int \rho(\lambda) A_1 B_2 \Big( 1 \pm A_2 B_1 \Big) \dd{\lambda} \!\bigg| - \\ - &\le \bigg|\! \int \rho(\lambda) A_1 B_1 \Big( 1 \pm A_2 B_2 \Big) \dd{\lambda} \!\bigg| - + \bigg|\! \int \rho(\lambda) A_1 B_2 \Big( 1 \pm A_2 B_1 \Big) \dd{\lambda} \!\bigg| -\end{aligned}$$ - -Using the fact that the product of $$A$$ and $$B$$ is always either $$-1$$ or $$+1$$, -we can reduce this to: - -$$\begin{aligned} - \Big| \Expval{A_1 B_1} - \Expval{A_1 B_2} \Big| - &\le \int \rho(\lambda) \Big| A_1 B_1 \Big| \Big( 1 \pm A_2 B_2 \Big) \dd{\lambda} - + \!\int \rho(\lambda) \Big| A_1 B_2 \Big| \Big( 1 \pm A_2 B_1 \Big) \dd{\lambda} - \\ - &\le \int \rho(\lambda) \Big( 1 \pm A_2 B_2 \Big) \dd{\lambda} - + \!\int \rho(\lambda) \Big( 1 \pm A_2 B_1 \Big) \dd{\lambda} -\end{aligned}$$ - -Evaluating these integrals gives us the following inequality, -which holds for both choices of $$\pm$$: - -$$\begin{aligned} - \Big| \Expval{A_1 B_1} - \Expval{A_1 B_2} \Big| - &\le 2 \pm \Expval{A_2 B_2} \pm \Expval{A_2 B_1} -\end{aligned}$$ - -We should choose the signs such that the right-hand side is as small as possible, that is: - -$$\begin{aligned} - \Big| \Expval{A_1 B_1} - \Expval{A_1 B_2} \Big| - &\le 2 \pm \Big( \Expval{A_2 B_2} + \Expval{A_2 B_1} \Big) - \\ - &\le 2 - \Big| \Expval{A_2 B_2} + \Expval{A_2 B_1} \Big| -\end{aligned}$$ - -Rearranging this and once again using the triangle inequality, -we get the CHSH inequality: - -$$\begin{aligned} - 2 - &\ge \Big| \Expval{A_1 B_1} - \Expval{A_1 B_2} \Big| + \Big| \Expval{A_2 B_2} + \Expval{A_2 B_1} \Big| - \\ - &\ge \Big| \Expval{A_1 B_1} - \Expval{A_1 B_2} + \Expval{A_2 B_2} + \Expval{A_2 B_1} \Big| -\end{aligned}$$ - -The quantity on the right-hand side is sometimes called the **CHSH quantity** $$S$$, -and measures the correlation between the spins of $$A$$ and $$B$$: - -$$\begin{aligned} - \boxed{ - S \equiv \Expval{A_2 B_1} + \Expval{A_2 B_2} + \Expval{A_1 B_1} - \Expval{A_1 B_2} - } -\end{aligned}$$ - -The CHSH inequality places an upper bound on the magnitude of $$S$$ -for LHV-based theories: - -$$\begin{aligned} - \boxed{ - |S| \le 2 - } -\end{aligned}$$ - - -## Tsirelson's bound - -Quantum physics can violate the CHSH inequality, but by how much? -Consider the following two-particle operator, -whose expectation value is the CHSH quantity, i.e. $$S = \expval{\hat{S}}$$: - -$$\begin{aligned} - \hat{S} - = \hat{A}_2 \otimes \hat{B}_1 + \hat{A}_2 \otimes \hat{B}_2 + \hat{A}_1 \otimes \hat{B}_1 - \hat{A}_1 \otimes \hat{B}_2 -\end{aligned}$$ - -Where $$\otimes$$ is the tensor product, -and e.g. $$\hat{A}_1$$ is the Pauli matrix for the $$\vec{a}_1$$-direction. -The square of this operator is then given by: - -$$\begin{aligned} - \hat{S}^2 - = \quad &\hat{A}_2^2 \otimes \hat{B}_1^2 + \hat{A}_2^2 \otimes \hat{B}_1 \hat{B}_2 - + \hat{A}_2 \hat{A}_1 \otimes \hat{B}_1^2 - \hat{A}_2 \hat{A}_1 \otimes \hat{B}_1 \hat{B}_2 - \\ - + &\hat{A}_2^2 \otimes \hat{B}_2 \hat{B}_1 + \hat{A}_2^2 \otimes \hat{B}_2^2 - + \hat{A}_2 \hat{A}_1 \otimes \hat{B}_2 \hat{B}_1 - \hat{A}_2 \hat{A}_1 \otimes \hat{B}_2^2 - \\ - + &\hat{A}_1 \hat{A}_2 \otimes \hat{B}_1^2 + \hat{A}_1 \hat{A}_2 \otimes \hat{B}_1 \hat{B}_2 - + \hat{A}_1^2 \otimes \hat{B}_1^2 - \hat{A}_1^2 \otimes \hat{B}_1 \hat{B}_2 - \\ - - &\hat{A}_1 \hat{A}_2 \otimes \hat{B}_2 \hat{B}_1 - \hat{A}_1 \hat{A}_2 \otimes \hat{B}_2^2 - - \hat{A}_1^2 \otimes \hat{B}_2 \hat{B}_1 + \hat{A}_1^2 \otimes \hat{B}_2^2 - \\ - = \quad &\hat{A}_2^2 \otimes \hat{B}_1^2 + \hat{A}_2^2 \otimes \hat{B}_2^2 + \hat{A}_1^2 \otimes \hat{B}_1^2 + \hat{A}_1^2 \otimes \hat{B}_2^2 - \\ - + &\hat{A}_2^2 \otimes \acomm{\hat{B}_1}{\hat{B}_2} - \hat{A}_1^2 \otimes \acomm{\hat{B}_1}{\hat{B}_2} - + \acomm{\hat{A}_1}{\hat{A}_2} \otimes \hat{B}_1^2 - \acomm{\hat{A}_1}{\hat{A}_2} \otimes \hat{B}_2^2 - \\ - + &\hat{A}_1 \hat{A}_2 \otimes \comm{\hat{B}_1}{\hat{B}_2} - \hat{A}_2 \hat{A}_1 \otimes \comm{\hat{B}_1}{\hat{B}_2} -\end{aligned}$$ - -Spin operators are unitary, so their square is the identity, -e.g. $$\hat{A}_1^2 = \hat{I}$$. Therefore $$\hat{S}^2$$ reduces to: - -$$\begin{aligned} - \hat{S}^2 - &= 4 \: (\hat{I} \otimes \hat{I}) + \comm{\hat{A}_1}{\hat{A}_2} \otimes \comm{\hat{B}_1}{\hat{B}_2} -\end{aligned}$$ - -The *norm* $$\norm{\hat{S}^2}$$ of this operator -is the largest possible expectation value $$\expval{\hat{S}^2}$$, -which is the same as its largest eigenvalue. -It is given by: - -$$\begin{aligned} - \Norm{\hat{S}^2} - &= 4 + \Norm{\comm{\hat{A}_1}{\hat{A}_2} \otimes \comm{\hat{B}_1}{\hat{B}_2}} - \\ - &\le 4 + \Norm{\comm{\hat{A}_1}{\hat{A}_2}} \Norm{\comm{\hat{B}_1}{\hat{B}_2}} -\end{aligned}$$ - -We find a bound for the norm of the commutators by using the triangle inequality, such that: - -$$\begin{aligned} - \Norm{\comm{\hat{A}_1}{\hat{A}_2}} - = \Norm{\hat{A}_1 \hat{A}_2 - \hat{A}_2 \hat{A}_1} - \le \Norm{\hat{A}_1 \hat{A}_2} + \Norm{\hat{A}_2 \hat{A}_1} - \le 2 \Norm{\hat{A}_1 \hat{A}_2} - \le 2 -\end{aligned}$$ - -And $$\norm{\comm{\hat{B}_1}{\hat{B}_2}} \le 2$$ for the same reason. -The norm is the largest eigenvalue, therefore: - -$$\begin{aligned} - \Norm{\hat{S}^2} - \le 4 + 2 \cdot 2 - = 8 - \quad \implies \quad - \Norm{\hat{S}} - \le \sqrt{8} - = 2 \sqrt{2} -\end{aligned}$$ - -We thus arrive at **Tsirelson's bound**, -which states that quantum mechanics can violate -the CHSH inequality by a factor of $$\sqrt{2}$$: - -$$\begin{aligned} - \boxed{ - |S| - \le 2 \sqrt{2} - } -\end{aligned}$$ - -Importantly, this is a *tight* bound, -meaning that there exist certain spin measurement directions -for which Tsirelson's bound becomes an equality, for example: - -$$\begin{aligned} - \hat{A}_1 = \hat{\sigma}_z - \qquad - \hat{A}_2 = \hat{\sigma}_x - \qquad - \hat{B}_1 = \frac{\hat{\sigma}_z + \hat{\sigma}_x}{\sqrt{2}} - \qquad - \hat{B}_2 = \frac{\hat{\sigma}_z - \hat{\sigma}_x}{\sqrt{2}} -\end{aligned}$$ - -Using the fact that $$\Expval{A_a B_b} = - \vec{a} \cdot \vec{b}$$, -it can then be shown that $$S = 2 \sqrt{2}$$ in this case. - - ## References 1. D.J. Griffiths, D.F. Schroeter, diff --git a/source/know/concept/chsh-inequality/index.md b/source/know/concept/chsh-inequality/index.md new file mode 100644 index 0000000..984bae6 --- /dev/null +++ b/source/know/concept/chsh-inequality/index.md @@ -0,0 +1,273 @@ +--- +title: "CHSH inequality" +sort_title: "CHSH inequality" +date: 2023-02-05 +categories: +- Physics +- Quantum mechanics +- Quantum information +layout: "concept" +--- + +The **Clauser-Horne-Shimony-Holt (CHSH) inequality** +is an alternative proof of [Bell's theorem](/know/concept/bells-theorem/), +which takes a slightly different approach +and is more useful in practice. + +Suppose there is a local hidden variable (LHV) $$\lambda$$ +with an unknown probability density $$\rho$$: + +$$\begin{aligned} + \int \rho(\lambda) \dd{\lambda} = 1 + \qquad \quad + \rho(\lambda) \ge 0 +\end{aligned}$$ + +Given two spin-1/2 particles $$A$$ and $$B$$, +measuring their spins along arbitrary directions $$\vec{a}$$ and $$\vec{b}$$ +would give each an eigenvalue $$\pm 1$$. We write this as: + +$$\begin{aligned} + A(\vec{a}, \lambda) = \pm 1 + \qquad \quad + B(\vec{b}, \lambda) = \pm 1 +\end{aligned}$$ + +If $$A$$ and $$B$$ start in an entangled [Bell state](/know/concept/bell-state/), +e.g. $$\ket{\Psi^{-}}$$, then we expect a correlation between their measurements results. +The product of the outcomes of $$A$$ and $$B$$ is: + +$$\begin{aligned} + \Expval{A_a B_b} + \equiv \int \rho(\lambda) \: A(\vec{a}, \lambda) \: B(\vec{b}, \lambda) \dd{\lambda} +\end{aligned}$$ + +So far, we have taken the same path as for proving Bell's inequality, +but for the CHSH inequality we must now diverge. + + + +## Deriving the inequality + +Consider four spin directions, two for $$A$$ called $$\vec{a}_1$$ and $$\vec{a}_2$$, +and two for $$B$$ called $$\vec{b}_1$$ and $$\vec{b}_2$$. +Let us introduce the following abbreviations: + +$$\begin{aligned} + A_1 \equiv A(\vec{a}_1, \lambda) + \qquad \quad + A_2 \equiv A(\vec{a}_2, \lambda) + \qquad \quad + B_1 \equiv B(\vec{b}_1, \lambda) + \qquad \quad + B_2 \equiv B(\vec{b}_2, \lambda) +\end{aligned}$$ + +From the definition of the expectation value, +we know that the difference is given by: + +$$\begin{aligned} + \Expval{A_1 B_1} - \Expval{A_1 B_2} + = \int \rho(\lambda) \Big( A_1 B_1 - A_1 B_2 \Big) \dd{\lambda} +\end{aligned}$$ + +We introduce some new terms and rearrange the resulting expression: + +$$\begin{aligned} + \Expval{A_1 B_1} - \Expval{A_1 B_2} + &= \int \rho(\lambda) \Big( A_1 B_1 - A_1 B_2 \pm A_1 B_1 A_2 B_2 \mp A_1 B_1 A_2 B_2 \Big) \dd{\lambda} + \\ + &= \int \rho(\lambda) A_1 B_1 \Big( 1 \pm A_2 B_2 \Big) \dd{\lambda} + - \!\int \rho(\lambda) A_1 B_2 \Big( 1 \pm A_2 B_1 \Big) \dd{\lambda} +\end{aligned}$$ + +Taking the absolute value of both sides +and invoking the triangle inequality then yields: + +$$\begin{aligned} + \Big| \Expval{A_1 B_1} - \Expval{A_1 B_2} \Big| + &= \bigg|\! \int \rho(\lambda) A_1 B_1 \Big( 1 \pm A_2 B_2 \Big) \dd{\lambda} + - \!\int \rho(\lambda) A_1 B_2 \Big( 1 \pm A_2 B_1 \Big) \dd{\lambda} \!\bigg| + \\ + &\le \bigg|\! \int \rho(\lambda) A_1 B_1 \Big( 1 \pm A_2 B_2 \Big) \dd{\lambda} \!\bigg| + + \bigg|\! \int \rho(\lambda) A_1 B_2 \Big( 1 \pm A_2 B_1 \Big) \dd{\lambda} \!\bigg| +\end{aligned}$$ + +Using the fact that the product of the spin eigenvalues of $$A$$ and $$B$$ +is always either $$-1$$ or $$+1$$ for all directions, +we can reduce this to: + +$$\begin{aligned} + \Big| \Expval{A_1 B_1} - \Expval{A_1 B_2} \Big| + &\le \int \rho(\lambda) \Big| A_1 B_1 \Big| \Big( 1 \pm A_2 B_2 \Big) \dd{\lambda} + + \!\int \rho(\lambda) \Big| A_1 B_2 \Big| \Big( 1 \pm A_2 B_1 \Big) \dd{\lambda} + \\ + &\le \int \rho(\lambda) \Big( 1 \pm A_2 B_2 \Big) \dd{\lambda} + + \!\int \rho(\lambda) \Big( 1 \pm A_2 B_1 \Big) \dd{\lambda} +\end{aligned}$$ + +Evaluating these integrals gives us the following inequality, +which holds for both choices of $$\pm$$: + +$$\begin{aligned} + \Big| \Expval{A_1 B_1} - \Expval{A_1 B_2} \Big| + &\le 2 \pm \Expval{A_2 B_2} \pm \Expval{A_2 B_1} +\end{aligned}$$ + +We should choose the signs such that the right-hand side is as small as possible, that is: + +$$\begin{aligned} + \Big| \Expval{A_1 B_1} - \Expval{A_1 B_2} \Big| + &\le 2 \pm \Big( \Expval{A_2 B_2} + \Expval{A_2 B_1} \Big) + \\ + &\le 2 - \Big| \Expval{A_2 B_2} + \Expval{A_2 B_1} \Big| +\end{aligned}$$ + +Rearranging this and once again using the triangle inequality, +we get the CHSH inequality: + +$$\begin{aligned} + 2 + &\ge \Big| \Expval{A_1 B_1} - \Expval{A_1 B_2} \Big| + \Big| \Expval{A_2 B_2} + \Expval{A_2 B_1} \Big| + \\ + &\ge \Big| \Expval{A_1 B_1} - \Expval{A_1 B_2} + \Expval{A_2 B_2} + \Expval{A_2 B_1} \Big| +\end{aligned}$$ + +The quantity on the right-hand side is sometimes called the **CHSH quantity** $$S$$, +and measures the correlation between the spins of $$A$$ and $$B$$: + +$$\begin{aligned} + \boxed{ + S \equiv \Expval{A_2 B_1} + \Expval{A_2 B_2} + \Expval{A_1 B_1} - \Expval{A_1 B_2} + } +\end{aligned}$$ + +The CHSH inequality places an upper bound on the magnitude of $$S$$ +for LHV-based theories: + +$$\begin{aligned} + \boxed{ + |S| \le 2 + } +\end{aligned}$$ + + + +## Tsirelson's bound + +Quantum physics can violate the CHSH inequality, but by how much? +Consider the following two-particle operator, +whose expectation value is the CHSH quantity, i.e. $$S = \expval{\hat{S}}$$: + +$$\begin{aligned} + \hat{S} + = \hat{A}_2 \otimes \hat{B}_1 + \hat{A}_2 \otimes \hat{B}_2 + \hat{A}_1 \otimes \hat{B}_1 - \hat{A}_1 \otimes \hat{B}_2 +\end{aligned}$$ + +Where $$\otimes$$ is the tensor product, +and e.g. $$\hat{A}_1$$ is the Pauli matrix for the $$\vec{a}_1$$-direction. +The square of this operator is then given by: + +$$\begin{aligned} + \hat{S}^2 + = \quad &\hat{A}_2^2 \otimes \hat{B}_1^2 + \hat{A}_2^2 \otimes \hat{B}_1 \hat{B}_2 + + \hat{A}_2 \hat{A}_1 \otimes \hat{B}_1^2 - \hat{A}_2 \hat{A}_1 \otimes \hat{B}_1 \hat{B}_2 + \\ + + &\hat{A}_2^2 \otimes \hat{B}_2 \hat{B}_1 + \hat{A}_2^2 \otimes \hat{B}_2^2 + + \hat{A}_2 \hat{A}_1 \otimes \hat{B}_2 \hat{B}_1 - \hat{A}_2 \hat{A}_1 \otimes \hat{B}_2^2 + \\ + + &\hat{A}_1 \hat{A}_2 \otimes \hat{B}_1^2 + \hat{A}_1 \hat{A}_2 \otimes \hat{B}_1 \hat{B}_2 + + \hat{A}_1^2 \otimes \hat{B}_1^2 - \hat{A}_1^2 \otimes \hat{B}_1 \hat{B}_2 + \\ + - &\hat{A}_1 \hat{A}_2 \otimes \hat{B}_2 \hat{B}_1 - \hat{A}_1 \hat{A}_2 \otimes \hat{B}_2^2 + - \hat{A}_1^2 \otimes \hat{B}_2 \hat{B}_1 + \hat{A}_1^2 \otimes \hat{B}_2^2 + \\ + = \quad &\hat{A}_2^2 \otimes \hat{B}_1^2 + \hat{A}_2^2 \otimes \hat{B}_2^2 + \hat{A}_1^2 \otimes \hat{B}_1^2 + \hat{A}_1^2 \otimes \hat{B}_2^2 + \\ + + &\hat{A}_2^2 \otimes \acomm{\hat{B}_1}{\hat{B}_2} - \hat{A}_1^2 \otimes \acomm{\hat{B}_1}{\hat{B}_2} + + \acomm{\hat{A}_1}{\hat{A}_2} \otimes \hat{B}_1^2 - \acomm{\hat{A}_1}{\hat{A}_2} \otimes \hat{B}_2^2 + \\ + + &\hat{A}_1 \hat{A}_2 \otimes \comm{\hat{B}_1}{\hat{B}_2} - \hat{A}_2 \hat{A}_1 \otimes \comm{\hat{B}_1}{\hat{B}_2} +\end{aligned}$$ + +Spin operators are unitary, so their square is the identity, +e.g. $$\hat{A}_1^2 = \hat{I}$$. Therefore $$\hat{S}^2$$ reduces to: + +$$\begin{aligned} + \hat{S}^2 + &= 4 \: (\hat{I} \otimes \hat{I}) + \comm{\hat{A}_1}{\hat{A}_2} \otimes \comm{\hat{B}_1}{\hat{B}_2} +\end{aligned}$$ + +The *norm* $$\norm{\hat{S}^2}$$ of this operator +is the largest possible expectation value $$\expval{\hat{S}^2}$$, +which is the same as its largest eigenvalue. +It is given by: + +$$\begin{aligned} + \Norm{\hat{S}^2} + &= 4 + \Norm{\comm{\hat{A}_1}{\hat{A}_2} \otimes \comm{\hat{B}_1}{\hat{B}_2}} + \\ + &\le 4 + \Norm{\comm{\hat{A}_1}{\hat{A}_2}} \Norm{\comm{\hat{B}_1}{\hat{B}_2}} +\end{aligned}$$ + +We find a bound for the norm of the commutators by using the triangle inequality, such that: + +$$\begin{aligned} + \Norm{\comm{\hat{A}_1}{\hat{A}_2}} + = \Norm{\hat{A}_1 \hat{A}_2 - \hat{A}_2 \hat{A}_1} + \le \Norm{\hat{A}_1 \hat{A}_2} + \Norm{\hat{A}_2 \hat{A}_1} + \le 2 \Norm{\hat{A}_1 \hat{A}_2} + \le 2 +\end{aligned}$$ + +And $$\norm{\comm{\hat{B}_1}{\hat{B}_2}} \le 2$$ for the same reason. +The norm is the largest eigenvalue, therefore: + +$$\begin{aligned} + \Norm{\hat{S}^2} + \le 4 + 2 \cdot 2 + = 8 + \quad \implies \quad + \Norm{\hat{S}} + \le \sqrt{8} + = 2 \sqrt{2} +\end{aligned}$$ + +We thus arrive at **Tsirelson's bound**, +which states that quantum mechanics can violate +the CHSH inequality by a factor of $$\sqrt{2}$$: + +$$\begin{aligned} + \boxed{ + |S| + \le 2 \sqrt{2} + } +\end{aligned}$$ + +Importantly, this is a *tight* bound, +meaning that there exist certain spin measurement directions +for which Tsirelson's bound becomes an equality, for example: + +$$\begin{aligned} + \hat{A}_1 = \hat{\sigma}_z + \qquad + \hat{A}_2 = \hat{\sigma}_x + \qquad + \hat{B}_1 = \frac{\hat{\sigma}_z + \hat{\sigma}_x}{\sqrt{2}} + \qquad + \hat{B}_2 = \frac{\hat{\sigma}_z - \hat{\sigma}_x}{\sqrt{2}} +\end{aligned}$$ + +Fundamental quantum mechanics says that +$$\Expval{A_a B_b} = - \vec{a} \cdot \vec{b}$$, +so $$S = 2 \sqrt{2}$$ in this case. + + + +## References +1. D.J. Griffiths, D.F. Schroeter, + *Introduction to quantum mechanics*, 3rd edition, + Cambridge. +2. J.B. Brask, + *Quantum information: lecture notes*, + 2021, unpublished. diff --git a/source/know/concept/lagrange-multiplier/index.md b/source/know/concept/lagrange-multiplier/index.md index ce5418f..9fb61a8 100644 --- a/source/know/concept/lagrange-multiplier/index.md +++ b/source/know/concept/lagrange-multiplier/index.md @@ -102,7 +102,7 @@ by demanding it is stationary: $$\begin{aligned} 0 - = \nabla \mathcal{L} + = \nabla' \mathcal{L} &= \bigg( \pdv{\mathcal{L}}{x}, \pdv{\mathcal{L}}{y}, \pdv{\mathcal{L}}{\lambda} \bigg) \\ &= \bigg( \pdv{f}{x} + \lambda \pdv{g}{x}, \:\:\: \pdv{f}{y} + \lambda \pdv{g}{y}, \:\:\: g \bigg) diff --git a/source/know/concept/pulay-mixing/index.md b/source/know/concept/pulay-mixing/index.md index 6e809dd..81051f1 100644 --- a/source/know/concept/pulay-mixing/index.md +++ b/source/know/concept/pulay-mixing/index.md @@ -8,68 +8,70 @@ layout: "concept" --- Some numerical problems are most easily solved *iteratively*, -by generating a series $$\rho_1$$, $$\rho_2$$, etc. -converging towards the desired solution $$\rho_*$$. +by generating a series of "solutions" $$f_1$$, $$f_2$$, etc. +converging towards the true $$f_\infty$$. **Pulay mixing**, also often called **direct inversion in the iterative subspace** (DIIS), can speed up the convergence for some types of problems, and also helps to avoid periodic divergences. The key concept it relies on is the **residual vector** $$R_n$$ -of the $$n$$th iteration, which in some way measures the error of the current $$\rho_n$$. -Its exact definition varies, -but is generally along the lines of the difference between -the input of the iteration and the raw resulting output: +of the $$n$$th iteration, which measures the error of the current $$f_n$$. +Its exact definition can vary, +but it is generally the difference between +the input $$f_n$$ of the $$n$$th iteration +and the raw resulting output $$f_n^\mathrm{new}$$: $$\begin{aligned} R_n - = R[\rho_n] - = \rho_n^\mathrm{new}[\rho_n] - \rho_n + \equiv R[f_n] + \equiv f_n^\mathrm{new}[f_n] - f_n \end{aligned}$$ -It is not always clear what to do with $$\rho_n^\mathrm{new}$$. -Directly using it as the next input ($$\rho_{n+1} = \rho_n^\mathrm{new}$$) +It is not always clear what to do with $$f_n^\mathrm{new}$$. +Directly using it as the next input ($$f_{n+1} = f_n^\mathrm{new}$$) often leads to oscillation, -and linear mixing ($$\rho_{n+1} = (1\!-\!f) \rho_n + f \rho_n^\mathrm{new}$$) +and linear mixing ($$f_{n+1} = (1\!-\!c) f_n + c f_n^\mathrm{new}$$) can take a very long time to converge properly. Pulay mixing offers an improvement. -The idea is to construct the next iteration's input $$\rho_{n+1}$$ -as a linear combination of the previous inputs $$\rho_1$$, $$\rho_2$$, ..., $$\rho_n$$, -such that it is as close as possible to the optimal $$\rho_*$$: +The idea is to construct the next iteration's input $$f_{n+1}$$ +as a linear combination of the previous inputs $$f_1$$, $$f_2$$, ..., $$f_n$$, +such that it is as close as possible to the optimal $$f_\infty$$: $$\begin{aligned} \boxed{ - \rho_{n+1} - = \sum_{m = 1}^n \alpha_m \rho_m + f_{n+1} + = \sum_{m = 1}^n \alpha_m f_m } \end{aligned}$$ To do so, we make two assumptions. -Firstly, the current $$\rho_n$$ is already close to $$\rho_*$$, -so that such a linear combination makes sense. -Secondly, the iteration is linear, -such that the raw output $$\rho_{n+1}^\mathrm{new}$$ -is also a linear combination with the *same coefficients*: +First, that the current $$f_n$$ is already close to $$f_\infty$$, +so such a linear combination makes sense. +Second, that the iteration is linear, +so the raw output $$f_{n+1}^\mathrm{new}$$ +is also a linear combination *with the same coefficients*: $$\begin{aligned} - \rho_{n+1}^\mathrm{new} - = \sum_{m = 1}^n \alpha_m \rho_m^\mathrm{new} + f_{n+1}^\mathrm{new} + = \sum_{m = 1}^n \alpha_m f_m^\mathrm{new} \end{aligned}$$ -We will return to these assumptions later. -The point is that $$R_{n+1}$$ is also a linear combination: +We will revisit these assumptions later. +The point is that $$R_{n+1}$$ can now also be written +as a linear combination of old residuals $$R_m$$: $$\begin{aligned} R_{n+1} - = \rho_{n+1}^\mathrm{new} - \rho_{n+1} - = \sum_{m = 1}^n \alpha_m \rho_m^\mathrm{new} - \sum_{m = 1}^n \alpha_m \rho_m + = f_{n+1}^\mathrm{new} - f_{n+1} + = \sum_{m = 1}^n \alpha_m f_m^\mathrm{new} - \sum_{m = 1}^n \alpha_m f_m = \sum_{m = 1}^n \alpha_m R_m \end{aligned}$$ The goal is to choose the coefficients $$\alpha_m$$ such that the norm of the error $$|R_{n+1}| \approx 0$$, -subject to the following constraint to preserve the normalization of $$\rho_{n+1}$$: +subject to the following constraint to preserve the normalization of $$f_{n+1}$$: $$\begin{aligned} \sum_{m=1}^n \alpha_m = 1 @@ -79,20 +81,19 @@ We thus want to minimize the following quantity, where $$\lambda$$ is a [Lagrange multiplier](/know/concept/lagrange-multiplier/): $$\begin{aligned} - \Inprod{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m^* - = \sum_{m=1}^n \alpha_m^* \Big( \sum_{k=1}^n \alpha_k \Inprod{R_m}{R_k} + \lambda \Big) + \inprod{R_{n+1}}{R_{n+1}} + \lambda \sum_{m = 1}^n \alpha_m^* + = \sum_{m=1}^n \alpha_m^* \Big( \sum_{k=1}^n \alpha_k \inprod{R_m}{R_k} + \lambda \Big) \end{aligned}$$ By differentiating the right-hand side with respect to $$\alpha_m^*$$ and demanding that the result is zero, -we get a system of equations that we can write in matrix form, -which is cheap to solve: +we get a cheap-to-solve system of equations, in matrix form: $$\begin{aligned} \begin{bmatrix} - \Inprod{R_1}{R_1} & \cdots & \Inprod{R_1}{R_n} & 1 \\ + \inprod{R_1}{R_1} & \cdots & \inprod{R_1}{R_n} & 1 \\ \vdots & \ddots & \vdots & \vdots \\ - \Inprod{R_n}{R_1} & \cdots & \Inprod{R_n}{R_n} & 1 \\ + \inprod{R_n}{R_1} & \cdots & \inprod{R_n}{R_n} & 1 \\ 1 & \cdots & 1 & 0 \end{bmatrix} \cdot @@ -106,48 +107,50 @@ $$\begin{aligned} \end{aligned}$$ From this, we can also see that the Lagrange multiplier -$$\lambda = - \Inprod{R_{n+1}}{R_{n+1}}$$, +$$\lambda = - \inprod{R_{n+1}}{R_{n+1}}$$, where $$R_{n+1}$$ is the *predicted* residual of the next iteration, subject to the two assumptions. +This fact makes $$\lambda$$ a useful measure of convergence. -However, in practice, the earlier inputs $$\rho_1$$, $$\rho_2$$, etc. -are much further from $$\rho_*$$ than $$\rho_n$$, -so usually only the most recent $$N\!+\!1$$ inputs $$\rho_{n - N}$$, ..., $$\rho_n$$ are used: +In practice, the earlier inputs $$f_1$$, $$f_2$$, etc. +are much further from $$f_\infty$$ than $$f_n$$, +so usually only the most recent $$N\!+\!1$$ inputs $$f_{n - N}, ..., f_n$$ are used. +This also keeps the matrix small: $$\begin{aligned} - \rho_{n+1} - = \sum_{m = n-N}^n \alpha_m \rho_m + f_{n+1} + = \sum_{m = n-N}^n \alpha_m f_m \end{aligned}$$ -You might be confused by the absence of any $$\rho_m^\mathrm{new}$$ -in the creation of $$\rho_{n+1}$$, as if the iteration's outputs are being ignored. +You might be confused by the absence of any $$f_m^\mathrm{new}$$ +in the creation of $$f_{n+1}$$, as if the iteration's outputs are being ignored. This is due to the first assumption, -which states that $$\rho_n^\mathrm{new}$$ and $$\rho_n$$ are already similar, +which states that $$f_n^\mathrm{new}$$ and $$f_n$$ are already similar, such that they are basically interchangeable. -Speaking of which, about those assumptions: -while they will clearly become more accurate as $$\rho_n$$ approaches $$\rho_*$$, -they might be very dubious in the beginning. -A consequence of this is that the early iterations might get "trapped" -in a suboptimal subspace spanned by $$\rho_1$$, $$\rho_2$$, etc. -To say it another way, we would be varying $$n$$ coefficients $$\alpha_m$$ -to try to optimize a $$D$$-dimensional $$\rho_{n+1}$$, -where in general $$D \gg n$$, at least in the beginning. - -There is an easy fix to this problem: -add a small amount of the raw residual $$R_m$$ -to "nudge" $$\rho_{n+1}$$ towards the right subspace, +Although those assumptions will clearly become more realistic as $$f_n \to f_\infty$$, +they might be very dubious at first. +Consequently, the early iterations may get "trapped" +in a suboptimal subspace spanned by $$f_1$$, $$f_2$$, etc. +Think of it like this: +we would be varying up to $$n$$ coefficients $$\alpha_m$$ +to try to optimize a $$D$$-dimensional $$f_{n+1}$$, where usually $$D \gg n$$. +It is almost impossible to find a decent optimum in this way! + +This problem is easy to fix, +by mixing in a small amount of the raw residuals $$R_m$$ +to "nudge" $$f_{n+1}$$ towards the right subspace, where $$\beta \in [0,1]$$ is a tunable parameter: $$\begin{aligned} \boxed{ - \rho_{n+1} - = \sum_{m = N}^n \alpha_m (\rho_m + \beta R_m) + f_{n+1} + = \sum_{m = N}^n \alpha_m (f_m + \beta R_m) } \end{aligned}$$ -In other words, we end up introducing a small amount of the raw outputs $$\rho_m^\mathrm{new}$$, -while still giving more weight to iterations with smaller residuals. +In this way, the raw outputs $$f_m^\mathrm{new}$$ are (rightfully) included via $$R_m$$, +but we still give more weight to iterations with smaller residuals. Pulay mixing is very effective for certain types of problems, e.g. density functional theory, -- cgit v1.2.3