From a39bb3b8aab1aeb4fceaedc54c756703819776c3 Mon Sep 17 00:00:00 2001
From: Prefetch
Date: Sat, 17 Dec 2022 18:19:26 +0100
Subject: Rewrite "Lagrange multiplier", various improvements

---
 source/know/concept/lagrange-multiplier/index.md | 187 ++++++++++++++---------
 1 file changed, 112 insertions(+), 75 deletions(-)

(limited to 'source/know/concept/lagrange-multiplier')

diff --git a/source/know/concept/lagrange-multiplier/index.md b/source/know/concept/lagrange-multiplier/index.md
index 8ee1054..a0b22aa 100644
--- a/source/know/concept/lagrange-multiplier/index.md
+++ b/source/know/concept/lagrange-multiplier/index.md
@@ -1,7 +1,7 @@
 ---
 title: "Lagrange multiplier"
 sort_title: "Lagrange multiplier"
-date: 2021-03-02
+date: 2022-12-17 # Originally 2021-03-02, major rewrite
 categories:
 - Mathematics
 - Physics
@@ -9,108 +9,145 @@ layout: "concept"
 ---
 
 The method of **Lagrange multipliers** or **undetermined multipliers**
-is a technique for optimizing (i.e. finding the extrema of)
-a function $$f(x, y, z)$$,
-subject to a given constraint $$\phi(x, y, z) = C$$,
-where $$C$$ is a constant.
-
-If we ignore the constraint $$\phi$$,
-optimizing $$f$$ simply comes down to finding stationary points:
+is a technique for optimizing (i.e. finding extrema of)
+a function $$f$$ subject to **equality constraints**.
+For example, in 2D, the goal is to maximize/minimize $$f(x, y)$$
+while satisfying $$g(x, y) = 0$$.
+We assume that $$f$$ and $$g$$ are both continuous
+and have continuous first derivatives,
+and that their domain is all of $$\mathbb{R}$$.
+
+Side note: many authors write that Lagrange multipliers
+can be used for constraints of the form $$g(x, y) = c$$ for a constant $$c$$.
+However, this method technically requires $$c = 0$$.
+This issue is easy to solve: given $$g = c$$,
+simply define $$\tilde{g} \equiv g - c = 0$$
+and use that as constraint instead.
+
+Before introducing $$g$$,
+optimizing $$f$$ comes down to finding its stationary points:
 
 $$\begin{aligned}
-    0 &= \dd{f} = f_x \dd{x} + f_y \dd{y} + f_z \dd{z}
+    0
+    &= \nabla f
+    = \bigg( \pdv{f}{x}, \pdv{f}{y} \bigg)
 \end{aligned}$$
 
-This problem is easy:
-$$\dd{x}$$, $$\dd{y}$$, and $$\dd{z}$$ are independent and arbitrary,
-so all we need to do is find the roots of
-the partial derivatives $$f_x$$, $$f_y$$ and $$f_z$$,
-which we respectively call $$x_0$$, $$y_0$$ and $$z_0$$,
-and then the extremum is simply $$(x_0, y_0, z_0)$$.
-
-But the constraint $$\phi$$, over which we have no control,
-adds a relation between $$\dd{x}$$, $$\dd{y}$$, and $$\dd{z}$$,
-so if two are known, the third is given by $$\phi = C$$.
-The problem is then a system of equations:
+This problem is easy: the two dimensions can be handled independently,
+so all we need to do is find the roots of the partial derivatives.
+
+However, adding $$g$$ makes the problem much more complicated:
+points with $$\nabla f = 0$$ might not satisfy $$g = 0$$,
+and points where $$g = 0$$ might not have $$\nabla f = 0$$.
+The dimensions also cannot be handled independently anymore,
+since they are implicitly related by $$g$$.
+
+Imagine a contour plot of $$g(x, y)$$.
+The trick is this: if we follow a contour of $$g = 0$$,
+the highest and lowest values of $$f$$ along the way
+are the desired local extrema.
+Recall our assumption that $$\nabla f$$ is continuous:
+hence *along our contour* $$f$$ is slowly-varying
+in the close vicinity of each such point,
+and stationary at the point itself.
+We thus have two categories of extrema:
+
+1.  $$\nabla f = 0$$ there,
+    i.e. $$f$$ is slowly-varying along *all* directions around the point.
+    In other words, a stationary point of $$f$$
+    coincidentally lies on a contour of $$g = 0$$.
+
+2.  The contours of $$f$$ and $$g$$ are parallel around the point.
+    By definition, $$f$$ is stationary along each of its contours,
+    so when we find that $$f$$ is stationary at a point on our $$g = 0$$ path,
+    it means we touched a contour of $$f$$.
+    Obviously, each point of $$f$$ lies on *some* contour,
+    but if they are not parallel,
+    then $$f$$ is increasing or decreasing along our path,
+    so this is not an extremum and we must continue our search.
+
+What about the edge case that $$g = 0$$ and $$\nabla g = 0$$ in the same point,
+i.e. we locally have no contour to follow?
+Do we just take whatever value $$f$$ has there?
+No, by convention, we do not,
+because this does not really count as *optimizing* $$f$$.
+
+Now, in the 2nd category, parallel contours imply parallel gradients,
+i.e. $$\nabla f$$ and $$\nabla g$$ differ only in magnitude, not direction.
+Formally:
 
 $$\begin{aligned}
-    0 &= \dd{f} = f_x \dd{x} + f_y \dd{y} + f_z \dd{z}
-    \\
-    0 &= \dd{\phi} = \phi_x \dd{x} + \phi_y \dd{y} + \phi_z \dd{z}
+    \nabla f = -\lambda \nabla g
 \end{aligned}$$
 
-Solving this directly would be a delicate balancing act
-of all the partial derivatives.
-
-To help us solve this, we introduce a "dummy" parameter $$\lambda$$,
-the so-called **Lagrange multiplier**,
-and contruct a new function $$L$$ given by:
-
-$$\begin{aligned}
-    L(x, y, z) = f(x, y, z) + \lambda \phi(x, y, z)
-\end{aligned}$$
+Where $$\lambda$$ is the **Lagrange multiplier**
+that quantifies the difference in magnitude between the gradients.
+By setting $$\lambda = 0$$, this equation also handles the 1st category $$\nabla f = 0$$.
+Some authors define $$\lambda$$ with the opposite sign.
 
-At the extremum, $$\dd{L} = \dd{f} + \lambda \dd{\phi} = 0$$,
-so now the problem is a "single" equation again:
+The method of Lagrange multipliers uses these facts
+to rewrite a constrained $$N$$-dimensional optimization problem
+as an unconstrained $$(N\!+\!1)$$-dimensional optimization problem
+by defining the **Lagrangian function** $$\mathcal{L}$$ as follows:
 
 $$\begin{aligned}
-    0 = \dd{L}
-    = (f_x + \lambda \phi_x) \dd{x} + (f_y + \lambda \phi_y) \dd{y} + (f_z + \lambda \phi_z) \dd{z}
+    \boxed{
+        \mathcal{L}(x, y, \lambda)
+        \equiv f(x, y) + \lambda g(x, y)
+    }
 \end{aligned}$$
 
-Assuming $$\phi_z \neq 0$$, we now choose $$\lambda$$ such that $$f_z + \lambda \phi_z = 0$$.
-This choice represents satisfying the constraint,
-so now the remaining $$\dd{x}$$ and $$\dd{y}$$ are independent again,
-and we simply have to find the roots of $$f_x + \lambda \phi_x$$ and $$f_y + \lambda \phi_y$$.
-
-In effect, after introducing $$\lambda$$,
-we have four unknowns $$(x, y, z, \lambda)$$,
-but also four equations:
+Let us do an unconstrained optimization of $$\mathcal{L}$$ as usual,
+by demanding it is stationary:
 
 $$\begin{aligned}
-    L_x = L_y = L_z = 0
-    \qquad \quad
-    \phi = C
+    0
+    = \nabla \mathcal{L}
+    &= \bigg( \pdv{\mathcal{L}}{x}, \pdv{\mathcal{L}}{y}, \pdv{\mathcal{L}}{\lambda} \bigg)
+    \\
+    &= \bigg( \pdv{f}{x} + \lambda \pdv{g}{x}, \:\:\: \pdv{f}{y} + \lambda \pdv{g}{y}, \:\:\: g \bigg)
 \end{aligned}$$
 
-We are only really interested in the first three unknowns $$(x, y, z)$$,
-so $$\lambda$$ is sometimes called the **undetermined multiplier**,
-since it is just an algebraic helper whose value is irrelevant.
-
-This method generalizes nicely to multiple constraints or more variables:
-suppose that we want to find the extrema of $$f(x_1, ..., x_N)$$
+The last item in this vector represents $$g = 0$$,
+and the others $$\nabla f = -\lambda \nabla g$$ as discussed earlier.
+To solve this equation,
+we assign $$\lambda$$ a value that agrees with it
+(such a value exists for each local extremum
+according to our above discussion of the two categories),
+and then find the locations $$(x, y)$$ that satisfy it.
+However, as usual for optimization problems,
+this method only finds *local* extrema *and* saddle points;
+it is a necessary condition for optimality, but not sufficient.
+
+We often assign $$\lambda$$ an algebraic expression rather than a value,
+usually without even bothering to calculate its final actual value.
+In fact, in some cases, $$\lambda$$'s only function is to help us reason
+about the interdependence of a system of equations
+(see [example 3](https://en.wikipedia.org/wiki/Lagrange_multiplier#Example_3:_Entropy) on Wikipedia);
+then $$\lambda$$ is not even given an expression!
+Hence it is sometimes also called an *undetermined multiplier*.
+
+This method generalizes nicely to multiple constraints or more variables.
+Suppose that we want to find the extrema of $$f(x_1, ..., x_N)$$
 subject to $$M < N$$ conditions:
 
 $$\begin{aligned}
-    \phi_1(x_1, ..., x_N) = C_1 \qquad \cdots \qquad \phi_M(x_1, ..., x_N) = C_M
-\end{aligned}$$
-
-This once again turns into a delicate system of $$M+1$$ equations to solve:
-
-$$\begin{aligned}
-    0 &= \dd{f} = f_{x_1} \dd{x_1} + ... + f_{x_N} \dd{x_N}
-    \\
-    0 &= \dd{\phi_1} = \phi_{1, x_1} \dd{x_1} + ... + \phi_{1, x_N} \dd{x_N}
-    \\
-    &\vdots
-    \\
-    0 &= \dd{\phi_M} = \phi_{M, x_1} \dd{x_1} + ... + \phi_{M, x_N} \dd{x_N}
+    g_1(x_1, ..., x_N) = c_1
+    \qquad \cdots \qquad
+    g_M(x_1, ..., x_N) = c_M
 \end{aligned}$$
 
 Then we introduce $$M$$ Lagrange multipliers $$\lambda_1, ..., \lambda_M$$
-and define $$L(x_1, ..., x_N)$$:
+and define $$\mathcal{L}(x_1, ..., x_N)$$:
 
 $$\begin{aligned}
-    L = f + \sum_{m = 1}^M \lambda_m \phi_m
+    \mathcal{L} \equiv f + \sum_{m = 1}^M \lambda_m g_m
 \end{aligned}$$
 
-As before, we set $$\dd{L} = 0$$ and choose the multipliers $$\lambda_1, ..., \lambda_M$$
-to eliminate $$M$$ of its $$N$$ terms:
+As before, we set $$\nabla \mathcal{L} = 0$$ and choose the multipliers
+$$\lambda_1, ..., \lambda_M$$ to satisfy the resulting system of $$(N\!+\!M)$$ 1D equations,
+and then find the coordinates of the extrema.
 
-$$\begin{aligned}
-    0 = \dd{L}
-    = \sum_{n = 1}^N \Big( f_{x_n} + \sum_{m = 1}^M \lambda_m \phi_{x_n} \Big) \dd{x_n}
-\end{aligned}$$
 
 
 ## References
-- 
cgit v1.2.3