diff options
author | Prefetch | 2022-12-17 18:19:26 +0100 |
---|---|---|
committer | Prefetch | 2022-12-17 18:20:50 +0100 |
commit | a39bb3b8aab1aeb4fceaedc54c756703819776c3 (patch) | |
tree | b21ecb4677745fb8c275e54f2ad9d4c2e775a3d8 /source/know/concept/lagrange-multiplier | |
parent | 49cc36648b489f7d1c75e1fde79f0990e08dd514 (diff) |
Rewrite "Lagrange multiplier", various improvements
Diffstat (limited to 'source/know/concept/lagrange-multiplier')
-rw-r--r-- | source/know/concept/lagrange-multiplier/index.md | 187 |
1 files changed, 112 insertions, 75 deletions
diff --git a/source/know/concept/lagrange-multiplier/index.md b/source/know/concept/lagrange-multiplier/index.md index 8ee1054..a0b22aa 100644 --- a/source/know/concept/lagrange-multiplier/index.md +++ b/source/know/concept/lagrange-multiplier/index.md @@ -1,7 +1,7 @@ --- title: "Lagrange multiplier" sort_title: "Lagrange multiplier" -date: 2021-03-02 +date: 2022-12-17 # Originally 2021-03-02, major rewrite categories: - Mathematics - Physics @@ -9,108 +9,145 @@ layout: "concept" --- The method of **Lagrange multipliers** or **undetermined multipliers** -is a technique for optimizing (i.e. finding the extrema of) -a function $$f(x, y, z)$$, -subject to a given constraint $$\phi(x, y, z) = C$$, -where $$C$$ is a constant. - -If we ignore the constraint $$\phi$$, -optimizing $$f$$ simply comes down to finding stationary points: +is a technique for optimizing (i.e. finding extrema of) +a function $$f$$ subject to **equality constraints**. +For example, in 2D, the goal is to maximize/minimize $$f(x, y)$$ +while satisfying $$g(x, y) = 0$$. +We assume that $$f$$ and $$g$$ are both continuous +and have continuous first derivatives, +and that their domain is all of $$\mathbb{R}$$. + +Side note: many authors write that Lagrange multipliers +can be used for constraints of the form $$g(x, y) = c$$ for a constant $$c$$. +However, this method technically requires $$c = 0$$. +This issue is easy to solve: given $$g = c$$, +simply define $$\tilde{g} \equiv g - c = 0$$ +and use that as constraint instead. + +Before introducing $$g$$, +optimizing $$f$$ comes down to finding its stationary points: $$\begin{aligned} - 0 &= \dd{f} = f_x \dd{x} + f_y \dd{y} + f_z \dd{z} + 0 + &= \nabla f + = \bigg( \pdv{f}{x}, \pdv{f}{y} \bigg) \end{aligned}$$ -This problem is easy: -$$\dd{x}$$, $$\dd{y}$$, and $$\dd{z}$$ are independent and arbitrary, -so all we need to do is find the roots of -the partial derivatives $$f_x$$, $$f_y$$ and $$f_z$$, -which we respectively call $$x_0$$, $$y_0$$ and $$z_0$$, -and then the extremum is simply $$(x_0, y_0, z_0)$$. - -But the constraint $$\phi$$, over which we have no control, -adds a relation between $$\dd{x}$$, $$\dd{y}$$, and $$\dd{z}$$, -so if two are known, the third is given by $$\phi = C$$. -The problem is then a system of equations: +This problem is easy: the two dimensions can be handled independently, +so all we need to do is find the roots of the partial derivatives. + +However, adding $$g$$ makes the problem much more complicated: +points with $$\nabla f = 0$$ might not satisfy $$g = 0$$, +and points where $$g = 0$$ might not have $$\nabla f = 0$$. +The dimensions also cannot be handled independently anymore, +since they are implicitly related by $$g$$. + +Imagine a contour plot of $$g(x, y)$$. +The trick is this: if we follow a contour of $$g = 0$$, +the highest and lowest values of $$f$$ along the way +are the desired local extrema. +Recall our assumption that $$\nabla f$$ is continuous: +hence *along our contour* $$f$$ is slowly-varying +in the close vicinity of each such point, +and stationary at the point itself. +We thus have two categories of extrema: + +1. $$\nabla f = 0$$ there, + i.e. $$f$$ is slowly-varying along *all* directions around the point. + In other words, a stationary point of $$f$$ + coincidentally lies on a contour of $$g = 0$$. + +2. The contours of $$f$$ and $$g$$ are parallel around the point. + By definition, $$f$$ is stationary along each of its contours, + so when we find that $$f$$ is stationary at a point on our $$g = 0$$ path, + it means we touched a contour of $$f$$. + Obviously, each point of $$f$$ lies on *some* contour, + but if they are not parallel, + then $$f$$ is increasing or decreasing along our path, + so this is not an extremum and we must continue our search. + +What about the edge case that $$g = 0$$ and $$\nabla g = 0$$ in the same point, +i.e. we locally have no contour to follow? +Do we just take whatever value $$f$$ has there? +No, by convention, we do not, +because this does not really count as *optimizing* $$f$$. + +Now, in the 2nd category, parallel contours imply parallel gradients, +i.e. $$\nabla f$$ and $$\nabla g$$ differ only in magnitude, not direction. +Formally: $$\begin{aligned} - 0 &= \dd{f} = f_x \dd{x} + f_y \dd{y} + f_z \dd{z} - \\ - 0 &= \dd{\phi} = \phi_x \dd{x} + \phi_y \dd{y} + \phi_z \dd{z} + \nabla f = -\lambda \nabla g \end{aligned}$$ -Solving this directly would be a delicate balancing act -of all the partial derivatives. - -To help us solve this, we introduce a "dummy" parameter $$\lambda$$, -the so-called **Lagrange multiplier**, -and contruct a new function $$L$$ given by: - -$$\begin{aligned} - L(x, y, z) = f(x, y, z) + \lambda \phi(x, y, z) -\end{aligned}$$ +Where $$\lambda$$ is the **Lagrange multiplier** +that quantifies the difference in magnitude between the gradients. +By setting $$\lambda = 0$$, this equation also handles the 1st category $$\nabla f = 0$$. +Some authors define $$\lambda$$ with the opposite sign. -At the extremum, $$\dd{L} = \dd{f} + \lambda \dd{\phi} = 0$$, -so now the problem is a "single" equation again: +The method of Lagrange multipliers uses these facts +to rewrite a constrained $$N$$-dimensional optimization problem +as an unconstrained $$(N\!+\!1)$$-dimensional optimization problem +by defining the **Lagrangian function** $$\mathcal{L}$$ as follows: $$\begin{aligned} - 0 = \dd{L} - = (f_x + \lambda \phi_x) \dd{x} + (f_y + \lambda \phi_y) \dd{y} + (f_z + \lambda \phi_z) \dd{z} + \boxed{ + \mathcal{L}(x, y, \lambda) + \equiv f(x, y) + \lambda g(x, y) + } \end{aligned}$$ -Assuming $$\phi_z \neq 0$$, we now choose $$\lambda$$ such that $$f_z + \lambda \phi_z = 0$$. -This choice represents satisfying the constraint, -so now the remaining $$\dd{x}$$ and $$\dd{y}$$ are independent again, -and we simply have to find the roots of $$f_x + \lambda \phi_x$$ and $$f_y + \lambda \phi_y$$. - -In effect, after introducing $$\lambda$$, -we have four unknowns $$(x, y, z, \lambda)$$, -but also four equations: +Let us do an unconstrained optimization of $$\mathcal{L}$$ as usual, +by demanding it is stationary: $$\begin{aligned} - L_x = L_y = L_z = 0 - \qquad \quad - \phi = C + 0 + = \nabla \mathcal{L} + &= \bigg( \pdv{\mathcal{L}}{x}, \pdv{\mathcal{L}}{y}, \pdv{\mathcal{L}}{\lambda} \bigg) + \\ + &= \bigg( \pdv{f}{x} + \lambda \pdv{g}{x}, \:\:\: \pdv{f}{y} + \lambda \pdv{g}{y}, \:\:\: g \bigg) \end{aligned}$$ -We are only really interested in the first three unknowns $$(x, y, z)$$, -so $$\lambda$$ is sometimes called the **undetermined multiplier**, -since it is just an algebraic helper whose value is irrelevant. - -This method generalizes nicely to multiple constraints or more variables: -suppose that we want to find the extrema of $$f(x_1, ..., x_N)$$ +The last item in this vector represents $$g = 0$$, +and the others $$\nabla f = -\lambda \nabla g$$ as discussed earlier. +To solve this equation, +we assign $$\lambda$$ a value that agrees with it +(such a value exists for each local extremum +according to our above discussion of the two categories), +and then find the locations $$(x, y)$$ that satisfy it. +However, as usual for optimization problems, +this method only finds *local* extrema *and* saddle points; +it is a necessary condition for optimality, but not sufficient. + +We often assign $$\lambda$$ an algebraic expression rather than a value, +usually without even bothering to calculate its final actual value. +In fact, in some cases, $$\lambda$$'s only function is to help us reason +about the interdependence of a system of equations +(see [example 3](https://en.wikipedia.org/wiki/Lagrange_multiplier#Example_3:_Entropy) on Wikipedia); +then $$\lambda$$ is not even given an expression! +Hence it is sometimes also called an *undetermined multiplier*. + +This method generalizes nicely to multiple constraints or more variables. +Suppose that we want to find the extrema of $$f(x_1, ..., x_N)$$ subject to $$M < N$$ conditions: $$\begin{aligned} - \phi_1(x_1, ..., x_N) = C_1 \qquad \cdots \qquad \phi_M(x_1, ..., x_N) = C_M -\end{aligned}$$ - -This once again turns into a delicate system of $$M+1$$ equations to solve: - -$$\begin{aligned} - 0 &= \dd{f} = f_{x_1} \dd{x_1} + ... + f_{x_N} \dd{x_N} - \\ - 0 &= \dd{\phi_1} = \phi_{1, x_1} \dd{x_1} + ... + \phi_{1, x_N} \dd{x_N} - \\ - &\vdots - \\ - 0 &= \dd{\phi_M} = \phi_{M, x_1} \dd{x_1} + ... + \phi_{M, x_N} \dd{x_N} + g_1(x_1, ..., x_N) = c_1 + \qquad \cdots \qquad + g_M(x_1, ..., x_N) = c_M \end{aligned}$$ Then we introduce $$M$$ Lagrange multipliers $$\lambda_1, ..., \lambda_M$$ -and define $$L(x_1, ..., x_N)$$: +and define $$\mathcal{L}(x_1, ..., x_N)$$: $$\begin{aligned} - L = f + \sum_{m = 1}^M \lambda_m \phi_m + \mathcal{L} \equiv f + \sum_{m = 1}^M \lambda_m g_m \end{aligned}$$ -As before, we set $$\dd{L} = 0$$ and choose the multipliers $$\lambda_1, ..., \lambda_M$$ -to eliminate $$M$$ of its $$N$$ terms: +As before, we set $$\nabla \mathcal{L} = 0$$ and choose the multipliers +$$\lambda_1, ..., \lambda_M$$ to satisfy the resulting system of $$(N\!+\!M)$$ 1D equations, +and then find the coordinates of the extrema. -$$\begin{aligned} - 0 = \dd{L} - = \sum_{n = 1}^N \Big( f_{x_n} + \sum_{m = 1}^M \lambda_m \phi_{x_n} \Big) \dd{x_n} -\end{aligned}$$ ## References |