--- title: "Lagrange multiplier" sort_title: "Lagrange multiplier" date: 2022-12-17 # Originally 2021-03-02, major rewrite categories: - Mathematics - Physics layout: "concept" --- The method of **Lagrange multipliers** or **undetermined multipliers** is a technique for optimizing (i.e. finding extrema of) a function $$f$$ subject to **equality constraints**. For example, in 2D, the goal is to maximize/minimize $$f(x, y)$$ while satisfying $$g(x, y) = 0$$. We assume that $$f$$ and $$g$$ are both continuous and have continuous first derivatives, and that their domain is all of $$\mathbb{R}$$. Side note: many authors write that Lagrange multipliers can be used for constraints of the form $$g(x, y) = c$$ for a constant $$c$$. However, this method technically requires $$c = 0$$. This issue is easy to solve: given $$g = c$$, simply define $$\tilde{g} \equiv g - c = 0$$ and use that as constraint instead. Before introducing $$g$$, optimizing $$f$$ comes down to finding its stationary points: $$\begin{aligned} 0 &= \nabla f = \bigg( \pdv{f}{x}, \pdv{f}{y} \bigg) \end{aligned}$$ This problem is easy: the two dimensions can be handled independently, so all we need to do is find the roots of the partial derivatives. However, adding $$g$$ makes the problem much more complicated: points with $$\nabla f = 0$$ might not satisfy $$g = 0$$, and points where $$g = 0$$ might not have $$\nabla f = 0$$. The dimensions also cannot be handled independently anymore, since they are implicitly related by $$g$$. Imagine a contour plot of $$g(x, y)$$. The trick is this: if we follow a contour of $$g = 0$$, the highest and lowest values of $$f$$ along the way are the desired local extrema. Recall our assumption that $$\nabla f$$ is continuous: hence *along our contour* $$f$$ is slowly-varying in the close vicinity of each such point, and stationary at the point itself. We thus have two categories of extrema: 1. $$\nabla f = 0$$ there, i.e. $$f$$ is slowly-varying along *all* directions around the point. In other words, a stationary point of $$f$$ coincidentally lies on a contour of $$g = 0$$. 2. The contours of $$f$$ and $$g$$ are parallel around the point. By definition, $$f$$ is stationary along each of its contours, so when we find that $$f$$ is stationary at a point on our $$g = 0$$ path, it means we touched a contour of $$f$$. Obviously, each point of $$f$$ lies on *some* contour, but if they are not parallel, then $$f$$ is increasing or decreasing along our path, so this is not an extremum and we must continue our search. What about the edge case that $$g = 0$$ and $$\nabla g = 0$$ in the same point, i.e. we locally have no contour to follow? Do we just take whatever value $$f$$ has there? No, by convention, we do not, because this does not really count as *optimizing* $$f$$. Now, in the 2nd category, parallel contours imply parallel gradients, i.e. $$\nabla f$$ and $$\nabla g$$ differ only in magnitude, not direction. Formally: $$\begin{aligned} \nabla f = -\lambda \nabla g \end{aligned}$$ Where $$\lambda$$ is the **Lagrange multiplier** that quantifies the difference in magnitude between the gradients. By setting $$\lambda = 0$$, this equation also handles the 1st category $$\nabla f = 0$$. Some authors define $$\lambda$$ with the opposite sign. The method of Lagrange multipliers uses these facts to rewrite a constrained $$N$$-dimensional optimization problem as an unconstrained $$(N\!+\!1)$$-dimensional optimization problem by defining the **Lagrangian function** $$\mathcal{L}$$ as follows: $$\begin{aligned} \boxed{ \mathcal{L}(x, y, \lambda) \equiv f(x, y) + \lambda g(x, y) } \end{aligned}$$ Let us do an unconstrained optimization of $$\mathcal{L}$$ as usual, by demanding it is stationary: $$\begin{aligned} 0 = \nabla \mathcal{L} &= \bigg( \pdv{\mathcal{L}}{x}, \pdv{\mathcal{L}}{y}, \pdv{\mathcal{L}}{\lambda} \bigg) \\ &= \bigg( \pdv{f}{x} + \lambda \pdv{g}{x}, \:\:\: \pdv{f}{y} + \lambda \pdv{g}{y}, \:\:\: g \bigg) \end{aligned}$$ The last item in this vector represents $$g = 0$$, and the others $$\nabla f = -\lambda \nabla g$$ as discussed earlier. To solve this equation, we assign $$\lambda$$ a value that agrees with it (such a value exists for each local extremum according to our above discussion of the two categories), and then find the locations $$(x, y)$$ that satisfy it. However, as usual for optimization problems, this method only finds *local* extrema *and* saddle points; it is a necessary condition for optimality, but not sufficient. We often assign $$\lambda$$ an algebraic expression rather than a value, usually without even bothering to calculate its final actual value. In fact, in some cases, $$\lambda$$'s only function is to help us reason about the interdependence of a system of equations (see [example 3](https://en.wikipedia.org/wiki/Lagrange_multiplier#Example_3:_Entropy) on Wikipedia); then $$\lambda$$ is not even given an expression! Hence it is sometimes also called an *undetermined multiplier*. This does not imply that $$\lambda$$ is meaningless; it often represents a quantity of interest. In general, defining $$h \equiv g + c$$ so that the constraint is $$h(x, y) = c$$, we see that the Lagrange multiplier represents the rate of change of $$\mathcal{L}$$ with respect to the value being constrained: $$\begin{aligned} \mathcal{L}(x, y, \lambda) = f(x, y) + \lambda (h(x, y) - c) \qquad \implies \qquad -\pdv{\mathcal{L}}{c} = \lambda \end{aligned}$$ The method of Lagrange multipliers generalizes nicely to more constraints or more variables. Suppose we want to find extrema of $$f(x_1, ..., x_N)$$ subject to $$M < N$$ conditions: $$\begin{aligned} g_1(x_1, ..., x_N) = c_1 \qquad \cdots \qquad g_M(x_1, ..., x_N) = c_M \end{aligned}$$ Then we introduce $$M$$ Lagrange multipliers $$\lambda_1, ..., \lambda_M$$ and define $$\mathcal{L}(x_1, ..., x_N)$$: $$\begin{aligned} \mathcal{L} \equiv f + \sum_{m = 1}^M \lambda_m g_m \end{aligned}$$ As before, we set $$\nabla \mathcal{L} = 0$$ and choose the multipliers $$\lambda_1, ..., \lambda_M$$ to satisfy the resulting system of $$(N\!+\!M)$$ 1D equations, and then find the coordinates of the extrema. ## References 1. G.B. Arfken, H.J. Weber, *Mathematical methods for physicists*, 6th edition, 2005, Elsevier. 2. O. Bang, *Applied mathematics for physicists: lecture notes*, 2019, unpublished.