Categories: Mathematics, Physics.

Lagrange multiplier

The method of Lagrange multipliers or undetermined multipliers is a technique for optimizing (i.e. finding extrema of) a function ff subject to equality constraints. For example, in 2D, the goal is to maximize/minimize f(x,y)f(x, y) while satisfying g(x,y)=0g(x, y) = 0. We assume that ff and gg are both continuous and have continuous first derivatives on all of R2\mathbb{R}^2.

Note: many authors write that Lagrange multipliers can be used for constraints of the form g(x,y)=cg(x, y) = c for a constant cc. Actually, the method requires c=0c = 0, but this issue is easy to solve: given g=cg = c, simply define g~gc=0\tilde{g} \equiv g - c = 0 and use that as constraint instead.

So, we want to optimize ff. If we ignore gg, that just means finding its stationary points:

0=f=(fx,fy)\begin{aligned} 0 &= \nabla f = \bigg( \pdv{f}{x}, \pdv{f}{y} \bigg) \end{aligned}

This problem is easy: the two dimensions can be handled independently, so all we need to do is find the roots of the partial derivatives.

However, a constraint g=0g = 0 makes the problem much more complicated: points with f=0\nabla f = 0 might not satisfy g=0g = 0, and points where g=0g = 0 might not have f=0\nabla f = 0. The dimensions also cannot be handled independently anymore, since they are implicitly related via gg.

Imagine a contour plot of g(x,y)g(x, y). The trick is this: if we follow a contour of g=0g = 0, the highest and lowest values of ff along the way are the desired local extrema. At each such extremum, ff must be stationary from the contour’s point of view, and slowly-varying in its close vicinity since f\nabla f is continuous. We thus have two categories of extrema:

  1. f=0\nabla f = 0 there, i.e. ff is slowly-varying along all directions around the point. In other words, a stationary point of ff coincidentally lies on a contour of g=0g = 0.

  2. The contours of ff and gg are parallel at the point. By definition, ff is stationary along each of its contours, so when we find that ff is stationary at a point on our g=0g = 0 path, it means we touched a contour of ff. Obviously, each point of ff lies on some contour, but if they are not parallel, then ff is increasing or decreasing along our path, so this is not an extremum and we must continue our search.

What about the edge case that g=0g = 0 and g=0\nabla g = 0 in the same point, i.e. we locally have no contour to follow? Do we just take whatever value ff has there? No, by convention, we do not, because this does not really count as optimizing ff.

Now, in the 2nd category, parallel contours imply parallel gradients, i.e. f\nabla f and g\nabla g differ only in magnitude, not direction. Formally:

f=λg\begin{aligned} \nabla f = -\lambda \nabla g \end{aligned}

Where λ\lambda is the Lagrange multiplier that quantifies the difference in magnitude between the gradients. By setting λ=0\lambda = 0, this equation also handles the 1st category f=0\nabla f = 0. Note that some authors define λ\lambda with the opposite sign.

The method of Lagrange multipliers uses these facts to rewrite a constrained NN-dimensional optimization problem as an unconstrained (N ⁣+ ⁣1)(N\!+\!1)-dimensional optimization problem by defining the Lagrangian function L\mathcal{L} as follows:

L(x,y,λ)f(x,y)+λg(x,y)\begin{aligned} \boxed{ \mathcal{L}(x, y, \lambda) \equiv f(x, y) + \lambda g(x, y) } \end{aligned}

Look what happens when we do an unconstrained optimization of L\mathcal{L} in the usual way:

0=L=(Lx,Ly,Lλ)=(fx+λgx,fy+λgy,g)\begin{aligned} 0 = \nabla' \mathcal{L} &= \bigg( \pdv{\mathcal{L}}{x}, \pdv{\mathcal{L}}{y}, \pdv{\mathcal{L}}{\lambda} \bigg) \\ &= \bigg( \pdv{f}{x} + \lambda \pdv{g}{x}, \:\:\: \pdv{f}{y} + \lambda \pdv{g}{y}, \:\:\: g \bigg) \end{aligned}

The last item in this vector represents g=0g = 0, and the others f=λg\nabla f = -\lambda \nabla g as discussed earlier. When this unconstrained problem is solved using standard methods, the resulting solutions also satisfy the constrained problem. However, as usual in the field of optimization, this method only finds local extrema and saddle points; it represents a necessary condition for optimality, but not a sufficient one.

We often assign λ\lambda an algebraic expression rather than a value, usually without even bothering to calculate its final actual value. In fact, in some cases, λ\lambda’s only function is to help us reason about the interdependence of a system of equations (see Wikipedia’s entropy example); then λ\lambda is not even given an expression! Hence it is sometimes also called an undetermined multiplier.

This does not imply that λ\lambda is meaningless; it often represents a quantity of interest. In general, defining hg+ch \equiv g + c so that the constraint is h(x,y)=ch(x, y) = c, we see that the Lagrange multiplier represents the rate of change of L\mathcal{L} with respect to the value being constrained:

L(x,y,λ)=f(x,y)+λ(h(x,y)c)    Lc=λ\begin{aligned} \mathcal{L}(x, y, \lambda) = f(x, y) + \lambda (h(x, y) - c) \qquad \implies \qquad -\pdv{\mathcal{L}}{c} = \lambda \end{aligned}

The method of Lagrange multipliers generalizes nicely to more constraints or more variables. Suppose we want to find extrema of f(x1,...,xN)f(x_1, ..., x_N) subject to M<NM < N conditions:

g1(x1,...,xN)=c1gM(x1,...,xN)=cM\begin{aligned} g_1(x_1, ..., x_N) = c_1 \qquad \cdots \qquad g_M(x_1, ..., x_N) = c_M \end{aligned}

Then we introduce MM Lagrange multipliers λ1,...,λM\lambda_1, ..., \lambda_M and define L(x1,...,xN)\mathcal{L}(x_1, ..., x_N):

Lf+m=1Mλmgm\begin{aligned} \mathcal{L} \equiv f + \sum_{m = 1}^M \lambda_m g_m \end{aligned}

As before, we set L=0\nabla \mathcal{L} = 0 and choose the multipliers λ1,...,λM\lambda_1, ..., \lambda_M to satisfy the resulting system of (N ⁣+ ⁣M)(N\!+\!M) 1D equations, and then find the coordinates of the extrema.

References

  1. G.B. Arfken, H.J. Weber, Mathematical methods for physicists, 6th edition, 2005, Elsevier.
  2. O. Bang, Applied mathematics for physicists: lecture notes, 2019, unpublished.