source/know/concept/lagrange-multiplier/index.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167

---
title: "Lagrange multiplier"
sort_title: "Lagrange multiplier"
date: 2022-12-17 # Originally 2021-03-02, major rewrite
categories:
- Mathematics
- Physics
layout: "concept"
---

The method of **Lagrange multipliers** or **undetermined multipliers**
is a technique for optimizing (i.e. finding extrema of)
a function $$f$$ subject to **equality constraints**.
For example, in 2D, the goal is to maximize/minimize $$f(x, y)$$
while satisfying $$g(x, y) = 0$$.
We assume that $$f$$ and $$g$$ are both continuous
and have continuous first derivatives
on all of $$\mathbb{R}^2$$.

Note: many authors write that Lagrange multipliers
can be used for constraints of the form $$g(x, y) = c$$ for a constant $$c$$.
Actually, the method requires $$c = 0$$,
but this issue is easy to solve: given $$g = c$$,
simply define $$\tilde{g} \equiv g - c = 0$$
and use that as constraint instead.

So, we want to optimize $$f$$.
If we ignore $$g$$, that just means finding its stationary points:

$$\begin{aligned}
    0
    &= \nabla f
    = \bigg( \pdv{f}{x}, \pdv{f}{y} \bigg)
\end{aligned}$$

This problem is easy: the two dimensions can be handled independently,
so all we need to do is find the roots of the partial derivatives.

However, a constraint $$g = 0$$ makes the problem much more complicated:
points with $$\nabla f = 0$$ might not satisfy $$g = 0$$,
and points where $$g = 0$$ might not have $$\nabla f = 0$$.
The dimensions also cannot be handled independently anymore,
since they are implicitly related via $$g$$.

Imagine a contour plot of $$g(x, y)$$.
The trick is this: if we follow a contour of $$g = 0$$,
the highest and lowest values of $$f$$ along the way
are the desired local extrema.
At each such extremum, $$f$$ must be stationary from the contour's point of view,
and slowly-varying in its close vicinity since $$\nabla f$$ is continuous.
We thus have two categories of extrema:

1.  $$\nabla f = 0$$ there,
    i.e. $$f$$ is slowly-varying along *all* directions around the point.
    In other words, a stationary point of $$f$$
    coincidentally lies on a contour of $$g = 0$$.

2.  The contours of $$f$$ and $$g$$ are parallel at the point.
    By definition, $$f$$ is stationary along each of its contours,
    so when we find that $$f$$ is stationary at a point on our $$g = 0$$ path,
    it means we touched a contour of $$f$$.
    Obviously, each point of $$f$$ lies on *some* contour,
    but if they are not parallel,
    then $$f$$ is increasing or decreasing along our path,
    so this is not an extremum and we must continue our search.

What about the edge case that $$g = 0$$ and $$\nabla g = 0$$ in the same point,
i.e. we locally have no contour to follow?
Do we just take whatever value $$f$$ has there?
No, by convention, we do not,
because this does not really count as *optimizing* $$f$$.

Now, in the 2nd category, parallel contours imply parallel gradients,
i.e. $$\nabla f$$ and $$\nabla g$$ differ only in magnitude, not direction.
Formally:

$$\begin{aligned}
    \nabla f = -\lambda \nabla g
\end{aligned}$$

Where $$\lambda$$ is the **Lagrange multiplier**
that quantifies the difference in magnitude between the gradients.
By setting $$\lambda = 0$$, this equation also handles the 1st category $$\nabla f = 0$$.
Note that some authors define $$\lambda$$ with the opposite sign.

The method of Lagrange multipliers uses these facts
to rewrite a constrained $$N$$-dimensional optimization problem
as an unconstrained $$(N\!+\!1)$$-dimensional optimization problem
by defining the **Lagrangian function** $$\mathcal{L}$$ as follows:

$$\begin{aligned}
    \boxed{
        \mathcal{L}(x, y, \lambda)
        \equiv f(x, y) + \lambda g(x, y)
    }
\end{aligned}$$

Look what happens when we do an unconstrained optimization of $$\mathcal{L}$$ in the usual way:

$$\begin{aligned}
    0
    = \nabla' \mathcal{L}
    &= \bigg( \pdv{\mathcal{L}}{x}, \pdv{\mathcal{L}}{y}, \pdv{\mathcal{L}}{\lambda} \bigg)
    \\
    &= \bigg( \pdv{f}{x} + \lambda \pdv{g}{x}, \:\:\: \pdv{f}{y} + \lambda \pdv{g}{y}, \:\:\: g \bigg)
\end{aligned}$$

The last item in this vector represents $$g = 0$$,
and the others $$\nabla f = -\lambda \nabla g$$ as discussed earlier.
When this unconstrained problem is solved using standard methods,
the resulting solutions also satisfy the constrained problem.
However, as usual in the field of optimization,
this method only finds *local* extrema *and* saddle points;
it represents a necessary condition for optimality, but not a sufficient one.

We often assign $$\lambda$$ an algebraic expression rather than a value,
usually without even bothering to calculate its final actual value.
In fact, in some cases, $$\lambda$$'s only function is to help us reason
about the interdependence of a system of equations
(see [example 3](https://en.wikipedia.org/wiki/Lagrange_multiplier#Example_3:_Entropy) on Wikipedia);
then $$\lambda$$ is not even given an expression!
Hence it is sometimes also called an *undetermined multiplier*.

This does not imply that $$\lambda$$ is meaningless;
it often represents a quantity of interest.
In general, defining $$h \equiv g + c$$ so that the constraint is $$h(x, y) = c$$,
we see that the Lagrange multiplier represents the rate of change of $$\mathcal{L}$$
with respect to the value being constrained:

$$\begin{aligned}
    \mathcal{L}(x, y, \lambda)
    = f(x, y) + \lambda (h(x, y) - c)
    \qquad \implies \qquad
    -\pdv{\mathcal{L}}{c} = \lambda
\end{aligned}$$

The method of Lagrange multipliers
generalizes nicely to more constraints or more variables.
Suppose we want to find extrema of $$f(x_1, ..., x_N)$$
subject to $$M < N$$ conditions:

$$\begin{aligned}
    g_1(x_1, ..., x_N) = c_1
    \qquad \cdots \qquad
    g_M(x_1, ..., x_N) = c_M
\end{aligned}$$

Then we introduce $$M$$ Lagrange multipliers $$\lambda_1, ..., \lambda_M$$
and define $$\mathcal{L}(x_1, ..., x_N)$$:

$$\begin{aligned}
    \mathcal{L} \equiv f + \sum_{m = 1}^M \lambda_m g_m
\end{aligned}$$

As before, we set $$\nabla \mathcal{L} = 0$$ and choose the multipliers
$$\lambda_1, ..., \lambda_M$$ to satisfy the resulting system of $$(N\!+\!M)$$ 1D equations,
and then find the coordinates of the extrema.


## References
1.  G.B. Arfken, H.J. Weber,
    *Mathematical methods for physicists*, 6th edition, 2005,
    Elsevier.
2.  O. Bang,
    *Applied mathematics for physicists: lecture notes*, 2019,
    unpublished.