summaryrefslogtreecommitdiff
path: root/source/know/concept/lagrange-multiplier/index.md
blob: a0b22aa33302792098349ccad73dbfc0d1a36fdd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
title: "Lagrange multiplier"
sort_title: "Lagrange multiplier"
date: 2022-12-17 # Originally 2021-03-02, major rewrite
categories:
- Mathematics
- Physics
layout: "concept"
---

The method of **Lagrange multipliers** or **undetermined multipliers**
is a technique for optimizing (i.e. finding extrema of)
a function $$f$$ subject to **equality constraints**.
For example, in 2D, the goal is to maximize/minimize $$f(x, y)$$
while satisfying $$g(x, y) = 0$$.
We assume that $$f$$ and $$g$$ are both continuous
and have continuous first derivatives,
and that their domain is all of $$\mathbb{R}$$.

Side note: many authors write that Lagrange multipliers
can be used for constraints of the form $$g(x, y) = c$$ for a constant $$c$$.
However, this method technically requires $$c = 0$$.
This issue is easy to solve: given $$g = c$$,
simply define $$\tilde{g} \equiv g - c = 0$$
and use that as constraint instead.

Before introducing $$g$$,
optimizing $$f$$ comes down to finding its stationary points:

$$\begin{aligned}
    0
    &= \nabla f
    = \bigg( \pdv{f}{x}, \pdv{f}{y} \bigg)
\end{aligned}$$

This problem is easy: the two dimensions can be handled independently,
so all we need to do is find the roots of the partial derivatives.

However, adding $$g$$ makes the problem much more complicated:
points with $$\nabla f = 0$$ might not satisfy $$g = 0$$,
and points where $$g = 0$$ might not have $$\nabla f = 0$$.
The dimensions also cannot be handled independently anymore,
since they are implicitly related by $$g$$.

Imagine a contour plot of $$g(x, y)$$.
The trick is this: if we follow a contour of $$g = 0$$,
the highest and lowest values of $$f$$ along the way
are the desired local extrema.
Recall our assumption that $$\nabla f$$ is continuous:
hence *along our contour* $$f$$ is slowly-varying
in the close vicinity of each such point,
and stationary at the point itself.
We thus have two categories of extrema:

1.  $$\nabla f = 0$$ there,
    i.e. $$f$$ is slowly-varying along *all* directions around the point.
    In other words, a stationary point of $$f$$
    coincidentally lies on a contour of $$g = 0$$.

2.  The contours of $$f$$ and $$g$$ are parallel around the point.
    By definition, $$f$$ is stationary along each of its contours,
    so when we find that $$f$$ is stationary at a point on our $$g = 0$$ path,
    it means we touched a contour of $$f$$.
    Obviously, each point of $$f$$ lies on *some* contour,
    but if they are not parallel,
    then $$f$$ is increasing or decreasing along our path,
    so this is not an extremum and we must continue our search.

What about the edge case that $$g = 0$$ and $$\nabla g = 0$$ in the same point,
i.e. we locally have no contour to follow?
Do we just take whatever value $$f$$ has there?
No, by convention, we do not,
because this does not really count as *optimizing* $$f$$.

Now, in the 2nd category, parallel contours imply parallel gradients,
i.e. $$\nabla f$$ and $$\nabla g$$ differ only in magnitude, not direction.
Formally:

$$\begin{aligned}
    \nabla f = -\lambda \nabla g
\end{aligned}$$

Where $$\lambda$$ is the **Lagrange multiplier**
that quantifies the difference in magnitude between the gradients.
By setting $$\lambda = 0$$, this equation also handles the 1st category $$\nabla f = 0$$.
Some authors define $$\lambda$$ with the opposite sign.

The method of Lagrange multipliers uses these facts
to rewrite a constrained $$N$$-dimensional optimization problem
as an unconstrained $$(N\!+\!1)$$-dimensional optimization problem
by defining the **Lagrangian function** $$\mathcal{L}$$ as follows:

$$\begin{aligned}
    \boxed{
        \mathcal{L}(x, y, \lambda)
        \equiv f(x, y) + \lambda g(x, y)
    }
\end{aligned}$$

Let us do an unconstrained optimization of $$\mathcal{L}$$ as usual,
by demanding it is stationary:

$$\begin{aligned}
    0
    = \nabla \mathcal{L}
    &= \bigg( \pdv{\mathcal{L}}{x}, \pdv{\mathcal{L}}{y}, \pdv{\mathcal{L}}{\lambda} \bigg)
    \\
    &= \bigg( \pdv{f}{x} + \lambda \pdv{g}{x}, \:\:\: \pdv{f}{y} + \lambda \pdv{g}{y}, \:\:\: g \bigg)
\end{aligned}$$

The last item in this vector represents $$g = 0$$,
and the others $$\nabla f = -\lambda \nabla g$$ as discussed earlier.
To solve this equation,
we assign $$\lambda$$ a value that agrees with it
(such a value exists for each local extremum
according to our above discussion of the two categories),
and then find the locations $$(x, y)$$ that satisfy it.
However, as usual for optimization problems,
this method only finds *local* extrema *and* saddle points;
it is a necessary condition for optimality, but not sufficient.

We often assign $$\lambda$$ an algebraic expression rather than a value,
usually without even bothering to calculate its final actual value.
In fact, in some cases, $$\lambda$$'s only function is to help us reason
about the interdependence of a system of equations
(see [example 3](https://en.wikipedia.org/wiki/Lagrange_multiplier#Example_3:_Entropy) on Wikipedia);
then $$\lambda$$ is not even given an expression!
Hence it is sometimes also called an *undetermined multiplier*.

This method generalizes nicely to multiple constraints or more variables.
Suppose that we want to find the extrema of $$f(x_1, ..., x_N)$$
subject to $$M < N$$ conditions:

$$\begin{aligned}
    g_1(x_1, ..., x_N) = c_1
    \qquad \cdots \qquad
    g_M(x_1, ..., x_N) = c_M
\end{aligned}$$

Then we introduce $$M$$ Lagrange multipliers $$\lambda_1, ..., \lambda_M$$
and define $$\mathcal{L}(x_1, ..., x_N)$$:

$$\begin{aligned}
    \mathcal{L} \equiv f + \sum_{m = 1}^M \lambda_m g_m
\end{aligned}$$

As before, we set $$\nabla \mathcal{L} = 0$$ and choose the multipliers
$$\lambda_1, ..., \lambda_M$$ to satisfy the resulting system of $$(N\!+\!M)$$ 1D equations,
and then find the coordinates of the extrema.



## References
1.  G.B. Arfken, H.J. Weber,
    *Mathematical methods for physicists*, 6th edition, 2005,
    Elsevier.
2.  O. Bang,
    *Applied mathematics for physicists: lecture notes*, 2019,
    unpublished.