Categories: Mathematics, Physics.
Lagrange multiplier
The method of Lagrange multipliers or undetermined multipliers is a technique for optimizing (i.e. finding extrema of) a function $f$ subject to equality constraints. For example, in 2D, the goal is to maximize/minimize $f(x, y)$ while satisfying $g(x, y) = 0$. We assume that $f$ and $g$ are both continuous and have continuous first derivatives on all of $\mathbb{R}^2$.
Note: many authors write that Lagrange multipliers can be used for constraints of the form $g(x, y) = c$ for a constant $c$. Actually, the method requires $c = 0$, but this issue is easy to solve: given $g = c$, simply define $\tilde{g} \equiv g  c = 0$ and use that as constraint instead.
So, we want to optimize $f$. If we ignore $g$, that just means finding its stationary points:
$\begin{aligned} 0 &= \nabla f = \bigg( \pdv{f}{x}, \pdv{f}{y} \bigg) \end{aligned}$This problem is easy: the two dimensions can be handled independently, so all we need to do is find the roots of the partial derivatives.
However, a constraint $g = 0$ makes the problem much more complicated: points with $\nabla f = 0$ might not satisfy $g = 0$, and points where $g = 0$ might not have $\nabla f = 0$. The dimensions also cannot be handled independently anymore, since they are implicitly related via $g$.
Imagine a contour plot of $g(x, y)$. The trick is this: if we follow a contour of $g = 0$, the highest and lowest values of $f$ along the way are the desired local extrema. At each such extremum, $f$ must be stationary from the contour’s point of view, and slowlyvarying in its close vicinity since $\nabla f$ is continuous. We thus have two categories of extrema:

$\nabla f = 0$ there, i.e. $f$ is slowlyvarying along all directions around the point. In other words, a stationary point of $f$ coincidentally lies on a contour of $g = 0$.

The contours of $f$ and $g$ are parallel at the point. By definition, $f$ is stationary along each of its contours, so when we find that $f$ is stationary at a point on our $g = 0$ path, it means we touched a contour of $f$. Obviously, each point of $f$ lies on some contour, but if they are not parallel, then $f$ is increasing or decreasing along our path, so this is not an extremum and we must continue our search.
What about the edge case that $g = 0$ and $\nabla g = 0$ in the same point, i.e. we locally have no contour to follow? Do we just take whatever value $f$ has there? No, by convention, we do not, because this does not really count as optimizing $f$.
Now, in the 2nd category, parallel contours imply parallel gradients, i.e. $\nabla f$ and $\nabla g$ differ only in magnitude, not direction. Formally:
$\begin{aligned} \nabla f = \lambda \nabla g \end{aligned}$Where $\lambda$ is the Lagrange multiplier that quantifies the difference in magnitude between the gradients. By setting $\lambda = 0$, this equation also handles the 1st category $\nabla f = 0$. Note that some authors define $\lambda$ with the opposite sign.
The method of Lagrange multipliers uses these facts to rewrite a constrained $N$dimensional optimization problem as an unconstrained $(N\!+\!1)$dimensional optimization problem by defining the Lagrangian function $\mathcal{L}$ as follows:
$\begin{aligned} \boxed{ \mathcal{L}(x, y, \lambda) \equiv f(x, y) + \lambda g(x, y) } \end{aligned}$Look what happens when we do an unconstrained optimization of $\mathcal{L}$ in the usual way:
$\begin{aligned} 0 = \nabla' \mathcal{L} &= \bigg( \pdv{\mathcal{L}}{x}, \pdv{\mathcal{L}}{y}, \pdv{\mathcal{L}}{\lambda} \bigg) \\ &= \bigg( \pdv{f}{x} + \lambda \pdv{g}{x}, \:\:\: \pdv{f}{y} + \lambda \pdv{g}{y}, \:\:\: g \bigg) \end{aligned}$The last item in this vector represents $g = 0$, and the others $\nabla f = \lambda \nabla g$ as discussed earlier. When this unconstrained problem is solved using standard methods, the resulting solutions also satisfy the constrained problem. However, as usual in the field of optimization, this method only finds local extrema and saddle points; it represents a necessary condition for optimality, but not a sufficient one.
We often assign $\lambda$ an algebraic expression rather than a value, usually without even bothering to calculate its final actual value. In fact, in some cases, $\lambda$’s only function is to help us reason about the interdependence of a system of equations (see Wikipedia’s entropy example); then $\lambda$ is not even given an expression! Hence it is sometimes also called an undetermined multiplier.
This does not imply that $\lambda$ is meaningless; it often represents a quantity of interest. In general, defining $h \equiv g + c$ so that the constraint is $h(x, y) = c$, we see that the Lagrange multiplier represents the rate of change of $\mathcal{L}$ with respect to the value being constrained:
$\begin{aligned} \mathcal{L}(x, y, \lambda) = f(x, y) + \lambda (h(x, y)  c) \qquad \implies \qquad \pdv{\mathcal{L}}{c} = \lambda \end{aligned}$The method of Lagrange multipliers generalizes nicely to more constraints or more variables. Suppose we want to find extrema of $f(x_1, ..., x_N)$ subject to $M < N$ conditions:
$\begin{aligned} g_1(x_1, ..., x_N) = c_1 \qquad \cdots \qquad g_M(x_1, ..., x_N) = c_M \end{aligned}$Then we introduce $M$ Lagrange multipliers $\lambda_1, ..., \lambda_M$ and define $\mathcal{L}(x_1, ..., x_N)$:
$\begin{aligned} \mathcal{L} \equiv f + \sum_{m = 1}^M \lambda_m g_m \end{aligned}$As before, we set $\nabla \mathcal{L} = 0$ and choose the multipliers $\lambda_1, ..., \lambda_M$ to satisfy the resulting system of $(N\!+\!M)$ 1D equations, and then find the coordinates of the extrema.
References
 G.B. Arfken, H.J. Weber, Mathematical methods for physicists, 6th edition, 2005, Elsevier.
 O. Bang, Applied mathematics for physicists: lecture notes, 2019, unpublished.