Skip to main content

Section 3.6 The Gradient and Max-Min Problems

Subsection 3.6.1 The gradient of a function of multiple variables

If \(f(x_{1}, x_{2}, \ldots, x_{n})\) is a real-valued function of \(n\) variables, i.e. \(x_{1}\text{,}\) \(x_{2}\text{,}\) …, \(x_{n}\text{,}\) then the gradient of \(f\) is the vector:
\begin{equation*} \nabla f(\mathbf{x}) = \left( \begin{array}{c}\frac{ \partial f}{ \partial x_{1}} (\mathbf{x} ) \\ \frac{ \partial f}{ \partial x_{2}} (\mathbf{x} ) \\ \vdots \\ \frac{ \partial f}{ \partial x_{n}} (\mathbf{x} )\end{array} \right)\text{,} \end{equation*}
where we use the vector notation \(\mathbf{x}= (x_{1}, x_{2}, \ldots, x_{n})'\text{.}\) For example, if \(f(x_{1},x_{2}) = x_{1}e^{-x_1^2-x_2^2+\sin(x_1x_2)}\text{,}\) then
\begin{equation*} \nabla f(\mathbf{x}) = \left( \begin{array}{c}(1+x_1(-2x_1+\cos(x_1x_2)x_2)) e^{-x_1^2-x_2^2+\sin(x_1x_2)}\\ x_1(-2x_2+\cos(x_1x_2)x_1) e^{-x_1^2-x_2^2+\sin(x_1x_2)}\end{array} \right)\text{.} \end{equation*}
Note that the gradient is a vector-valued function of multiple variables. This type of function is often called a vector field.
The gradient vector has important features with geometric meaning.
  • The gradient vector points in the direction in which \(f\) is increasing the most.
  • The length of the gradient vector is equal to the derivative of \(f\) in that direction.
  • The gradient vector is perpendicular to the level curve through that point.
We can visualize a gradient (or other vector field) using the MATLAB command quiver (makes arrows).
>> [x1,x2] = meshgrid(-2:0.2:2);
>> z = x1.*exp(-x1.^2-x2.^2+sin(x1.*x2));
>> dz_dx1 = (1+x1.*(-2*x1+cos(x1.*x2).*x2)).*exp(-x1.^2-x2.^2+sin(x1.*x2));
>> dz_dx2 = x1.*(-2*x2+cos(x1.*x2).*x1).*exp(-x1.^2-x2.^2+sin(x1.*x2));
>> contour(x1,x2,z,10)
>> hold on
>> quiver(x1,x2,dz_dx1,dz_dx2)
>> hold off
Compare this to the surface plot:
>> figure()
>> surf(x1,x2,z)

Subsection 3.6.2 Optimization in multiple variables

If a differentiable function of multiple variables, \(f\text{,}\) has a maximum or minimum at point \(\mathbf{x}^{*}\text{,}\) then we will have:
\begin{equation*} \nabla f(\mathbf{x}^{*}) = \mathbf{0}\quad \text{ (the zero vector)}\text{.} \end{equation*}
Thus one way to locate max or min points for a specific function is to solve \(\nabla f(\mathbf{x}^{*}) = \mathbf{0}\text{.}\) This will often lead to systems of non-linear equations as we encountered in Section 2.6. For example, for \(f(x_{1},x_{2}) = x_{1}e^{-x_1^2-x_.^2+\sin(x_1x_2)}\text{,}\)to attempt to find critical point we would need to try to solve:
\begin{equation*} (1+x_{1}(-2x_{1}+\cos(x_{1}x_{2})x_{2})) e^{-x_1^2-x_.^2+\sin(x_1x_2)}=0 \end{equation*}
and
\begin{equation*} x_{1}(-2x_{2}+\cos(x_{1}x_{2})x_{1}) e^{-x_1^2-x_.^2+\sin(x_1x_2)}=0\text{.} \end{equation*}
That, obviously, is going to be very hard and in fact is impossible using algebra. Thus we will have to use numerical methods to find any maximum or minimum points. The multiple variable Newton’s method that we learned in section Section 2.6 would be one option. In the next section we introduce another common algorithm.

Subsection 3.6.3 The gradient ascent or descent method

We will take advantage of the fact that the gradient vector points in the direction of greatest increase in the function \(f\text{.}\) If \(f\) is a function of \(x\) and \(y\text{,}\) then we can think of the graph of \(z = f(x,y)\) as a landscape and the gradient points ``uphill’’. That gives us a very handy way to find maximum points, just follow the gradient vectors as they will take you toward higher values of \(z\text{.}\) See Figure 3.6.1.
Quiver plot showing ascent path.
Surface plot.
Figure 3.6.1. Gradient ascent on \(f(x_{1},x_{2}) = x_{1}e^{-x_1^2-x_.^2+\sin(x_1x_2)}\text{.}\)
Since the gradient vector field may change from point to point, we can only ``follow’’ it numerically in finite steps. That is, starting from an initial guess \(\mathbf{x}_{0}\text{,}\) we generate a sequence of points by the formula
\begin{equation*} \mathbf{x}_{n+1}= \mathbf{x}_{n} + \epsilon \nabla f(\mathbf{x}_{n})\text{.} \end{equation*}
The hope then is that
\begin{equation*} \mathbf{x}_{n} \rightarrow \mathbf{x}^{*}\text{.} \end{equation*}
The number \(\epsilon\) is often called the ``learning rate’’. It should not be too small or too large. If it is too small the algorithm will be slow. If it is too large, the steps might skip over a critical point. There is ongoing research about how to choose it optimally.
Notice the similarities between this method and the Newton’s method for nonlinear systems in Section 2.6. In both cases we have a function of a vector and generate a sequence of vectors \(\mathbf{x}_{0},\mathbf{x}_{1},\dots\text{.}\)

Exercises 3.6.4 Exercises

1.

Find by hand the gradient of the function \(f(x_{1},x_{2}) = \sin(x_{1}) \ e^{-x_1^2 - x_2^2}\text{.}\) Plot it on the rectangle \(-3 \le x_{1} \le 3\text{,}\) \(-2 \le x_{2} \le 2\) using meshgrid and quiver. On the same figure, add the contour plot of \(f\text{.}\)

2.

Write a well-commented script program that uses the gradient descent method to find a minimum point for the function \(f(x_{1},x_{2}) = \sin(x_{1}) \ e^{-x_1^2 - x_2^2}\text{,}\) starting from \(\mathbf{x}_{0} = [-1;-2]\text{.}\) Compare with the plot in Exercise 3.6.4.1 to check that it worked. (Hint: Review the program mymultnewton.m in Section 2.6 for examples of making functions of vectors.)