2.1.1 The Wrong Iterative Method: Steepest Descent

Steepest Descent is a well-known iterative minimization method. It can be applied to any surface for which the gradient may be computed. The method of Steepest Descent computes the gradient at its current location, then travels in the opposite direction of the gradient until it reaches a minimum in this direction. Elementary calculus shows that at this minimum the new gradient is perpendicular to the previous gradient. When applied to a quadratic form with initial guess , Steepest Descent may be summarized as

Choose search direction. Compute distance to minimum for line search. Compute next location.

Because we are only concerned with quadratic forms having a global
minimum and no local minima, Steepest Descent is guaranteed to
converge. However there is no guarantee that Steepest Descent will
converge quickly. For example, Figure 2.1 shows
Steepest Descent's path when started from a worst-case starting point
[41].

The minimum point in Figure 2.1 is
, which by definition satisfies
.
Let
represent the error vector at
iteration . We cannot compute
since we do not know
; this requires the inverse or pseudo-inverse of
.
However we can easily compute the image of
under
,
since

(2.4) | |||

(2.5) | |||

(2.6) |

Define the residual vector
, and note that Equation 2.3 implies
. Since this negative gradient is the
direction Steepest Descent will travel next, we may characterize
Steepest Descent as greedily reducing the size of the residual.

Because Steepest Descent follows the gradient
until it reaches a point
for which
, the algorithm can only make square
turns. When applied to a two dimensional quadratic form with
symmetric positive semidefinite Hessian
, this property implies
that search directions must be repeated if more than two iterations
are necessary. The only way to guarantee two or fewer iterations is
to choose a starting point
for which
or
. This cannot be done
without computing
, which requires knowing the answer
.