next up previous contents
Next: 2.1.2 The Right Iterative Up: 2.1 Minimizing Quadratic Forms Previous: 2.1 Minimizing Quadratic Forms   Contents


2.1.1 The Wrong Iterative Method: Steepest Descent

Steepest Descent is a well-known iterative minimization method. It can be applied to any surface for which the gradient may be computed. The method of Steepest Descent computes the gradient at its current location, then travels in the opposite direction of the gradient until it reaches a minimum in this direction. Elementary calculus shows that at this minimum the new gradient is perpendicular to the previous gradient. When applied to a quadratic form $ f$ with initial guess $ \ensuremath{\mathbf{x}}_0$, Steepest Descent may be summarized as

$ \ensuremath{\mathbf{d}}_i = -f'(\ensuremath{\mathbf{x}}_i) = \ensuremath{\mathbf{b}} - \ensuremath{\mathbf{A}}\ensuremath{\mathbf{x}}_i$ Choose search direction.
$ \alpha = \frac {\ensuremath{\mathbf{d}}_i^T \ensuremath{\mathbf{d}}_i} {\ensuremath{\mathbf{d}}_i^T \ensuremath{\mathbf{A}} \ensuremath{\mathbf{d}}_i}$ Compute distance to minimum for line search.
$ \ensuremath{\mathbf{x}}_{i+1} = \ensuremath{\mathbf{x}} + \alpha \ensuremath{\mathbf{d}}_i$ Compute next location.

Figure: Steepest Descent worst-case path. The large dot is $ \ensuremath{\mathbf{x}}_0$, and the smaller dots are the iterates $ \ensuremath{\mathbf{x}}_i$. Steepest Descent follows $ f'(\ensuremath{\mathbf{x}}_0)$ until it finds a perpendicular gradient. Then it repeats the process with the new gradient.
\includegraphics[width=\textwidth]{figures/sd_final.eps}
Figure: CG worst-case path. The large dot is $ \ensuremath{\mathbf{x}}_0$, and the smaller dots are the iterates $ \ensuremath{\mathbf{x}}_i$. Note that CG stops its first iteration such that $ f'(\ensuremath{\mathbf{x}}_1) \perp f'(\ensuremath{\mathbf{x}}_0)$, but it does not follow the gradient for its second iteration.
\includegraphics[width=\textwidth]{figures/cg_final.eps}

Because we are only concerned with quadratic forms having a global minimum and no local minima, Steepest Descent is guaranteed to converge. However there is no guarantee that Steepest Descent will converge quickly. For example, Figure 2.1 shows Steepest Descent's path when started from a worst-case starting point $ \ensuremath{\mathbf{x}}_0$ [41].

The minimum point in Figure 2.1 is $ \ensuremath{\mathbf{x}}^* =
(0,0)^T$, which by definition satisfies $ \ensuremath{\mathbf{A}}\ensuremath{\mathbf{x}}^* = \ensuremath{\mathbf{b}}$. Let $ \ensuremath{\mathbf{e}}_i = \ensuremath{\mathbf{x}}^* - \ensuremath{\mathbf{x}}_i$ represent the error vector at iteration $ i$. We cannot compute $ \ensuremath{\mathbf{e}}_i$ since we do not know $ \ensuremath{\mathbf{x}}^*$; this requires the inverse or pseudo-inverse of $ \ensuremath{\mathbf{A}}$. However we can easily compute the image of $ \ensuremath{\mathbf{e}}_i$ under $ \ensuremath{\mathbf{A}}$, since

$\displaystyle \ensuremath{\mathbf{A}}\ensuremath{\mathbf{e}}_i$ $\displaystyle =$ $\displaystyle \ensuremath{\mathbf{A}} \left( \ensuremath{\mathbf{x}}^* - \ensuremath{\mathbf{x}}_i \right )$ (2.4)
  $\displaystyle =$ $\displaystyle \ensuremath{\mathbf{A}}\ensuremath{\mathbf{x}}^* - \ensuremath{\mathbf{A}}\ensuremath{\mathbf{x}}_i$ (2.5)
  $\displaystyle =$ $\displaystyle \ensuremath{\mathbf{b}} - \ensuremath{\mathbf{A}}\ensuremath{\mathbf{x}}_i$ (2.6)

Define the residual vector $ \ensuremath{\mathbf{r}}_i = \ensuremath{\mathbf{b}} - \ensuremath{\mathbf{A}}\ensuremath{\mathbf{x}}_i =
\ensuremath{\mathbf{A}}\ensuremath{\mathbf{e}}_i$, and note that Equation 2.3 implies $ \ensuremath{\mathbf{r}}_i = -f'(\ensuremath{\mathbf{x}}_i)$. Since this negative gradient is the direction Steepest Descent will travel next, we may characterize Steepest Descent as greedily reducing the size of the residual.

Because Steepest Descent follows the gradient $ -f'(\ensuremath{\mathbf{x}}_{i}) =
\ensuremath{\mathbf{r}}_i$ until it reaches a point $ \ensuremath{\mathbf{x}}_{i+1}$ for which $ \ensuremath{\mathbf{r}}_{i+1} \perp \ensuremath{\mathbf{r}}_i$, the algorithm can only make square turns. When applied to a two dimensional quadratic form with symmetric positive semidefinite Hessian $ \mathbf{A}$, this property implies that search directions must be repeated if more than two iterations are necessary. The only way to guarantee two or fewer iterations is to choose a starting point $ \ensuremath{\mathbf{x}}_0$ for which $ f'({x}_0) \parallel
\ensuremath{\mathbf{e}}_0$ or $ f'(\ensuremath{\mathbf{x}}_0) \perp \ensuremath{\mathbf{e}}_0$. This cannot be done without computing $ \ensuremath{\mathbf{e}}_0$, which requires knowing the answer $ \ensuremath{\mathbf{x}}^*$.


next up previous contents
Next: 2.1.2 The Right Iterative Up: 2.1 Minimizing Quadratic Forms Previous: 2.1 Minimizing Quadratic Forms   Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu