We would like to have a set of linearly independent direction vectors and scalars for which
Because we cannot know without knowing the solution, we cannot solve Equation 2.7 in any amount of time. However, we can compute , as shown in Equation 2.6. Left multiplying Equation 2.7 by produces
If we assume we can find a set of linearly independent directions
which are mutually
-orthogonal, we may solve
for by left-multiplying Equation 2.8
Not only does there exist a set of linearly independent mutually -orthogonal direction vectors not in the null space of , but each vector can be computed incrementally. Let . This allows computation of and , and all remaining direction vectors can be computed as needed via
It is worth emphasizing that each direction vector is a linear combination of the current residual and the previous direction vector. Because the direction vectors are linearly independent, a simple induction implies that is a linear combination of and
Figure 2.2 shows the conjugate directions used by CG when started from the same place as Steepest Descent was in Figure 2.1. Because CG will converge within iterations, this is a worst-case starting point for CG requiring steps. CG does so much better than Steepest Descent in our example because it assumes, correctly, that is a quadratic form. If is not a quadratic form then adjustments must be made. The resulting algorithm is often referred to as nonlinear CG, and is discussed briefly in Section 2.2. Predictably, the CG algorithm discussed so far is called linear CG.
We have omitted demonstration of the existence of a linearly independent set of conjugate directions which are not in the null space of , and also omitted many interesting properties of CG. The value of in Equation 2.11 would be the same had we defined it as the step length for minimizing line-search in direction starting at . The minimizer is . This is where the gradient, and hence the residual, is perpendicular to . Thus is perpendicular to . Equation 2.14 then implies that each new residual is perpendicular to the space already searched. It also follows that the residuals are mutually orthogonal and that is minimized over . These properties help explain the efficiency per iteration of CG.
One may also observe that the computations needed in each CG iteration depend only on and the previous iteration, with the exception of as shown in Equation 2.11. In fact can also be written in terms of the previous iteration [41,31]. Therefore we only need to store a few vectors and . Because only matrix-vector computations are needed, storage and computation may exploit the structure or sparsity of .
It was demonstrated above that CG should arrive at the exact solution in no more than iterations when is . This property does not hold when rounded arithmetic is used, for instance when CG is performed on a computer. Errors during computations can reduce the conjugacy of the direction vectors and decrease efficiency so that many more than iterations are required. 
The convergence rate for CG is not entirely straightforward. In Shewchuk  this rate is described as
There are many possibilities for deciding when to terminate CG besides the relative size of the error. Furthermore the relation between the true error and the computed error is complex, and the subject of ongoing numerical analysis research. Termination alternatives include minimizing the absolute size of the error, or relative or absolute size of the residual, or context-dependent quantities such as the relative change of the likelihood when computing a maximum likelihood estimate.
Finding a good termination criteria is very important to minimize wasted time while ensuring sufficient optimality. When we discuss our use of CG later in this thesis, the most important issue will be proper termination. Because the worst-case time complexity of CG depends on the spectrum of , our CG termination methods should to adapt to each dataset. We will also examine how the starting point affects our methods in the one case where an informed choice is possible. Overall, we aim hide all of these issues from the users of our algorithms.