International
Tables for Crystallography Volume C Mathematical, physical and chemical tables Edited by E. Prince © International Union of Crystallography 2006 
International Tables for Crystallography (2006). Vol. C, ch. 8.1, pp. 686687

A numerical procedure that is applicable to largescale problems that may not be sparse is called the conjugategradient method. Conjugategradient methods were originally designed to solve the quadratic minimization problem, find the minimum of where H is a symmetric, positivedefinite matrix. The gradient of S is and its Hessian matrix is H. Given an initial estimate, , the conjugategradient algorithm is

This algorithm finds the exact solution for the quadratic function in not more than p steps.
This algorithm cannot be used directly for the nonlinear case because it requires H to compute , and the goal is to solve the problem without computing the Hessian. To accomplish this, the exact computation of α is replaced by an actual line search, and the termination after at most p steps is replaced by a convergence test. Thus, we obtain, for a given starting value and a general, nonquadratic function S:

Note that, as promised, H is not needed. In practice, it has been observed that the line search need not be exact, but that periodic restarts in the steepestdescent direction are often helpful. This procedure often requires more iterations and function evaluations than methods that store approximate Hessians, but the cost per iteration is small. Thus, it is often the overall leastexpensive method for large problems.
For the leastsquares problem, recall that we are finding the minimum of for which By using these definitions in the conjugategradient algorithm, it is possible to formulate a specific algorithm for linear least squares that requires only the calculation of Z times a vector and Z^{T} times a vector, and never requires the calculation or factorization of Z^{T}Z.
In practice, such an algorithm will, due to roundoff error, sometimes require more than p iterations to reach a solution. A detailed examination of the performance of the procedure shows, however, that fewer than p iterations will be required if the eigenvalues of Z^{T}Z are bunched, that is, if there are sets of multiple eigenvalues. Specifically, if the eigenvalues are bunched into k distinct sets, then the conjugategradient method will converge in k iterations. Thus, significant improvements can be made if the problem can be transformed to one with bunched eigenvalues. Such a transformation leads to the socalled preconditioned conjugategradient method. In order to analyse the situation, let C be a p × p matrix that transforms the variables, such that Then, Therefore, C should be such that the system Cx = x′ is easy to solve, and has bunched eigenvalues. The ideal choice would be C = R, where R is the upper triangular factor of the QR decomposition, since . has all of its eigenvalues equal to one, and, since R is triangular, the system is easy to solve. If R were known, however, the problem would already be exactly solved, so this is not a useful alternative. Unfortunately, no universal best choice seems to exist, but one approach is to choose a sparse approximation to R by ignoring rows that cause too much fill in or by making R a diagonal matrix whose elements are the Euclidean norms of the columns of Z. Bear in mind that, in the nonlinear case, an expensive computation to choose C in the first iteration may work very well in subsequent iterations with no further expense. One should be aware of the trade off between the extra work per iteration of the preconditionedconjugate gradient method versus the reduction in the number of iterations. This is especially important in nonlinear problems.
The solution of large, leastsquares problems is currently an active area of research, and we have certainly not given an exhaustive list of methods in this chapter. The choice of method or approach for any particular problem is dependent on many conditions. Some of these are:
