All the conjugate gradient algorithms start out by searching in the steepest descent
direction (negative of the gradient) on the first iteration.
p0=−g0
A line search is then performed to determine the optimal distance to move along the
current search direction:
xk+1=xkαkpk
Then the next search direction is determined so that it is conjugate to previous search
directions. The general procedure for determining the new search direction is to combine the
new steepest descent direction with the previous search direction:
pk=−gk+βkpk−1
The various versions of the conjugate gradient algorithm are distinguished by the manner
in which the constant βk is computed. For the
Fletcher-Reeves update the procedure is
βk=gkTgkgk−1Tgk−1
This is the ratio of the norm squared of the current gradient to the norm squared of the
previous gradient.
See [FlRe64] or [HDB96] for a discussion of the
Fletcher-Reeves conjugate gradient algorithm.
The conjugate gradient algorithms are usually much faster than variable learning rate
backpropagation, and are sometimes faster than trainrp, although the results
vary from one problem to another. The conjugate gradient algorithms require only a little more
storage than the simpler algorithms. Therefore, these algorithms are good for networks with a
large number of weights.
Try the Neural Network Design
demonstration nnd12cg [HDB96] for an illustration of the performance of a conjugate
gradient algorithm.