Genetic Algorithms in Search and Optimization
By Richard Baker
The past decade has witnessed a flurry of interest within the financial industry regarding artificial intelligence technologies, including neural networks, fuzzy systems, and genetic algorithms. In many ways, genetic algorithms, and the extension of genetic programming, offer an outstanding combination of flexibility, robustness, and simplicity. The following discussion highlights some of the key features of genetic algorithms (GAs), and illustrates an application of a particular GA in the search and estimation of global optima.
Virtually every technical discipline, from science and engineering to finance and economics, frequently encounters problems of optimization. Although optimization techniques abound, such techniques often involve identifying, in some fashion, the values of a sequence of explanatory parameters associated with the best performance of an underlying dependent, scalar-valued objective function. For example, in simple 3-D space, this amounts to finding the (x,y) point associated with the optimal z value above or below the x-y plane, where the scalar-valued z is a surface identified by the objective function f(x,y). Or it may involve estimating a large number of parameters of a more elaborate econometric model. For example, we might wish to estimate the coefficients of a generalized auto-regressive conditional heteroskedastic (GARCH) model, in which the log-likelihood function is the objective to maximize. In each case, the unknowns may be thought of as a parameter vector, V, and the objective function, z = f(V), as a transformation of a vector-valued input to a scalar-valued performance metric z.
Optimization may take the form of a minimization or maximization procedure. Throughout this article, optimization will refer to maximization without loss of generality, because maximizing f(V) is the same as minimizing -f(V). My preference for maximization is simply intuitive: Genetic algorithms are based on evolutionary processes and Darwin's concept of natural selection. In a GA context, the objective function is usually referred to as a fitness function, and the phrase survival of the fittest implies a maximization procedure.
Motivation for the Use of GAsA natural question for those unfamiliar with GAs might be: Given all the powerful numerical optimization methods available, such as deterministic gradient-based and simplex-based search methods, why would I be interested in a stochastic alternative like a GA?
The answer is, at least in part, dependent upon the application. If the optimization problems you encounter are reasonably well-behaved, then conventional deterministic techniques are clearly the best choice. But what if the problem is ill-behaved? Or, put another way, what are some of the features that make an optimization problem difficult?
Well, difficulties arise when the estimation involves many parameters that interact in highly non-linear ways. Objective functions characterized by many local optima, expansive flat planes in multi-dimensional space, points at which gradients are undefined, or when the objective function is discontinuous, pose difficulty for traditional mathematical techniques. In these situations, heuristic methods like GAs offer a powerful alternative, and can greatly enhance the set of tools available to researchers. Throughout the following discussion, try to think of GAs as enhancements to a suite of optimization techniques, regarding them as alternatives to more traditional methods, and not as replacements for conventional numerical optimizers. I other words, GAs should be used to complement familiar numerical techniques, and are not meant to replace or subsume them.
Common Features of GAsIt's important to note that, despite all the surrounding hype, the successful design and use of a GA is as much art as science. GA theory is elegant, fascinating, and full of subtle variations that may - or may not - be useful in certain applications. Since Holland's seminal work (Holland, 1975), numerous variations of the conventional GA have been proposed. Most of the variations are well beyond the scope of this article, so I will only discuss some the most common features of GAs. Interested readers are encouraged to consult the references listed below.
One of the most powerful features of GAs is their elegant simplicity. In fact, the simple power of a genetic approach may be harnessed with little more than a few lines of computer code and a random number generator. There are no closed form gradients to derive, and no differentiability or continuity requirements to satisfy. A GA is, in essence, nothing more than a repeated application of a few simple operations. Moreover, a GA knows nothing about the problem at hand, nor does it care. In fact, GAs need not know how to solve a problem, they just need to recognize a good solution when they see it!
GAs begin by randomly generating, or seeding, an initial population of candidate solutions. Each candidate is an individual member of a large population of size M. For the purposes of this discussion, think of each individual as a row vector composed of N elements. In GA parlance, individuals are often referred to as chromosomes, and the vector elements as genes. Each gene will provide storage for, and may be associated with, a specific parameter of the search space. As a simple example of two unknowns, we may think of each individual as a parameter vector V, in which the each V contains a point in the x-y plane, V = [x y]. With this example in mind, the entire population may be stored and manipulated as two-column matrix: The first column represents a population of x-axis values, and the second column a population of y-axis values, and each of the M rows of the matrix is a solution vector V of length N = 2.
The individuals (chromosomes) in genetic algorithms are usually constant-length character sequences (vectors V of constant size). In the traditional GA, these vectors are usually sequences of zeros and ones, but in practice may be anything, including a mix of integers and real numbers, or even a mix of numbers and character strings. The actual data encoded in the vectors is called the representation scheme. To keep the discussion simple and concrete, the chromosomes in this article will be real, continuous parameters with two elements, V = [x y]. Given these two-element chromosomes, the objective is to search for the (x,y) point that maximizes the scalar-valued fitness function z = f(V) = f(x,y). Starting with the initial random population of vectors, a GA then applies a sequence of operations to the population, guided only by the relative fitness of the individuals, and allows the population to evolve over a number of generations. The goal of the evolutionary process is to continually improve the fitness of the best solution vector, as well as the average population fitness, until some termination criteria is satisfied.
Conventional GAs usually apply a sequence of operations to the population based on the relative fitness of the members. The operations typically involve reproduction, in which individuals are randomly selected to survive into the next generation. In reproduction, highly fit individuals are more likely to be selected than unfit members. The idea behind reproduction is to allow relatively fit members to survive and procreate, with the hope that their genes are truly associated with better performance. Note that reproduction is asexual, in that only a single individual is involved.
The next operation, crossover, is a sexual operation involving two (or even more!) individuals. In crossover, highly fit individuals are more likely to be selected to mate and produce children than unfit members. In this manner, highly fit vectors are allowed to breed, with the hope that they will produce ever more fit offspring. Although the crossover operation may take many forms, it typically involves splitting each parent chromosome at a randomly-selected point within the interior of the chromosome, and rearranging the fragments so as to produce offspring of the same size as the parents. The children usually differ from each other, and are usually different from the parents. The effect of crossover is to build upon the success of the past, yet still explore new areas of the search space. Crossover is a slice-and-dice operation which efficiently shuffles genetic information from one generation the next.
The next operation is mutation, in which individuals are slightly changed. In our case, this means that the (x,y) parameters of the chromosome are slightly perturbed with some small probability. Mutation is an asexual operation, and usually plays a relatively minor role in the population. The idea behind mutation is to restore genetic diversity lost during the application of reproduction and crossover, both of which place relentless pressure on the population to converge.
The relentless pressure towards convergence is driven by fitness, and offers a convenient termination criteria. In fact, after many generations of evolution via the repeated application of reproduction, crossover, and mutation, the individuals in the population will often begin to look alike, so to speak. At this point, the GA typically terminates because additional evolution will produce little improvement in fitness. Many termination criteria may be used, in which the most simple is to just stop after some predetermined number of generations. See what I mean about part art, part science?
Pros & Cons of GAsIn contrast to more traditional numerical techniques, which iteratively refine a single solution vector as they search for optima in a multi-dimensional landscape, genetic algorithms operate on entire populations of candidate solutions in parallel. In fact, the parallel nature of a GA's stochastic search is one of the main strengths of the genetic approach. This parallel nature implies that GAs are much more likely to locate a global peak than traditional techniques, because they are much less likely to get stuck at local optima. Also, due to the parallel nature of the stochastic search, the performance is much less sensitive to initial conditions, and hence and a GA's convergence time is rather predictable. In fact, the problem of finding a local optimum is greatly minimized because GAs, in effect, make hundreds, or even thousands, of initial guesses. This implies that a GA's performance is at least as good as a purely random search. In fact, by simply seeding an initial population and stopping there, a GA without any evolutionary progression is essentially a Monte Carlo simulation.
As appealing as a GA may seem, the parallel nature of the stochastic search is not without consequences. Although the prospects of finding global optima make it robust, the convergence of a GA is usually slower than traditional techniques. In fact, with a good initial guess close to the global optimum, a numerical technique will likely be much faster, and more accurate, than a genetic search because, in essence, the GA will be wasting time testing the fitness of sub-optimal solutions.
Furthermore, due to the stochastic nature of a GA, the solution, although more likely to estimate the global optimum, will only be an estimate. Users must realize that GAs will only by chance find an exact optimum, whereas traditional methods will find it exactly . assuming, of course, they find it all. The user must then determine whether the solution found by a GA is close enough. In may cases it will be, but the question of 'How close is close enough?' is somewhat arbitrary and application-dependent.
An Optimization ExampleTo build on the notation developed earlier, this example involves the optimization of a real valued fitness function of two variables, z = f(V) = f(x,y). That is, we are to estimate the 2-D parameter vector V = [x y] of the (x,y) point associated with the maximum z value. This problem is very instructive, because it allows the user to visualize the entire evolutionary process as an animated sequence in 3-D space, and provides a very intuitive understanding of the adaptive, self-improving nature of a GA in action.
Furthermore, the example is motivated by a desire to help users integrate the stochastic search capabilities of a GA with traditional gradient-based search methods. Specifically, I want to emphasize the complementary nature of the two fundamentally different approaches.
The entire optimization example was developed in MATLAB. Although the analysis could have been performed with any of a host of software packages, MATLAB's integrated graphics and matrix-based processing capabilities make it ideal for GA work. In fact, since GAs are inherently parallel, simultaneously processing populations of parameter vectors in MATLAB is almost trivial and significantly reduces the development time and the amount of code required.
The particular GA I chose, specifically designed to optimize fitness functions of continuous parameters, was co-developed by Ken Price and Rainer Storn. Although the details of this particular GA cannot be addressed in this article, I strongly refer interested readers to the April 1997 edition of Dr. Dobb's Journal.
The fitness function for this example is the MATLAB demonstration function PEAKS shown in Figure 1; PEAKS is a function of two variables obtained by translating and scaling several Gaussian distributions. Although PEAKS is a smooth function of only two variables, it nevertheless has three local maxima, and thus represents a simple function for which the parameters estimated by a gradient-based search method are sensitive to the initial guess of the solution vector. The gradient-based optimization function used is the FMINU function of the MATLAB Optimization Toolbox. Note that FMINU is a minimization routine, so to maximize the PEAKS function, FMINU actually needs to minimize -PEAKS.
Figure 1 MATLAB PEAKS as the Fitness Function
To illustrate the example, set an initial vector estimate V0 = [-1 -1]. FMINU returns a solution vector V = [-0.4601 -0.6292] associated with the local maximum z = f(-0.4601,-0.6292) = 3.7766. If we start at V0 = [1 0], FMINU returns a solution vector V = [1.2857 -0.0048] associated with the local maximum z = f(1.2857,-0.0048) = 3.5925. Finally, if we start at V0 = [2 2], FMINU returns the solution vector V = [-0.0093 1.5814] associated with the true global maximum z = f(-0.0093,1.5814]) = 8.1062.
The gradient-based method works extremely well, provided it knows where to start. In this simple case we can visually determine the location of the true global peak, but in problems of much higher dimension there is frequently no such luxury.
Now consider first running the Price/Storn GA to provide FMINU with an initial estimate. The initial population was seeded with only 20 individual chromosomes (20 (x,y) points randomly generated such that -3 < x,y < 3), and allowed to evolve for only 10 generations, producing the parameter estimate V0 = [0.1205 1.6551]. This estimate is then passed to FMINU. The result, as expected, is now the true global maximum z = f(-0.0093,1.5814]) = 8.1062.
Stochastic techniques can expand the optimization tools now available to researchers in a way that enhances and complements the familiar, traditional methods. Genetic approaches, in particular, are now available to optimize difficult, ill-behaved objective functions that often prove difficult for many conventional methods. Furthermore, these genetic approaches are often simple to design and easy to code, and can be used in concert with traditional methods to greatly increase the probability of finding the true global optimum of multi-dimensional functions.
1. Dhar, V, and Stein, R.(1997). Seven Methods for Transforming Corporate Data into Business Intelligence.. Prentice-Hall.
2. Epstein, J.M., and Axtell, R. (1996). Growing Artificial Societies: Social Science from the Bottom Up. MIT Press.
3. Goldberg, D.E (1989). Genetic Algorithms in Search, Optimization & Machine Learning. Addison-Wesley Longman.
4. Holland, J.H. (1992). Adaptation in Natural and Artificial Systems. MIT Press.
5. Koza, J.R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press.
6. Koza, J.R. (1994). Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press.
7. Price, K., and Storn, R (April 1997). Differential Evolution: A Simple Evolution Strategy for Fast Optimization. Dr. Dobb's Journal, 264, pp. 18-24.