GA Resources

On this page, we list some information on genetic algorithms and various genetic operators. We also provide some information on evolutionary algorithms used for optimization. We discuss single/multi-objective optimization and constraint handling techniques. We also present details of some of the cutting edge research topics being pursued at KanGAL

Index of GA Resources



There exists a number of real-parameter GA implementations, where crossover and mutation operators are applied directly on real parameter values. One of the early implementations was by Wright (1991), where a linear crossover operator created three solutions (xi(1,t)+xi(2,t)), (1.5xi(1,t)-0.5xi(2,t)), and (-0.5xi(1,t)+1.5xi(2,t)) from two parent solutions xi(1,t) and xi(2,t)at generation t and choose the best two solutions as children solutions. Goldberg introduced the concept of virtual alphabets in the context of real-coded GAs (Goldberg, 1991). Eshelman and Schaffer (1993) have introduced the notion of interval schemata for real-coded genetic algorithms and suggested a blend crossover (BLX-$\alpha$) operator. For two parent solutions xi(1,t) and xi(2,t) (assuming xi(1,t)<xi(2,t)), the BLX-$\alpha$ randomly picks a solution in the range [ $x_i^{(1,t)}-\alpha(x_i^{(2,t)}-x_i^{(1,t)})$, $x_i^{(2,t)}+\alpha(x_i^{(2,t)}-x_i^{(1,t)})$]. Thus, if ui is a random number between 0 and 1, the following is a child solution:

 \begin{displaymath}x_i^{(1,t+1)} = (1-\gamma_i)x_i^{(1,t)}+ \gamma_i x_i^{(2,t)}, \end{displaymath} (1)

where $\gamma_i = (1+2\alpha)u_i - \alpha$. If $\alpha$ is zero, this crossover creates a random solution in the range ( xi(1,t), xi(2,t)). In a number of test problems, they have reported that BLX-0.5 (with $\alpha=0.5$) performs better than BLX operators with any other $\alpha$ value. However, it is important to note that the factor $\gamma_i$ is uniformly distributed for a fixed value of $\alpha$. However, BLX-$\alpha$has an interesting property: the location of the child solution depends on the difference in parent solutions. This will be clear if we rewrite equation  as follows:

 \begin{displaymath}\left(x_i^{(1,t+1)} - x_i^{(1,t)}\right) = \gamma_i\left(x_i^{(2,t)}-x_i^{(1,t)}\right). \end{displaymath} (2)

If the difference between the parent solutions is small, the difference between the child and parent solutions is also small. We believe that this is an essential property for any search algorithm to exhibit self-adaptation. This is because the spread of current population dictates the spread of solutions in the resulting population. In problems where self-adaptation should cause the population to either converge or diverge for better regions in the search space, assignment of children population proportional to parent population may help achieve the task.

Ono and Kobayashi (1997) suggested a unimodal normally distributed crossover (UNDX) operator, where three parent solutions are used to create two or more children solutions. Children solutions are created from an ellipsoidal probability distribution with one axis is formed along the line joining two of the three parent solutions and the extent of the orthogonal direction is decided by the perpendicular distance of the third parent from the axis. Unlike the BLX operator, this operator assigns more probability for creating solutions near the center of the first two parents than near the parents. Recently, a multi-parental UNDX operator with more than three parents is also suggested (Kita, Ono, and Kobayashi, 1998). Like the BLX operator, this operator also assigns children solutions proportional to the spread of parent solutions, thereby making a GA with this operator potential to exhibit self-adaptation.

In the year 1995, the first author and his students have developed the simulated binary crossover (SBX), which works with two parent solutions and creates two children solutions (Deb and Agrawal, 1995; Deb and Kumar, 1995). As the name suggests, the SBX operator simulates the working principle of the single-point crossover operator on binary strings. In those studies, authors showed that this crossover operator respects the interval schemata processing, in the sense that common interval schemata between parents are preserved in children. The procedure of computing the children solutions xi(1,t+1) and xi(2,t+1) from parent solutions xi(1,t) and xi(2,t) is described as follows. A spread factor $\beta_i$ is defined as the ratio of the absolute difference in children values to that of the parent values:

 \begin{displaymath}\beta_i = \left\vert\frac{x_i^{2,t+1}-x_i^{1,t+1}}{x_i^{2,t}-x_i^{1,t}}\right\vert. \end{displaymath} (3)

First, a random number ui between 0 and 1 is created. Thereafter, from a specified probability distribution function, the ordinate ${\beta_q}_i$ is found so that the area under the probability curve from 0 to ${\beta_q}_i$ is equal to the chosen random number ui. The probability distribution used to create a child solution is derived to have a similar search power as that in a single-point crossover in binary-coded GAs and is given as follows (Deb and Agrawal, 1995):

 \begin{displaymath}{\cal P}(\beta_i) = \left\{\begin{array}{ll} 0.5(\eta+1)\beta... ...ac{1}{\beta_i^{\eta+2}}, & {\rm otherwise}. \end{array}\right. \end{displaymath} (4)

Figure  shows the above probability distribution with $\eta=2$ and 5 for creating children solutions from two parent solutions ( xi(1,t)=2.0 and xi(2,t)=5.0) in the real space. In the above expressions, the distribution index $\eta$ is any nonnegative real number. A large value of $\eta$ gives a higher probability for creating near parent solutions and a small value of $\eta$allows distant solutions to be selected as children solutions. Using equation-4 , we calculate ${\beta_q}_i$ by equating the area under the probability curve equal to ui, as follows:

 \begin{displaymath}{\beta_q}_i = \left\{\begin{array}{lll} (2u_i)^{\frac{1}{\eta... ...ht)^{\frac{1}{\eta+1}}, & \mbox{otherwise}. \end{array}\right. \end{displaymath} (5)

  
Figure: Probability distribution for creating children solutions of continuous variables. Parents are marked with an 'o'.
\begin{figure} \begin{center} \epsfig{file=prob.ps,width=0.65\linewidth}\end{center}\end{figure}

After obtaining ${\beta_q}_i$ from the above probability distribution, the children solutions are calculated as follows:

  
xi(1,t+1) = $\displaystyle 0.5\left[(1+{\beta_q}_i)x_i^{(1,t)}+(1-{\beta_q}_i)x_i^{(2,t)}\right],$ (6)
xi(2,t+1) = $\displaystyle 0.5\left[(1-{\beta_q}_i)x_i^{(1,t)}+(1+{\beta_q}_i)x_i^{(2,t)}\right].$ (7)

Thus, the following step-by-step procedure is followed to create two children solutions ( xi(1,t+1) and xi(2,t+1)) from two parent solutions ( xi(1,t) and xi(2,t)):
Step 1: Choose a random number $u_i \in [0,1)$.
Step 2:
Calculate ${\beta_q}_i$ using equation 5
Step 3:
Compute children solutions using equations 5 and 6.

Note that two children solutions are symmetric about the parent solutions. This is deliberately used to avoid any bias towards any particular parent solution in a single crossover operation. Another interesting aspect of this crossover operator is that for a fixed $\eta$ the children solutions has a spread which is proportional to that of the parent solutions:

 \begin{displaymath}\left(x_i^{(2,t+1)}- x_i^{(1,t+1)}\right) = {\beta_q}_i \left(x_i^{(2,t)} - x_i^{(1,t)}\right). \end{displaymath} (8)

This has an important implication. Let us consider two scenarios: (i) Two parents are far away from each other, and (ii) two parents are closer to each other. For illustration, both these cases (with parent solutions xi(1,t)=2.0 and xi(2,t)=5.0 in the first case and with parent solutions xi(1,t)=2.0 and xi(2,t)=2.5 in the second case) and the corresponding probability distributions with $\eta=2$ are shown in Figures below.
  
Figure: Probability distribution of children solutions with distant parents.
x_i ${\scriptstyle x_i}$ \epsfig{file=diff1.ps,width=\linewidth}





Figure: Probability distribution of children solutions with closely spaced parents.
x_i ${\scriptstyle x_i}$ \epsfig{file=diff2.ps,width=\linewidth}



For an identical random number ui, the parameter ${\beta_q}_i$ take the same value in both cases. From equation 8 it is clear that in the first case the children are likely to be more widely spread than in the second case. Figures 3 and 4 also show the corresponding children solutions (marked with a box) for ui=0.8 or ${\beta_q}_i=2.5^{1/3}$. Figures clearly show that if the parent values are far from each other (the first case), solutions away from parents are possible to be created. Compare the child solution xi(1,t+1) = 1.464 with parent xi(1,t)=2.0. But if the parent values are close by (the second case), distant children solutions are not likely. Compare the child solution xi(1,t+1) = 1.911 with parent xi(1,t)=2.0 creating using the same random number as in the first case. In initial populations, where the solutions are randomly placed (like the first case), this allows almost any value to be created as a child solution. But when the solutions tend to converge due to the action of genetic operators (like the second case), distant solutions are not allowed, thereby focusing the search to a narrow region.

It is interesting to note that both equations 6 and 7 can be written in the form of equation 1 with the following relationship: $\gamma_i = 0.5(1 \mp {\beta_q}_i).$However, it is important that, unlike in BLX-$\alpha$ operator, the equivalent $\gamma_i$ term in SBX operator is not uniformly distributed (refer to equation 5 ). The SBX operator biases solutions near each parent more favorably than solutions away from parents. Essentially, SBX operator has two properties:

1.
The extent of children solutions is in proportion to the parent solutions, and
2.
Near parent solutions are monotonically more likely to be chosen as children solutions than solutions distant from parents.



Multi-objective optimization deals with solving optimization problems which involve multiple objectives. Most real-world search and optimization problems involve multiple objectives (such as minimizing fabrication cost and maximize product reliability, and others) and should be ideally formulated and solved as a multi-objective optimization problem. However, the task of multi-objective optimization is different from that of single-objective optimization in that in multi-objective optimization, there is usually no single solution which is optimum with respect to all objectives. The resulting problem usually has a set of optimal solutions, known as Pareto-optimal solutions, non-inferior solutions, or effective solutions (Steuer, 1986). Since there exists more than one optimal solution and since without further information no one solution can be said to be better than any other Pareto-optimal solution, one of the goals of multi-objective optimization is to find as many Pareto-optimal solutions as possible.

Classical search and optimization methods usually work with a point-by-point principle and thus are required to be applied many times, each time finding one Pareto-optimal solution. Moreover, the efficacy of classical methods largely depends on the shape of the Pareto-optimal region, discreteness of the search space, presence of constraints, and others. Over the past decade, population-based evolutionary algorithms (EAs) (genetic algorithms (GAs) and evolution strategies (ESs)) have been found to be quite useful in solving multi-objective optimization problems, simply because of their ability to find multiple optimal solutions in a single simulation run.

Although the possibility of using evolutionary algorithms to solve multi-objective optimization problems was proposed in the Seventies, the first implementation emerged in the year 1984. Since then, there has been a lukewarm interest for a decade or so. But the major popularity of the field began since 1993. A web site maintained by Carlos A. Coello Coello shows that there are at least 300 research papers written till to date on this topic. A year-wise count of those papers is plotted in the figure, which shows an exponential growth in interest in the field over the years.


Since evolutionary approaches use a population-based approach, it is intuitive that a number of Pareto-optimal solutions can, in principle, be developed in one single simulation run. Reasonably enough, there exists a number of past studies and a growing number of current approaches in this direction, which is providing the evolutionary computing field a new dimension.

Multi-Criterion Evolutionary Optimization: Past

As early as in 1967, Rosenberg suggested, but did not simulate, a genetic search method for finding the chemistry of a population of single-celled organisms with multiple properties or objectives (Rosenberg, 1967).

However, the first implementation of a real multi-objective evolutionary algorithm (vector-evaluated GA or VEGA) was suggested by David Schaffer in the year 1984 (Schaffer, 1984). Schaffer modified the simple tripartite genetic algorithm GAs with selection, crossover, and mutation) by performing independent selection cycles according to each objective. The selection method is repeated for each individual objective to fill up a portion of the mating pool. Then the entire population is thoroughly shuffled to apply crossover and mutation operators. This is performed to achieve the mating of individuals of different subpopulation groups. The algorithm worked efficiently for some generations but in some cases suffered from its bias towards some individuals or regions (mostly individual objective champions). This does not fulfill the second goal of MOEO.

Ironically, no significant study was performed for almost a decade after the pioneering work of Schaffer, until a revolutionary 10-line sketch of a new non-dominated sorting procedure suggested by David Goldberg Goldberg (1989). Since an EA needs a fitness function for reproduction, the trick was to find a single metric from a number of objective functions. Goldberg's suggestion was to use the concept of domination to assign more copies to non-dominated individuals in a population. Since diversity is another concern, he also suggested the use of a niching strategy among solutions of a non-dominated class. Getting this clue, at least three independent groups of researchers developed different versions of multi-objective evolutionary algorithms. Basically, these algorithms differ in the way a fitness is assigned to each individual.

Fonesca and Fleming (1993), in their multi-objective GA (MOGA), first classify the whole population according to different non-dominated classes. Thereafter, individuals of the first (best) class are all assigned a rank `1'. Other individuals are ranked by calculating how many solutions (say k) dominate a particular solution. That solution is assigned a rank one more than k. Therefore, at the end of this ranking procedure, there may exist many solutions having the same rank. The selection procedure then uses these ranks to select or delete blocks of points to form the mating pool. MOGA also uses a niche-formation method to distribute the population over the Pareto-optimal region. But instead of performing niching on the parameter values, they have used niching on objective function values.

Horn, Nafploitis, and Goldberg (1995) used Pareto domination tournaments instead of non-dominated sorting and ranking selection method in their niched-Pareto GA (NPGA). In this method, a comparison set comprising of a specific number (tdom) of individuals is picked at random from the population at the beginning of each selection process. Two random individuals are picked from the population for selecting a winner according to the following procedure. Both individuals are compared with the members of the comparison set for domination with respect to objective functions. If one of them is non-dominated and the other is dominated, then the non-dominated point is selected. On the other hand, if both are either non-dominated or dominated, a niche count is found for each individual in the entire population. The niche count is calculated by simply counting the number of points in the population within a certain distance ( $\sigma_{\rm share}$) from an individual. The individual with least niche count is selected. Since this non-dominance is computed by comparing an individual with a randomly chosen population set of size tdom, the success of this algorithm highly depends on the parameter tdom. If a proper size is not chosen, true non-dominated (Pareto-optimal) points may not be found.

Srinivas and Deb (1994) developed a non-dominated sorting GA (NSGA), which is similar to the MOGA. NSGA differs from MOGA in two ways: fitness assignment and the way niching is performed. After the population is classified for the best class of non-domination, they are assigned a dummy fitness value equal to N (population size). A sharing strategy is then used on parameter values (instead of objective function values) to find the a niche count for each individual of the best class. Niche count measures a qualitative number of individuals in the vicinity of a solution. For each individual, a shared fitness is found by dividing the assigned fitness N by the niche count. The smallest shared fitness value ${\cal F}_1^{\min}$ is noted for further use. Thereafter, The second class of non-dominated solutions are found and a dummy fitness value equal to ${\cal F}_1^{\min}-\epsilon_1$ (where $\epsilon_1$ is a small positive number) is assigned to all individuals. Niche counts of individuals within this class are found and shared fitness values are calculated. This process is continued till all solutions are assigned a fitness value. This fitness assignment procedure ensures two aspects: (i) a dominated solution is assigned a smaller shared fitness value than any solution which dominates it and (ii) In each non-dominated class, diversity of ensured. On a number of test problems and real-world optimization problems, NSGA has found wide-spread Pareto-optimal or near Pareto-optimal solutions. One difficulty of NSGA is to choose the niching parameter, which signifies the maximum distance between two neighboring Pareto-optimal solutions. Although most studies used a fixed value of the niching parameter, there exists a study where an adaptive sizing strategy has been suggested (Fonseca and Fleming, 1993).

Multi-Criterion Evolutionary Optimization: Present

The field of multi-objective optimization is relatively new for classifying the studies according to past and present. However, in this section, we outline algorithms which are suggested in the past couple years or so.

Zitzler and Thiele (1998) suggested an elitist multi-criterion EA with the concept of non-domination in their strength Pareto EA (SPEA). They suggested maintaining an external population at every generation storing all non-dominated solutions discovered so far beginning from the initial population. This external population participates in genetic operations. At each generation, a combined population with the external and the current population is first constructed. All non-dominated solutions in the combined population are assigned a fitness based on the number of solutions they dominate. To maintain diversity and in the context of minimizing the fitness function, they assigned more fitness to a non-dominated solution having more dominated solutions in the combined population. On the other hand, more fitness is also assigned to solutions dominated by more solutions in the combined population. Care is taken to assign no non-dominated solution a fitness worse than that of the best dominated solution. This assignment of fitness makes sure that the search is directed towards the non-dominated solutions and simultaneously diversity among dominated and non-dominated solution are maintained. On knapsack problems, they have reported better results than any other method used in that study. However, such comparisons of algorithms is not appropriate, simply because SPEA approach uses a inherent elitism mechanism of using the best non-dominated solutions discovered up to the current generation, whereas other algorithms do not use any such mechanism. Nevertheless, an interesting aspect of that study is that it shows the importance of introducing elitism in evolutionary multi-criterion optimization.

Knowles and Corne (1999) suggested a simple possible MOEO using evolution strategy (ES). In their Pareto-archived ES (PAES) with one parent and one child, the child is compared with respect to the parent. If the child dominates the parent, the child is accepted as the next parent and the iteration continues. On the other hand, if the parent dominates the child, the child is discarded and a new mutated solution (a new child) is found. However, if the child and the parent do not dominate each other, the choice of child or a parent considers the second task of keeping diversity among obtained solutions. To maintain diversity, an archive of non-dominated solutions found so far is maintained. The child is compared with the archive to check if it dominates any member of the archive. If yes, the child is accepted as the new parent and the dominated solution is eliminated from the archive. If the child does not dominate any member of the archive, both parent and child are checked for their nearness with the solutions of the archive. If the child resides in a least crowded region in the parameter space among the members of the archive, it is accepted as a parent and a copy of added to the archive. It is interesting to note that both features of (i) emphasizing non-dominated solutions, and (ii) maintaining diversity among non-dominated solutions are present in this simple algorithm. Later, they suggested a multi-parent PAES with similar principles as above.

Van Veldhuizen and Lamont (1999) suggested a multi-objective messy GA (MOMGA), which works in two phases: (i) primordial phase finds important building blocks associated with the problem, and (ii) juxtapositional phase combines these building blocks to form optimal or near-optimal solutions (Goldberg, Korb, and Deb, 1989). In the primordial phase, partial strings are initialized. Unlike in the original messy GA, here, multiple templates, each corresponding to an individual objective function, are needed. MOMGA begins with random templates and then finds best templates for each objective from the best solutions obtained at the end of each era, thereby finding the competitive template for each objective for an era, a matter which is important for proper evaluation of partial solutions in an era. His selection operator is exactly the same as in Horn et al. (1995) and the recombination operator is exactly identical as that of the original messy GAs. Since the dominance measure is used for selection and diversity preserving mechanism of NPGA is used, the resulting algorithm has both the desired properties of a multi-objective optimizer. The study also suggests a parallel version of MOMGA to overcome the computational burden of messy GAs. Rudolph (1999) suggested, but did not simulate, a simple elitist multi-objective EA based on systematic comparison of individuals from parent and offspring populations. The non-dominated solutions of the offspring population are compared with that of parent solutions to form an overall non-dominated set of solutions, which becomes the parent population of the next iteration. If the cardinality of this set is not the same as the desired population size, other individuals from the offspring population are included. With this strategy, he has been able to prove the convergence of this algorithm to the Pareto-optimal set. Although this is an important achievement in its own right, the algorithm lacks motivation for the second task of maintaining diversity of Pareto-optimal solutions. An explicit diversity preserving mechanism must be added to make it more usable in practice.

Laumanns, Rudolph, and Schwefel (1999) suggested a predator-prey approach which is closer to the nature. In a toroidal graph, preys with the solution vectors are kept in the vertices of the graph and predators move along the edges in search of preys. At least M predators chase a prey. A predator is associated with an independent objective function and catches a prey with worst objective function value. When the prey is caught, it is deleted from the vertex and is replaced by a mutated solution. Although no explicit non-domination concept is used, the above concept is shown to force the prey solutions to converge to the Pareto-optimal solutions in a couple of test problems using a mutation-based evolution strategy. In order to obtain a better distribution of Pareto-optimal solutions, 100 predators per objective function is deployed in the neighborhood of each prey. The idea is nice and different from the rest of the multi-objective evolutionary algorithms, but it remains to be investigated how this concept scales to more complex problems.

Multi-Criterion Evolutionary Optimization: Future

In this section, we discuss a number of issues that must be attempted to answer in order to understand the working principles of multi-objective evolutionary optimization better and to make them more useful in practice. A list of them are outlined and discussed in the following:

1.
Comparison of existing multi-objective GA implementations
2.
To understand the dynamics of GA populations with generations
3.
Scalability of multi-objective GAs with number of objectives
4.
Introduction of elitism in multi-objective EAs
5.
Convergence to Pareto-optimal front
6.
Definition of metrics for comparing two populations
7.
Application to real-world problems

For a variety of research papers on multi-objective optimization using evolutionary algorithms, readers may visit the web site (http://www.lania.mx/~ccoello/EMOO/EMOObib.html) maintained by Carlos A. Coello Coello.




In order to solve constrained problems, we have suggested an efficient penalty function method which is suitable to be used with any population-based technique such as GAs. The penalty function is as follows:

(1)

In the above equation, the operator returns the operand, if the operand is negative and returns a value zero, otherwise. The parameter is the maximum function value of all feasible solutions in the population. A careful investigation of the above penalty function method reveals the following properties:
1.
The fitness of a feasible solution is equal to its function value f(x).
2.
The fitness of an infeasible solution in a population is always worse than that of a feasible solution.
3.
The fitness of a infeasible solution having smaller constraint violation than another infeasible solution is better.
Above properties of the fitness function allows proper assignment of copies in the selection process. We illustrate this fitness function in the case of a single-variable constrained optimization problem. Figure (put the number)shows the objective function and two inequality constraints. The fitness function based on equation (put the number) is also shown in the figure. Although this fitness function is discontinuous, the fitness function monotonically improves towards the feasible region. Since an evolutionary search method does not use any gradient information, its performance does not depend on the discontinuity of the search space. Moreover, since the above fitness function uses the maximum function value of the feasible solutions, this type of penalty function can only be used in the case of population-based algorithms such as evolutionary search methods.
  
 
The proposed penalty function method is demonstrated. The fitness of the feasible solutions is the same as f(x) and the fitness of the infeasible solutions is worse than the worst feasible solution. Solutions in a population are marked by eq. 2.

In order to demonstrated the efficacy of the proposed constrained handling technique, we apply a real-coded GA with SBX operator and this constrained handling technique on a two-variable multimodal optimization problem. The function has one optimal solution in the feasible search space, however the unconstrained function has two optimal solutions:

  (2)

The entire search space and the feasible region are shown in Figure (put the number) . When the real-coded GA with SBX and the above penalty function approach is applied with the initial population of 50 points as shown in the figure (marked with open circles), most population members converge inside the narrow feasible region after 10 generations (marked with open boxes) and finally converges to the true optimum solution (2.491, 2.299) after 30 generations (marked with solid circles). This behaviour has been observed in many repetitive runs of GAs. When the same function is tried to solve using the same real-coded GA but with a different penalty function approach (best known in the GA literature (Michalewicz and Schoenauer, 1996), the optimal solution is not found, instead the complete solution converges to the unconstrained optima which is infeasible to the above constrained optimization problem. The population after 10 generations is marked with a `+' in the figure. This simple comparison shows that the proposed constrained handling technique is superior to the best known strategy and is intuitively a better strategy to implement with EC algorithms.

The efficacy of the proposed penalty function approach is illustrated on a two-variable constrained optimization problem. The points marked with open circles are initial points. The points marked with open boxes are obtained after 10 generations of the proposed approach and the points marked with closed circles are obtained after 30 generations when the population has converged to the true optimum solution. The points marked with `+' are obtained after 10 generations of the best known penalty function approach with GAs. This GA converges to the unconstrained optimum (x1=3.0, x2=2.0), which is infeasible to the constrained problem, after 50 iterations.

To handle equality constraints, the fitness function presented in equation (put the equation number) can be suitably extended. This research has been suggested as a part of the proposed research and one of my primary current research interests.