9.4 Example Applications

Fig. 9.4 Comparing algorithms on problem instances with a scalable parameter
图 9.4 用可伸缩参数比较问题实例上的算法

Furthermore, there can be few available sets of real data, and these data may be commercially sensitive and therefore diffiffifficult to publish and to allow others to compare.
Last, but not least, there might be so many applicationspecifific aspects involved that the results are hard to generalise.
Despite these drawbacks it remains highly relevant to tackle real-world problems as the proof of the pudding is in the eating!

As mentioned in the introduction to this chapter, instead of presenting two case studies with implementation details, we next describe examples of good and bad practice, in order to illustrate some of our points.

9.4.1 Bad Practice
This section shows a hypothetical example of an experimental study following the template that can be found in many EC publications.1 In this imaginary case a researcher has invented a new EA feature, e.g., “tricky mutation”, and assessed the value of this new feature by running a standard GA and “tricky GA” 20 times independently on each of 10 objective functions chosen from the literature. The outcomes of these experiments proved tricky GA better on seven, equal on one, and worse on two objective functions in terms of SR. On this basis it was concluded that the new feature is indeed valuable.

The main question here is what did we, the EC community, learn from this experience? We did learn a new feature (tricky mutation) and obtained some indication that it might be a promising idea to try in a GA. This can of course justify publishing a paper reporting this; however, there are also many things that we did not learn here, including:

9 Working with Evolutionary Algorithms

• How relevant are these results, e.g., are the test functions typical of realworld problems, or important only from an academic perspective?
• What would have happened if a different performance metric had been used, or if the runs had been ended sooner, or later?
• What is the scope of claims about the superiority of the tricky GA?
• Is there a property distinguishing the seven good and two bad functions?
• Are these results generalisable? Alternatively, do some features of the tricky GA make it applicable for other specific problems, and if so which?
• How sensitive are these results to changes in the algorithm’s parameters?
• Are the performance differences as measured here statistically significant, or can they be just artifacts caused by random effects?
• 这些结果的相关性有多大,例如,测试函数是典型的现实世界问题,还是仅仅从学术角度来看重要?

The next example explicitly addresses some of these issues and therefore forms a showcase for a better, albeit still not perfect, practice.

9.4.2 Better Practice
A better example of how to evaluate the behaviour of a new algorithm takes into account questions such as:
• What type of problem am I trying to solve?
• What would be a desirable property of an algorithm for this type of problem, for example: speed of finding good solutions, reliably locating good solutions, or occasional brilliance?
• What methods currently exist for this problem, and why am I trying to make a new one, i.e., when do they not t perform well?
• 我想解决什么样的问题?

After considering these issues, a particular problem type can be chosen, a careful set of experiments can be designed, and the necessary data to collect can be identifified. A typical process might proceed along the following lines:
•inventing a new EA (xEA) for solving problem X
• identifying three other EAs and a traditional benchmark heuristic for problem X in the literature
• asking when and why xEA could be better than the other four methods
• obtaining a problem instance generator for problem X with two parameters: n (problem size) and k (some problem-specific indicator)
• selecting five values for k and five values for n • generating 100 random problem instances for all 25 combinations
• executing all algorithms on each instance 100 times (the benchmark heuristic is also stochastic)
• recording AES, SR, and MBF values and standard deviations (not for SR)
• identifying appropriate tests based on the data and assessing the statistical
significance of results
• putting the program code and the instances on the Web
•确定其他三个EA和文献中prob lem X的传统基准启发式
•在每个实例上执行所有算法100次(基准heuris tic也是随机的)

The advantages of this template with respect to the one in the previous example are numerous:
• The results can be arranged in 3D: that is, as a performance landscape over the (n, k) plane with special attention to the effect of n on scale-up.
• The niche for xEA can be identified, e.g., weak with respect to other algorithms for (n, k) combinations of type 1, strong for (n, k) combinations of type 2, comparable otherwise. Thus the ‘when’ question can be answered.
• Analysing the specific features and the niches of each algorithm can shed light on the ‘why’ question.
• A lot of knowledge has been collected about problem X and its solvers.
• Generalisable results are achieved, or at least claims with well-identified
scope based on solid data.
• Reproduction of the results, and further research elsewhere, is facilitated.

For exercises and recommended reading for this chapter, please visit www.evolutionarycomputation.org.

  1. 作者承认,他们自己的一些论文也遵循这个模板。 ↩︎





