《Machine Learning（Tom M. Mitchell）》读书笔记——10、第九章

最新推荐文章于 2019-01-14 16:06:04 发布

mmc2015

最新推荐文章于 2019-01-14 16:06:04 发布

阅读量1k

点赞数

分类专栏：《MachineLearning，Tom Mitchell》

本文链接：https://blog.csdn.net/mmc2015/article/details/41553923

版权

《MachineLearning，Tom Mitchell》专栏收录该内容

17 篇文章 2 订阅

订阅专栏

1. Introduction (about machine learning)

2. Concept Learning and the General-to-Specific Ordering

3. Decision Tree Learning

4. Artificial Neural Networks

5. Evaluating Hypotheses

6. Bayesian Learning

7. Computational Learning Theory

8. Instance-Based Learning

9. Genetic Algorithms

10. Learning Sets of Rules

11. Analytical Learning

12. Combining Inductive and Analytical Learning

13. Reinforcement Learning

9. Genetic Algorithms

This chapter covers both genetic algorithms(遗传算法), in which hypotheses are typically described by bit strings, and genetic programming(遗传编程), in which hypotheses are described by computer programs.

9.1 MOTIVATION

The popularity of GAS is motivated by a number of factors including:

Evolution is known to be a successful, robust method for adaptation within biological systems.

GAS can search spaces of hypotheses containing complex interacting parts, where the impact of each part on overall hypothesis fitness may be difficult to model.

Genetic algorithms are easily parallelized and can take advantage of the decreasing costs of powerful computer hardware.

9.2 GENETIC ALGORITHMS

The problem addressed by GAS is to search a space of candidate hypotheses to identify the best hypothesis. In GAS the "best hypothesis" is defined as the one that optimizes a predefined numerical measure for the problem at hand, called the hypothesis fitness(适应度).

Although different implementations of genetic algorithms vary in their details, they typically share the following structure: The algorithm operates by itera- tively updating a pool of hypotheses, called the population. On each iteration, all members of the population are evaluated according to the fitness function. A new population is then generated by probabilistically selecting the most fit individuals from the current population. Some of these selected individuals are carried forward into the next generation population intact. Others are used as the basis for creating new offspring individuals by applying genetic operations(遗传算子) such as crossover and mutation.

A prototypical genetic algorithm is described in Table 9.1.

9.2.1 Representing Hypotheses

Hypotheses in GAS are often represented by bit strings, so that they can be easily manipulated by genetic operators such as mutation and crossover. The hypotheses represented by these bit strings can be quite complex. For example, sets of if-then rules can easily be represented in this way, by choosing an encoding of rules that allocates specific substrings for each rule precondition and postcondition. Placing a 1 in some position indicates that the attribute is allowed to take on the corresponding value.

To pick an example, consider the attribute Outlook, which can take on any of the three values Sunny, Overcast, or Rain; consider a second attribute, Wind, that can take on the value Strong or Weak, then:

9.2.2 Genetic Operators

The two most common operators are crossover and mutation.

9.2.3 Fitness Function and Selection

fitness proportionate selection, or roulette wheel selection.(适应度比例选择; 轮盘赌选择)

tournament selection(锦标赛选择)

rank selection(排序选择)

9.3 AN ILLUSTRATIVE EXAMPLE

价值不大，略过。P256-P258。

9.4 HYPOTHESIS SPACE SEARCH

The GA search can move much more abruptly, replacing a parent hypothesis by an offspring that may be radically different from the parent. And the GA search is therefore less likely to fall into the same kind of local minima that can plague gradient descent methods.

One practical difficulty in some GA applications is the problem of crowding(拥挤). Crowding is a phenomenon in which some individual that is more highly fit than others in the population quickly reproduces, so that copies of this individual and 1 very similar individuals take over a large fraction of the population. The negative impact of crowding is that it reduces the diversity of the population, thereby slow- ing further progress by the GA.

Several strategies have been explored for reducing crowding. One approach is to alter the selection function, using criteria such as tournament selection or rank selection in place of fitness proportionate roulette wheel selection. A related strategy is "fitness sharing"(适应度共享), in which the measured fitness of an individual is reduced by the presence of other, similar individuals in the population. A third approach is to restrict the kinds of individuals allowed to recombine to form offspring. For example, by allowing only the most similar individuals to recombine, we can encourage the formation of clusters of similar individuals, or multiple "subspecies"(亚种) within the population. A related approach is to spatially distribute(按空间) individuals and allow only nearby individuals to recombine. Many of these techniques are inspired by the analogy to biological evolution.

9.4.1 Population Evolution and the Schema Theorem

It is interesting to ask whether one can mathematically characterize the evolution over time of the population within a GA.

The schema theorem of Holland (1975) provides one such characterization. It is based on the concept of schemas, or patterns that describe sets of bit strings. To be precise, a schema is any string composed of Os, Is, and *'s. Each schema represents the set of bit strings containing the indicated 0s and Is, with each "*" interpreted as a "don't care." For example, the schema 0*10 represents the set of bit strings that includes exactly 0010 and 0110.

The schema theorem characterizes the evolution of the population within a GA in terms of the number of instances representing each schema. Let m(s, t) denote the number of instances of schema s in the population at time t (i.e., during the tth generation). The schema theorem describes the expected value of m(s, t + 1) in terms of m(s, t) and other properties of the schema, population, and GA algorithm parameters.

Let /f(t) denote the average fitness of all individuals in the population at time t and let ^u(s, t) denote the average fitness of instances of schema s in the population at time t. then:

If we view the GA as performing a virtual parallel search through the space of possible schemas at the same time it performs its explicit parallel search through the space of individuals, then Equation (9.3) indicates that more fit schemas will grow in influence over time.

The schema theorem is perhaps the most widely cited characterization of population evolution within a GA. One way in which it is incomplete is that it fails to consider the (presumably) positive effects of crossover and mutation. Numerous more recent theoretical analyses have been proposed, including analyses based on Markov chain models(马尔科夫链模型) and on statistical mechanics models(统计力学模型). See, for example, Whitley and Vose (1995) and Mitchell (1996).

9.5 GENETIC PROGRAMMING

Genetic programming (GP) is a form of evolutionary computation in which the in- dividuals in the evolving population are computer programs rather than bit strings.

9.5.1 Representing Programs

Programs manipulated by a GP are typically represented by trees corresponding to the parse tree(解析树) of the program. Each function call is represented by a node in the tree, and the arguments to the function are given by its descendant nodes. For example, Figure 9.1 illustrates this tree representation for the function sin(x) + sqrt(x*x + y).

To apply genetic programming to a particular domain, the user must define the primitive functions(原子函数) to be considered (e.g., sin, cos, +, -, exponentials),a s well as the terminals (e.g., x, y, constants such as 2). The genetic programming algorithm then uses an evolutionary search to explore the vast space of programs that can be described using these primitives.

As in a genetic algorithm, the prototypical genetic programming algorithm maintains a population of individuals (in this case, program trees). On each iteration, it produces a new generation of individuals using selection, crossover, and mutation. The fitness of a given individual program in the population is typ- ically determined by executing the program on a set of training data.

9.5.2 Illustrative Example

As in most GP applications, the choice of problem representation has a significant impact on the ease of solving the problem. In Koza's formulation, the primitive functions used to compose programs for this task include the following three terminal arguments:

CS (current stack), which refers to the name of the top block on the stack, or F if there is no current stack.

TB (top correct block), which refers to the name of the topmost block on the stack, such that it and those blocks beneath it are in the correct order.

NN (next necessary), which refers to the name of the next block needed above TB in the stack, in order to spell the word "universal" or F if no more blocks are needed.

9.5.3 Remarks on Genetic Programming

Despite the huge size of the hypothesis space it must search, genetic programming has been demonstrated to produce intriguing results in a number of applications.

In most cases, the performance of genetic programming depends crucially on the choice of representation and on the choice of fitness function. For this reason, an active area of current research is aimed at the automatic discovery and incorporation of subroutines that improve on the original set of primitive functions, thereby allowing the system to dynamically alter the primitives from which it constructs individuals. See, for example, Koza (1994).

9.6 MODELS OF EVOLUTION AND LEARNING

One interesting question regarding evolutionary systems is "What is the relationship between learning during the lifetime of a single individual, and the longer time frame species-level learning afforded by evolution?'

9.6.1 Lamarckian Evolution(拉马克进化)

9.6.2 Baldwin Effect(鲍德温效应)

9.7 PARALLELIZING GENETIC ALGORITHMS(并行遗传算法)

GAS are naturally suited to parallel implementation, and a number of approaches to parallelization have been explored. Coarse grain(粗粒度) approaches to parallelization subdivide the population into somewhat distinct groups of individuals, called demes(类属). Each deme is assigned to a different computational node, and a standard GA search is performed at each node. Communication and cross-fertilization(交叉受精) between demes occurs on a less frequent basis than within demes. In contrast to coarse-grained parallel implementations of GAS, fine-grained implementations typically assign one processor per individual in the population. Recombination then takes place among neighboring individuals.

mmc2015

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
《Machine Learning（Tom M. Mitchell）》读书笔记——10、第九章

1. Introduction (about machine learning)2. Concept Learning and the General-to-Specific Ordering3. Decision Tree Learning4. Artificial Neural Networks5. Evaluating Hypotheses6.
复制链接

扫一扫

专栏目录