优化算法——遗传算法（Genetic Algorithm）（基于python基本实现与deap库实现）

最新推荐文章于 2024-08-04 10:24:10 发布

一只黍离

最新推荐文章于 2024-08-04 10:24:10 发布

阅读量6.3k

点赞数

分类专栏：优化算法文章标签：遗传算法 GA Python

本文链接：https://blog.csdn.net/weixin_42201701/article/details/102543886

版权

优化算法专栏收录该内容

2 篇文章 8 订阅

订阅专栏

在这里插入图片描述

使用python实现基本的遗传算法，并学习使用python库deap实现遗传算法

一、遗传算法理解

定义

遗传算法（Genetic Algorithm, GA）起源于对生物系统所进行的计算机模拟研究。它是模仿自然界生物进化机制发展起来的随机全局搜索和优化方法，借鉴了达尔文的进化论和孟德尔的遗传学说。其本质是一种高效、并行、全局搜索的方法，能在搜索过程中自动获取和积累有关搜索空间的知识，并自适应地控制搜索过程以求得最佳解。

提出问题

已知一元函数：

目标找到该函数的最大值。

函数图示

“袋鼠跳”问题

该问题可思考为**“袋鼠跳”问题**

既然我们把函数曲线理解成一个一个山峰和山谷组成的山脉。那么我们可以设想所得到的每一个解就是一只袋鼠，我们希望它们不断的向着更高处跳去，直到跳到最高的山峰（尽管袋鼠本身不见得愿意那么做）。所以求最大值的过程就转化成一个“袋鼠跳”的过程。
模拟物竞天择的生物进化过程，通过维护一个潜在解的群体执行了多方向的搜索，并支持这些方向上的信息构成和交换。是以面为单位的搜索，比以点为单位的搜索，更能发现全局最优解。（在遗传算法中，有很多袋鼠，它们降落到喜玛拉雅山脉的任意地方。这些袋鼠并不知道它们的任务是寻找珠穆朗玛峰。但每过几年，就在一些海拔高度较低的地方射杀一些袋鼠，并希望存活下来的袋鼠是多产的，在它们所处的地方生儿育女。）（或者换个说法。从前，有一大群袋鼠，它们被莫名其妙的零散地遗弃于喜马拉雅山脉。于是只好在那里艰苦的生活。海拔低的地方弥漫着一种无色无味的毒气，海拔越高毒气越稀薄。可是可怜的袋鼠们对此全然不觉，还是习惯于活蹦乱跳。于是，不断有袋鼠死于海拔较低的地方，而越是在海拔高的袋鼠越是能活得更久，也越有机会生儿育女。就这样经过许多年，这些袋鼠们竟然都不自觉地聚拢到了一个个的山峰上，可是在所有的袋鼠中，只有聚拢到珠穆朗玛峰的袋鼠被带回了美丽的澳洲。）

染色体——基因的编码方式

人类的染色体有四种碱基组成：腺嘌呤（A）、鸟嘌呤（G）、胞嘧啶（C）、胸腺嘧啶（T），共有两种组合方式A-T、C-G，相当于2 bit 的信息量。人类通过这简单的组合做到了世界上不存在两个相同的人，那在计算机中能不能也找到类似的一个方式去表示特征呢？
受人类染色体编码方式的启发，结合计算机机器语言的特性，我们可以利用二进制编码的方式表示个体的特征：1000111011110101
这里的‘0’、‘1’就可等同于两种碱基，用一条链将所有的碱基有序的串起来，因为每个数据都能表现出1 bit的信息便，所以主要其足够长便可以表示一个个体的特征。
二进制的编码的方式简单直观，但明显地，当个体特征比较复杂的时候，需要大量的编码才能精确地描述，相应的解码过程（类似于生物学中的DNA翻译过程，就是把基因型映射到表现型的过程。）将过份繁复，为改善遗传算法的计算复杂性、提高运算效率，提出了浮点数编码：1.2 – 3.3 – 2.0 –5.4 – 2.7 – 4.3

适应度（Fitness）

一个染色体确定一个个体，多个个体定义一个种群。在初始化种群后，为了估计种群中每个个体对我们的环境（也就是我们提出的问题——目标函数，可理解为袋鼠生存的珠穆朗玛峰）的存活率，引入适应度的概念f(X)。适应度函数在求解过程中通常都是我们自己定义的，若是求解最大值问题，则适应度函数值越高，生存几率越大，求解最小值也可通过一定方法变为求最大值问题计算其生存率。

选择

选择也就是优胜劣汰操作，在遗传算法中我们一般选择轮盘赌方法，也可选择其他方法，这些方法在deap库中都有提供，可查询API。我们事先知道计算适应度的指标，就像我们知道袋鼠所在海拔越高适应性越强，适应度越高被选中的概率越高，轮盘赌的主要思想为个体被选中的概率与其适应度函数的大小成比列，用公式表示概率为：

对于之前提出的问题，给出以下种群进行轮盘赌选择：

i	x	f(x)	p(x)
1	2(00000010)	1.2	0.03
2	3(00000101)	4.429	0.12
3	9(00001001)	12.67	0.35
4	17(00010001)	17.6	0.5

轮盘赌

由表和图可见，特征为9(00001001)、17(00010001)的个体在**“竞争”**中存活率更高.

交叉

类比生物中的基因重组，不同基因重新组合产生新的基因，是有一定概率性的事件。在遗传算法中，不同的“亲代”个体进行随机交叉产生新的“子代”个体，同样也是一个有概率的事件，基本遗传算法一般采用单点交叉法。
交叉算子是遗传算法中的关键部分。

假设有以下两个“亲代”个体，这里用不同的颜色进行标注

10001011111000111 00011101101101010 通过交叉可得“子代”个体 10001011101101010 00011101 111000111 这是“亲代”个体在种群中被淘汰

变异

10001011101101010

10001011001101010

二、遗传算法python实现（不调用库）

定义目标函数

该目标函数即为第一部分提出的问题

def aimFunction(x):
    y=x+5*math.sin(5*x)+2*math.cos(3*x)
    return y

定义解码函数

将每一个体由二进制编码转为浮点数值

def decode(x):
    y=0+int(x,2)/(2**17-1)*9
    return y

定义适应度函数

在此处，适应度为每一个体对应目标函数的值。
因为我们对目标函数求解最大值，所以值越高适应度越高。

def fitness(population,aimFunction):
    value=[]
    for i in range(len(population)):
        value.append(aimFunction(decode(population[i])))
        if value[i]<0:
            value[i]=0
    return value

定义轮盘赌选择算子

def selection(population,value):
    # 轮盘赌选择
    fitness_sum = []
    for i in range(len(value)):
        if i ==0:
            fitness_sum.append(value[i])
        else:
            fitness_sum.append(fitness_sum[i-1]+value[i])
    for i in range(len(fitness_sum)):
        fitness_sum[i]/=sum(value)

    population_new = []
    for i in range(len(value)):
        rand = np.random.uniform(0,1)
        for j in range(len(value)):
            if j==0:
                if 0<rand and rand <=fitness_sum[j]:
                    population_new.append(population[i])
            else:
                if fitness_sum[j-1]<rand and rand<=fitness_sum[j]:
                    population_new.append(population[j])
    return population_new

定义交叉算子（crossover）

def crossover(population_new,pc):
    """
    交叉算子
    :param population: 经过选择后的种群
    :param pc: 交叉概率
    :return: 交叉后代
    """
    half = int(len(population_new)/2)
    father = population_new[:half]
    mother = population_new[half:]
    np.random.shuffle(father)
    np.random.shuffle(mother)
    offspring = []
    for i in range(half):
        if np.random.uniform(0,1)<=pc:
            copint = np.random.randint(0,int(len(father[i])/2))
            son = father[i][:copint]+mother[i][copint:]
            daughter = mother[i][:copint]+father[i][copint:]
        else:
            son = father[i]
            daughter = mother[i]

        offspring.append(son)
        offspring.append(daughter)

    return offspring

定义变异算子（mutation）

def mutation(offspring,pm):
    """
    变异算子
    :param offspring: 后代
    :param pm: 变异概率
    :return: 变异后代
    """
    for i in range(len(offspring)):
        if np.random.uniform(0,1)<=pm:
            position = np.random.randint(0,len(offspring[i]))
            #'str' object does not support item assignment,cannot use = to change value
            if position!=0:
                if offspring[i][position]=='1':
                    offspring[i]=offspring[i][0:position-1]+'0'+offspring[i][position+1:]
                else:
                    offspring[i]=offspring[i][0:position-1]+'1'+offspring[i][position+1:]
            else:
                if offspring[i][position]=='1':
                    offspring[i]='0'+offspring[i][1:]
                else:
                    offspring[i]='1'+offspring[i][1:]
    return offspring

三、遗传算法deap库实现

调用将要用到的库与工具

import random
from deap import creator
from deap import base
import math
from deap import tools
import numpy as np

定义目标函数与解码函数（与第二部分的实现一样）

def decode(x_list):
    x_ = [str(i) for i in x_list]
    x = "".join(x_)
    y=0+int(x,2)/(2**17-1)*9
    return y

def aimFunction(x):
    y=x+5*math.sin(5*x)+2*math.cos(3*x)
    return y

适应度函数

def evaluate(population):
    value=[]
    for i in range(len(population)):
        value.append(aimFunction(decode(population[i])))
        if value[i]<0:
            value[i]=0
    return value

toolbox.register("evaluate", evaluate)

这里的toolbox操作将在下文进行解释
下面是真正用到deap库的地方，注意看好

为每个个体定义适应性（Fitness）

creator.create("FitnessMax", base.Fitness, weights=(1.0,))

提供的Fitness类是一个抽象类，需要一个weights属性才能起作用。使用负片权重建立最小化适应度，而最大化适应度具有正权重。
此处用到的create()函数至少有两个参数，一个是新创建的类的名称和一个基类。任何后续参数都将成为该类的属性。

上述操作是对目标进行寻找最大值操作，若目标想要最小化或者是多目标拟合，则分别参考下列用法：

creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("FitnessMulti", base.Fitness, weights=(-1.0, 1.0))

定义个体（Individual）

creator.create("FitnessMax", base.Fitness, weights=(1.0,))  
creator.create("Individual", np.ndarray, fitness=creator.FitnessMax)

IND_SIZE=17

toolbox = base.Toolbox()
toolbox.register("attr_bool", random.randint, 0, 1)# 包含了0,1的随机整数。
toolbox.register("individual", tools.initRepeat, creator.Individual,toolbox.attr_bool, n=IND_SIZE)

新引入的register()方法至少需要两个参数; 别名和分配给此别名的函数。

定义交叉算子

toolbox.register("mate", tools.cxTwoPoint)

定义变异算子

toolbox.register("mutate", tools.mutFlipBit,indpb=0.02)

定义选择算子

toolbox.register("select", tools.selTournament, tournsize=3)

完整代码

import random
from deap import creator
from deap import base
import math
from deap import tools
import numpy as np

def decode(x_list):
    x_ = [str(i) for i in x_list]
    x = "".join(x_)
    y=0+int(x,2)/(2**17-1)*9
    return y

def aimFunction(x):
    y=x+5*math.sin(5*x)+2*math.cos(3*x)
    return y

def evaluate(population):
    value=[]
    for i in range(len(population)):
        value.append(aimFunction(decode(population[i])))
        if value[i]<0:
            value[i]=0
    return value



creator.create("FitnessMax", base.Fitness, weights=(1.0,))  # 这里这个base.Fitness是干嘛的？？？

# creator.create("Individual", list, fitness=creator.FitnessMax)  # 这里的list，fitness是参数，干嘛的？？？
creator.create("Individual", np.ndarray, fitness=creator.FitnessMax)

IND_SIZE=17

toolbox = base.Toolbox()

toolbox.register("attr_bool", random.randint, 0, 1)# 包含了0,1的随机整数。

toolbox.register("individual", tools.initRepeat, creator.Individual,toolbox.attr_bool, n=IND_SIZE)

toolbox.register("population", tools.initRepeat, list, toolbox.individual)
# toolbox.register("population", tools.initRepeat, np.ndarray, toolbox.individual)

toolbox.register("evaluate", evaluate)

toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutFlipBit,indpb=0.02)
toolbox.register("select", tools.selTournament, tournsize=3)

def main():
    random.seed(63)
    # create an initial population of 300 individuals (where
    # each individual is a list of integers)
    pop = toolbox.population(n=300)
    print(pop)

    # CXPB  is the probability with which two individuals
    #       are crossed
    #
    # MUTPB is the probability for mutating an individual
    #
    # NGEN  is the number of generations for which the
    #       evolution runs   进化运行的代数！果然，运行40代之后，就停止计算了
    CXPB, MUTPB, NGEN = 0.5, 0.2, 40
    print("Start of evolution")

    fitnesses = toolbox.evaluate(pop)
    print(fitnesses)
    for ind, fit in zip(pop, fitnesses):
        ind.fitness.values = (fit,)

    print("Evaluated %i individuals" % len(pop))  # 这时候，pop的长度还是300呢

    for g in range(NGEN):
        print("-- Generation %i --" % g)

        offspring = toolbox.select(pop, len(pop))
        offspring = list(map(toolbox.clone, offspring))

        # print("population:")
        # print(offspring)

        for child1, child2 in zip(offspring[::2], offspring[1::2]):
            # cross two individuals with probability CXPB
            if random.random() < CXPB:
                toolbox.mate(child1, child2)

                del child1.fitness.values
                del child2.fitness.values
        # print(offspring)
        # print("mate:")
        # print(offspring)

        # print(offspring[0] is offspring[1])
        # print(offspring[0] is offspring[2])
        # print(offspring[3] is offspring[4])
        for mutant in offspring:
            # mutate an individual with probability MUTPB
            if random.random() < MUTPB:
                toolbox.mutate(mutant)
                del mutant.fitness.values
        # print("mutant:")
        # print(offspring)
        # print(offspring)

        # Evaluate the individuals with an invalid fitness
        invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
        fitnesses = toolbox.evaluate(invalid_ind)

        for ind, fit in zip(invalid_ind, fitnesses):
            ind.fitness.values = (fit,)
        fitnesses = [ind.fitness.values for ind in offspring]
        # print(fitnesses)
        print("Evaluated %i individuals" % len(invalid_ind))

        # The population is entirely replaced by the offspring
        pop[:] = offspring

        # Gather all the fitnesses in one list and print the stats
        fits = [ind.fitness.values[0] for ind in pop]
        length = len(pop)
        mean = sum(fits) / length
        sum2 = sum(x * x for x in fits)
        std = abs(sum2 / length - mean ** 2) ** 0.5

        print("  Min %s" % min(fits))
        print("  Max %s" % max(fits))
        print("  Avg %s" % mean)
        print("  Std %s" % std)

    print("-- End of (successful) evolution --")

    best_ind = tools.selBest(pop, 1)[0]
    print("Best individual is %s, %s" % (best_ind, best_ind.fitness.values))

if __name__ == '__main__':
    main()