【遗传算法GA】--句子配对（Python）

比奇堡咻飞兜

于 2021-07-19 18:55:46 发布

阅读量577

点赞数

分类专栏：最优化文章标签： GA python

本文链接：https://blog.csdn.net/weixin_46308081/article/details/118901744

版权

最优化专栏收录该内容

10 篇文章 6 订阅

订阅专栏

文章目录

1.基础介绍

遗传算法的来源、定义、特点见之前的文章【遗传算法GA】–计算函数最值（Python）。

下面我们先来介绍本次需要完成的任务：

对于给定的一句英文，我们通过遗传算法让计算机自己还原出这句话。流程与之前相同，通过编码得到染色体，根据个体的适应度分别进行选择、交叉、变异，经过多次迭代之后得到最终结果。

重点关注的问题：

$\bullet$ 如何编码：由于给出的是英文字符，所以我们无法像之前一样通过0、1来编码。（事实上当对应的是数字或者选择结果只有两种时，我们采用0、1编码）对于英文字符通过查询ASCII表：
在这里插入图片描述
其中0-31还有127是控制字符不能进行显示，所以所有的英文字符都包含再32-126。那么我们就可以用数字32-126来表示染色体来进行编码。

$\bullet$ 适应度函数确认：适应度函数是进行遗传算法的重点，我们只有通过适应度的不同才可以找到最优情况。对于本题，我们可以对匹配字符进行计数，有几个字符匹配就记几，适应度越大，越吻合原句。

参数：

参数名称	意义
target	要匹配的句子，这里我们以`Keep head held high`为例
popsize	种群规模大小
pa	交叉概率
pm	变异概率
iterNum	迭代次数
dnaSize	编码长度，也就是匹配句子的字符个数
tarAscii	原句的ASCII编码
asciiBound	选取字符的范围，（32，126）

2.分步实现

$\bullet$ 参数定义：

target = 'Keep head held high'
popsize = 300
pa = 0.6
pm = 0.01
iterNum = 1000
dnaSize = len(target)
tarAscii = np.fromstring(target,dtype=np.uint8)
asciiBound = [32,126]

tarAscii为：

[ 75 101 101 112 32 104 101 97 100 32 104 101 108 100 32 104 105 103
104]

$\bullet$ 建立GA类，初始化函数为：参数依此为DNA长度、DNA对应的数字范围、交叉概率、变异概率和种群规模。

def __init__(self,dnaSize,dnaBound,pc,pm,popsize):
    self.dnaSize=dnaSize
    dnaBound[1]+=1
    self.dnaBound=dnaBound
    self.pc=pc
    self.pm=pm
    self.popsize=popsize
    
    self.pop = np.random.randint(*dnaBound, size=(popsize, dnaSize)).astype(np.int8) #将int8转变成ASCII

$\bullet$ 将数字转换为字符translateDNA函数：根据ASCII表进行对应的转换。

def translateDNA(self,DNA):
    return DNA.tostring().decode('ascii')

$\bullet$ 获取适应度getFitness函数：寻找该种群中每个个体与目标所匹配的字符个数。

def getFitness(self):
   match_count = (self.pop == tarAscii).sum(axis=1)
    return match_count

$\bullet$ 选择函数selection：将适应度高的个体多选择几个，用来替代适应度低的个体，保持总数不变。

def selection(self):
    fitness = self.getFitness() + 1e-4
    idx = np.random.choice(np.arange(self.popsize),size=self.popsize,replace=True,p=fitness/fitness.sum())
    return self.pop[idx]

$\bullet$ 交换函数crossover：随机选取一个数，当该数小于交换概率时（大于也行，选择一个方向就行），i为随机一个个体下标。cpoint为随机选取的一些进行交换的点，然后将parent对应位置的点进行替换。

def crossover(self,parent,pop):
    if np.random.rand()<self.pc:
        i = np.random.randint(0,self.popsize,size=1)
        cpoint = np.random.randint(0,2,self.dnaSize).astype(np.bool)
        parent[cpoint] = pop[i,cpoint]
    return parent

$\bullet$ 变异函数mutation：当随机一个数小于变异概率时，将child中一个位置进行随机替换。

def mutation(self,child):
    for point in range(self.dnaSize):
        if np.random.rand()<self.pm:
            child[point]=np.random.randint(*self.dnaBound)
    return child

$\bullet$ 进化函数evolve：调用交换函数和变异函数。

def evolve(self):
   pop = self.selection()
    pop_copy = pop.copy()
    for parent in pop:
        child = self.crossover(parent,pop_copy)
        child = self.mutation(child)
        parent[:] = child
    self.pop = pop

$\bullet$ 主函数：生成GA对象，并进行迭代，当产生符合原语句的语句时跳出循环。

if __name__=='__main__':
    ga = GA(dnaSize=DNASIZE,dnaBound=ASCIIBOUND,pc=PC,pm=PM,popsize=POPSIZE)
    
	for gen in range(iterNum):
	    fitness = ga.getFitness()
	    bestDna = ga.pop[np.argmax(fitness)]
	    bestTar = ga.translateDNA(bestDna)
	    print("Gen", gen," : ",bestTar)
	    if bestTar == TARGET:
	        break;
	    ga.evolve()

3.完整代码

import numpy as np

TARGET = 'Keep head held high'
POPSIZE = 300
PC = 0.4
PM = 0.01
iterNum = 1000
DNASIZE = len(TARGET)
tarAscii = np.fromstring(TARGET,dtype=np.uint8)
ASCIIBOUND = [32, 126]
# print(tarAscii)

class GA(object):
    def __init__(self,dnaSize,dnaBound,pc,pm,popsize):
        self.dnaSize=dnaSize
        dnaBound[1]+=1
        self.dnaBound=dnaBound
        self.pc=pc
        self.pm=pm
        self.popsize=popsize
        
        self.pop = np.random.randint(*dnaBound, size=(popsize, dnaSize)).astype(np.int8) #将int8转变成ASCII
        
    def translateDNA(self,DNA):
        return DNA.tostring().decode('ascii')
    
    def getFitness(self):
        match_count = (self.pop == tarAscii).sum(axis=1)
        return match_count
    
    def selection(self):
        fitness = self.getFitness() + 1e-4
        idx = np.random.choice(np.arange(self.popsize),size=self.popsize,replace=True,p=fitness/fitness.sum())
        return self.pop[idx]
    
    def crossover(self,parent,pop):
        if np.random.rand()<self.pc:
            i = np.random.randint(0,self.popsize,size=1)
            cpoint = np.random.randint(0,2,self.dnaSize).astype(np.bool)
            parent[cpoint] = pop[i,cpoint]
        return parent
    
    def mutation(self,child):
        for point in range(self.dnaSize):
            if np.random.rand()<self.pm:
                child[point]=np.random.randint(*self.dnaBound)
        return child
        
    def evolve(self):
        pop = self.selection()
        pop_copy = pop.copy()
        for parent in pop:
            child = self.crossover(parent,pop_copy)
            child = self.mutation(child)
            parent[:] = child
        self.pop = pop
        
if __name__=='__main__':
    ga = GA(dnaSize=DNASIZE,dnaBound=ASCIIBOUND,pc=PC,pm=PM,popsize=POPSIZE)
    
    for gen in range(iterNum):
        fitness = ga.getFitness()A
        bestDna = ga.pop[np.argmax(fitness)]
        bestTar = ga.translateDNA(bestDna)
        print("Gen", gen," : ",bestTar)
        if bestTar == TARGET:
            break;
        ga.evolve()