用遗传算法优化BP神经网络的初始化参数

置顶 xiaokinL

已于 2023-12-02 17:12:13 修改

阅读量1k

点赞数 6

文章标签：神经网络人工智能深度学习

于 2023-11-17 16:55:04 首次发布

本文链接：https://blog.csdn.net/qq_46694178/article/details/134460186

版权

一、想法

最近一个课设要求用遗传算法解决一个典型优化问题，恰好最近又在入门深度学习，想着能不能把这两个结合起来。遗传算法的优点在于使用交叉、变异、选择等操作，理论上种群数目足够大，迭代次数足够多，则能够找到全局最优解，但是如果参数数目很多的话，迭代耗费的时间会比较多；利用梯度下降法优化的BP神经网络，在gpu上计算的话，速度会很快，但是如果不是理想的凸优化问题，得到的解很可能就是局部最优解，但是一般的BP优化问题都是非凸的。

为了解决这个问题，我的想法是用GA先对神经网络的初始化参数：各层的权重和偏置，进行优化，评价指标就是优化完成之后的参数作为初始化参数给到网络，然后再在训练集上跑一遍，看看准确率如何。为什么会想到用GA优化这个呢？因为BP网络的初始化参数不一样的时候，最终训练得到的网络收敛速度和预测准确率都不尽相同。下面是网络初始参数分别为全0、随机生成的均匀分布和随机生成的正态分布，得到的训练过程：

由此可以看出不同初始化参数对网络训练的影响很大。而BP网络通过梯度下降法优化的参数正是各层的权重和偏置，所以如果使用GA先对这些参数进行预训练，得到最优的初始化参数，这里的最优初始化参数刚好在全局最优解附近，此时再利用梯度下降法就能够很快得到全局最优解。简而言之，先利用GA找到全局最优解的邻域附近，再通过梯度下降确定全局最优解。

但是这也随之带来一个问题，如果在训练GA的时候以预测准确度最大（或者说网络损失函数最小）为目标函数，那该如何确定GA预训练的程度，使得GA得到的参数刚好在全局最优解附近？这个问题先放一放

二、遗传算法

2.1遗传算法介绍

遗传算法（Genetic Algorithm，GA）是一种模拟自然选择和遗传机制的优化算法，用于在搜索空间中寻找最优解或接近最优解的问题。它属于进化算法的一种，借鉴了生物学中的遗传学和自然选择的概念。遗传算法的基本思想是通过模拟生物个体的进化过程来逐步优化问题的解。算法通过一代代的迭代，利用交叉、变异和选择等遗传操作，不断演化种群中的个体，使其逐渐趋向于问题的最优解。

它的主要思想来源就是生物的进化，物竞天择，适者生存。假设有一个种群里最开始有100个个体，它们通过互相结合，也就是发生了染色体交叉（概率很大），接着每个个体的基因都有可能发生突变（概率较小），然后通过每一个个体的表现型来评判它适应当前环境的程度，通过人工选择，适应度较差的个体有很大的概率被淘汰，最终得到一个新的种群，种群数目仍是100（会有重复选择优秀个体的情况）。如此迭代下去，直到种群中出现了适应度达到要求的个体。由于最开始的种群的个体都是随机生成的，包括每个个体都又可能发生变异，所以种群里的基因型是多样化的，他们都会朝着最大适应度的方向进化。

2.2遗传算法的步骤

1、初始化种群：随机生成一组个体，这些个体代表问题的潜在解。这组个体构成了初始种群。

这里需要注意的是还有编码和解码的操作，我觉得这是遗传算法里很重要的一个部分，它很大程度上影响了算法的运行速度和最终的优化结果。通常使用的编码是二进制编码，例如一个参数用一个10位的二进制表示一个十进制的数，但是只用二进制表示范围有点小，所以可以指定一个区间来表示这个参数的实际取值区间，然后利用基因的二进制的值作为相对距离去计算该参数在区间内的实际值，那么此时的二进制的位数就可以表示为待求解参数的精度了，二进制数表示基因型，参数实际值表示表现型。初始化种群时，是对种群内的所有个体的基因型进行随机初始化，随机生成的0，1序列就是初始化的基因型。接着对种群的所有个体进行其他操作。

2、适应度评估：对每个个体计算其适应度，即解决问题的效果好坏的度量标准。适应度越高，个体越有可能被选择进行后续的遗传操作。

3、选择操作：根据适应度，选择一部分个体作为父代，通常选择适应度较高的个体，以增加优秀解的概率。常见的选择方法有：

        轮盘赌选择（Roulette Wheel Selection）：这是最常见的选择方法之一，每个个体的选择概率与其适应度成正比。具体做法是将适应度映射到一个区间上，然后通过轮盘赌的方式，按照适应度的大小来选择个体。比如一个种群的个体的适应度分别为：[0.4，0.1，0.6，0.9]，那么个体被选择的概率就为[0.2，0.05，0.3，0.45]，再通过累积概率，得到[0.2，0.25，0.55，1]四个区间，此时生成[0，1]区间里的一个随机数，在哪个区间就选择哪个个体。可以看到，如果个体的适应度越大，那么它所占的概率空间也就越大，从而被选择的概率也就越大，达到了选择的效果。

        锦标赛选择（Tournament Selection）：在每次选择中，随机选择一定数量的个体（称为锦标赛规模），然后从中选择适应度最好的个体作为父代。这种方法对于维护种群的多样性较为有效。

        排名选择（Rank Selection）：将种群中的个体按照适应度从高到低进行排名，然后根据排名选择个体。排名越高的个体被选中的概率越大。这种方法相对于轮盘赌选择更加稳定，因为不受适应度的细微差异影响。

4、交叉操作：通过交叉操作，将父代个体的基因信息进行组合，生成新的个体。这模拟了生物学中的交叉过程。

5、变异操作：对新生成的个体进行变异，引入一些随机性，以增加搜索空间的广度。变异操作模拟了基因的突变过程。

6、生成新种群：将经过选择、交叉和变异操作得到的个体组成新的种群，作为下一代种群。

7、重复迭代：重复以上步骤，直到满足停止条件，比如达到最大迭代次数、适应度达到一定阈值，或者经过一定次数迭代后解没有显著改善。

8、输出结果：返回最终的个体作为问题的解或近似解。

2.3一个简单的遗传算法

用遗传算法优化一个凸函数和非凸函数: F(x,y)=-(x^2+y^2) 和 F(x,y)=sin(pai/2*x)*cos(pai/2*y)

1、定义基因和设置超参数

用24位二进制来表示一个参数，X和Y的边界都为[-3,3]，种群大小为100，交叉概率设为0.8，变异概率为0.005，迭代次数15次。

DNA_SIZE = 24            #定义每个基因的二进制长度，一个基因就是一个变量，它的二进制长度为24，
                         #如果有两个变量，（x，y）那长度就是48，且x,y交替往后
POP_SIZE = 100           #种群大小为200
#现在的种群是200 X 48的初始种群矩阵

CROSSOVER_RATE = 0.8    #交叉概率
MUTATION_RATE = 0.005   #变异概率
N_GENERATIONS = 15      #迭代次数
X_BOUND = [-3, 3]       # x变量边界
Y_BOUND = [-3, 3]       # y边界

2、定义适应度函数

由于我们要求的就是函数的最大值，所以适应度函数就是目标函数。

#适应度计算
def F(x, y):
    return -(x**2+y**2)
    #return np.sin(math.pi/2*x)*np.cos(math.pi/2*y)

3、基因解码操作


#把二进制的基因解码为十进制
def translateDNA(pop):     #pop表示种群矩阵，一行表示一个二进制编码表示的DNA，矩阵的行数为种群数目
    x_pop = pop[:,1::2]   #奇数列表示X
    y_pop = pop[:,::2]    #偶数列表示y
    
    #pop:(POP_SIZE,DNA_SIZE)*(DNA_SIZE,1) --> (POP_SIZE,1)
    x = x_pop.dot(2**np.arange(DNA_SIZE)[::-1])/float(2**DNA_SIZE-1)*(X_BOUND[1]-X_BOUND[0])+X_BOUND[0]
    y = y_pop.dot(2**np.arange(DNA_SIZE)[::-1])/float(2**DNA_SIZE-1)*(Y_BOUND[1]-Y_BOUND[0])+Y_BOUND[0]
    return x,y          # 得到了十进制的x和y，x的大小是24，y的大小也是24

4、定义获取个体适应度操作

#得到适应度，获取轮盘赌的区域分界值
def get_fitness(pop): 
    x,y = translateDNA(pop)
    pred = F(x, y)                          #减去最小的适应度是为了防止适应度出现负数，通过这一步fitness的范围
    return (pred - np.min(pred)) + 1e-3    #为[0, np.max(pred)-np.min(pred)],最后在加上一个很小的数防止出现为0的适应度
#把适应度转为[0，max_fitness-min_fitness]范围，
#同样也可以得到最小适应度，

5、定义变异操作

#变异，就是随机选一位，该位取反
def mutation(child, MUTATION_RATE=0.003):
    if np.random.rand() < MUTATION_RATE:                 #以MUTATION_RATE的概率进行变异
        mutate_point = np.random.randint(0, DNA_SIZE*2)    #随机产生一个实数，代表要变异基因的位置
        child[mutate_point] = child[mutate_point]^1      #将变异点的二进制为反转，^1异或,就是突变

6、定义交叉操作

#交叉，这一步也就是生成子代
def crossover_and_mutation(pop, CROSSOVER_RATE = 0.8):
    new_pop = []
    for father in pop:                                             #遍历种群中的每一个个体，将该个体作为父亲
        child = father                                          #孩子先得到父亲的全部基因（这里我把一串二进制串的那些0，1称为基因）
        if np.random.rand() < CROSSOVER_RATE:                             #产生子代时不是必然发生交叉，而是以一定的概率发生交叉
            mother = pop[np.random.randint(POP_SIZE)]                #再种群中选择另一个个体，并将该个体作为母亲
            cross_points = np.random.randint(low=0, high=DNA_SIZE*2)            #随机产生交叉的点
            child[cross_points:] = mother[cross_points:]                    #孩子得到位于交叉点后的母亲的基因
        mutation(child) 
        new_pop.append(child)                                               #每个后代有一定的机率发生变异
    return new_pop

7、定义选择操作

#选择，获取最大值所在的基因片段，适应度越高，被选择的机会越高
def select(pop, fitness):    # nature selection wrt pop's fitness
    idx = np.random.choice(np.arange(POP_SIZE), size=POP_SIZE, replace=True,
                           p=(fitness)/(fitness.sum()) )
    #  以概率p随机生成索引值，p越大，选择该fitness对应的索引值的概率越大
    # print("*************************************")
    # print(fitness/fitness.sum())
    # print("*************************************")
    return pop[idx]

8、定义打印操作

def print_info(pop):
    fitness = get_fitness(pop)
    max_fitness_index = np.argmax(fitness)
    print("max_fitness:", fitness[max_fitness_index],"\n")
    fitness_list.append(fitness[max_fitness_index])
    x,y = translateDNA(pop)
    print("最优的基因型：", pop[max_fitness_index],"\n")
    print("(x, y):", (x[max_fitness_index], y[max_fitness_index]),"\n")
    print('函数最大值是：', (F(x[max_fitness_index], y[max_fitness_index])),"\n")
    function_list.append(F(x[max_fitness_index], y[max_fitness_index]))

9、定义画图操作

画一个3d的图，用来直观感受每个个体随着迭代次数改变的情况。

#画图
def plot_3d(ax):

    X = np.linspace(*X_BOUND, 100)
    Y = np.linspace(*Y_BOUND, 100)
    X,Y = np.meshgrid(X, Y)
    Z = F(X, Y)
    ax.plot_surface(X,Y,Z,rstride=1,cstride=1,cmap=cm.coolwarm)
    ax.set_zlim(-10,10)
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_zlabel('z')
    plt.pause(3)
    plt.show()

10、主函数

import math
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D

if __name__ == "__main__":
    fitness_list=[]
    function_list=[]
    fig = plt.figure()
    #ax = Axes3D(fig)
    ax = fig.add_axes(Axes3D(fig))
    plt.ion()#将画图模式改为交互模式，程序遇到plt.show不会暂停，而是继续执行
    plot_3d(ax)

    pop = np.random.randint(2, size=(POP_SIZE, DNA_SIZE*2)) #matrix (POP_SIZE, DNA_SIZE)
    for _ in range(N_GENERATIONS):#迭代N代
        x,y = translateDNA(pop)
        if 'sca' in locals(): 
            sca.remove()
        sca = ax.scatter(x, y, F(x,y), c='black', marker='o');plt.show();plt.pause(0.1)
        pop = np.array(crossover_and_mutation(pop, CROSSOVER_RATE))
        #F_values = F(translateDNA(pop)[0], translateDNA(pop)[1])#x, y --> Z matrix
        fitness = get_fitness(pop)
        print_info(pop)
        pop = select(pop, fitness) #选择生成新的种群

 #   print_info(pop)
    plt.ioff()
    plot_3d(ax)
    x=[]
    for i in range(N_GENERATIONS):
        x.append(i+1)


    plt.subplot(2, 1, 1)
    plt.plot(x, function_list, 'o-')
    plt.title('function')
    plt.ylabel('function_value')
    plt.subplot(2, 1, 2)
    plt.plot(x, fitness_list, 'o-')
    plt.title('fitness')
    plt.ylabel('fitness_value')

    plt.show()
    n=eval(input())

11、结果分析

凸函数：F(x,y)=-(x^2+y^2)

可以看到，最大值非常接近0，也就是全局最优解位置。

非凸函数：F(x,y)=sin(pai/2*x)*cos(pai/2*y)

可以看到，最大函数值非常接近1，也就是全局最优解位置。

代码来源参考这里：【精选】遗传算法的原理与python实现_python实现遗传算法解决二维函数并绘制图像_steelDK的博客-CSDN博客

三、设计实验

现在已经学会了最简单的遗传算法操作，接下来就来设计我们的实验。

整个算法的流程为：

3.1、构造LeNet网络

1、先加载minist数据集并划分训练集和测试集


def get_dataloader_workers():  #@save
    """使用4个进程来读取数据"""
    return 1
#划分数据集为训练集和测试集
def load_data_mnist(batch_size, resize=None):  #@save
    """下载Fashion-MNIST数据集,然后将其加载到内存中"""
    trans = [transforms.ToTensor()]
    if resize:
        trans.insert(0, transforms.Resize(resize))
    trans = transforms.Compose(trans)
    mnist_train = torchvision.datasets.MNIST(
        root="../data", train=True, transform=trans, download=True)
    mnist_test = torchvision.datasets.MNIST(
        root="../data", train=False, transform=trans, download=True)
    return (data.DataLoader(mnist_train, batch_size, shuffle=True,
                            num_workers=get_dataloader_workers()),
            data.DataLoader(mnist_test, batch_size, shuffle=False,
                            num_workers=get_dataloader_workers()))

2、定义训练时需要用到的计算网络精度的函数

def evaluate_accuracy_gpu(net, data_iter, device=None): #@save
    """使用GPU计算模型在数据集上的精度"""
    if isinstance(net, nn.Module):
        net.eval()  # 设置为评估模式
        if not device:
            device = next(iter(net.parameters())).device
    # 正确预测的数量，总预测的数量
    metric = zs.Accumulator(2)
    with torch.no_grad():
        for X, y in data_iter:
            if isinstance(X, list):
                # BERT微调所需的
                X = [x.to(device) for x in X]
            else:
                X = X.to(device)
            y = y.to(device)
            metric.add(zs.accuracy(net(X), y), y.numel())
    return metric[0] / metric[1]

3、训练LeNet的函数


def train_ch6(net, train_iter, test_iter, num_epochs, lr, device):
    """用GPU训练模型(在第六章定义)"""
    def init_weights(m):
        if type(m) == nn.Linear or type(m) == nn.Conv2d:
            nn.init.xavier_uniform_(m.weight)
    #可以看到这里对卷积层和全连接层初始化参数用的是随机均匀初始化
    net.apply(init_weights)
    print('training on', device)
    net.to(device)
    optimizer = torch.optim.SGD(net.parameters(), lr=lr)
    loss = nn.CrossEntropyLoss()
    
    animator = zs.Animator(xlabel='epoch', xlim=[1, num_epochs],
                            legend=['train loss', 'train acc', 'test acc'])
    num_batches =len(train_iter)
    for epoch in range(num_epochs):
        # 训练损失之和，训练准确率之和，样本数
        metric = zs.Accumulator(3)
        net.train()
        for i, (X, y) in enumerate(train_iter):
            
            optimizer.zero_grad()
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            l=loss(y_hat,y)
        
            #l = loss2(net[i].weight, y_hat, y, C)
            l.backward()
            optimizer.step()
            with torch.no_grad():
                metric.add(l * X.shape[0], zs.accuracy(y_hat, y), X.shape[0])
            
            train_l = metric[0] / metric[2]
            train_acc = metric[1] / metric[2]
            if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:
                animator.add(epoch + (i + 1) / num_batches,
                             (train_l, train_acc, None))
        test_acc = evaluate_accuracy_gpu(net, test_iter)
        
        animator.add(epoch + 1, (None, None, test_acc))
    # for name,params in net.named_parameters():
    #     print(name[-1])
    #     print(params[-1])
    print(f'loss {train_l:.3f}, train acc {train_acc:.3f}, '
          f'test acc {test_acc:.3f}')
    print(f'on {str(device)}')

4、主函数

这就是LeNet的整个网络以及训练流程。

常见的LeNet结构如下：

为了不增大开销，我们设计的全连接层为90*50，50*20，20*10 ，加上权重和偏置一共有5780个待优化的参数。

#构造LeNet网络
net=nn.Sequential(
    nn.Conv2d(1, 6, kernel_size=3, padding=2), nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=3, stride=2),
    nn.Conv2d(6, 10, kernel_size=9), nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Flatten(),
    nn.Linear(10 * 3 * 3, 50), nn.Sigmoid(),
    nn.Linear(50, 20), nn.Sigmoid(),
    nn.Linear(20, 10)
)
def test():
    batch_size = 256
    train_iter, test_iter = shw.load_data_mnist(batch_size=batch_size)
    lr, num_epochs = 0.9, 10
    train_ch6(net, train_iter, test_iter, num_epochs, lr, device='cuda')
    plt.show()

3.2构造遗传算法

同上文提到的构造过程一样，我们来构造接下来的GA。

1、解码操作

#把二进制的基因解码为十进制
def translateDNA(x_pop):     #pop表示种群矩阵，一行表示一个二进制编码表示的DNA，矩阵的行数为种群数目
    #pop:(POP_SIZE,DNA_SIZE)*(DNA_SIZE,1) --> (POP_SIZE,1)
    x = x_pop.dot(2**np.arange(DNA_SIZE)[::-1])/float(2**DNA_SIZE-1)*(X_BOUND[1]-X_BOUND[0])+X_BOUND[0]
    return x      # 得到了十进制的x

2、变异

#变异，就是随机选一位，该位取反
def mutation(child, MUTATION_RATE):
    for i in range(1500):
        if np.random.rand() < MUTATION_RATE:                 #以MUTATION_RATE的概率进行变异
            mutate_point = np.random.randint(0, DNA_SIZE*all_params)    #随机产生一个实数，代表要变异基因的位置
            child[mutate_point] = child[mutate_point]^1      #将变异点的二进制为反转，^1异或,就是突变

3、交叉

#交叉，这一步也就是生成子代
def crossover_and_mutation(pop, CROSSOVER_RATE ):
    new_pop = []
    for father in pop:                                             #遍历种群中的每一个个体，将该个体作为父亲
        child = father                                          #孩子先得到父亲的全部基因（这里我把一串二进制串的那些0，1称为基因）
        if np.random.rand() < CROSSOVER_RATE:                             #产生子代时不是必然发生交叉，而是以一定的概率发生交叉
            mother = pop[np.random.randint(POP_SIZE)]                #再种群中选择另一个个体，并将该个体作为母亲
            cross_points = np.random.randint(low=0, high=DNA_SIZE*all_params)            #随机产生交叉的点
            child[cross_points:] = mother[cross_points:]                    #孩子得到位于交叉点后的母亲的基因
        mutation(child,MUTATION_RATE) 
        new_pop.append(child)                                               #每个后代有一定的机率发生变异
    return new_pop

4、计算适应度

#获得适应度
def get_fitness(pop):
    pred=[]
    accuracy=[]
    performance=[]
    for i in range(POP_SIZE):
        performance.append(apply_param_to_linear(pop,i))
        test_acc = evaluate_accuracy_gpu(net, train_iter)
        #print("第",i+1,"个个体的表现，即准确率为:",test_acc)
        #print("************************************")
        pred.append(test_acc)
        accuracy.append(test_acc)
        #减去最小的适应度是为了防止适应度出现负数，通过这一步缩小fitness的范围
    #print("当前种群的准确率为：",pred)
    print("最大准确率为：",max(pred),"\n")
    pred= (pred - np.min(pred)) + 1e-8
    #max_fitness_index = np.argmax(fitness)
    #print("max_fitness:", fitness[max_fitness_index],"\n")
    #fitness_list.append(fitness[max_fitness_index])

    #print("最优的基因型：", pop[max_fitness_index],"\n")
    #print("最优的表现型:", (performance[max_fitness_index]),"\n")
    return pred,accuracy

5、选择——锦标赛法

#锦标赛法
def select(pop, fitness):    # nature selection wrt pop's fitness
    idx=[]
    for i in range(POP_SIZE):
        competers=[]
        #competers=np.array(competers)
        for j in range(POP_SIZE):
            competers.append(random.randrange(0,POP_SIZE,1))
        competers_fitness=fitness[competers]
        sorted_competers_fitness=sorted(competers_fitness,reverse=True)
        index=np.where(sorted_competers_fitness[0]==competers_fitness)
        max_idx=competers[index[0][0]]
        idx.append(max_idx)
    return pop[idx]

6、把个体的参数应用到网络中

#初始化层参数
def init_net(net_number,weights,bias):
    #print("************************",len(weights))
    weights=torch.tensor(weights).reshape(net[net_number].weight.shape)
    bias=torch.tensor(bias).reshape(net[net_number].bias.shape)
    net[net_number].weight.data.copy_(weights)
    net[net_number].bias.data.copy_(bias)
#把种群的所有个体应用到net中
def apply_param_to_linear(pop,current_num):
    new_pop=[]
    for j in range(all_params):
            new_pop.append(translateDNA(pop[current_num][j*DNA_SIZE:j*DNA_SIZE+DNA_SIZE]))
        #第七层初始化
    weight_7=new_pop[0:4500] 
#        weight_7=torch.tensor(weight_7).reshape(120,400)
    bias_7=new_pop[4500:4550]
#        bias_7=torch.tensor(bias_7).reshape(120)
    init_net(7,weight_7,bias_7)
        #第9层初始化
    weight_9=new_pop[4550:5550]
#        weight_9=torch.tensor(weight_9).reshape(84,120)
    bias_9=new_pop[5550:5570]
#        bias_9=torch.tensor(bias_9).reshape(84)
    init_net(9,weight_9,bias_9)
        #第11层初始化
    weight_11=new_pop[5570:5770]
#        weight_11=torch.tensor(weight_11).reshape(10,84)
    bias_11=new_pop[5770:5780]
#        bias_11=torch.tensor(bias_11).reshape(10)
    init_net(11,weight_11,bias_11)
    return new_pop

7、主函数

def GA():
    #超参数设置
    all_params=5780
    DNA_SIZE = 10           #每个基因的二进制长度为10
    POP_SIZE = 200          #种群大小为200
    CROSSOVER_RATE = 0.95    #交叉概率
    MUTATION_RATE = 0.05   #变异概率
    N_GENERATIONS = 120      #迭代次数
    X_BOUND = [-2000, 2000]       # x变量边界
    #初始化种群
    max_fitness=[]
    hero=np.array([])
    old_pop = np.random.randint(2, size=(POP_SIZE, DNA_SIZE*all_params)) #matrix (POP_SIZE, DNA_SIZE)
    for m in range(N_GENERATIONS):#迭代N代
        print("第",m+1,"次迭代：\n")
        start =time.time()
        
        # #new_pop
        # for i in range(POP_SIZE):
        #     net.apply(init_weights)
        #     apply_param_to_linear(old_pop,i)
        pop_corss_mutation = np.array(crossover_and_mutation(old_pop, CROSSOVER_RATE))
            #F_values = F(translateDNA(pop)[0], translateDNA(pop)[1])#x, y --> Z matrix
        fitness,accuracy= get_fitness(pop_corss_mutation)
        max_fitness.append(max(accuracy))
        fitness=np.array(fitness)
        hero=pop_corss_mutation[fitness.argmax()]
        #print_info(pop_corss_mutation)
        old_pop = select(pop_corss_mutation, fitness) #选择生成新的种群
        #print("old_pop_shape:",len(old_pop),"\n\n")
        end=time.time()
        print("第",m+1,"次迭代花费时间为：",int(end-start),"秒\n")
    return hero,max_fitness
best_son,max_fitness_all=GA()
#画适应度迭代曲线
print("GA训练完成！")
x=list(range(1,N_GENERATIONS+1,1))
plt.plot(x,max_fitness_all,label="适应度")
plt.legend()
plt.show()


# 保存列表到文件！！！！！！！！！！！！！！！！！！！！！！！！！！！！！
np.save('best_son.npy', best_son)
# 从文件加载列表用这个
#best_son = np.load('best_son.npy')

事实上，我使用的种群数目为100，迭代次数80次，就花费了28个小时，最后没有收敛，如果按照种群数目200，迭代次数150，效果应该更好，但是我得交作业，没时间跑完。我最后的迭代结果能够达到50%的正确率。

然而，我跑完这个没有单独存best_son这个列表，而是直接把它往下初始化LeNet中，然后直接开始训练。此时没有初始化卷积层的参数，结果可想而知......

反正就是一个字，绝......

过两天再重新跑一遍，看看效果到底如何。

———————————————————————————————————————————

分割线2023.12.2

由于之前的数据太多了，导致遗传算法运行的时候速度太慢，而且一直没有收敛，所以这次换个简单点的网络，先试试遗传算法优化BP到底有没有效果。所以选择了最简单的鸢尾花Iris数据集进行分类，Iris数据集一共有150组数据，分为4个特征，3种类别。所以设计一个简单的两层MLP就够用了。分别是：4*16，16*3的两层。

3.3构造用于Iris数据集分类的两层MLP网络

import sklearn
import numpy as np
from sklearn.datasets import load_iris
import torch
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import StandardScaler
from torch import nn
import matplotlib.pyplot as plt
import d2l
from d2l import torch as d2l
d2l.use_svg_display()
from IPython import display
import time
import random


iris=load_iris()
data=iris.data
#这里的data是（150*4的大小）
labels=iris.target
#labels是150个，4类

scaler = StandardScaler()
data = scaler.fit_transform(data)
# 数据预处理：标准化输入特征

train_data=[]
train_labels=[]
test_data=[]
test_labels=[]
for i in range(150):
#150组数据，拿出80%作为训练集，剩下的为测试集
    if torch.randn(1)<0.8:
        train_data.append(data[i])
        train_labels.append(labels[i])
    else:
        test_data.append(data[i])
        test_labels.append(labels[i])

# 自定义数据集类
class CustomDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        # 返回数据和标签的元组
        return self.data[index], self.labels[index]

#转成tensor
train_data=torch.tensor(train_data,dtype=torch.float32)
train_labels=torch.tensor(train_labels, dtype=torch.long)
test_data=torch.tensor(test_data,dtype=torch.float32)
test_labels=torch.tensor(test_labels, dtype=torch.long)

# 创建自定义数据集实例
train_dataset = CustomDataset(train_data, train_labels)
test_dataset = CustomDataset(test_data, test_labels)
# 创建数据加载器
batch_size = 32
train_iter = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_iter = DataLoader(test_dataset, batch_size=batch_size, shuffle=True)
#构造网络，展平，线性层，激活函数用relu
net=nn.Sequential(nn.Flatten(),
                  nn.Linear(4,16),nn.ReLU(),
                  nn.Linear(16,3))

#下面是训练时要用到的函数
#****************************************************
#计算预测正确的数量
def accuracy(y_hat,y):
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
        y_hat=torch.argmax(y_hat,axis=1)
    cmp = y_hat.type(y.dtype) == y
    return float(cmp.type(y.dtype).sum())
#计算在指定数据集上模型的计算精度

#就是一个累加器，存着和，调用add函数，往里加数，方便后面读取
class Accumulator:  
    """在n个变量上累加"""
    def __init__(self, n):
        self.data = [0.0] * n

    def add(self, *args):
        self.data = [a + float(b) for a, b in zip(self.data, args)]

    def reset(self):
        self.data = [0.0] * len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]

#计算模型精度，就是分类正确样本/总样本
def evaluate_accuracy(net,data_iter):
    if isinstance(net,torch.nn.Module):
        net.eval()#评估模式，不计算梯度，只进行前向传播
    metric=Accumulator(2)# 正确预测数、预测总数
    with torch.no_grad():
        for X, y in data_iter:
            metric.add(accuracy(net(X), y), y.numel())
            #这里就是往累加器里面加数值，分别加的这一轮的正确样本数和这一轮样本数
    return metric[0] / metric[1]#[0是正确样本数]，[1]是总样本数



#一个epoch的训练
def train_epoch_ch3(net, train_iter, loss, updater):  #@save
    """训练模型一个迭代周期（定义见第3章）"""
    # 将模型设置为训练模式
    if isinstance(net, torch.nn.Module):
        net.train()

    # 训练损失总和、训练准确度总和、样本数
    metric = Accumulator(3)
    for X, y in train_iter:
        # 计算梯度并更新参数
        y_hat = net(X)
        l = loss(y_hat, y)
        if isinstance(updater, torch.optim.Optimizer):
            # 使用PyTorch内置的优化器和损失函数
            updater.zero_grad()
            l.mean().backward()
            updater.step()
        else:
            # 使用定制的优化器和损失函数
            l.sum().backward()
            updater(X.shape[0])
        metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
    # 返回训练损失和训练精度
    return metric[0] / metric[2], metric[1] / metric[2]

#用来画训练过程中的损失函数和准确度的动画
class Animator:  #@save
    """在动画中绘制数据"""
    def __init__(self, xlabel=None, ylabel=None, legend=None, xlim=None,
                 ylim=None, xscale='linear', yscale='linear',
                 fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1,
                 figsize=(5, 4)):
        # 增量地绘制多条线
        if legend is None:
            legend = []
        d2l.use_svg_display()
        self.fig, self.axes = d2l.plt.subplots(nrows, ncols, figsize=figsize)
        if nrows * ncols == 1:
            self.axes = [self.axes, ]
        # 使用lambda函数捕获参数
        self.config_axes = lambda: d2l.set_axes(
            self.axes[0], xlabel, ylabel, xlim, (0,1), xscale, yscale, legend)
        self.X, self.Y, self.fmts = None, None, fmts

    def add(self, x, y):
        # 向图表中添加多个数据点
        if not hasattr(y, "__len__"):
            y = [y]
        n = len(y)
        if not hasattr(x, "__len__"):
            x = [x] * n
        if not self.X:
            self.X = [[] for _ in range(n)]
        if not self.Y:
            self.Y = [[] for _ in range(n)]
        for i, (a, b) in enumerate(zip(x, y)):
            if a is not None and b is not None:
                self.X[i].append(a)
                self.Y[i].append(b)
        self.axes[0].cla()
        for x, y, fmt in zip(self.X, self.Y, self.fmts):
            self.axes[0].plot(x, y, fmt)
        self.config_axes()
        display.display(self.fig)
        # plt.draw()
        # plt.pause(0.1)
        display.clear_output(wait=True)


def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):
    """训练模型"""
    #这是一个画图的可视化过程
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])
    #扫一遍所有的数据，一共扫num_epoch遍
    test_acc_all=[]
    for epoch in range(num_epochs):
        #把一个epoch的训练损失和精度存起来
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        #每一个epoch的在测试集上的准确度
        
        test_acc = evaluate_accuracy(net, test_iter)
        test_acc_all.append(test_acc)
        print("epoch:",epoch+1,"train_loss=",train_metrics[0]," ","train_acc=",train_metrics[1]," ","test_acc:",test_acc)
        animator.add(epoch + 1, train_metrics + (test_acc,))
        # if epoch<20:
        #     print("epoch:",epoch+1,"train_loss=",train_metrics[0]," ","train_acc=",train_metrics[1]," ","test_acc:",test_acc)
        #     animator.add(epoch + 1, train_metrics + (test_acc,))
        # else:
        #     if (epoch+1)%10==0:
        #         print("epoch:",epoch+1,"train_loss=",train_metrics[0]," ","train_acc=",train_metrics[1]," ","test_acc:",test_acc)
        #         animator.add(epoch + 1, train_metrics + (test_acc,))
    #返回每一轮次的训练损失和训练的精度
    train_loss, train_acc = train_metrics
    return train_metrics,test_acc

#初始化权重，用的xavier_uniform_，初始化的权重值将在[-a,a]内均匀分布，
def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.xavier_uniform_(m.weight)

#主程序
net.apply(init_weights)
lr=0.03
num_epochs=20
trainer=torch.optim.Adam(net.parameters(),lr)
loss=nn.CrossEntropyLoss(reduction='none')
train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)
#print(net.state_dict())
#打印网络的权重和偏置，也就是网络中各层的参数
plt.show()

注意这里的初始化方法用的xavier_uniform_，初始化的权重值将在[-a,a]内均匀分布，其中 $a=(\frac{6}{fan_{in}+fan_{out})})^{\frac{1}{2}}$ ，fan_in和fan_out分别是当前层输入和输出神经元个数。

可以看到，基本上测试集的正确率大概在95%到96%之间，就不会再上升了，而训练集上的准确率则可以达到100%，说明模型过拟合，陷入了相对于测试集的局部最优解。

3.4 构造遗传算法优化初始参数，再用GA优化后的参数初始化MLP，进行训练

GA参数设置：

DNA_SIZE = 10 #每个基因的二进制长度为10

POP_SIZE = 250 #种群大小为100

CROSSOVER_RATE = 0.9 #交叉概率

MUTATION_RATE = 0.06 #变异概率

N_GENERATIONS = 150 #迭代次数

X_BOUND = [-3, 3] # x变量边界

这里的边界，采用了 $a=(\frac{6}{fan_{in}+fan_{out})})^{\frac{1}{2}}$ 的计算方法，所以就设置为[-3，3]

import sklearn
import numpy as np
from sklearn.datasets import load_iris
import torch
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import StandardScaler
from torch import nn

import matplotlib.pyplot as plt
import d2l
from d2l import torch as d2l
d2l.use_svg_display()
from IPython import display

import time
import random

iris=load_iris()
data=iris.data
#这里的data是（150*4的大小）
labels=iris.target
#labels是150个，4类

scaler = StandardScaler()
data = scaler.fit_transform(data)
# 数据预处理：标准化输入特征

train_data=[]
train_labels=[]
test_data=[]
test_labels=[]
for i in range(150):
    if torch.randn(1)<0.8:
        train_data.append(data[i])
        train_labels.append(labels[i])
    else:
        test_data.append(data[i])
        test_labels.append(labels[i])

# 自定义数据集类
class CustomDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        # 返回数据和标签的元组
        return self.data[index], self.labels[index]

#转成tensor
train_data=torch.tensor(train_data,dtype=torch.float32)
train_labels=torch.tensor(train_labels, dtype=torch.long)
test_data=torch.tensor(test_data,dtype=torch.float32)
test_labels=torch.tensor(test_labels, dtype=torch.long)

# 创建自定义数据集实例
train_dataset = CustomDataset(train_data, train_labels)
test_dataset = CustomDataset(test_data, test_labels)
# 创建数据加载器
batch_size = 32
train_iter = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_iter = DataLoader(test_dataset, batch_size=batch_size, shuffle=True)
#构造网络，展平，线性层，激活函数用sigmoid
net=nn.Sequential(nn.Flatten(),
                  nn.Linear(4,16),nn.ReLU(),
                  nn.Linear(16,3))



#计算预测正确的数量
def accuracy(y_hat,y):
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
        y_hat=torch.argmax(y_hat,axis=1)
    cmp = y_hat.type(y.dtype) == y
    return float(cmp.type(y.dtype).sum())
#计算在指定数据集上模型的计算精度

#就是一个累加器，存着和，调用add函数，往里加数，方便后面读取
class Accumulator:  
    """在n个变量上累加"""
    def __init__(self, n):
        self.data = [0.0] * n

    def add(self, *args):
        self.data = [a + float(b) for a, b in zip(self.data, args)]

    def reset(self):
        self.data = [0.0] * len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]

#计算模型精度，就是分类正确样本/总样本
def evaluate_accuracy(net,data_iter):
    if isinstance(net,torch.nn.Module):
        net.eval()#评估模式，不计算梯度，只进行前向传播
    metric=Accumulator(2)# 正确预测数、预测总数
    with torch.no_grad():
        for X, y in data_iter:
            metric.add(accuracy(net(X), y), y.numel())
            #这里就是往累加器里面加数值，分别加的这一轮的正确样本数和这一轮样本数
    return metric[0] / metric[1]#[0是正确样本数]，[1]是总样本数



#一个epoch的训练
def train_epoch_ch3(net, train_iter, loss, updater):  #@save
    """训练模型一个迭代周期（定义见第3章）"""
    # 将模型设置为训练模式
    if isinstance(net, torch.nn.Module):
        net.train()

    # 训练损失总和、训练准确度总和、样本数
    metric = Accumulator(3)
    for X, y in train_iter:
        # 计算梯度并更新参数
        y_hat = net(X)
        l = loss(y_hat, y)
        if isinstance(updater, torch.optim.Optimizer):
            # 使用PyTorch内置的优化器和损失函数
            updater.zero_grad()
            l.mean().backward()
            updater.step()
        else:
            # 使用定制的优化器和损失函数
            l.sum().backward()
            updater(X.shape[0])
        metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
    # 返回训练损失和训练精度
    return metric[0] / metric[2], metric[1] / metric[2]

#用来画训练过程中的损失函数和准确度的动画
class Animator:  #@save
    """在动画中绘制数据"""
    def __init__(self, xlabel=None, ylabel=None, legend=None, xlim=None,
                 ylim=None, xscale='linear', yscale='linear',
                 fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1,
                 figsize=(5, 4)):
        # 增量地绘制多条线
        if legend is None:
            legend = []
        d2l.use_svg_display()
        self.fig, self.axes = d2l.plt.subplots(nrows, ncols, figsize=figsize)
        if nrows * ncols == 1:
            self.axes = [self.axes, ]
        # 使用lambda函数捕获参数
        self.config_axes = lambda: d2l.set_axes(
            self.axes[0], xlabel, ylabel, xlim, (0,1), xscale, yscale, legend)
        self.X, self.Y, self.fmts = None, None, fmts

    def add(self, x, y):
        # 向图表中添加多个数据点
        if not hasattr(y, "__len__"):
            y = [y]
        n = len(y)
        if not hasattr(x, "__len__"):
            x = [x] * n
        if not self.X:
            self.X = [[] for _ in range(n)]
        if not self.Y:
            self.Y = [[] for _ in range(n)]
        for i, (a, b) in enumerate(zip(x, y)):
            if a is not None and b is not None:
                self.X[i].append(a)
                self.Y[i].append(b)
        self.axes[0].cla()
        for x, y, fmt in zip(self.X, self.Y, self.fmts):
            self.axes[0].plot(x, y, fmt)
        self.config_axes()
        display.display(self.fig)
        # plt.draw()
        # plt.pause(0.1)
        display.clear_output(wait=True)


def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):
    """训练模型"""
    #这是一个画图的可视化过程
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])
    #扫一遍所有的数据，一共扫num_epoch遍
    test_acc_all=[]
    for epoch in range(num_epochs):
        #把一个epoch的训练损失和精度存起来
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        #每一个epoch的在测试集上的准确度
        
        test_acc = evaluate_accuracy(net, test_iter)
        test_acc_all.append(test_acc)
        print("epoch:",epoch+1,"train_loss=",train_metrics[0]," ","train_acc=",train_metrics[1]," ","test_acc:",test_acc)
        animator.add(epoch + 1, train_metrics + (test_acc,))
        # if epoch<20:
        #     print("epoch:",epoch+1,"train_loss=",train_metrics[0]," ","train_acc=",train_metrics[1]," ","test_acc:",test_acc)
        #     animator.add(epoch + 1, train_metrics + (test_acc,))
        # else:
        #     if (epoch+1)%10==0:
        #         print("epoch:",epoch+1,"train_loss=",train_metrics[0]," ","train_acc=",train_metrics[1]," ","test_acc:",test_acc)
        #         animator.add(epoch + 1, train_metrics + (test_acc,))
    #返回每一轮次的训练损失和训练的精度
    train_loss, train_acc = train_metrics
    return train_metrics,test_acc





all_params=131
DNA_SIZE = 10           #每个基因的二进制长度为10
POP_SIZE = 250          #种群大小为100
CROSSOVER_RATE = 0.9    #交叉概率
MUTATION_RATE = 0.06   #变异概率
N_GENERATIONS = 150    #迭代次数
X_BOUND = [-3, 3]       # x变量边界

#下面是GA
#****************************************************************************************************

#初始化层参数
def init_net(net_number,weights,bias):
    #print("************************",len(weights))
    weights=torch.tensor(weights).reshape(net[net_number].weight.shape)
    bias=torch.tensor(bias).reshape(net[net_number].bias.shape)
    net[net_number].weight.data.copy_(weights)
    net[net_number].bias.data.copy_(bias)

#把二进制的基因解码为十进制
def translateDNA(x_pop):     #pop表示种群矩阵，一行表示一个二进制编码表示的DNA，矩阵的行数为种群数目
    #pop:(POP_SIZE,DNA_SIZE)*(DNA_SIZE,1) --> (POP_SIZE,1)
    x = x_pop.dot(2**np.arange(DNA_SIZE)[::-1])/float(2**DNA_SIZE-1)*(X_BOUND[1]-X_BOUND[0])+X_BOUND[0]
    return x      # 得到了十进制的x


#变异，就是随机选一位，该位取反
def mutation(child, MUTATION_RATE):
    for i in range(2):
        if np.random.rand() < MUTATION_RATE:                 #以MUTATION_RATE的概率进行变异
            mutate_point = np.random.randint(0, DNA_SIZE*all_params)    #随机产生一个实数，代表要变异基因的位置
            child[mutate_point] = child[mutate_point]^1      #将变异点的二进制为反转，^1异或,就是突变

#交叉，这一步也就是生成子代
def crossover_and_mutation(pop, CROSSOVER_RATE ):
    new_pop = []
    for father in pop:                                             #遍历种群中的每一个个体，将该个体作为父亲
        child = father                                          #孩子先得到父亲的全部基因（这里我把一串二进制串的那些0，1称为基因）
        if np.random.rand() < CROSSOVER_RATE:                             #产生子代时不是必然发生交叉，而是以一定的概率发生交叉
            mother = pop[np.random.randint(POP_SIZE)]                #再种群中选择另一个个体，并将该个体作为母亲
            cross_points = np.random.randint(low=0, high=DNA_SIZE*all_params)            #随机产生交叉的点
            child[cross_points:] = mother[cross_points:]                    #孩子得到位于交叉点后的母亲的基因
        mutation(child,MUTATION_RATE) 
        new_pop.append(child)                                               #每个后代有一定的机率发生变异
    return new_pop

#把种群的所有个体应用到net中
def apply_param_to_linear(pop,current_num):
    new_pop=[]
    for j in range(all_params):
            new_pop.append(translateDNA(pop[current_num][j*DNA_SIZE:j*DNA_SIZE+DNA_SIZE]))
        #第七层初始化
    weight_1=new_pop[0:64] 
#        weight_7=torch.tensor(weight_7).reshape(120,400)
    bias_1=new_pop[64:80]
#        bias_7=torch.tensor(bias_7).reshape(120)
    init_net(1,weight_1,bias_1)
        #第9层初始化
    weight_2=new_pop[80:128]
#        weight_9=torch.tensor(weight_9).reshape(84,120)
    bias_2=new_pop[128:131]
#        bias_9=torch.tensor(bias_9).reshape(84)
    init_net(3,weight_2,bias_2)

    return new_pop



#获得适应度
def get_fitness(pop):
    pred=[]
    accuracy=[]
    performance=[]
    for i in range(POP_SIZE):
        performance.append(apply_param_to_linear(pop,i))
        test_acc = evaluate_accuracy(net, train_iter)
        #print("第",i+1,"个个体的表现，即准确率为:",test_acc)
        #print("************************************")
        pred.append(test_acc)
        accuracy.append(test_acc)
        #减去最小的适应度是为了防止适应度出现负数，通过这一步缩小fitness的范围
    #print("当前种群的准确率为：",pred)
    print("最大准确率为：",max(pred))
    pred= (pred - np.min(pred)) + 1e-8
    #max_fitness_index = np.argmax(fitness)
    #print("max_fitness:", fitness[max_fitness_index],"\n")
    #fitness_list.append(fitness[max_fitness_index])

    #print("最优的基因型：", pop[max_fitness_index],"\n")
    #print("最优的表现型:", (performance[max_fitness_index]),"\n")
    return pred,accuracy
  #为[0, np.max(pred)-np.min(pred)],最后在加上一个很小的数防止出现为0的适应度
#把适应度转为[0，max_fitness-min_fitness]范围，
#同样也可以得到最小适应度，

#选择，获取最大值所在的基因片段，适应度越高，被选择的机会越高
# def select(pop, fitness):    # nature selection wrt pop's fitness

#     idx = np.random.choice(np.arange(POP_SIZE), size=POP_SIZE, replace=True,
#                            p=(fitness)/(fitness.sum()) )
#     return pop[idx]
#锦标赛法
def select(pop, fitness):    # nature selection wrt pop's fitness
    idx=[]
    for i in range(POP_SIZE):
        competers=[]
        #competers=np.array(competers)
        for j in range(POP_SIZE):
            competers.append(random.randrange(0,POP_SIZE,1))
        competers_fitness=fitness[competers]
        sorted_competers_fitness=sorted(competers_fitness,reverse=True)
        index=np.where(sorted_competers_fitness[0]==competers_fitness)
        max_idx=competers[index[0][0]]
        idx.append(max_idx)
    #print(np.where(competers_fitness==max(competers_fitness)))
    
    #max_idx=competers[list(np.where(competers_fitness==max(competers_fitness)))]
    
    return pop[idx]

def in_bestson(best_son):
    with open('best_son.txt', 'w') as file:
     for item in best_son:
           file.write(f"{item}\n")

def GA():
    #初始化种群
    max_fitness=[]
    hero=np.array([])
    old_pop = np.random.randint(2, size=(POP_SIZE, DNA_SIZE*all_params)) #matrix (POP_SIZE, DNA_SIZE)
    for m in range(N_GENERATIONS):#迭代N代
        print("第",m+1,"次迭代：")
        start =time.time()
        
        # #new_pop
        # for i in range(POP_SIZE):
        #     net.apply(init_weights)
        #     apply_param_to_linear(old_pop,i)
        pop_corss_mutation = np.array(crossover_and_mutation(old_pop, CROSSOVER_RATE))
            #F_values = F(translateDNA(pop)[0], translateDNA(pop)[1])#x, y --> Z matrix
        fitness,accuracy= get_fitness(pop_corss_mutation)
        max_fitness.append(max(accuracy))
        fitness=np.array(fitness)
        hero=pop_corss_mutation[fitness.argmax()]
        #print_info(pop_corss_mutation)
        old_pop = select(pop_corss_mutation, fitness) #选择生成新的种群
        #print("old_pop_shape:",len(old_pop),"\n\n")
        end=time.time()
        #print("第",m+1,"次迭代花费时间为：",int(end-start),"秒\n")
    return hero,max_fitness


def after_GA():
    best_son,max_fitness_all=GA()
    #存一下GA得到的最优的初始化参数
    in_bestson(best_son)
    print("GA训练完成！")
    x=list(range(1,N_GENERATIONS+1,1))
    plt.plot(x,max_fitness_all,label="fitness")
    plt.xlabel('generation')
    plt.ylabel('fitness_value')
    plt.legend()
    plt.show()

after_GA()
#下面是用GA的参数训练MLP
#*****************************************************************8
def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.xavier_uniform_(m.weight)

def MLP():
   # net.apply(init_weights)
   # '''
    with open('best_son.txt', 'r') as file:
        best_son = [int(line.strip()) for line in file]
    best_son=np.array(best_son)
    new_pop=[]
    for j in range(all_params):
            new_pop.append(translateDNA(best_son[j*DNA_SIZE:j*DNA_SIZE+DNA_SIZE]))
            #第七层初始化
    weight_1=new_pop[0:64] 
#        weight_7=torch.tensor(weight_7).reshape(120,400)
    bias_1=new_pop[64:80]
#        bias_7=torch.tensor(bias_7).reshape(120)
    init_net(1,weight_1,bias_1)
        #第9层初始化
    weight_2=new_pop[80:128]
#        weight_9=torch.tensor(weight_9).reshape(84,120)
    bias_2=new_pop[128:131]
#        bias_9=torch.tensor(bias_9).reshape(84)
    init_net(3,weight_2,bias_2)
    #'''
    lr=0.03
    num_epochs=20
    trainer=torch.optim.Adam(net.parameters(),lr)
    loss=nn.CrossEntropyLoss(reduction='none')
    train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)
    #print(net.state_dict())
    #打印网络的权重和偏置，也就是网络中各层的参数
    plt.show()
MLP()

迭代200次，得到最大适应度的个体其适应度为：98.46%