在当今的机器学习和人工智能领域,遗传算法作为一种强大的优化技术,被广泛应用于各种问题的求解。PyGAD是一个开源的Python库,专门用于实现遗传算法并训练机器学习算法。本文将深入介绍PyGAD库,并通过五个不同的应用案例展示其强大功能。
PyGAD的安装与入门
PyGAD可以通过PyPI(Python Package Index)进行安装,安装过程非常简单。对于Windows系统,使用以下命令:
pip install pygad
对于Mac/Linux系统,在终端命令中使用pip3
代替pip
:
pip3 install pygad
安装完成后,可以通过Python shell导入库来验证是否安装成功:
import pygad
目前,PyGAD的最新版本是2.3.2,发布于2020年6月1日。可以使用__version__
特殊变量来查看当前版本:
import pygad
print(pygad.__version__)
PyGAD简介
PyGAD的主要目标是提供遗传算法的简单实现。它提供了一系列参数,允许用户针对广泛的应用自定义遗传算法。在这个教程中,我们将讨论五个这样的应用。
在PyGAD 2.3.2中有5个模块:
pygad
:主模块,默认已导入。pygad.nn
:用于实现神经网络。pygad.gann
:使用遗传算法训练神经网络。pygad.cnn
:用于实现卷积神经网络。pygad.gacnn
:使用遗传算法训练卷积神经网络。
每个模块在GitHub上都有自己的仓库。主模块pygad
有一个名为GA
的类,只需创建pygad.GA
类的实例即可使用遗传算法。使用pygad
模块的步骤如下:
- 创建适应度函数。
- 准备
pygad.GA
类所需的参数。 - 创建
pygad.GA
类的实例。 - 运行遗传算法。
在PyGAD 2.3.2中,pygad.GA
类的构造函数有19个参数,其中16个是可选的。三个必需的参数是:
num_generations
:代数。num_parents_mating
:作为父代选择的解的数量。fitness_func
:计算解的适应度值的适应度函数。
fitness_func
参数允许遗传算法针对不同问题进行定制。这个参数接受一个用户定义的函数,该函数计算单个解的适应度值。它还接受两个额外的参数:解和其在种群中的索引。
例如,假设有一个包含3个解的种群:
[ 221, 342, 213 ]
[ 675, 32, 242 ]
[ 452, 23, - 212 ]
分配给fitness_func
参数的函数必须返回一个代表每个解适应度的单个数字。以下是一个返回解的总和的示例:
def fitness_function(solution, solution_idx):
return sum(solution)
这3个解的适应度值分别为776、949和263。父代根据这些适应度值进行选择,适应度值越高,解越好。
PyGAD的应用案例
拟合线性模型
假设存在一个有6个输入、1个输出和6个参数的方程:
y = f(w1 : w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6
假设输入为(4, -2, 3.5, 5, -11, -4.7),输出为44。我们可以使用遗传算法来找到满足方程的6个参数的值。
首先,准备适应度函数:
import numpy
function_inputs = [ 4, - 2, 3.5, 5, - 11, - 4.7 ]
desired_output = 44
def fitness_func(solution, solution_idx):
output = numpy.sum(solution * function_inputs)
fitness = 1.0 / numpy.abs(output - desired_output)
return fitness
然后,准备其他重要参数:
sol_per_pop = 50
num_genes = len(function_inputs)
init_range_low = - 2
init_range_high = 5
mutation_percent_genes = 1
接着,实例化pygad.GA
类:
import pygad
ga_instance = pygad.GA(num_generations = num_generations,
num_parents_mating = num_parents_mating,
fitness_func = fitness_func,
sol_per_pop = sol_per_pop,
num_genes = num_genes,
init_range_low = init_range_low,
init_range_high = init_range_high,
mutation_percent_genes = mutation_percent_genes)
最后,调用run()
方法开始迭代,并使用plot_result()
方法显示各代的适应度值:
ga_instance.run()
ga_instance.plot_result()
使用best_solution()
方法可以检索最佳解、其适应度和在种群中的索引:
solution, solution_fitness, solution_idx = ga_instance.best_solution()
print("Parameters of the best solution : {solution}".format(solution = solution))
print("Fitness value of the best solution = {solution_fitness}".format(solution_fitness = solution_fitness))
print("Index of the best solution : {solution_idx}".format(solution_idx = solution_idx))
图像重现
在这个应用中,我们从一个随机图像(随机像素值)开始,然后使用遗传算法演化每个像素的值。由于图像是2D或3D的,而遗传算法期望解是1D向量,因此我们需要使用img2chromosome()
函数将图像转换为1D向量:
import functools
import operator
import numpy
def img2chromosome(img_arr):
return numpy.reshape(a = img_arr, newshape = (functools.reduce(operator.mul, img_arr.shape)))
使用chromosome2img()
函数可以将向量恢复为2D或3D图像:
def chromosome2img(vector, shape):
if len(vector) != functools.reduce(operator.mul, shape):
raise ValueError("A vector of length {vector_length} into an array of shape {shape}.".format(vector_length = len(vector), shape = shape))
return numpy.reshape(a = vector, newshape = shape)
除了使用PyGAD的常规步骤外,我们还需要读取图像:
import imageio
target_im = imageio.imread('fruit.jpg')
target_im = numpy.asarray(target_im / 255, dtype = numpy.float)
准备适应度函数,该函数计算解中的像素与目标图像之间的差异:
import gari
target_chromosome = gari.img2chromosome(target_im)
def fitness_fun(solution, solution_idx):
fitness = numpy.sum(numpy.abs(target_chromosome - solution))
fitness = numpy.sum(target_chromosome) - fitness
return fitness
创建pygad.GA
类的实例:
import pygad
ga_instance = pygad.GA(num_generations = 20000,
num_parents_mating = 10,
fitness_func = fitness_fun,
sol_per_pop = 20,
num_genes = target_im.size,
init_range_low = 0.0,
init_range_high = 1.0,
mutation_percent_genes = 0.01,
mutation_type = "random",
mutation_by_replacement = True,
random_mutation_min_val = 0.0,
random_mutation_max_val = 1.0)
运行遗传算法并显示适应度值的演化:
ga_instance.run()
ga_instance.plot_result()
最后,将最佳解转换为图像并显示:
import matplotlib.pyplot as plt
result = gari.chromosome2img(solution, target_im.shape)
plt.imshow(result)
plt.show()
8皇后问题
8皇后问题涉及在一个8×8的矩阵中分布8个国际象棋皇后,每行一个皇后。目标是放置这些皇后,使得没有一个皇后可以在垂直、水平或对角线上攻击另一个皇后。我们可以使用遗传算法来找到满足这些条件的解。
这个项目在GitHub上有一个使用Kivy构建的GUI,显示一个8×8的矩阵。GUI底部有三个按钮:
- “Initial Population”按钮:创建遗传算法的初始种群。
- “Show Best Solution”按钮:显示遗传算法停止的最后一代中的最佳解。
- “Start GA”按钮:开始遗传算法的迭代/代。
初始化种群的方法如下:
import numpy
def initialize_population(self, *args):
self.num_solutions = 10
self.reset_board_text()
self.population_1D_vector = numpy.zeros(shape = (self.num_solutions, 8))
for solution_idx in range(self.num_solutions):
initial_queens_y_indices = numpy.random.rand(8) * 8
initial_queens_y_indices = initial_queens_y_indices.astype(numpy.uint8)
self.population_1D_vector[solution_idx, :] = initial_queens_y_indices
self.vector_to_matrix()
self.pop_created = 1
self.num_attacks_Label.text = "Initial Population Created."
适应度函数计算每个皇后可以进行的攻击次数,并返回这个值作为适应度值:
def fitness(solution_vector, solution_idx):
if solution_vector.ndim == 2:
solution = solution_vector
else:
solution = numpy.zeros(shape = (8, 8))
row_idx = 0
for col_idx in solution_vector:
solution[row_idx, int(col_idx)] = 1
row_idx = row_idx + 1
total_num_attacks_column = attacks_column(solution)
total_num_attacks_diagonal = attacks_diagonal(solution)
total_num_attacks = total_num_attacks_column + total_num_attacks_diagonal
if total_num_attacks == 0:
total_num_attacks = 1.1
else:
total_num_attacks = 1.0 / total_num_attacks
return total_num_attacks
按下“Start GA”按钮时,创建pygad.GA
类的实例并调用run()
方法:
import pygad
ga_instance = pygad.GA(num_generations = 500,
num_parents_mating = 5,
fitness_func = fitness,
num_genes = 8,
initial_population = self.population_1D_vector,
mutation_percent_genes = 0.01,
mutation_type = "random",
mutation_num_genes = 3,
mutation_by_replacement = True,
random_mutation_min_val = 0.0,
random_mutation_max_val = 8.0,
callback_generation = callback)
ga_instance.run()
训练神经网络
遗传算法可以用于训练神经网络。PyGAD支持使用pygad.gann.GANN
模块训练神经网络,以解决分类问题。
首先,准备训练数据,这里构建一个模拟XOR逻辑门的网络:
import numpy
data_inputs = numpy.array([ [ 1, 1 ], [ 1, 0 ], [ 0, 1 ], [ 0, 0 ] ])
data_outputs = numpy.array([ 0, 1, 1, 0 ])
创建pygad.gann.GANN
类的实例:
import pygad.gann
num_inputs = data_inputs.shape[1]
num_classes = 2
num_solutions = 6
GANN_instance = pygad.gann.GANN(num_solutions = num_solutions,
num_neurons_input = num_inputs,
num_neurons_hidden_layers = [ 2 ],
num_neurons_output = num_classes,
hidden_activations = [ "relu" ],
output_activation = "softmax")
创建适应度函数,返回传递的解的分类准确率:
import pygad.nn
import pygad.gann
def fitness_func(solution, sol_idx):
global GANN_instance, data_inputs, data_outputs
predictions = pygad.nn.predict(last_layer = GANN_instance.population_networks[sol_idx], data_inputs = data_inputs)
correct_predictions = numpy.where(predictions == data_outputs)[0].size
solution_fitness = (correct_predictions / data_outputs.size) * 100
return solution_fitness
准备其他必要的参数:
population_vectors = pygad.gann.population_as_vectors(population_networks = GANN_instance.population_networks)
initial_population = population_vectors.copy()
num_parents_mating = 4
num_generations = 500
mutation_percent_genes = 5
parent_selection_type = "sss"
crossover_type = "single_point"
mutation_type = "random"
keep_parents = 1
init_range_low = - 2
init_range_high = 5
创建pygad.GA
类的实例:
import pygad
ga_instance = pygad.GA(num_generations = num_generations,
num_parents_mating = num_parents_mating,
initial_population = initial_population,
fitness_func = fitness_func,
mutation_percent_genes = mutation_percent_genes,
init_range_low = init_range_low,
init_range_high = init_range_high,
parent_selection_type = parent_selection_type,
crossover_type = crossover_type,
mutation_type = mutation_type,
keep_parents = keep_parents,
callback_generation = callback_generation)
定义callback_generation
函数,用于在每一代后更新所有神经网络的权重:
def callback_generation(ga_instance):
global GANN_instance
population_matrices = pygad.gann.population_as_matrices(population_networks = GANN_instance.population_networks, population_vectors = ga_instance.population)
GANN_instance.update_population_trained_weights(population_trained_weights = population_matrices)
运行遗传算法:
ga_instance.run()
训练卷积神经网络
PyGAD还支持使用遗传算法训练卷积神经网络。
首先,准备训练数据:
import numpy
train_inputs = numpy.load("dataset_inputs.npy")
train_outputs = numpy.load("dataset_outputs.npy")
使用pygad.cnn
模块构建CNN架构:
import pygad.cnn
input_layer = pygad.cnn.Input2D(input_shape = (80, 80, 3))
conv_layer = pygad.cnn.Conv2D(num_filters = 2, kernel_size = 3, previous_layer = input_layer, activation_function = "relu")
average_pooling_layer = pygad.cnn.AveragePooling2D(pool_size = 5, previous_layer = conv_layer, stride = 3)
flatten_layer = pygad.cnn.Flatten(previous_layer = average_pooling_layer)
dense_layer = pygad.cnn.Dense(num_neurons = 4, previous_layer = flatten_layer, activation_function = "softmax")
创建模型:
model = pygad.cnn.Model(last_layer = dense_layer, epochs = 5, learning_rate = 0.01)
使用summary()
方法查看模型架构的摘要:
model.summary()
实例化pygad.gacnn.GACNN
类创建初始种群:
import pygad.gacnn
GACNN_instance = pygad.gacnn.GACNN(model = model, num_solutions = 4)
准备适应度函数:
def fitness_func(solution, sol_idx):
global GACNN_instance, data_inputs, data_outputs
predictions = GACNN_instance.population_networks[sol_idx].predict(data_inputs = data_inputs)
correct_predictions = numpy.where(predictions == data_outputs)[0].size
solution_fitness = (correct_predictions / data_outputs.size) * 100
return solution_fitness
准备其他参数:
population_vectors = pygad.gacnn.population_as_vectors(population_networks = GACNN_instance.population_networks)
initial_population = population_vectors.copy()
num_parents_mating = 2
num_generations = 10
mutation_percent_genes = 0.1
parent_selection_type = "sss"
crossover_type = "single_point"
mutation_type = "random"
keep_parents = - 1
创建pygad.GA
类的实例:
import pygad
ga_instance = pygad.GA(num_generations = num_generations,
num_parents_mating = num_parents_mating,
initial_population = initial_population,
fitness_func = fitness_func,
mutation_percent_genes = mutation_percent_genes,
parent_selection_type = parent_selection_type,
crossover_type = crossover_type,
mutation_type =