large scale evaluation net -- MLP全连接实验记录
Ⅰ. Experiment detail
some detail of the original paper can see: https://blog.csdn.net/qq_40690815/article/details/103617388
Use pytorch
of python.
Use the fashion-mnist
as the dataset.
Define some class Vertex
and Edge
to combine the DNA
which define the struct of a net. Define MadeData
to generate or import the source data.
Define the class Evolution_pop
to control the population evolution. Here we use:
- 每次挑选两个个体, 确保个体已经被训练过了
- 挑选个体的fitness,(population过大->kill不好的),反之(population过小->reproduce好的)
Define the class StructMutation
to mutate the individual, we have:
- 改变学习率α
- 增layer
- 增skip-connect
- 设置’linear’/‘relu’
改变layer size
Last,we define the class Model
use torch to realistic the net.
Ⅱ. Method
Vertex
Edge
DNA
Evolution_pop
MadeData
Model
StructMutation
Ⅲ. Experiment Record
实验1(简要)
_population_size_setpoint = 5_max_layer_size = 4_evolve_time = 50
EPOCH = 1BATCH_SIZE = 50
mutate operator | probability | mutate operator | probability |
---|---|---|---|
_mutate_structure | 0.3 | mutate_vertex | 0.5 |
mutate_learningRate | 0.5 |
实验结果(横轴为ID,纵轴为fitness)
# 最佳网络的参数...并不复杂, 当然本次实验参数较为简单
[calculate_flow]->start Node 0: 784
Node 1 size: 768: [ 0 , 1 ]
Node 2 size: 605: [ 1 , 2 ]
Node 3 size: 10: [ 0 , 3 ] [ 2 , 3 ] [ 1 , 3 ]
[decode].[ 28 ] Model(
(layer): ModuleList(
(0): Linear(in_features=784, out_features=768, bias=True)
(1): Linear(in_features=768, out_features=605, bias=True)
(2): Linear(in_features=2157, out_features=10, bias=True)
)
)
实验2
_population_size_setpoint = 11_max_layer_size = 10_evolve_time = 100
EPOCH = 3BATCH_SIZE = 50
mutate operator | probability | mutate operator | probability |
---|---|---|---|
_mutate_structure | 0.3 | mutate_vertex | 0.5 |
mutate_learningRate | 0.5 |
# 最优网络结构
[calculate_flow]->start Node 0: 784
Node 1 size: 259: [ 0 , 1 ]
Node 2 size: 72: [ 1 , 2 ] [ 0 , 2 ]
Node 3 size: 10: [ 2 , 3 ] [ 1 , 3 ]
Node 4 size: 10: [ 0 , 4 ] [ 3 , 4 ] [ 2 , 4 ]
[decode].[ 53 ] Model(
(layer): ModuleList(
(0): Linear(in_features=784, out_features=259, bias=True)
(1): Linear(in_features=1043, out_features=72, bias=True)
(2): Linear(in_features=331, out_features=10, bias=True)
(3): Linear(in_features=866, out_features=10, bias=True)
)
)
Epoch: 0 step: 1100[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] loss: 870.8071 | accuracy: 0.7450
Epoch: 1 step: 1100[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] loss: 56.4895 | accuracy: 0.8085
Epoch: 2 step: 1100[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] loss: 45.0535 | accuracy: 0.8300
实验结果(红色为最个体,蓝色为仍在poplution中的个体)
实验3
_population_size_setpoint = 16_max_layer_size = 15_evolve_time = 300
EPOCH = 3BATCH_SIZE = 50
mutate operator | probability | mutate operator | probability |
---|---|---|---|
_mutate_structure | 0.3 | mutate_vertex | 0.5 |
mutate_learningRate | 0.5 |
# 最优网络结构 fitness: 0.8495
[calculate_flow 127 ]->start Node 0: 784
Node 1 type: bn_relu: [ 0 , 1 ]
Node 2 type: linear: [ 1 , 2 ]
Node 3 type: linear: [ 0 , 3 ] [ 2 , 3 ]
Node 4 type: bn_relu: [ 0 , 4 ] [ 2 , 4 ] [ 3 , 4 ]
Node 5 type: linear: [ 4 , 5 ] [ 3 , 5 ] [ 0 , 5 ] [ 2 , 5 ]
[decode].[ 127 ] Model(
(layer): ModuleList(
(0): Linear(in_features=784, out_features=479, bias=True)
(1): Linear(in_features=479, out_features=131, bias=True)
(2): Linear(in_features=915, out_features=72, bias=True)
(3): Linear(in_features=987, out_features=23, bias=True)
(4): Linear(in_features=1010, out_features=10, bias=True)
)
)
实验结果
最优个体网络结构
实验4
_population_size_setpoint = 21_max_layer_size = 20_evolve_time = 500
EPOCH = 2BATCH_SIZE = 50
mutate operator | probability | mutate operator | probability |
---|---|---|---|
_mutate_structure | 0.2 | mutate_vertex | 0.3 |
mutate_learningRate | 0.3 |
# BEST INDIVIDUAL
DNA [ 210 ]销毁->fitness 0.8
[calculate_flow 239 ]->start Node 0: 784
Node 1 type: bn_relu: [ 0 , 1 ]
Node 2 type: bn_relu: [ 1 , 2 ]
Node 3 type: linear: [ 2 , 3 ]
Node 4 type: linear: [ 3 , 4 ] [ 0 , 4 ] [ 2 , 4 ] [ 1 , 4 ]
[decode].[ 239 ] Model(
(layer): ModuleList(
(0): Linear(in_features=784, out_features=784, bias=True)
(1): Linear(in_features=784, out_features=784, bias=True)
(2): Linear(in_features=784, out_features=784, bias=True)
(3): Linear(in_features=3136, out_features=10, bias=True)
)
)
Epoch: 0 step: 1100[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] loss: 0.5242 | accuracy: 0.7535
Epoch: 1 step: 1100[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] loss: 0.6452 | accuracy: 0.8530
实验结果
最优个体网络结构
Ⅳ. 实验结论
实验1与实验2之间调整了进化代数,即产生了更多个体;并降低了各个变异操作的发生概率(由于多个变异操作很可能同时进行,可能会造成个体网络结构差异过大,后代若使用同样层数的网络结构则多样性必然降低)。
可以发现实验1与2中实验种群的适应度变化并没有十分明显(可能由于采用了fashion-mnist数据集并且本实验完成的是简单的全连接神经网络的原因),但随着迭代次数增加,留存在种群中的个体和最优个体仍然很明显是出现在靠后的位置。
随着网络结构进化,个体结构的深度不断增加,因而变异的空间也不断扩大,因此在本实验(实验1迭代300次,实验2迭代500次的形况下)产生4-5个隐藏层已经很难了,而由于为了节省时间,实验2训练集迭代三次,实验3训练集仅迭代两次也使得较深的神经网络不易训练。
实际上为了节约时间应当考虑自带个体在继承网络结构时同时继承部分权值,并在代码中引入序列化-反序列化方便记录,但本实验尚未完成这些细节。
观察最优网络结构可以发现,其中对第0层(输入数据)加入的skip-connect链接最为常见,且大多链接层位置大多靠后,可能说明input数据保留了最完整的源数据信息,且对位置靠后的层结构起到的作用更大。
而对于实验3所得出的网络结构,所有的skip-connect都恰好为每层对输出层的链接,可能同样说明了位置较为靠后的网络层结构接收靠前的数据输入影响较为积极。
在本实验之前,我也在其他课上使用matlab利用手动设计的简单不包含skip-connect的五层神经网络设计训练fashion-mnist数据集,并在测试集上能够达到88%以上的准确率,本实验可能由于考虑运行时间,没有使用全部的训练集并且迭代次数较少,造成个体间的差异并不十分明显。