基于MNIST(数据集)实现的Net2Net实验
根据(论文)'Net2Net: Accelerating Learning via Knowledge Transfer' (通过知识迁移加速学习)
代码注释
'''This is an implementation of Net2Net experiment with MNIST in
'Net2Net: Accelerating Learning via Knowledge Transfer'
by Tianqi Chen, Ian Goodfellow, and Jonathon Shlens
这是一个基于MNIST(数据集)实现的Net2Net实验。根据(论文)
'Net2Net: Accelerating Learning via Knowledge Transfer'
通过知识迁移加速学习
by Tianqi Chen, Ian Goodfellow, and Jonathon Shlens
arXiv:1511.05641v4 [cs.LG] 23 Apr 2016
http://arxiv.org/abs/1511.05641
# Notes
注意
- What:
+ Net2Net is a group of methods to transfer knowledge from a teacher neural
net to a student net,so that the student net can be trained faster than
from scratch.
Net2Net是将知识从教师神经网络传递到学生网络的一组方法,使得学生网络可以比从头开始更快地被训练。
+ The paper discussed two specific methods of Net2Net, i.e. Net2WiderNet
and Net2DeeperNet.
本文讨论了Net2Net的两种具体方法,Net2WiderNet和Net2DeeperNet
+ Net2WiderNet replaces a model with an equivalent wider model that has
more units in each hidden layer.
Net2WiderNet用一个等效的更宽的模型替换了原模型,该模型在每个隐藏层中有更多的单元。
+ Net2DeeperNet replaces a model with an equivalent deeper model.
+ Net2DeeperNet 用等效的深层模型代替模型。
+ Both are based on the idea of 'function-preserving transformations of
neural nets'.
两者都基于“神经网络的函数保持变换”的思想。
- Why:
为什么
+ Enable fast exploration of multiple neural nets in experimentation and
design process,by creating a series of wider and deeper models with
transferable knowledge.
在实验和设计过程中实现多个神经网络的快速探索,通过创建一系列更广泛和更深入的模型和可转移的知识。
+ Enable 'lifelong learning system' by gradually adjusting model complexity
to data availability,and reusing transferable knowledge.
通过逐步调整模型复杂度到数据可用性,并重用迁移知识,实现“终身学习系统”。
# Experiments
实验
- Teacher model: a basic CNN model trained on MNIST for 3 epochs.
- Teacher model: 一个基本的CNN模型,在MNIST上训练了3个周期。
- Net2WiderNet experiment:
+ Student model has a wider Conv2D layer and a wider FC layer.
+ 学生模型具有更宽的Conv2D层和更宽的FC层。
+ Comparison of 'random-padding' vs 'net2wider' weight initialization.
+ “random”与“net2wider”权重初始化的比较.
+ With both methods, after 1 epoch, student model should perform as well as
teacher model, but 'net2wider' is slightly better.
这两种方法,在1个周期之后,学生模型应该和教师模型一样表现,但是“net2wider”稍微好一点。
- Net2DeeperNet experiment:
+ Student model has an extra Conv2D layer and an extra FC layer.
学生模型具有更宽的Conv2D层和更宽的FC层。
+ Comparison of 'random-init' vs 'net2deeper' weight initialization.
“random”与“net2wider”权重初始化的比较.
+ After 1 epoch, performance of 'net2deeper' is better than 'random-init'.
一个周期后,“net2deeper”表现比“random-init”好
- Hyper-parameters:
超参数:
+ SGD with momentum=0.9 is used for training teacher and student models.
+ 用momentum=0.9的SGD(随机梯度下降)来训练教师和学生模型。
+ Learning rate adjustment: it's suggested to reduce learning rate
to 1/10 for student model.
学习率调节:建议student model学习率为teacher model的1/10
+ Addition of noise in 'net2wider' is used to break weight symmetry
and thus enable full capacity of student models. It is optional
when a Dropout layer is used.
在“net2wider”中添加噪声来打破权重对称性,从而使学生模型具有完全的容量。当使用Dropout层时,它是可选的。
# Results
结果
- Tested with TF backend and 'channels_last' image_data_format.
使用TF(Tensorflow)后台测试和'channels_last' 图像格式。
- Running on GPU GeForce GTX Titan X Maxwell
基于GPU GeForce GTX Titan X Maxwell运行
- Performance Comparisons - validation loss values during first 3 epochs:
效果比较-前3个周期的验证损失值
Teacher model ...
(0) teacher_model: 0.0537 0.0354 0.0356
Experiment of Net2WiderNet ...
(1) wider_random_pad: 0.0320 0.0317 0.0289
(2) wider_net2wider: 0.0271 0.0274 0.0270
Experiment of Net2DeeperNet ...
(3) deeper_random_init: 0.0682 0.0506 0.0468
(4) deeper_net2deeper: 0.0292 0.0294 0.0286
'''
from __future__ import print_function
import numpy as np
import keras
from keras import backend as K
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten
from keras.optimizers import SGD
from keras.datasets import mnist
if K.image_data_format() == 'channels_first':
input_shape = (1, 28, 28) # image shape # 图像形状
else:
input_shape = (28, 28, 1) # image shape# 图像形状
num_classes = 10 # number of classes # 类别数量
epochs = 3
# load and pre-process data
# 加载和预处理数据
def preprocess_input(x):
return x.astype('float32').reshape((-1,) + input_shape) / 255
def preprocess_output(y):
return keras.utils.to_categorical(y)
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = map(preprocess_input, [x_train, x_test])
y_train, y_test = map(preprocess_output, [y_train, y_test])
print('Loading MNIST data...')
print('x_train shape:', x_train.shape, 'y_train shape:', y_train.shape)
print('x_test shape:', x_test.shape, 'y_test shape', y_test.shape)
# knowledge transfer algorithms
# 知识迁移算法
def wider2net_conv2d(teacher_w1, teacher_b1, teacher_w2, new_width, init):
'''Get initial weights for a wider conv2d layer with a bigger filters,
by 'random-padding' or 'net2wider'.
通过一个更大的过滤器,通过“random-padding”或“net2wider”获得更宽的conv2d(二维卷积层)的初始权重。
# Arguments
参数
teacher_w1: `weight` of conv2d layer to become wider,
of shape (filters1, num_channel1, kh1, kw1)
teacher_w1: 2维卷积层的权重变宽,形状(filters1, num_channel1, kh1, kw1)
teacher_b1: `bias` of conv2d layer to become wider,
of shape (filters1, )
teacher_b1: `2维卷积层的偏置变宽,形状(filters1, )
teacher_w2: `weight` of next connected conv2d layer,
of shape (filters2, num_channel2, kh2, kw2
teacher_w2: ` 下一个连接2维卷积层的变宽,形状(filters2, num_channel2, kh2, kw2)
new_width: new `filters` for the wider conv2d layer
new_width: 变宽的2维卷积层的新过滤器
init: initialization algorithm for new weights,
either 'random-pad' or 'net2wider'
init: 'random-pad'或 'net2wider'的新权重的初始化算法
'''
assert teacher_w1.shape[0] == teacher_w2.shape[1], (
'successive layers from teacher model should have compatible shapes')
assert teacher_w1.shape[3] == teacher_b1.shape[0], (
'weight and bias from same layer should have compatible shapes')
assert new_width > teacher_w1.shape[3], (
'new width (filters) should be bigger than the existing one')
n = new_width - teacher_w1.shape[3]
if init == 'random-pad':
new_w1 = np.random.normal(0, 0.1, size=teacher_w1.shape[:3] + (n,))
new_b1 = np.ones(n) * 0.1
new_w2 = np.random.normal(0, 0.1,
size=teacher_w2.shape[:2] + (n, teacher_w2.shape[3]))
elif init == 'net2wider':
index = np.random.randint(teacher_w1.shape[3], size=n)
factors = np.bincount(index)[index] + 1.
new_w1 = teacher_w1[:, :, :, index]
new_b1 = teacher_b1[index]
new_w2 = teacher_w2[:, :, index, :] / factors.reshape((1, 1, -1, 1))
else:
raise ValueError('Unsupported weight initializer: %s' % init)
student_w1 = np.concatenate((teacher_w1, new_w1), axis=3)
if init == 'random-pad':
student_w2 = np.concatenate((teacher_w2, new_w2), axis=2)
elif init == 'net2wider':
# add small noise to break symmetry, so that student model will have
# full capacity later
# 增加小噪音打破对称,使学生模型随后有充分的能力
noise = np.random.normal(0, 5e-2 * new_w2.std(), size=new_w2.shape)
student_w2 = np.concatenate((teacher_w2, new_w2 + noise), axis=2)
student_w2[:, :, index, :] = new_w2
student_b1 = np.concatenate((teacher_b1, new_b1), axis=0)
return student_w1, student_b1, student_w2
def wider2net_fc(teacher_w1, teacher_b1, teacher_w2, new_width, init):
'''Get initial weights for a wider fully connected (dense) layer
with a bigger nout, by 'random-padding' or 'net2wider'.
通过“random-padding”或“net2wider”获得更大的完全连接层的初始权重,具有更大的nout。
# Arguments
参数
teacher_w1: `weight` of fc layer to become wider,
of shape (nin1, nout1)
teacher_w1: `变宽的fc层权重,形状(nin1, nout1)
teacher_b1: `bias` of fc layer to become wider,
of shape (nout1, )
teacher_b1: 变宽的fc层偏置,形状(nin1, nout1)
teacher_w2: `weight` of next connected fc layer,
of shape (nin2, nout2)
teacher_w2: `下一个fc层权重,形状(nin2, nout2))
new_width: new `nout` for the wider fc layer
new_width: 变宽的fc层的新 `nout`
init: initialization algorithm for new weights,
either 'random-pad' or 'net2wider'
init:'random-pad'或 'net2wider'的新权重的初始化算法
'''
assert teacher_w1.shape[1] == teacher_w2.shape[0], (
'successive layers from teacher model should have compatible shapes')
assert teacher_w1.shape[1] == teacher_b1.shape[0], (
'weight and bias from same layer should have compatible shapes')
assert new_width > teacher_w1.shape[1], (
'new width (nout) should be bigger than the existing one')
n = new_width - teacher_w1.shape[1]
if init == 'random-pad':
new_w1 = np.random.normal(0, 0.1, size=(teacher_w1.shape[0], n))
new_b1 = np.ones(n) * 0.1
new_w2 = np.random.normal(0, 0.1, size=(n, teacher_w2.shape[1]))
elif init == 'net2wider':
index = np.random.randint(teacher_w1.shape[1], size=n)
factors = np.bincount(index)[index] + 1.
new_w1 = teacher_w1[:, index]
new_b1 = teacher_b1[index]
new_w2 = teacher_w2[index, :] / factors[:, np.newaxis]
else:
raise ValueError('Unsupported weight initializer: %s' % init)
student_w1 = np.concatenate((teacher_w1, new_w1), axis=1)
if init == 'random-pad':
student_w2 = np.concatenate((teacher_w2, new_w2), axis=0)
elif init == 'net2wider':
# add small noise to break symmetry, so that student model will have
# full capacity later
noise = np.random.normal(0, 5e-2 * new_w2.std(), size=new_w2.shape)
student_w2 = np.concatenate((teacher_w2, new_w2 + noise), axis=0)
student_w2[index, :] = new_w2
student_b1 = np.concatenate((teacher_b1, new_b1), axis=0)
return student_w1, student_b1, student_w2
def deeper2net_conv2d(teacher_w):
'''Get initial weights for a deeper conv2d layer by net2deeper'.
通过net2deeper获得更深的conv2d层的初始权重。
# Arguments
参数
teacher_w: `weight` of previous conv2d layer,
of shape (kh, kw, num_channel, filters) 、
teacher_w: `weight` 以前的conv2d层,形状(kh, kw, num_channel, filters)
'''
kh, kw, num_channel, filters = teacher_w.shape
student_w = np.zeros_like(teacher_w)
for i in range(filters):
student_w[(kh - 1) // 2, (kw - 1) // 2, i, i] = 1.
student_b = np.zeros(filters)
return student_w, student_b
def copy_weights(teacher_model, student_model, layer_names):
'''Copy weights from teacher_model to student_model,
for layers with names listed in layer_names
将从教师模型到学生模型的权重复制,用于层名称中列出的名称的层
'''
for name in layer_names:
weights = teacher_model.get_layer(name=name).get_weights()
student_model.get_layer(name=name).set_weights(weights)
# methods to construct teacher_model and student_models
# 建构教师模型和学生模型的方法
def make_teacher_model(x_train, y_train,
x_test, y_test,
epochs):
'''Train and benchmark performance of a simple CNN.
简单CNN训练和基准绩效。
(0) Teacher model
(0) 教师模型
'''
model = Sequential()
model.add(Conv2D(64, 3, input_shape=input_shape,
padding='same', name='conv1'))
model.add(MaxPooling2D(2, name='pool1'))
model.add(Conv2D(64, 3, padding='same', name='conv2'))
model.add(MaxPooling2D(2, name='pool2'))
model.add(Flatten(name='flatten'))
model.add(Dense(64, activation='relu', name='fc1'))
model.add(Dense(num_classes, activation='softmax', name='fc2'))
model.compile(loss='categorical_crossentropy',
optimizer=SGD(lr=0.01, momentum=0.9),
metrics=['accuracy'])
model.fit(x_train, y_train,
epochs=epochs,
validation_data=(x_test, y_test))
return model
def make_wider_student_model(teacher_model,
x_train, y_train,
x_test, y_test,
init, epochs):
'''Train a wider student model based on teacher_model,
with either 'random-pad' (baseline) or 'net2wider'
使用'random-init' (baseline) 或 'net2deeper',基于教师模型训练较宽的学生模型
'''
new_conv1_width = 128
new_fc1_width = 128
model = Sequential()
# a wider conv1 compared to teacher_model
# 比教师模型宽的conv1
model.add(Conv2D(new_conv1_width, 3, input_shape=input_shape,
padding='same', name='conv1'))
model.add(MaxPooling2D(2, name='pool1'))
model.add(Conv2D(64, 3, padding='same', name='conv2'))
model.add(MaxPooling2D(2, name='pool2'))
model.add(Flatten(name='flatten'))
# a wider fc1 compared to teacher model
# 比教师模型宽的fc1
model.add(Dense(new_fc1_width, activation='relu', name='fc1'))
model.add(Dense(num_classes, activation='softmax', name='fc2'))
# The weights for other layers need to be copied from teacher_model
# to student_model, except for widened layers
# and their immediate downstreams, which will be initialized separately.
# For this example there are no other layers that need to be copied.
# 其他层的权重需要从老师模型到学生模型复制,除了加宽的层和它们的下一层,它
# 们将分别被初始化。对于这个例子,没有其他需要复制的层。
w_conv1, b_conv1 = teacher_model.get_layer('conv1').get_weights()
w_conv2, b_conv2 = teacher_model.get_layer('conv2').get_weights()
new_w_conv1, new_b_conv1, new_w_conv2 = wider2net_conv2d(
w_conv1, b_conv1, w_conv2, new_conv1_width, init)
model.get_layer('conv1').set_weights([new_w_conv1, new_b_conv1])
model.get_layer('conv2').set_weights([new_w_conv2, b_conv2])
w_fc1, b_fc1 = teacher_model.get_layer('fc1').get_weights()
w_fc2, b_fc2 = teacher_model.get_layer('fc2').get_weights()
new_w_fc1, new_b_fc1, new_w_fc2 = wider2net_fc(
w_fc1, b_fc1, w_fc2, new_fc1_width, init)
model.get_layer('fc1').set_weights([new_w_fc1, new_b_fc1])
model.get_layer('fc2').set_weights([new_w_fc2, b_fc2])
model.compile(loss='categorical_crossentropy',
optimizer=SGD(lr=0.001, momentum=0.9),
metrics=['accuracy'])
model.fit(x_train, y_train,
epochs=epochs,
validation_data=(x_test, y_test))
def make_deeper_student_model(teacher_model,
x_train, y_train,
x_test, y_test,
init, epochs):
'''Train a deeper student model based on teacher_model,
with either 'random-init' (baseline) or 'net2deeper'
使用'random-init' (baseline) 或 'net2deeper',基于教师模型训练较深的学生模型
'''
model = Sequential()
model.add(Conv2D(64, 3, input_shape=input_shape,
padding='same', name='conv1'))
model.add(MaxPooling2D(2, name='pool1'))
model.add(Conv2D(64, 3, padding='same', name='conv2'))
# add another conv2d layer to make original conv2 deeper
# 添加2维卷积层使原2维卷积层更深
if init == 'net2deeper':
prev_w, _ = model.get_layer('conv2').get_weights()
new_weights = deeper2net_conv2d(prev_w)
model.add(Conv2D(64, 3, padding='same',
name='conv2-deeper', weights=new_weights))
elif init == 'random-init':
model.add(Conv2D(64, 3, padding='same', name='conv2-deeper'))
else:
raise ValueError('Unsupported weight initializer: %s' % init)
model.add(MaxPooling2D(2, name='pool2'))
model.add(Flatten(name='flatten'))
model.add(Dense(64, activation='relu', name='fc1'))
# add another fc layer to make original fc1 deeper
# 添加一个全连接层使原全连接层更深
if init == 'net2deeper':
# net2deeper for fc layer with relu, is just an identity initializer
model.add(Dense(64, kernel_initializer='identity',
activation='relu', name='fc1-deeper'))
elif init == 'random-init':
model.add(Dense(64, activation='relu', name='fc1-deeper'))
else:
raise ValueError('Unsupported weight initializer: %s' % init)
model.add(Dense(num_classes, activation='softmax', name='fc2'))
# copy weights for other layers
# 为其他层复制权重
copy_weights(teacher_model, model, layer_names=[
'conv1', 'conv2', 'fc1', 'fc2'])
model.compile(loss='categorical_crossentropy',
optimizer=SGD(lr=0.001, momentum=0.9),
metrics=['accuracy'])
model.fit(x_train, y_train,
epochs=epochs,
validation_data=(x_test, y_test))
# experiments setup
# 实验配置
def net2wider_experiment():
'''Benchmark performances of
基准绩效
(1) a wider student model with `random_pad` initializer
基于`random_pad` 初始化的广度学生模型
(2) a wider student model with `Net2WiderNet` initializer
基于`Net2WiderNet` 初始化的广度学生模型
'''
print('\nExperiment of Net2WiderNet ...')
print('\n(1) building wider student model by random padding ...')
make_wider_student_model(teacher_model,
x_train, y_train,
x_test, y_test,
init='random-pad',
epochs=epochs)
print('\n(2) building wider student model by net2wider ...')
make_wider_student_model(teacher_model,
x_train, y_train,
x_test, y_test,
init='net2wider',
epochs=epochs)
def net2deeper_experiment():
'''Benchmark performances of
基准绩效
(3) a deeper student model with `random_init` initializer
基于`random_init` 初始化的深度学生模型
(4) a deeper student model with `Net2DeeperNet` initializer
基于`Net2DeeperNet` 初始化的深度学生模型
'''
print('\nExperiment of Net2DeeperNet ...')
print('\n(3) building deeper student model by random init ...')
make_deeper_student_model(teacher_model,
x_train, y_train,
x_test, y_test,
init='random-init',
epochs=epochs)
print('\n(4) building deeper student model by net2deeper ...')
make_deeper_student_model(teacher_model,
x_train, y_train,
x_test, y_test,
init='net2deeper',
epochs=epochs)
print('\n(0) building teacher model ...')
teacher_model = make_teacher_model(x_train, y_train,
x_test, y_test,
epochs=epochs)
# run the experiments
# 运行实验
net2wider_experiment()
net2deeper_experiment()
代码执行
Keras详细介绍
中文:http://keras-cn.readthedocs.io/en/latest/
实例下载
https://github.com/keras-team/keras
https://github.com/keras-team/keras/tree/master/examples
完整项目下载
方便没积分童鞋,请加企鹅452205574,共享文件夹。
包括:代码、数据集合(图片)、已生成model、安装库文件等。