Net_surgery: 关于如何reshape caffe中.caffemodel中的卷积层
1. 前言
前提是当有一个训练好的模型或者一个预训练模型,而我们需要微调网络的时候,例如修改某些卷积层的 size,需要微调网络的原因有很多,例如想要提升模型精度,增加某些卷积层的 channel,或者更在意模型的实时性(速度),减少某些卷积层的 channel。当 reshape 了网络某些层的 size 时,毫无疑问需要重新训练模型,有两种思路吧:
- 修改 prototxt 文件,在 caffe 框架下重头新训练
- 修改 prototxt 文件,利用已经训练的 caffemodel 或者预训练模型的参数训练
第一种思路的话,最简单也是耗时最久的,收敛速度比较慢,在训练过程中也需要不断调小 learn_rate,这样训练就更慢了,且有过拟合的风险?第二种往往是更好的,好处就是之前辛苦训练的结果没有白费,加快了训练的过程,但是有一个问题就是新的 prototxt 文件与原来已经训练好的 caffemodel 中的权重参数的 size 肯定是不匹配的,无法直接训练,所以本文就是为了记录这个问题,给出一个简单的示例。
2. 进入正题
-
前提: 已经有了训练过的模型(或者预训练模型)old_deploy.prototxt, old.caffemodel
以及 修改网络某些层后 得到的新模型的 new_deploy.prototxt
根据 new.prototxt 得到初步的 new.caffemodel:
在 caffe 中利用 new.prototxt 训练,随便迭代多少代,迭代一次就可以了,目的只是为了得到 new.caffemodel,caffe 中怎么训练就不说了。
#!/bin/sh
# mkdir -p snapshot
/home/alpha/ssd/build/tools/caffe train -solver="solver_train.prototxt" -gpu 0
从 old.caffemodel 传递参数到 new.caffemodel:
在 caffe 的官网中,有关于 Net_surgery 的一个例子,里面展示了如何将 weights 参数从一个 caffemodel 复制到另一个相同大小的 caffemodel,具体参考后文中的 links,而如果改变了某些卷积的通道数,如何来传递参数呢,首先加载两个模型,看一下那些层是做了改变的,当然直接从 prototxt 文件也可以看出。
#coding= utf-8
import sys
import os
caffe_root = '/home/alpha/ssd/'
sys.path.insert(0, caffe_root + 'python')
os.environ['GLOG_minloglevel'] = '2' # 设置不打印加载模型时候的 log
import caffe
caffe.set_mode_cpu()
net_old = caffe.Net('old_deploy.prototxt', 'old.caffemodel', caffe.TEST)
net_new = caffe.Net('new_deploy.prototxt', 'new.caffemodel', caffe.TEST)
# visualize every layer output of old model
for old_layername, blob in net_old.blobs.iteritems():
print(old_layername + '\t' + str(blob.data.shape))
print('---------------------------------------------------------------')
# visualize every layer output of new model
for new_layername, blob in net_new.blobs.iteritems():
print(new_layername + '\t' + str(blob.data.shape))
在我的模型中输出是这样的,两个模型每一层的名称我没有改,下面只显示有改变的卷积层:
# old
... ...
conv_6/project (1, 64, 19, 19)
conv_7/expand (1, 384, 19, 19)
conv_7/depthwise (1, 384, 19, 19)
conv_7/project (1, 64, 19, 19)
conv_8/expand (1, 384, 19, 19)
conv_8/depthwise (1, 384, 19, 19)
conv_8/project (1, 64, 19, 19)
conv_9/expand (1, 384, 19, 19)
conv_9/depthwise (1, 384, 19, 19)
conv_9/project (1, 64, 19, 19)
conv_10/expand (1, 384, 19, 19)
conv_10/depthwise (1, 384, 19, 19)
conv_10/project (1, 96, 19, 19)
... ...
--------------------------------------------------------------------
# new
... ...
conv_6/project (1, 48, 19, 19)
conv_7/expand (1, 288, 19, 19)
conv_7/depthwise (1, 288, 19, 19)
conv_7/project (1, 48, 19, 19)
conv_8/expand (1, 288, 19, 19)
conv_8/depthwise (1, 288, 19, 19)
conv_8/project (1, 48, 19, 19)
conv_9/expand (1, 288, 19, 19)
conv_9/depthwise (1, 288, 19, 19)
conv_9/project (1, 48, 19, 19)
conv_10/expand (1, 288, 19, 19)
conv_10/depthwise (1, 288, 19, 19)
conv_10/project (1, 96, 19, 19)
... ...
以 conv_6/project 层为例,size 从 (1, 64, 19, 19) 变为了 (1, 48, 19, 19),caffe 中 net.blobs 不存储学习的 weights,而是存储在 net.params 中,因此传递参数:
net_new.params['conv_6/project'][0].data[...] = net_old.params['conv_6/project'][0].data[:48,:,:,:]
对于没有改变的层,直接复制,例如
net_new.params['conv_1/expand'][0].data[...] = net_old.params['conv_1/expand'][0].data[...]
如果获取所有层的名称,当然这个得到不止卷积层,还有其他层
for layer_name, param in net_new.params.iteritems():
print layer_name + '\t' + str(param[0].data.shape)
保存新模型
net_new.save('newest.caffemodel')
因此,对于我自己的模型,完整的参数传递如下,仅参考
#coding= utf-8
import sys
import os
caffe_root = '/home/alpha/ssd/'
sys.path.insert(0, caffe_root + 'python')
# os.environ['GLOG_minloglevel'] = '2' # 设置不打印加载模型时候的 log
import caffe
# size 未改的卷积层名称
CONV_LAYERS = ('Conv', 'conv/depthwise', 'conv/project',
'conv_1/expand', 'conv_1/depthwise', 'conv_1/project', 'conv_2/expand', 'conv_2/depthwise', 'conv_2/project',
'conv_3/expand', 'conv_3/depthwise', 'conv_3/project', 'conv_4/expand', 'conv_4/depthwise', 'conv_4/project',
'conv_5/expand', 'conv_5/depthwise', 'conv_5/project', 'conv_6/expand', 'conv_6/depthwise',
'conv_11/expand', 'conv_11/depthwise', 'conv_11/project', 'conv_12/expand', 'conv_12/depthwise', 'conv_12/project',
'conv_13/expand', 'conv_13/depthwise', 'conv_13/project', 'conv_14/expand', 'conv_14/depthwise', 'conv_14/project',
'conv_15/expand', 'conv_15/depthwise', 'conv_15/project', 'conv_16/expand', 'conv_16/depthwise', 'conv_16/project',
'Conv_1',
'layer_19_1_2', 'layer_19_2_2/depthwise', 'layer_19_2_2', 'layer_19_1_3', 'layer_19_2_3/depthwise', 'layer_19_2_3',
'layer_19_1_4', 'layer_19_2_4/depthwise', 'layer_19_2_4', 'layer_19_1_5', 'layer_19_2_5/depthwise', 'layer_19_2_5',
'conv_13/expand_mbox_loc/depthwise', 'conv_13/expand_mbox_loc', 'conv_13/expand_mbox_conf/depthwise', 'conv_13/expand_mbox_conf',
'Conv_1_mbox_loc/depthwise', 'Conv_1_mbox_loc', 'Conv_1_mbox_conf/depthwise', 'Conv_1_mbox_conf',
'layer_19_2_2_mbox_loc/depthwise', 'layer_19_2_2_mbox_loc', 'layer_19_2_2_mbox_conf/depthwise', 'layer_19_2_2_mbox_conf',
'layer_19_2_3_mbox_loc/depthwise', 'layer_19_2_3_mbox_loc', 'layer_19_2_3_mbox_conf/depthwise', 'layer_19_2_3_mbox_conf',
'layer_19_2_4_mbox_loc/depthwise', 'layer_19_2_4_mbox_loc', 'layer_19_2_4_mbox_conf/depthwise', 'layer_19_2_4_mbox_conf',
'layer_19_2_5_mbox_loc/depthwise', 'layer_19_2_5_mbox_loc', 'layer_19_2_5_mbox_conf/depthwise', 'layer_19_2_5_mbox_conf')
# load two caffemodel
caffe.set_mode_cpu()
net_old = caffe.Net('old_deploy.prototxt', 'old.caffemodel', caffe.TEST)
net_new = caffe.Net('new_deploy.prototxt', 'new.caffemodel', caffe.TEST)
# size 没变的卷积层,即存在 CONV_LAYERS 中的卷积层,参数直接复制
for i in range(len(CONV_LAYERS)):
net_new.params[CONV_LAYERS[i]][0].data[...] = net_old.params[CONV_LAYERS[i]][0].data[...]
# size 改变的卷积层,根据改变后的 size 传递参数
net_new.params['conv_6/project'][0].data[...] = net_old.params['conv_6/project'][0].data[:48,:,:,:]
net_new.params['conv_7/expand'][0].data[...] = net_old.params['conv_7/expand'][0].data[:288,:48,:,:]
net_new.params['conv_7/depthwise'][0].data[...] = net_old.params['conv_7/depthwise'][0].data[:288,:,:,:]
net_new.params['conv_7/project'][0].data[...] = net_old.params['conv_7/project'][0].data[:48,:288,:,:]
net_new.params['conv_8/expand'][0].data[...] = net_old.params['conv_8/expand'][0].data[:288,:48,:,:]
net_new.params['conv_8/depthwise'][0].data[...] = net_old.params['conv_8/depthwise'][0].data[:288,:,:,:]
net_new.params['conv_8/project'][0].data[...] = net_old.params['conv_8/project'][0].data[:48,:288,:,:]
net_new.params['conv_9/expand'][0].data[...] = net_old.params['conv_9/expand'][0].data[:288,:48,:,:]
net_new.params['conv_9/depthwise'][0].data[...] = net_old.params['conv_9/depthwise'][0].data[:288,:,:,:]
net_new.params['conv_9/project'][0].data[...] = net_old.params['conv_9/project'][0].data[:48,:288,:,:]
net_new.params['conv_10/expand'][0].data[...] = net_old.params['conv_10/expand'][0].data[:288,:48,:,:]
net_new.params['conv_10/depthwise'][0].data[...] = net_old.params['conv_10/depthwise'][0].data[:288,:,:,:]
net_new.params['conv_10/project'][0].data[...] = net_old.params['conv_10/project'][0].data[:,:288,:,:]
# save 新模型
net_new.save('newest.caffemodel')
3. links
caffe 官方 Net_surgery notebook 例子
others
- Email: yxiao2048@gmail.com