Net_surgery: 关于如何reshape caffe中.caffemodel中的卷积层

Net_surgery: 关于如何reshape caffe中.caffemodel中的卷积层

1. 前言


前提是当有一个训练好的模型或者一个预训练模型,而我们需要微调网络的时候,例如修改某些卷积层的 size,需要微调网络的原因有很多,例如想要提升模型精度,增加某些卷积层的 channel,或者更在意模型的实时性(速度),减少某些卷积层的 channel。当 reshape 了网络某些层的 size 时,毫无疑问需要重新训练模型,有两种思路吧:

  • 修改 prototxt 文件,在 caffe 框架下重头新训练
  • 修改 prototxt 文件,利用已经训练的 caffemodel 或者预训练模型的参数训练

第一种思路的话,最简单也是耗时最久的,收敛速度比较慢,在训练过程中也需要不断调小 learn_rate,这样训练就更慢了,且有过拟合的风险?第二种往往是更好的,好处就是之前辛苦训练的结果没有白费,加快了训练的过程,但是有一个问题就是新的 prototxt 文件与原来已经训练好的 caffemodel 中的权重参数的 size 肯定是不匹配的,无法直接训练,所以本文就是为了记录这个问题,给出一个简单的示例。

2. 进入正题


  • 前提: 已经有了训练过的模型(或者预训练模型)old_deploy.prototxt, old.caffemodel

    以及 修改网络某些层后 得到的新模型的 new_deploy.prototxt

根据 new.prototxt 得到初步的 new.caffemodel:

在 caffe 中利用 new.prototxt 训练,随便迭代多少代,迭代一次就可以了,目的只是为了得到 new.caffemodel,caffe 中怎么训练就不说了。

#!/bin/sh
# mkdir -p snapshot
/home/alpha/ssd/build/tools/caffe train -solver="solver_train.prototxt"  -gpu 0

从 old.caffemodel 传递参数到 new.caffemodel:

在 caffe 的官网中,有关于 Net_surgery 的一个例子,里面展示了如何将 weights 参数从一个 caffemodel 复制到另一个相同大小的 caffemodel,具体参考后文中的 links,而如果改变了某些卷积的通道数,如何来传递参数呢,首先加载两个模型,看一下那些层是做了改变的,当然直接从 prototxt 文件也可以看出。

#coding= utf-8
import sys
import os

caffe_root = '/home/alpha/ssd/'
sys.path.insert(0, caffe_root + 'python')
os.environ['GLOG_minloglevel'] = '2' # 设置不打印加载模型时候的 log
import caffe

caffe.set_mode_cpu()
net_old = caffe.Net('old_deploy.prototxt', 'old.caffemodel', caffe.TEST)
net_new = caffe.Net('new_deploy.prototxt', 'new.caffemodel', caffe.TEST)

# visualize every layer output of old model
for old_layername, blob in net_old.blobs.iteritems():
    print(old_layername + '\t' + str(blob.data.shape))
print('---------------------------------------------------------------')    
# visualize every layer output of new model
for new_layername, blob in net_new.blobs.iteritems():
    print(new_layername + '\t' + str(blob.data.shape))

在我的模型中输出是这样的,两个模型每一层的名称我没有改,下面只显示有改变的卷积层:

# old
... ...
conv_6/project	(1, 64, 19, 19)
conv_7/expand	(1, 384, 19, 19)
conv_7/depthwise	(1, 384, 19, 19)
conv_7/project	(1, 64, 19, 19)
conv_8/expand	(1, 384, 19, 19)
conv_8/depthwise	(1, 384, 19, 19)
conv_8/project	(1, 64, 19, 19)
conv_9/expand	(1, 384, 19, 19)
conv_9/depthwise	(1, 384, 19, 19)
conv_9/project	(1, 64, 19, 19)
conv_10/expand	(1, 384, 19, 19)
conv_10/depthwise	(1, 384, 19, 19)
conv_10/project	(1, 96, 19, 19)
... ...
--------------------------------------------------------------------
# new
... ...
conv_6/project	(1, 48, 19, 19)
conv_7/expand	(1, 288, 19, 19)
conv_7/depthwise	(1, 288, 19, 19)
conv_7/project	(1, 48, 19, 19)
conv_8/expand	(1, 288, 19, 19)
conv_8/depthwise	(1, 288, 19, 19)
conv_8/project	(1, 48, 19, 19)
conv_9/expand	(1, 288, 19, 19)
conv_9/depthwise	(1, 288, 19, 19)
conv_9/project	(1, 48, 19, 19)
conv_10/expand	(1, 288, 19, 19)
conv_10/depthwise	(1, 288, 19, 19)
conv_10/project	(1, 96, 19, 19)
... ...

以 conv_6/project 层为例,size 从 (1, 64, 19, 19) 变为了 (1, 48, 19, 19),caffe 中 net.blobs 不存储学习的 weights,而是存储在 net.params 中,因此传递参数:

net_new.params['conv_6/project'][0].data[...] = net_old.params['conv_6/project'][0].data[:48,:,:,:]

对于没有改变的层,直接复制,例如

net_new.params['conv_1/expand'][0].data[...] = net_old.params['conv_1/expand'][0].data[...]

如果获取所有层的名称,当然这个得到不止卷积层,还有其他层

for layer_name, param in net_new.params.iteritems():
    print layer_name + '\t' + str(param[0].data.shape)

保存新模型

net_new.save('newest.caffemodel')

因此,对于我自己的模型,完整的参数传递如下,仅参考

#coding= utf-8
import sys
import os

caffe_root = '/home/alpha/ssd/'
sys.path.insert(0, caffe_root + 'python')
# os.environ['GLOG_minloglevel'] = '2' # 设置不打印加载模型时候的 log
import caffe

# size 未改的卷积层名称
CONV_LAYERS = ('Conv', 'conv/depthwise', 'conv/project',
               'conv_1/expand', 'conv_1/depthwise', 'conv_1/project', 'conv_2/expand', 'conv_2/depthwise', 'conv_2/project',
               'conv_3/expand', 'conv_3/depthwise', 'conv_3/project', 'conv_4/expand', 'conv_4/depthwise', 'conv_4/project',
               'conv_5/expand', 'conv_5/depthwise', 'conv_5/project', 'conv_6/expand', 'conv_6/depthwise', 
              
               'conv_11/expand', 'conv_11/depthwise', 'conv_11/project', 'conv_12/expand', 'conv_12/depthwise', 'conv_12/project',
               'conv_13/expand', 'conv_13/depthwise', 'conv_13/project', 'conv_14/expand', 'conv_14/depthwise', 'conv_14/project',
               'conv_15/expand', 'conv_15/depthwise', 'conv_15/project', 'conv_16/expand', 'conv_16/depthwise', 'conv_16/project',
               
               'Conv_1', 
               'layer_19_1_2', 'layer_19_2_2/depthwise', 'layer_19_2_2', 'layer_19_1_3', 'layer_19_2_3/depthwise', 'layer_19_2_3',
               'layer_19_1_4', 'layer_19_2_4/depthwise', 'layer_19_2_4', 'layer_19_1_5', 'layer_19_2_5/depthwise', 'layer_19_2_5', 
               
               'conv_13/expand_mbox_loc/depthwise', 'conv_13/expand_mbox_loc', 'conv_13/expand_mbox_conf/depthwise', 'conv_13/expand_mbox_conf', 
               'Conv_1_mbox_loc/depthwise', 'Conv_1_mbox_loc', 'Conv_1_mbox_conf/depthwise', 'Conv_1_mbox_conf', 
              
               'layer_19_2_2_mbox_loc/depthwise', 'layer_19_2_2_mbox_loc', 'layer_19_2_2_mbox_conf/depthwise', 'layer_19_2_2_mbox_conf', 
               'layer_19_2_3_mbox_loc/depthwise', 'layer_19_2_3_mbox_loc', 'layer_19_2_3_mbox_conf/depthwise', 'layer_19_2_3_mbox_conf', 
               'layer_19_2_4_mbox_loc/depthwise', 'layer_19_2_4_mbox_loc', 'layer_19_2_4_mbox_conf/depthwise', 'layer_19_2_4_mbox_conf', 
               'layer_19_2_5_mbox_loc/depthwise', 'layer_19_2_5_mbox_loc', 'layer_19_2_5_mbox_conf/depthwise', 'layer_19_2_5_mbox_conf')


# load two caffemodel
caffe.set_mode_cpu()
net_old = caffe.Net('old_deploy.prototxt', 'old.caffemodel', caffe.TEST)
net_new = caffe.Net('new_deploy.prototxt', 'new.caffemodel', caffe.TEST)

# size 没变的卷积层,即存在 CONV_LAYERS 中的卷积层,参数直接复制
for i in range(len(CONV_LAYERS)):
    net_new.params[CONV_LAYERS[i]][0].data[...] = net_old.params[CONV_LAYERS[i]][0].data[...]

# size 改变的卷积层,根据改变后的 size 传递参数
net_new.params['conv_6/project'][0].data[...] = net_old.params['conv_6/project'][0].data[:48,:,:,:]
net_new.params['conv_7/expand'][0].data[...] = net_old.params['conv_7/expand'][0].data[:288,:48,:,:]
net_new.params['conv_7/depthwise'][0].data[...] = net_old.params['conv_7/depthwise'][0].data[:288,:,:,:]
net_new.params['conv_7/project'][0].data[...] = net_old.params['conv_7/project'][0].data[:48,:288,:,:]
net_new.params['conv_8/expand'][0].data[...] = net_old.params['conv_8/expand'][0].data[:288,:48,:,:]
net_new.params['conv_8/depthwise'][0].data[...] = net_old.params['conv_8/depthwise'][0].data[:288,:,:,:]
net_new.params['conv_8/project'][0].data[...] = net_old.params['conv_8/project'][0].data[:48,:288,:,:]
net_new.params['conv_9/expand'][0].data[...] = net_old.params['conv_9/expand'][0].data[:288,:48,:,:]
net_new.params['conv_9/depthwise'][0].data[...] = net_old.params['conv_9/depthwise'][0].data[:288,:,:,:]
net_new.params['conv_9/project'][0].data[...] = net_old.params['conv_9/project'][0].data[:48,:288,:,:]
net_new.params['conv_10/expand'][0].data[...] = net_old.params['conv_10/expand'][0].data[:288,:48,:,:]
net_new.params['conv_10/depthwise'][0].data[...] = net_old.params['conv_10/depthwise'][0].data[:288,:,:,:]
net_new.params['conv_10/project'][0].data[...] = net_old.params['conv_10/project'][0].data[:,:288,:,:]

# save 新模型
net_new.save('newest.caffemodel')

3. links

caffe 官方 Net_surgery notebook 例子

others

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值