finetune的好处想必大家都知道,在此不多说,那么在caffe中又是如何实现的呢。上代码:
./build/tools/caffe train -solver xxx.prototxt -weights xxx.caffemodel
意思就是用xxx.caffemodel里的训练好的权重初始化xxx.prototxt,里所要初始化的网络。
那么如何将xxx.caffemodel里的参数运用到自己的模型中呢,需要注意的是一下几点:
1.xxx.caffemodel 是别人训练好的参数,需要你从网络下载下来,比如here 或者自己训练过程中保存下来
2.自己的模型中,需要finetune的那几层layer的name需要和xxx.caffemodel中的name一样,type也一样。bottom和top的name一样不一样无所谓。
3.并不是name一样就可以了,需要finetune的这一层layer的bottom和top的shape必须和caffemodel对应的layer一致,不然会报错
4.要是xxx.cafemodel对应的layer的bottom,top和自己的model不一致,需要把该layer的name修改成与xxx.caffemodel中不一样就可,比如用lenet训练mnist之后的model来finetune自己的一个2分类的model,明显最后全连接的输出从10变成了2,则此全连接的layer的name不能与lenet相同。
为了说明问题,下面给出一个例子(不考虑finetune的是否合理性),用lenet_train_test.prototxt训练好的caffemodel网络来finetune自己的model
lenet_train_test网络如下:
name: "LeNet"
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/mnist/mnist_train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/mnist/mnist_test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 50
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
被finetune的网络如下:
name: "MyFinetuneNet"
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/mnist/mnist_train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/mnist/mnist_test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "data"
top: "fc1"
param {
lr_mult: 10
}
param {
lr_mult: 20
}
inner_product_param {
num_output: 800
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "fc1"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
简单来说lenet的网络为
mnist-conv1-pool1-conv2-pool2-ip1-relu1-ip2-loss
而自己的model为:
mnist-fc1-relu3-ip1-relu4-ip2-loss
在这里finetune的layer为ip1和ip2,
为了说明问题,人为的将自己model中的ip1和lenet的ip1进行finetune,但是不做任何处理的话这两个layer的输入shape是不一致的,无法finetune,运行时会报错,
因此人为的增加fc1,使得fc1的top的shape==lenet的ip1的bottom,这样便可finetune
这说明只要name,type,shape等一致,两个网络中的任意两个layer都可finetune。(当然还要考虑合理性)