Convolution+BatchNorm+Scale+Relu的组合模块在卷积后进行归一化,然后在放出非线性Relu层,可以加速训练收敛。但在推理时BatchNorm非常耗时,可以将训练时学习到的BatchNorm+Scale的线性变换参数融合到卷积层,替换原来的Convolution层中weight和bias,实现在不影响准确度的前提下加速预测时间。
一、Convolution+BatchNorm+Scale层在caffe中参数设置示例:
layer {
name: "Conv1"
type: "Convolution"
bottom: "Conv1"
top: "Conv2"
convolution_param {
num_output: 64
kernel_h:1
kernel_w:3
pad_h: 0
pad_w: 1
stride: 1
weight_filler {
type: "msra"
}
bias_term: false
}
}
layer {
name: "Conv2/bn"
type: "BatchNorm"
bottom: "Conv2"
top: "Conv2"
batch_norm_param {
use_global_stats: false
eps:1e-03
}
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
include {
phase: TRAIN
}
}
lay