LeNet: the MNIST Classification Model
We have defined the layers in $CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt
注:
(1)CAFFE_ROOT
是caffe在本机的安装目录
(2)lenet_train_test.prototxt
文件中已经填好了网络的定义语句。如下的部分是对定义的语句做分析。
(3)如下共分析了两个文件:1是网络的定义文件lenet_train_test.prototxt
。该文件中描述了网络中的层和关系;2是参数的设置文件lenet_solver.prototxt
。该文件则描述了学习率等参数的设置。
(4)因为一手资料都是英文的,为了培养自己的阅读惯性,因此尽量使用英文来做笔记。
Define the MNIST Network
This section explains the lenet_train_test.prototxt
model definition that specifies the LeNet model for MNIST handwritten digit classification. We assume that you are familiar with Google Protobuf, and assume that you have read the protobuf definitions used by Caffe, which can be found at$CAFFE_ROOT/src/caffe/proto/caffe.proto
.
start by giving the network a name:
name: "LeNet"
Writing the Data Layer
layer {
name: "mnist"
type: "Data"
transform_param {
scale: 0.00390625
}
data_param {
source: "mnist_train_lmdb"
backend: LMDB
batch_size: 64
}
top: "data"
top: "label"
}
注:scale是数据数据缩放的意思,0.00390625是1/256
And finally, this layer produces two blobs, one is the data blob, and one is the label blob.
Writing the Convolution Layer
layer {
name: "conv1"
type: "Convolution"
param { lr_mult: 1 }
param { lr_mult: 2 }
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "data"
top: "conv1"
}
This layer takes the data blob (it is provided by the data layer), and produces the conv1 layer. It produces outputs of 20 channels, with the convolutional kernel size 5 and carried out with stride 1.
The fillers allow us to randomly initialize the value of the weights and bias. For the weight filler, we will use the xavier algorithm that automatically determines the scale of initialization based on the number of input and output neurons. For the bias filler, we will simply initialize it as constant, with the default filling value 0.
lr_mults are the learning rate adjustments for the layer’s learnable parameters. In this case, we will set the weight learning rate to be the same as the learning rate given by the solver during runtime, and the bias learning rate to be twice as large as that - this usually leads to better convergence rates.
Writing the Pooling Layer
layer {
name: "pool1"
type: "Pooling"
pooling_param {
kernel_size: 2
stride: 2
pool: MAX
}
bottom: "conv1"
top: "pool1"
}
Writing the Fully Connected Layer
layer {
name: "ip1"
type: "InnerProduct"
param { lr_mult: 1 }
param { lr_mult: 2 }
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "pool2"
top: "ip1"
}
Writing the ReLU Layer
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
Since ReLU is an element-wise operation(元素级别), we can do in-place operations to save some memory. This is achieved by simply giving the same name to the bottom and top blobs. Of course, do NOT use duplicated blob names for other layer types!
After the ReLU layer, we will write another innerproduct layer
layer {
name: "ip2"
type: "InnerProduct"
param { lr_mult: 1 }
param { lr_mult: 2 }
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "ip1"
top: "ip2"
}
Writing the Loss Layer
layer {
name: "loss" #loss also known as error,cost,or objective function
type: "SoftmaxWithLoss"
bottom: "ip2" #prediction
bottom: "label"#truth
}
The softmax_loss layer implements both the softmax and the multinomial logistic loss (that saves time and improves numerical stability). It takes two blobs, the first one being the prediction and the second one being the label provided by the data layer (remember it?). It does not produce any outputs - all it does is to compute the loss function value, report it when backpropagation starts, and initiates the gradient with respect to ip2. This is where all magic starts.
Additional Notes: Writing Layer Rules
注:这是在层内增加的注释信息。默认情况下,每层是train和test情况下均能使用。除非一些需要特别指定的情况。比如测试集。
Layer definitions can include rules for whether and when they are included in the network definition, like the one below:
layer {
// ...layer definition...
include: { phase: TRAIN }
}
This is a rule, which controls layer inclusion in the network, based on current network’s state. You can refer to $CAFFE_ROOT/src/caffe/proto/caffe.proto
for more information about layer rules and model schema.
In the above example, this layer will be included only in TRAIN phase. If we change TRAIN with TEST, then this layer will be used only in test phase. By default, that is without layer rules, a layer is always included in the network. Thus, lenet_train_test.prototxt has two DATA layers defined (with different batch_size), one for the training phase and one for the testing phase. Also, there is an Accuracy layer which is included only in TEST phase for reporting the model accuracy every 100 iteration, as defined in lenet_solver.prototxt
.
Define the MNIST Solver
Check out the comments explaining each line in the prototxt $CAFFE_ROOT/examples/mnist/lenet_solver.prototxt
# The train/test net protocol buffer definition
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
solver_mode: GPU
Training and Testing the Model
报错 ./build/tools/caffe: not found
出现该问题的原因是在当前目录下找不到该文件,需要在caffe目录下执行才不会报错
解决方法:
#第一步是切换回caffe目录
cd ~/home/tools/caffe
#第二步是运行指定目录下的文件
./examples/mnist/train_lenet.sh
Question
(1)生成的模型在哪里?
(2)生成的模型如何部署在服务器上。