Caffe ImageNet例程翻译

Brewing ImageNet
This guide is meant to get you ready to train your own model on your own data. If you just want an ImageNet-trained network, then note that since training takes a lot of energy and we hate global warming, we provide the CaffeNet model trained as described below in the model zoo.

这个手册目的是让你为模型准备好数据。 如果你想要一个Imagenet训练网络,然后因为训练会花费很多的能量,我们讨厌全球变暖,我们提供一个CaffeNet模型,如以下描述所说的。

Data Preparation
The guide specifies all paths and assumes all commands are executed from the root caffe directory.
By “ImageNet” we here mean the ILSVRC12 challenge, but you can easily train on the whole of ImageNet as well, just with more disk space, and a little longer training time.
We assume that you already have downloaded the ImageNet training data and validation data, and they are stored on your disk like:

这个手册区分所有的路径,且假设所有的命令,都是从caffe根目录里面执行。这里的ImageNet我们意味着 ILSVRC12挑战赛,但是你也能轻易训练整个ImageNet网络,只是需要更多的硬盘空间,和更多一点的训练时间。
我们假设您已经下载了ImageNet 训练数据库和验证数据,同时他们在你的磁盘上如下存储


You will first need to prepare some auxiliary data for training. This data can be downloaded by:



The training and validation input are described in train.txt and val.txt as text listing all the files and their labels. Note that we use a different indexing for labels than the ILSVRC devkit: we sort the synset names in their ASCII order, and then label them from 0 to 999. See synset_words.txt for the synset/name mapping.

训练和验证输入在train.txt和val.txt里面进行描述,包括文件和标签。 记住我们用不同的索引进行标签,而不是ILSVRC devkit: 我们将同义词名字按照ASCII顺序排列,然后从0-999进行标签。 参见 synset_words.txt

You may want to resize the images to 256x256 in advance. By default, we do not explicitly do this because in a cluster environment, one may benefit from resizing images in a parallel fashion, using mapreduce. For example, Yangqing used his lightweight mincepie package. If you prefer things to be simpler, you can also use shell commands, something like:

你可能想提前将256*256的图片进行resize,默认的,我们并不明确这样做,因为在集群环境中,可能受益于并行运算,如mapreduce。 例如,作者用到了他的轻量级的mincepie包。 如果你希望更简单,你能用到shell 命令,如:

for name in /path/to/imagenet/val/*.JPEG; do
convert -resize 256x256! namename

Take a look at examples/imagenet/ Set the paths to the train and val dirs as needed, and set “RESIZE=true” to resize all images to 256x256 if you haven’t resized the images in advance. Now simply create the leveldbs with examples/imagenet/ Note that examples/imagenet/ilsvrc12_train_leveldb and examples/imagenet/ilsvrc12_val_leveldb should not exist before this execution. It will be created by the script. GLOG_logtostderr=1 simply dumps more information for you to inspect, and you can safely ignore it.

看一下examples/imagenet/, 设置路径去训练和验证,设置“RESIZE=true”将所有的图片归一化到256*256,如果你不想提前进行归一化。 现在简单的创造用 examples/imagenet/ 创造 leveldbs。 执行这步操作之前,确保examples/imagenet/ilsvrc12_train_leveldb and examples/imagenet/ilsvrc12_val_leveldb 不存在。 leverldb将会被创建。 GLOG_logtostderr=1 能为你释放更多的log信息,或者你可以忽略它。

Compute Image Mean
The model requires us to subtract the image mean from each image, so we have to compute the mean. tools/compute_image_mean.cpp implements that - it is also a good example to familiarize yourself on how to manipulate the multiple components, such as protocol buffers, leveldbs, and logging, if you are not familiar with them. Anyway, the mean computation can be carried out as:
which will make data/ilsvrc12/imagenet_mean.binaryproto.

这个模型需要我们从每张图片里面进行平均,所以我们必须计算均值。 tools/compute_image_mean.cpp 实现这个,而诶是一个很好的例子,使你熟悉如何操控多组件,如protocol buffers,leveldbs,logging。 不管如何,取平均值的运算可以如下执行:


生成数据 ilsvrc12/imagenet_mean.binaryproto.

Model Definition
We are going to describe a reference implementation for the approach first proposed by Krizhevsky, Sutskever, and Hinton in their NIPS 2012 paper.
The network definition (models/bvlc_reference_caffenet/train_val.prototxt) follows the one in Krizhevsky et al. Note that if you deviated from file paths suggested in this guide, you’ll need to adjust the relevant paths in the .prototxt files.
If you look carefully at models/bvlc_reference_caffenet/train_val.prototxt, you will notice several include sections specifying either phase: TRAIN or phase: TEST. These sections allow us to define two closely related networks in one file: the network used for training and the network used for testing. These two networks are almost identical, sharing all layers except for those marked with include { phase: TRAIN } or include { phase: TEST }. In this case, only the input layers and one output layer are different.

我们将描述一个参考实现方法, 由Hinton在2012年提出
这个网络定义(models/bvlc_reference_caffenet/train_val.prototxt)是随着Krizhevsky提出的一种。 记住你如果偏离了文中所推荐的路径, 你将重新在.prototxt文件里进行调整。

如果你仔细的看models/bvlc_reference_caffenet/train_val.prototxt, 你将会注意到几个选择项: TRAIN 或者TEST。 这些选项可以允许我们在同一个文件里定义两个相关网络: 用于训练的网络和用于检测的网络。 这两个网络几乎完全一样, 分享所有的网络层,除了那些标志着 {phase:TRAIN}或者{phase:TEST}。 在这种情况下,只有输入层和输出层是不同的。

Input layer differences: The training network’s data input layer draws its data from examples/imagenet/ilsvrc12_train_leveldb and randomly mirrors the input image. The testing network’s data layer takes data from examples/imagenet/ilsvrc12_val_leveldb and does not perform random mirroring.
Output layer differences: Both networks output the softmax_loss layer, which in training is used to compute the loss function and to initialize the backpropagation, while in validation this loss is simply reported. The testing network also has a second output layer, accuracy, which is used to report the accuracy on the test set. In the process of training, the test network will occasionally be instantiated and tested on the test set, producing lines like Test score #0: xxx and Test score #1: xxx. In this case score 0 is the accuracy (which will start around 1/1000 = 0.001 for an untrained network) and score 1 is the loss (which will start around 7 for an untrained network).

训练网络的数据输入层从examples/imagenet/ilsvrc12_train_leveld 里面获取数据,且随机镜像输入图像。 而测试网络的数据从examples/imagenet/ilsvrc12_val_leveldb来,且不执行随机镜像

两个网络输出softmax层,在训练里面,是用于计算损失函数的,且是初始化反向传播的,而在测试中这个损失只是简单报告了。 在测试网络同样有一个第二输出层 accuracy, 在测试集报告测试精度的。 在训练中,这个测试网络将偶尔实例化,产生一些信息:Test score #0: xxx and Test score #1: xxx. 这里的分数0 是精度(未训练一般从0.001左右开始), 分数1是损失函数(未训练一般是从7左右开始)

We will also lay out a protocol buffer for running the solver. Let’s make a few plans:
We will run in batches of 256, and run a total of 450,000 iterations (about 90 epochs).
For every 1,000 iterations, we test the learned net on the validation data.
We set the initial learning rate to 0.01, and decrease it every 100,000 iterations (about 20 epochs).
Information will be displayed every 20 iterations.
The network will be trained with momentum 0.9 and a weight decay of 0.0005.
For every 10,000 iterations, we will take a snapshot of the current status.
Sound good? This is implemented in models/bvlc_reference_caffenet/solver.prototxt.

我们同样为运行中的解决方案提供数据格式, 如下:
* 我们每批256, 跑450000次迭代;
* 每1000次迭代, 我们在验证集上,测试这个学习网络;
* 我们设置初始的学习率为0.01, 每迭代10000次,减少它,约20 epochs, epochs是计算时根据输出误差返回调整神经元权值和阀值的次数;
* 信息每20次迭代就展现一次;
* 网络将以0.9的动量,0.00005的权值衰减进行训练;
* 每10000次迭代,我们将展示当前的状态;

听起来不错? 这些已经在models/bvlc_reference_caffenet/solver.prototxt中实现

Training ImageNet
Ready? Let’s train.

./build/tools/caffe train –solver=models/bvlc_reference_caffenet/solver.prototxt
Sit back and enjoy!

On a K40 machine, every 20 iterations take about 26.5 seconds to run (while a on a K20 this takes 36 seconds), so effectively about 5.2 ms per image for the full forward-backward pass. About 2 ms of this is on forward, and the rest is backward. If you are interested in dissecting the computation time, you can run

./build/tools/caffe time –model=models/bvlc_reference_caffenet/train_val.prototxt

准备好了吗? 让我们训练吧

./build/tools/caffe train –solver=models/bvlc_reference_caffenet/solver.prototxt

在K40的机器上,每20次迭代需要花费26.5秒去跑(K20需要36秒),所以每个图片包括反向传播是5.2毫秒, 约2ms花在前向上,剩下的是反向上。 如有你有兴趣剖析具体的时间, 你可以run :

./build/tools/caffe time –model=models/bvlc_reference_caffenet/train_val.prototxt

Resume Training?
We all experience times when the power goes out, or we feel like rewarding ourself a little by playing Battlefield (does anyone still remember Quake?). Since we are snapshotting intermediate results during training, we will be able to resume from snapshots. This can be done as easy as:

./build/tools/caffe train –solver=models/bvlc_reference_caffenet/solver.prototxt –snapshot=models/bvlc_reference_caffenet/caffenet_train_iter_10000.solverstate
where in the script caffenet_train_iter_10000.solverstate is the solver state snapshot that stores all necessary information to recover the exact solver state (including the parameters, momentum history, etc).

我们都经过电源突然断掉的事情, 或者我们想玩下游戏。 自从我们在训练过程中,可以看到中间结果,我们将可以从这些重启。 如下:

./build/tools/caffe train –solver=models/bvlc_reference_caffenet/solver.prototxt –snapshot=models/bvlc_reference_caffenet/caffenet_train_iter_10000.solverstate

在脚本中, caffenet_train_iter_10000.solverstate 是状态值,存储必要的信息以恢复解决方案的状态(包括参数,动量历史等)

Parting Words
Hope you liked this recipe! Many researchers have gone further since the ILSVRC 2012 challenge, changing the network architecture and/or fine-tuning the various parameters in the network to address new data and tasks. Caffe lets you explore different network choices more easily by simply writing different prototxt files - isn’t that exciting?

And since now you have a trained network, check out how to use it with the Python interface for classifying ImageNet.

希望你喜欢这个处方! 许多研发人员已经走得更远从ILSVRC 2012 挑战赛, 改变这个网络的结构,或者调优一些参数,新的数据或者任务。 Caffe允许您探索各种网络结构的选择,只需要撰写不同的prototxt 文件,难道不精彩吗?

从现在起,你有了一个训练网络,检查如何使用它用Python接口 for 区分ImageNet。(