由于项目需要进行机器人运行环境的识别,搜寻了一番发现caffe实现较为简单,所以进行了初步尝试,为了避免自己忘掉主要步骤在这里记录一下。使用的系统是ubuntu 16.04 caffe opencv 3.4 cpu only
1.caffea编译
caffe在ubuntu下的编译可参考https://www.cnblogs.com/darkknightzh/p/5797526.html 和https://blog.csdn.net/u013832707/article/details/53159071一定要安装这篇文章首先提到的那几个依赖库。这里很奇怪,我在使用以下代码进行安装时总出现很多错误,最后使用cmake进行了安装,建立一个build文件夹,cmake .. make, make install最后竟然装好了 == 注意在执行以下这句生成mekafile时 如果和我一样仅仅使用的是CPU需要将Makefile.config.example 中的 CPU_ONLY := 1 取消注释,即:
# CPU-only switch (uncomment to build without GPU support).
CPU_ONLY := 1
cp Makefile.config.example Makefile.config
cp Makefile.config.example Makefile.config
# Adjust Makefile.config (for example, if using Anaconda Python, or if cuDNN is desired)
make all
make test
make runtest
2.训练
2.1准备数据以及网络
在安装好caffe之后,保证能实现手写识别的demo 具体可以参考/home/***/CAFFE_ROOT/caffe/examples/mnist 目录下的readme.md进行实验,教程非常详细,运行该例程保证caffe可以正常使用。通过运行这个例子大概明白caffe时如何让进行工作的。首先在训练的时候需要准备:
(1) 数据 lmdb格式的;(2)train.txt, val.txt作为数据的目录 (3)train_val.prototxt网络模型 (4)solver.prototxt
我的工程目录结构如下:其中my_data的路径为:/home/×××/CAFFE_ROOT/caffe/data/my_data/
my_data:
build_lmdb
data_set
train
val
train.txt
val.txt
sovler.prototxt
制作数据标签的脚本如下:
#!/usr/bin/env sh
DATA_TRAIN_GROUND=/home/×××/CAFFE_ROOT/caffe/data/my_data/data_set/train/ground #训练的图片文件位置
DATA_TRAIN_STAIR=/home/×××/CAFFE_ROOT/caffe/data/my_data/data_set/train/stairs #训练的图片文件位置
DATA_VAL_STAIR=/home/×××/CAFFE_ROOT/caffe/data/my_data/data_set/val/stairs #测试的图片文件位置
DATA_VAL_GROUND=/home/×××/CAFFE_ROOT/caffe/data/my_data/data_set/val/ground #测试的图片文件位置
DATASAVE=/home/×××/CAFFE_ROOT/caffe/data/my_data/data_set #保存train.txt 和 cal.txt的路径
echo "Create train.txt..."
find $DATA_TRAIN_STAIR -name *.jpg | cut -d '/' -f8-12 | sed "s/$/ 0/">>$DATASAVE #/train.txt中的目录只截取到第8层级 即原本的路径是/home/×××/CAFFE_ROOT/caffe/data/my_data/data_set/train/stairs 截取之后存入train.txt中的只是 data_set/train/stairs/332.jpg 0其中0表示 标签 332.jpg时文件名,下文同理
find $DATA_VAL_STAIR -name *.jpg | cut -d '/' -f8-12 | sed "s/$/ 0/">>$DATASAVE/val.txt
find $DATA_TRAIN_GROUND -name *.jpg | cut -d '/' -f8-12 | sed "s/$/ 1/">>$DATASAVE/tmp1.txt
find $DATA_VAL_GROUND -name *.jpg | cut -d '/' -f8-12 | sed "s/$/ 1/">>$DATASAVE/tmp.txt
cat $DATASAVE/tmp1.txt>>$DATASAVE/train.txt
cat $DATASAVE/tmp.txt>>$DATASAVE/val.txt
#cat $DATASAVE/train.txt>>$DATASAVE/train.txt
rm -rf $DATASAVE/tmp.txt
rm -rf $DATASAVE/tmp1.txt
echo "create train.txt & evl.txt Done.."
其中几个目录是我电脑上的,这里使用了绝对路径,可能需要根据自己的电脑进行修改。
制作lmdb文件的脚本如下:
#!/usr/bin/env sh
# Create the imagenet lmdb inputs
# N.B. set the path to the imagenet train + val data dirs
set -e
EXAMPLE=/home/×××/CAFFE_ROOT/caffe/data/my_data/build_lmdb
DATA=/home/×××//CAFFE_ROOT/caffe/data/my_data/data_set #its the dirctory of train and val images file
TOOLS=/home//×××//CAFFE_ROOT/caffe/build/tool #编译好caffe后生成的转换工具的目录
TRAIN_DATA_ROOT=/home//×××//CAFFE_ROOT/caffe/data/my_data/ #这个目录只需要写道 trian.txt中的上一层即可,如在上文中最后triaan.txt的目录是data_set/train....\这两个路径加起来构成完整的数据路径!
VAL_DATA_ROOT=/home//×××//CAFFE_ROOT/caffe/data/my_data/ #its the dirctory + val.txt leads to the image with lables
# Set RESIZE=true to resize the images to 256x256. Leave as false if images have
# already been resized using another tool.
RESIZE=false
if $RESIZE; then
echo "RESIZE_HEIGHT=256 RESIZE_WIDTH=256."
RESIZE_HEIGHT=256
RESIZE_WIDTH=256
else
RESIZE_HEIGHT=0
RESIZE_WIDTH=0
fi
if [ ! -d "$TRAIN_DATA_ROOT" ]; then
echo "Error: TRAIN_DATA_ROOT is not a path to a directory: $TRAIN_DATA_ROOT"
echo "Set the TRAIN_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet training data is stored."
exit 1
fi
if [ ! -d "$VAL_DATA_ROOT" ]; then
echo "Error: VAL_DATA_ROOT is not a path to a directory: $VAL_DATA_ROOT"
echo "Set the VAL_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet validation data is stored."
exit 1
fi
echo "Creating train lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
--resize_height=$RESIZE_HEIGHT \
--resize_width=$RESIZE_WIDTH \
--shuffle \
$TRAIN_DATA_ROOT \
$DATA/train.txt \
$EXAMPLE/ilsvrc12_train_lmdb
echo "Creating val lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
--resize_height=$RESIZE_HEIGHT \
--resize_width=$RESIZE_WIDTH \
--shuffle \
$VAL_DATA_ROOT \
$DATA/val.txt \
$EXAMPLE/ilsvrc12_val_lmdb
echo "Done."
制作均值文件脚本:
#!/usr/bin/env sh
# Compute the mean image from the imagenet training lmdb
# N.B. this is available in data/ilsvrc12
EXAMPLE=/home/***/CAFFE_ROOT/caffe/data/my_data/build_lmdb #lmdb directory
DATA=/home/***/CAFFE_ROOT/caffe/data/my_data/build_lmdb #target dirextory
TOOLS=/home/***/CAFFE_ROOT/caffe/build/tools
$TOOLS/compute_image_mean $EXAMPLE/ilsvrc12_train_lmdb \
$DATA/imagenet_mean.binaryproto
echo "Done."
caffenet的网络结构 sovler.prototxt如下:
net: "/home/***/CAFFE_ROOT/caffe/data/my_data/train_val.prototxt"
test_iter: 20
test_interval: 500
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 400
display: 20
max_iter: 15000
momentum: 0.9
weight_decay: 0.0005
snapshot: 1000
snapshot_prefix: "/home//***//CAFFE_ROOT/caffe/data/my_data/caffenet_train"
solver_mode: CPU
#test_iter:在测试的时候,需要迭代的次数,即test_iter* batchsize(测试集的)=测试集的大小,
#测试集batchsize可以在prototx文件里设置。
#test_interval:interval是区间的意思,该参数表示训练的时
#候,每迭代500次就进行一次测试。
#caffe在训练的过程是边训练边测试的。训练过程中每500次迭代(也就是160个训练样本参与了计
#算,batchsize为8),计算一次测试误差。计算一次测试误差就需要包含所有的测试图片(这里为200),
#这样可以认为在一个epoch里,训练集中的所有样本都遍历以一遍,但测试集的所有样本至少要遍历一次,
#至于具体要多少次,也许不是整数次,这就要看代码,大致了解下这个过程就可以了。
caffenet的网络结构 train_val.prototxt如下:
name: "CaffeNet"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 227
mean_file: "/home/***/CAFFE_ROOT/caffe/data/my_data/build_lmdb/imagenet_mean.binaryproto"
}
# mean pixel / channel-wise mean instead of mean image
# transform_param {
# crop_size: 227
# mean_value: 104
# mean_value: 117
# mean_value: 123
# mirror: true
# }
data_param {
source: "/home/***/CAFFE_ROOT/caffe/data/my_data/build_lmdb/ilsvrc12_train_lmdb"
batch_size: 8
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
crop_size: 227
mean_file: "/home/***/CAFFE_ROOT/caffe/data/my_data/build_lmdb/imagenet_mean.binaryproto"
}
# mean pixel / channel-wise mean instead of mean image
# transform_param {
# crop_size: 227
# mean_value: 104
# mean_value: 117
# mean_value: 123
# mirror: false
# }
data_param {
source: "/home/chao-zhang/CAFFE_ROOT/caffe/data/my_data/build_lmdb/ilsvrc12_val_lmdb"
batch_size: 50
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm2"
type: "LRN"
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "norm2"
top: "conv3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8"
bottom: "label"
top: "loss"
}
几个参数的简单说明:
test_iter:在测试的时候,需要迭代的次数,即test_iter* batchsize(测试集的)=测试集的大小,测试集batchsize可以在prototx文件里设置。
test_interval:interval是区间的意思,该参数表示训练的时候,每迭代500次就进行一次测试。
caffe在训练的过程是边训练边测试的。训练过程中每500次迭代(也就是32000个训练样本参与了计算,batchsize为64),计算一次测试误差。计算一次测试误差就需要包含所有的测试图片(这里为10000),这样可以认为在一个epoch里,训练集中的所有样本都遍历以一遍,但测试集的所有样本至少要遍历一次,至于具体要多少次,也许不是整数次,这就要看代码,大致了解下这个过程就可以了。
2.2 训练
使用如下脚本进行训练:
#!/usr/bin/env sh
cd /home/***/CAFFE_ROOT/caffe
./build/tools/caffe train --solver=/home/***/CAFFE_ROOT/caffe/data/my_data/solver.prototxt -iterations=10000
#执行该脚本进行训练
-iterations=10000迭代次数,默认为50 这里设置为10000 接下来就是漫长的等待,根据机器的差异时间会不同。在训练过程中发现loss在不断的降低,最后我的稳定在0.2 0.3附近,上网查后别人说主要时看训练过程中的accuracy最后我的大概稳定在0.8附近。训练的过程中依据之前在solver.prototxt中设置的test_interval的大小进行,比如我的是 test_interval: 500即运行500次进行一侧测试,给出测试accrency。
2.3 测试
在测试时调用 ./build/examples/cpp_classification/classification.bin 程序,是由之前caffe编译完成生成的二进制文件。其中的源码在/home/***/CAFFE_ROOT/caffe/examples/cpp_classification从源代码中我们可以发现,该函数需要传入的参数如下:
Classifier(const string& model_file,
const string& trained_file,
const string& mean_file,
const string& label_file);
其中第一个参数为训练完成后生成的模型文件,该模型文件需要依据训练网络进行修改,去掉不必要的部分,比如训练数据输入等我的测试模型如下:
name: "CaffeNet"
layer {
name: "data"
type: "Input"
top: "data"
input_param{shape:{dim:10 dim:3 dim:227 dim:227}}#这里要和训练的模型一致!
# dim:1 batchsize dim:1 number of colour channels - rgb
# dim:256 width dim:256 height
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm2"
type: "LRN"
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "norm2"
top: "conv3"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
inner_product_param {
num_output: 4096
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
inner_product_param {
num_output: 4096
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
inner_product_param {
num_output: 2
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "fc8"
top: "prob"
}
第二个参数为训练完成后生成的模型文件 caffenet_train_iter_1000.caffemodel 第三个参数为之前数据准备时生成的均值文件:imagenet_mean.binaryproto 第四个参数为要识别的图片文件:1.jpg 这里最好对图片进行处理 将尺寸设置为256×256 具体细节可参考:https://blog.csdn.net/zchang81/article/details/73088042比如我的训练脚本如下:
#!/usr/bin/env sh
cd /home/***/CAFFE_ROOT/caffe
./build/examples/cpp_classification/classification.bin
data/my_data/deploy.prototxt
data/my_data/caffenet_train_iter_2000.caffemodel data/my_data/build_lmdb/imagenet_mean.binaryproto
data/my_data/lable.txt data/my_data/1.jpg
#执行这个脚本进行单张图片测试
运行完该脚本后的结果如下:
***@***-ThinkPad-X1-Carbon-3rd:~/CAFFE_ROOT/caffe/data/my_data$ ./test_image.sh
---------- Prediction for data/my_data/1.jpg ----------
0.5195 - "stair 0"
0.4805 - "ground 1"
本文讲了如何使用caffenet使用自己的数据集进行图像识别。仅仅是傻瓜式操作。在之后的工作中需要修改classification的源码,结合摄像头进行视频实时识别。在下一篇文章中将会介绍如何使用已经训练好的模型进行识别移植。