【caffe源码研究】第二章：实战篇(1) : 字符识别项目

最新推荐文章于 2018-04-22 22:03:43 发布

FrankJingle

最新推荐文章于 2018-04-22 22:03:43 发布

阅读量2.5k

点赞数 2

分类专栏： Machine Learning Deep Learning Caffe

本文链接：https://blog.csdn.net/fangjin_kl/article/details/53889190

版权

Deep Learning 同时被 3 个专栏收录

46 篇文章 1 订阅

订阅专栏

Caffe

33 篇文章 2 订阅

订阅专栏

Machine Learning

18 篇文章 2 订阅

订阅专栏

字符识别实战项目，前面章节已经都有涉及，这里做一个总结，看看一个完整的项目从数据制作、训练
、识别、分析的全过程。

一、数据集的制作

这里就使用lmdb格式，其他格式参考【caffe源码研究】第二章：使用篇(1): 制作数据集中的方法即可。

(1). 现有的数据集

我们的数据如下，每个traindata和testdata里面都有10个文件夹，命名为0-9，分别对应数字0-9. 下方是目录结构部分显示。

/home/fangjin/CAFFE/DATA
│  list.txt
│  
├─testData
│  ├─0
│  │      0-3-033OJJ7KZA.jpg 
│  │      0-5-CV7UTRECKB.jpg
│  │      
│  ├─1
│  │      1-3-01VZAOCIPC.jpg
│  │      1-3-09GBY203S5.jpg
│  ...
│       
└─trainData
    │  train.txt
    │  
    ├─0
    │      0-3-00DUJ0RVR9.jpg
    │      0-3-0AWLKVU51V.jpg
    │      
    ├─1
    │      1-7-E3Y0H6X1TR.jpg
    │      1-7-E5DLYZ289T.jpg
    ...

(2). 数据txt文件

先制作一个txt文件，包含数据的路径和标签，格式如下

trainData/0/0-3-00DUJ0RVR9.jpg 0
trainData/0/0-3-0AWLKVU51V.jpg 0
trainData/0/0-3-0DS9V90EJ6.jpg 0
trainData/0/0-3-0DUO09DFPD.jpg 0
trainData/0/0-3-0F1UTHN9O9.jpg 0
trainData/0/0-3-0KBIEMMCYC.jpg 0
trainData/0/0-3-0QPBZLGTF7.jpg 0
trainData/0/0-3-0R5LZ0FG2H.jpg 0
trainData/0/0-3-0T1RBO2IMH.jpg 0
trainData/0/0-3-0TTN1FAFZY.jpg 0

写个简单的python脚本

import os

rootPath = './'

f = open(rootPath+'train.txt','w')
for i in range(10):
    path = 'trainData/' + str(i)
    lists = os.listdir(rootPath + path)
    for listfile in lists:
        if listfile != 'Thumbs.db':
            f.writelines([path,'/',listfile,' ',str(i),'\n'])
f.close()

f = open(rootPath+'test.txt','w')
for i in range(10):
    path = 'testData/' + str(i)                                                                                                              
    lists = os.listdir(rootPath + path)
    for listfile in lists:
        if listfile != 'Thumbs.db':
            f.writelines([path,'/',listfile,' ',str(i),'\n'])
f.close()

即可以生成train.txt和test.txt。

(3). 数据转换

使用接口convert_imageset 进行转换。

shell脚本如下

TOOLS=/home/users/fangjin/caffe/build/tools                                             
ESIZE_HEIGHT=32
RESIZE_WIDTH=32
TRAIN_DATA_ROOT=/home/users/fangjin/test/number_data/

echo "Creating train lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
   --resize_height=32 \
   --resize_width=32 \
   --shuffle \
   $TRAIN_DATA_ROOT \
   train.txt \
   number_train_lmdb

echo "Creating test lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
   --resize_height=32 \
   --resize_width=32 \
   --shuffle \
   $TRAIN_DATA_ROOT \
   test.txt \
   number_test_lmdb  #输出

参数说明

resize_height ，可选参数，resize后的高。
resize_width ，可选参数，resize后的宽。但是注意，resize_height和resize_width不能仅设置一个。
shuffle，是可选参数，混排。
$TRAIN_DATA_ROOT这个参数指的是图片生成txt文件中的相对主目录。也就是说$TRAIN_DATA_ROOT+ txt中路径才是完整路径。
db_backend ，LevelDB的格式只需要将convert_imageset 后面接参数db_backend= leveldb即可。

如果报错一般都是路径错误，每次重新运行都需要先删除原来的lmdb数据。

脚本执行完会生成两个文件夹，存储的是lmdb数据。

drwxr--r--   2 fangjin fangjin      4096 Dec 21 12:01 number_test_lmdb
drwxr--r--   2 fangjin fangjin      4096 Dec 21 12:01 number_train_lmdb

(4). 计算均值

在caffe可以先计算均值，然后对所有图像去除均值，现在展示如何使用caffe自己编译的工具计算均值。
使用说明：

compute_image_mean: Compute the mean_image of a set of images given by a leveldb/lmdb
Usage:
    compute_image_mean [FLAGS] INPUT_DB [OUTPUT_FILE]


  Flags from tools/compute_image_mean.cpp:
    -backend (The backend {leveldb, lmdb} containing the images) type: string
      default: "lmdb"

写脚本compute_image_mean.sh

#!/usr/bin/env sh
set -e

CAFFETOOL=/home/users/fangjin/caffe/build/tools

${CAFFETOOL}/compute_image_mean number_train_lmdb image_mean.binaryproto

就会生成均值文件image_mean.binaryproto。
使用方式是将Data层的transform_param添加一个mean_file,训练集和验证集都要添加

transform_param {
    scale: 0.00390625
    mean_file: "image_mean.binaryproto"
  }

然后同样的方式训练。

二、训练过程

(1). 写训练脚本`train_lenet.sh`

#!/usr/bin/env sh
set -e
CAFFETOOL=/home/users/fangjin/caffe/build/tools

GLOG_logtostderr=1 $CAFFETOOL/caffe train \
   --solver=lenet_solver.prototxt 2>&1 | tee log_1st.txt

(2). 写网络配置文件`lenet_solver.prototxt`

# The train/test net protocol buffer definition
net: "lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "lenet"
# solver mode: CPU or GPU
solver_mode: CPU

(3). 写网络结构文件`lenet_train_test.prototxt`

name: "LeNet"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
    mean_file: "image_mean.binaryproto"
  }
  data_param {
    source: "number_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
    mean_file: "image_mean.binaryproto"
  }
  data_param {
    source: "number_test_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

(4). 画网络结构图

将lenet_train_test.prototxt内容贴到 http://ethereon.github.io/netscope/#/editor 中可以可视化看到网络结构，如下图

这里写图片描述

也可以使用python提供的接口来实现

python /home/users/fangjin/caffe/python/draw_net.py lenet_train_test.prototxt lenet_train_test_net.png

这里写图片描述

(5). 训练

执行shell脚本就可以了

sh train_lenet.sh

会生成最终的model文件lenet_iter_10000.caffemodel和状态文件lenet_iter_10000.solverstate,由于配置文件中配置的每5000次保存一次结果，所以也会生成lenet_iter_5000.caffemodel和状态文件lenet_iter_5000.solverstate

三、测试过程

可以用测试集测试一下准确率。可以直接使用上面的测试集，也可以自己制作一个新的测试集，把lenet_train_test.prototxt文件中的TEST对应的数据换成新的测试集。

#!/usr/bin/env sh
set -e
CAFFETOOL=/home/users/fangjin/caffe/build/tools

GLOG_logtostderr=1 $CAFFETOOL/caffe test \
    --model=lenet_train_test.prototxt 
    --weights=lenet_iter_10000.caffemodel 
    --iterations=100

输出测试结果

I1222 10:31:07.233053 27436 caffe.cpp:308] Batch 45, accuracy = 0.97
I1222 10:31:07.233103 27436 caffe.cpp:308] Batch 45, loss = 0.165972
I1222 10:31:07.301012 27436 caffe.cpp:308] Batch 46, accuracy = 0.98
I1222 10:31:07.301057 27436 caffe.cpp:308] Batch 46, loss = 0.126905
I1222 10:31:07.369249 27436 caffe.cpp:308] Batch 47, accuracy = 0.97
I1222 10:31:07.369297 27436 caffe.cpp:308] Batch 47, loss = 0.161663
I1222 10:31:07.437602 27436 caffe.cpp:308] Batch 48, accuracy = 0.99
I1222 10:31:07.437649 27436 caffe.cpp:308] Batch 48, loss = 0.0438317
I1222 10:31:07.505509 27436 caffe.cpp:308] Batch 49, accuracy = 0.98
I1222 10:31:07.505556 27436 caffe.cpp:308] Batch 49, loss = 0.0786786
I1222 10:31:07.505565 27436 caffe.cpp:313] Loss: 0.134281
I1222 10:31:07.505623 27436 caffe.cpp:325] accuracy = 0.9752
I1222 10:31:07.505653 27436 caffe.cpp:325] loss = 0.134281 (* 1 = 0.134281 loss)

四、预测过程

预测过程指的是给一张新的图片，不知道分类的情况下预测分类。
这里分C++接口和python接口介绍。

先注意，预测阶段的网络和训练阶段的网络结构有细微差异，主要体现在输入和输出上，所以先写预测的网络结构lenet_deploy.prototxt

name: "LeNet"
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 64 dim: 3 dim: 32 dim: 32 } }      
  transform_param {
    scale: 0.00390625
    mean_file: "image_mean.binaryproto"
  }                                                                            
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "ip2"
  top: "prob"
}

注意看data层的区别。
在输出阶段，把Accuracy和SoftmaxWithLoss两层去掉了，改成Softmax层。

(1). c++接口

见【caffe源码研究】第二章：使用篇(3) : C++接口

(2). python接口

见【caffe源码研究】第二章：使用篇(4) : python接口

五、分析过程

见【caffe源码研究】第二章：使用篇(5) : 模型可视化
见【caffe源码研究】第二章：使用篇(6) : 训练过程分析工具

FrankJingle

关注

2
点赞
踩
3

收藏

觉得还不错? 一键收藏
2
评论
【caffe源码研究】第二章：实战篇(1) : 字符识别项目

字符识别实战项目，前面章节已经都有涉及，这里做一个总结，看看一个完整的项目从数据制作、训练、识别、分析的全过程。一、数据集的制作这里就使用lmdb格式，其他格式参考【caffe源码研究】第二章：使用篇(1): 制作数据集中的方法即可。(1). 现有的数据集我们的数据如下，每个traindata和testdata里面都有10个文件夹，命名为0-9，分别对应数字0-9. 下方是目录结构部分显示。
复制链接

扫一扫