深度学习框架Caffe图片分类教程

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_31258245/article/details/75093380

深度学习框架Caffe图片分类教程

使用Caffe进行图片分类大致分为数据集准备,格式转换为LMDB或者LEBELDB,定义网络模型文件,定义求解器文件设置训练参数,部署预测,下边详细说下这几个步骤。PS:训练集图片共1907张,其中200张作为训练阶段测试,另外用于训练。图片分为5类,分别为bus,car,person,cat,train。该教程所说的根目录为Caffe主目录。

  • 第一步:数据集准备
    在data文件夹下新建imagenet1907文件夹,这个新文件夹主要用于存放该实验的原始数据。在新文件下新建trainval目录分别用于存放训练时训练和测试的图片。另外在imagenet1907下新建train.txt、和val.txt,这两个文本文件的作用就是存放图片路径以及标签。格式如下:

train.txt

000201.jpg 1
000202.jpg 3
000203.jpg 4
000204.jpg 4
000205.jpg 4
000206.jpg 1
...

val.txt的格式和train.txt是一样的,写好后(当然用程序生成…)保存。
开头所说的图片分为5类,这里标签用了1-5的连续数字表示,因此需要一个map文件来标识数字对应的类,还是在这个目录下写一个map.txt文件,内容如下:

map.txt

0 bus
1 car
2 person
3 cat
4 train

这个文件在后边还有用~。

  • 第二步:格式转换
    examples文件夹下边同样新建imagenet1907文件夹,caffe已经为我们提供了格式转换的工具convert_imageset,该工具的cpp文件路径为caffe/examples/cpp_classification/classification.cpp,该工具的具体使用方法可以参照caffe提供的说明文件caffe/examples/cpp_classification/readme.md,我们通过编写脚本文件create_imagenet.sh来进行格式转换以及计算均值文件

caffe/examples/imagenet1907/create_imagenet.sh

#!/usr/bin/env sh
set -e

EXAMPLE=examples/imagenet1907
DATA=data/imagenet1907
TOOLS=build/tools

TRAIN_DATA_ROOT=data/imagenet1907/train/
VAL_DATA_ROOT=data/imagenet1907/val/

# Set RESIZE=true to resize the images to 256x256. Leave as false if images have
# already been resized using another tool.
RESIZE=true
if $RESIZE; then
  RESIZE_HEIGHT=128
  RESIZE_WIDTH=128
else
  RESIZE_HEIGHT=0
  RESIZE_WIDTH=0
fi

# Set ENCODE=true to encode the images as compressed JPEGs stored in the LMDB.
# Leave as false for uncompressed (raw) images.
ENCODE=true
if $ENCODE; then
  ENCODE_FLAG='--encoded=true'
  ENCODE_TYPE_FLAG='--encode_type=jpg'
else
  ENCODE_FLAG='--encoded=false'
  ENCODE_TYPE_FLAG=''
fi

if [ ! -d "$TRAIN_DATA_ROOT" ]; then
  echo "Error: TRAIN_DATA_ROOT is not a path to a directory: $TRAIN_DATA_ROOT"
  echo "Set the TRAIN_DATA_ROOT variable in create_imagenet.sh to the path" \
       "where the ImageNet training data is stored."
  exit 1
fi

if [ ! -d "$VAL_DATA_ROOT" ]; then
  echo "Error: VAL_DATA_ROOT is not a path to a directory: $VAL_DATA_ROOT"
  echo "Set the VAL_DATA_ROOT variable in create_imagenet.sh to the path" \
       "where the ImageNet validation data is stored."
  exit 1
fi

echo "Creating train lmdb..."

rm -rf $EXAMPLE/imagenet1907_train_lmdb
rm -rf $EXAMPLE/imagenet1907_val_lmdb

GLOG_logtostderr=1 $TOOLS/convert_imageset \
    --resize_height=$RESIZE_HEIGHT \
    --resize_width=$RESIZE_WIDTH \
    $ENCODE_FLAG \
    $ENCODE_TYPE_FLAG \
    --shuffle \
    $TRAIN_DATA_ROOT \
    $DATA/train.txt \
    $EXAMPLE/imagenet1907_train_lmdb

echo "Creating val lmdb..."

GLOG_logtostderr=1 $TOOLS/convert_imageset \
    --resize_height=$RESIZE_HEIGHT \
    --resize_width=$RESIZE_WIDTH \
    $ENCODE_FLAG \
    $ENCODE_TYPE_FLAG \
    --shuffle \
    $VAL_DATA_ROOT \
    $DATA/val.txt \
    $EXAMPLE/imagenet1907_val_lmdb

echo "Compute train mean..."

$TOOLS/compute_image_mean $EXAMPLE/imagenet1907_train_lmdb \
  $DATA/train_mean.binaryproto

echo "Compute val mean..."

$TOOLS/compute_image_mean $EXAMPLE/imagenet1907_val_lmdb \
  $DATA/val_mean.binaryproto

echo "Done."

在根目录下运行该脚本文件(可能需要权限),之后生成两个lmdb文件夹,两个均值文件。

sudo ./examples/imagenet1907/create_imagenet.sh
  • 第三步:定义网络模型文件
    网上有很多关于图片识别的网络模型,我们可以拿来修改一些内容就可以用了。主要修改的内容就是输入层的mean_file和source两个路径。也就是对应上一步生成的2个文件夹,2个文件。

caffe/examples/imagenet1907/train_val.prototxt

name: "ImageNet1907"
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    mean_file: "data/imagenet1907/train_mean.binaryproto"
    crop_size: 31
  }
  data_param {
    source: "examples/imagenet1907/imagenet1907_train_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    mean_file: "data/imagenet1907/val_mean.binaryproto"
    crop_size: 31
  }
  data_param {
    source: "examples/imagenet1907/imagenet1907_val_lmdb"
    batch_size: 50
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.0001
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "pool1"
  top: "pool1"
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 64
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3"
  top: "pool3"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool3"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 64
    weight_filler {
      type: "gaussian"
      std: 0.1
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 5
    weight_filler {
      type: "gaussian"
      std: 0.1
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

最后全连接层的num_output要修改成5,因为我们最后的分类结果有5个,所有全连接最后输出应该为5。之前就是没注意到这一点,结果模型训练出来之后测试出了问题,又得重新训练…

下面就是需要一个求解器的Solver文件,用来设置多少次迭代后进行一次测试以及最大迭代此处,设置模型文件前缀等内容。

caffe/examples/imagenet1907/solver.prototxt

net: "examples/imagenet1907/train_val.prototxt"
test_iter: 100
test_interval: 500
base_lr: 0.001
momentum: 0.9
weight_decay: 0.0005
lr_policy: "fixed"
display: 100
max_iter: 5000
snapshot_after_train: true
snapshot_prefix: "examples/imagenet1907/imagenet1907_train"
solver_mode: GPU
data_distribute_mode: MANUALLY
model_average_iter_interval: 1

其中net参数就是刚才定义的train_val.prototxt路径。
test_iter表示训练时进行多少次迭代进行一次测试
test_interval表示进行多少次迭代后会输出一次准确率
max_iter表示最大迭代次数
snapshot_prefix表示训练成功后模型文件的路径。路径中imagenet1907_train为生成的模型文件的前缀。
solver_mode表示训练使用CPU还是GPU

关于Solver暂时就说这么多。

  • 第四步:训练模型
    其实上边的一、二、三步都是需要做的准备工作,下边开始训练模型才是一个比较耗时的操作,取决于计算机的性能。
    在根目录下运行命令即可开始训练,运行没出错的话,就可以sit back and enjoy!了,比较漫长。。。
./build/tools/caffe train --solver=examples/imagenet1907/solver.prototxt $@

在训练结束之后会生成caffemodel类型文件,这个模型是用来进行图片的预测的。

  • 第五步:部署预测
    要预测图片我们也需要定义一个deploy.prototxt文件,该文件和Net文件内容基本相似,可以直接将上边train_val.prototxt文件内容拷贝过来,然后将训练和测试的输入层删除,并用新的输入层替换。

caffe/examples/imagenet1907/deploy.prototxt

name: "ImageNet1907"
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 10 dim: 3 dim: 31 dim: 31 } }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "pool1"
  top: "pool1"
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  convolution_param {
    num_output: 64
    pad: 2
    kernel_size: 5
    stride: 1
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3"
  top: "pool3"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool3"
  top: "ip1"
  inner_product_param {
    num_output: 64
  }
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  inner_product_param {
    num_output: 5
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "ip2"
  top: "prob"
}

完成了上面这些就可以拿来一张图片来进行预测了。
在根目录下运行命令

sudo ./build/examples/cpp_classification/classification.bin \
examples/imagenet1907/deploy.prototxt \
examples/imagenet1907/your_iter_5000.caffemodel \
data/imagenet1907/train_mean.binaryproto \
data/imagenet1907/map.txt \
examples/images/cat.jpg

classification.bin工具接受5个命令行参数
1.部署文件
2.模型文件
3.均值文件
4.map文件
5.图片文件

之后终端会输出如下内容

---------- Prediction for examples/images/cat.jpg ----------
0.6014 - "1 car"
0.1483 - "4 train"
0.0903 - "2 person"
0.0841 - "0 bus"
0.0760 - "3 cat"

作为这方面的新手,在做这个的过程中也是踩了不少坑,有问题的话,欢迎评论提问。

阅读更多
想对作者说点什么?

博主推荐

换一批

没有更多推荐了,返回首页