caffe-SSD源码解析——生成数据列表及数据集

最新推荐文章于 2019-11-28 15:50:09 发布

MAUM

最新推荐文章于 2019-11-28 15:50:09 发布

阅读量374

点赞数

分类专栏：深度学习 caffe 文章标签： caffe-SSD 数据集制作

本文链接：https://blog.csdn.net/maum61/article/details/99606726

版权

caffe 同时被 2 个专栏收录

8 篇文章 0 订阅

订阅专栏

深度学习

7 篇文章 1 订阅

订阅专栏

与数据生成有关的.sh有以下两个：

caffe/data/VOC0712/create_list.sh

caffe/data/VOC0712/create_data.sh

按照github上作者的提示，顺序执行即可。

下面来详细研究一下源码，因为笔者没学过shell语言，python也是刚入门，顶多就会个C++.

#!/bin/bash

root_dir=$HOME/data/VOCdevkit/
sub_dir=ImageSets/Main
#  get current executed file's path  $CAFFE_ROOT/data/VOC0712
bash_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

# dataset is trainval or test
for dataset in trainval test
do
  dst_file=$bash_dir/$dataset.txt
  if [ -f $dst_file ]
  then
    rm -f $dst_file
  fi
  for name in VOC2007 VOC2012
  do
    # continue means skep the rest of current loop
    if [[ $dataset == "test" && $name == "VOC2012" ]]
    then
      continue
    fi
    echo "Create list for $name $dataset..."
    # find each of the xxx.txt file
    dataset_file=$root_dir/$name/$sub_dir/$dataset.txt

    img_file=$bash_dir/$dataset"_img.txt"
    # copy the file contents to $img_file
    cp $dataset_file $img_file
    # add /$name\/JPEGImages\ at the begining of each line
    sed -i "s/^/$name\/JPEGImages\//g" $img_file
    # add .jpg at the end of each line
    sed -i "s/$/.jpg/g" $img_file
    # create label file
    label_file=$bash_dir/$dataset"_label.txt"
    cp $dataset_file $label_file
    # add /$name\/Annotations\ at the begining of each line
    sed -i "s/^/$name\/Annotations\//g" $label_file
    # add .xml at the end of each line
    sed -i "s/$/.xml/g" $label_file
    # concate each line of the two files with a ' ' separated
    paste -d' ' $img_file $label_file >> $dst_file

    rm -f $label_file
    rm -f $img_file
  done

  # Generate image name and size infomation.
  if [ $dataset == "test" ]
  then      #                              [root folder   list file   output file]
    $bash_dir/../../build/tools/get_image_size $root_dir $dst_file $bash_dir/$dataset"_name_size.txt"
  fi

  # Shuffle trainval file.
  if [ $dataset == "trainval" ]
  then
    rand_file=$dst_file.random
    cat $dst_file | perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);' > $rand_file
    mv $rand_file $dst_file
  fi
done

create_list.sh最终生成 text.txt、trainval.txt和test_name_size.txt。脚本首先获得数据所在的根目录root_dir，子目录sub_dir和当前脚本目录bash_dir。。然后分别在 VOC2007和VOC2012两个数据集文件里寻找数据集提供的trainval 和test的txt文件，这里面存放了所有图片的名称(不包含后缀.jpg)。假如已经有之前生成过的文件，则rm删除。

if [[ $dataset == "test" && $name == "VOC2012" ]]
then
  continue
fi

这句话是因为VOC2012里面没有test.txt。

dataset_file=$root_dir/$name/$sub_dir/$dataset.txt

这句话是定位到数据集中的.txt列表文本，然后以该dataset_file为母版，分别创建图片列表和Annotations 列表，

# add /$name\/JPEGImages\ at the begining of each line
sed -i "s/^/$name\/JPEGImages\//g" $img_file
# add .jpg at the end of each line
sed -i "s/$/.jpg/g" $img_file

img_file是在每一行前面添加VOC2007或VOC2012/JPEGImages/,以表明路径，在每一行的末尾添加.jpg后缀。label_file与前者类似，只不过后缀变为.xml。然后将$img_file、$label_file合并，每一行中间插入一个空格：paste -d' ' $img_file $label_file >> $dst_file，注意$是表示该变量的内容。这里最终生成： caffe/data/VOC0712/test.txt和 caffe/data/VOC0712/trainval.txt

当是test数据集时，还要调用get_image_size.cpp去获取所有图片的尺寸。

$bash_dir/../../build/tools/get_image_size $root_dir $dst_file $bash_dir/$dataset"_name_size.txt"

参数分别是：[root folder list file output file]，最终生成： caffe/data/VOC0712/test_name_size.txt

对于trainval.txt还需要进行随机排列shuffle。以上是create_list.sh的解释。

create_data比较简洁：

#!/bin/bash
cur_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )
#root_dir=/home/nvidia/PycharmProjects/caffe_ssd/caffe
root_dir=$cur_dir/../../

cd $root_dir
redo=1
data_root_dir="$HOME/data/VOCdevkit"
dataset_name="VOC0712"
mapfile="$root_dir/data/$dataset_name/labelmap_voc.prototxt"
anno_type="detection"
db="lmdb"
min_dim=0
max_dim=0
width=0
height=0

extra_cmd="--encode-type=jpg --encoded"
if [ $redo ]
then
  extra_cmd="$extra_cmd --redo"
fi
for subset in test trainval
do
  python $root_dir/scripts/create_annoset.py --anno-type=$anno_type --label-map-file=$mapfile --min-dim=$min_dim --max-dim=$max_dim --resize-width=$width --resize-height=$height --check-label $extra_cmd $data_root_dir $root_dir/data/$dataset_name/$subset.txt $data_root_dir/$dataset_name/$db/$dataset_name"_"$subset"_"$db examples/$dataset_name
done

先获取当前文件目录，然后找到caffe-ssd的根目录，设置好数据存放的目录以及lmdb的目录等，找到VOC的类别描述文件：labelmap_voc.prototxt，调用 $root_dir/scripts/create_annoset.py生成数据。后面参数比较多，其中使用到的数据列表文件是刚才生成的$root_dir/data/$dataset_name/$subset.txt （for subset in test trainval，data_root_dir="$HOME/data/VOCdevkit" dataset_name="VOC0712"）。

这个参数是生成的文件（db="lmdb"，）：

$data_root_dir/$dataset_name/$db/$dataset_name"_"$subset"_"$db

以及在这里 examples/$dataset_name 生成link（没有声明具体路径，那就是在执行.sh的那个文件夹下）：

The directory to store the link of the database files.

MAUM

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
caffe-SSD源码解析——生成数据列表及数据集

与数据生成有关的.sh有以下两个：caffe/data/VOC0712/create_list.shcaffe/data/VOC0712/create_data.sh按照github上作者的提示，顺序执行即可。下面来详细研究一下源码，因为笔者没学过shell语言，python也是刚入门，顶多就会个C++.#!/bin/bashroot_dir=$HOME/data/VOC...
复制链接

扫一扫