我们把需要做的数据放在Caffe_DataMaker 文件夹中,并且将数据按比例分为测试集文件夹train和训练集文件夹val;在测试集文件夹中按照类别放在相应的文件夹中,文件夹命名为0,1,…..(从0开始命名),在测试集文件夹中进行同样的操作,使用的python代码为:
# -*- coding: utf-8 -*-
"""
使用python制作用于caffe分类的lmdb数据源
"""
import os
import io
def caffe_input_txt_maker(data_folder,outfile_name, phase = 'train'):
# 计数文件个数
file_cnt = 0
class_cnt = 0
with io.open(outfile_name, 'wb+') as fobj:
for folder_name in os.listdir(data_folder):
label = folder_name.split('__')[0]
folder_path = os.path.join(data_folder, folder_name)
class_cnt += 1
for file_name in os.listdir(folder_path):
file_cnt += 1 # 将文件夹名称也添加入内
if phase == 'train' :
file_path = folder_name + '/' + file_name
if phase == 'test' :
file_path = file_name
fobj.writelines( file_path +" "+str(label)+'\n')
file_dir, base_name = os.path.split(outfile_name)
file_name, ext = os.path.splitext(base_name)
#new_outfile_name = file_dir + '/' + file_name + '_%d_%d' % (class_cnt, file_cnt) + ext
#if os.path.exists(new_outfile_name): os.remove(new_outfile_name)
#os.rename(outfile_name, new_outfile_name)
print ('Done')
if __name__ == "__main__":
caffe_input_txt_maker(data_folder = '/home/pcb/caffe/examples/Caffe_DataMaker/train',
outfile_name = "/home/pcb/caffe/examples/Caffe_DataMaker/train.txt", phase = 'train')
caffe_input_txt_maker(data_folder = '/home/pcb/caffe/examples/Caffe_DataMaker/val',
outfile_name = "/home/pcb/caffe/examples/Caffe_DataMaker/val.txt", phase = 'test')
data_folder是存放训练集的文件夹,outfile_name是输出的train.txt,其内容如下:
val.txt的内容如下:
注意:待生成val.txt后,训练集val文件夹下的图片全部从类别文件夹中移动到val文件夹下,因为val.txt中图片路径前面没有子目录。
接下来就要写create_imagenet.sh里面的东西了,如下所示:
这#!/usr/bin/env sh
# Create the imagenet lmdb inputs
# N.B. set the path to the imagenet train + val data dirs
set -e
EXAMPLE=/home/pcb/caffe/examples/Caffe_DataMaker #放置相应的文件夹
DATA=/home/pcb/caffe/examples/Caffe_DataMaker #放置相应的文件夹
TOOLS=/home/pcb/caffe/build/tools
TRAIN_DATA_ROOT=/home/pcb/caffe/examples/Caffe_DataMaker/train/ #存放用于训练的数据
VAL_DATA_ROOT=/home/pcb/caffe/examples/Caffe_DataMaker/val/ #存放用于测试的数据
# Set RESIZE=true to resize the images to 256x256. Leave as false if images have
# already been resized using another tool.
RESIZE=false
if $RESIZE; then
RESIZE_HEIGHT=256
RESIZE_WIDTH=256
else
RESIZE_HEIGHT=0
RESIZE_WIDTH=0
fi
if [ ! -d "$TRAIN_DATA_ROOT" ]; then
echo "Error: TRAIN_DATA_ROOT is not a path to a directory: $TRAIN_DATA_ROOT"
echo "Set the TRAIN_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet training data is stored."
exit 1
fi
if [ ! -d "$VAL_DATA_ROOT" ]; then
echo "Error: VAL_DATA_ROOT is not a path to a directory: $VAL_DATA_ROOT"
echo "Set the VAL_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet validation data is stored."
exit 1
fi
echo "Creating train lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
--resize_height=$RESIZE_HEIGHT \
--resize_width=$RESIZE_WIDTH \
--shuffle \
$TRAIN_DATA_ROOT \
$DATA/train.txt \
$EXAMPLE/train_lmdb
echo "Creating val lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
--resize_height=$RESIZE_HEIGHT \
--resize_width=$RESIZE_WIDTH \
--shuffle \
$VAL_DATA_ROOT \
$DATA/val.txt \
$EXAMPLE/val_lmdb
echo "Done."里写代码片
写完之后在终端的目录调整到Caffe_DataMaker下,使用命令sh create_imagenet.sh就会在Caffe_DataMaker文件夹下生成train_lmdb和val_lmdb,这样就大功告成了!