使用python进行Caffe数据源lmdb的制作

我们把需要做的数据放在Caffe_DataMaker 文件夹中,并且将数据按比例分为测试集文件夹train和训练集文件夹val;在测试集文件夹中按照类别放在相应的文件夹中,文件夹命名为0,1,…..(从0开始命名),在测试集文件夹中进行同样的操作,使用的python代码为:

# -*- coding: utf-8 -*-
"""
使用python制作用于caffe分类的lmdb数据源
"""
import os
import io
def caffe_input_txt_maker(data_folder,outfile_name, phase = 'train'):
    # 计数文件个数
    file_cnt = 0
    class_cnt = 0
    with io.open(outfile_name, 'wb+') as fobj:
        for folder_name in os.listdir(data_folder):
            label = folder_name.split('__')[0]
            folder_path = os.path.join(data_folder, folder_name)
            class_cnt += 1
            for file_name in os.listdir(folder_path):
                file_cnt += 1  # 将文件夹名称也添加入内
                if phase == 'train' :
                    file_path = folder_name + '/' + file_name
                if phase == 'test' :
                    file_path = file_name
                fobj.writelines( file_path +" "+str(label)+'\n')

    file_dir, base_name = os.path.split(outfile_name)
    file_name, ext = os.path.splitext(base_name)

    #new_outfile_name = file_dir + '/' + file_name + '_%d_%d' % (class_cnt, file_cnt) + ext
    #if os.path.exists(new_outfile_name): os.remove(new_outfile_name)
    #os.rename(outfile_name, new_outfile_name)
    print ('Done')

if __name__ == "__main__":
    caffe_input_txt_maker(data_folder = '/home/pcb/caffe/examples/Caffe_DataMaker/train',
                          outfile_name = "/home/pcb/caffe/examples/Caffe_DataMaker/train.txt", phase = 'train')
    caffe_input_txt_maker(data_folder = '/home/pcb/caffe/examples/Caffe_DataMaker/val',
                          outfile_name = "/home/pcb/caffe/examples/Caffe_DataMaker/val.txt", phase = 'test')

data_folder是存放训练集的文件夹,outfile_name是输出的train.txt,其内容如下:
这里写图片描述
val.txt的内容如下:
这里写图片描述
注意:待生成val.txt后,训练集val文件夹下的图片全部从类别文件夹中移动到val文件夹下,因为val.txt中图片路径前面没有子目录。
接下来就要写create_imagenet.sh里面的东西了,如下所示:

#!/usr/bin/env sh
# Create the imagenet lmdb inputs
# N.B. set the path to the imagenet train + val data dirs
set -e

EXAMPLE=/home/pcb/caffe/examples/Caffe_DataMaker  #放置相应的文件夹
DATA=/home/pcb/caffe/examples/Caffe_DataMaker     #放置相应的文件夹
TOOLS=/home/pcb/caffe/build/tools

TRAIN_DATA_ROOT=/home/pcb/caffe/examples/Caffe_DataMaker/train/  #存放用于训练的数据
VAL_DATA_ROOT=/home/pcb/caffe/examples/Caffe_DataMaker/val/      #存放用于测试的数据


# Set RESIZE=true to resize the images to 256x256. Leave as false if images have
# already been resized using another tool.
RESIZE=false
if $RESIZE; then
  RESIZE_HEIGHT=256
  RESIZE_WIDTH=256
else
  RESIZE_HEIGHT=0
  RESIZE_WIDTH=0
fi

if [ ! -d "$TRAIN_DATA_ROOT" ]; then
  echo "Error: TRAIN_DATA_ROOT is not a path to a directory: $TRAIN_DATA_ROOT"
  echo "Set the TRAIN_DATA_ROOT variable in create_imagenet.sh to the path" \
       "where the ImageNet training data is stored."
  exit 1
fi

if [ ! -d "$VAL_DATA_ROOT" ]; then
  echo "Error: VAL_DATA_ROOT is not a path to a directory: $VAL_DATA_ROOT"
  echo "Set the VAL_DATA_ROOT variable in create_imagenet.sh to the path" \
       "where the ImageNet validation data is stored."
  exit 1
fi

echo "Creating train lmdb..."

GLOG_logtostderr=1 $TOOLS/convert_imageset \
    --resize_height=$RESIZE_HEIGHT \
    --resize_width=$RESIZE_WIDTH \
    --shuffle \
    $TRAIN_DATA_ROOT \
    $DATA/train.txt \
    $EXAMPLE/train_lmdb

echo "Creating val lmdb..."

GLOG_logtostderr=1 $TOOLS/convert_imageset \
    --resize_height=$RESIZE_HEIGHT \
    --resize_width=$RESIZE_WIDTH \
    --shuffle \
    $VAL_DATA_ROOT \
    $DATA/val.txt \
    $EXAMPLE/val_lmdb

echo "Done."里写代码片

写完之后在终端的目录调整到Caffe_DataMaker下,使用命令sh create_imagenet.sh就会在Caffe_DataMaker文件夹下生成train_lmdb和val_lmdb,这样就大功告成了!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值