使用RecordIO创建数据集

RecordIO是一种顺序记录文件. MXNET推荐将图像文件打包在一起,并将图片存储为记录. 其优势是包括:

  • Storing images in a compact format--e.g., JPEG, for records--greatly reduces the size of the dataset on the disk.
  • Packing data together allows continuous reading on the disk.
  • RecordIO has a simple way to partition, simplifying distributed setting. We provide an example later.

 

Prerequisites

Download the data. You don't need to resize the images manually. You can use im2rec to resize them automatically. For details, see the "Extension: Using Multiple Labels for a Single Image," later in this topic.

Step 1. Make an Image List File

  • Note that the im2rec.py provides a param --list to generate the list for you, but im2rec.cc doesn't support it.

After you download the data, you need to make an image list file. The format is:

integer_image_index \t label_index \t path_to_image

Typically, the program takes the list of names of all of the images, shuffles them, then separates them into two lists: a training filename list and a testing filename list. Write the list in the right format. This is an example file:

95099  464.000000     n04467665_17283.JPEG
10025081        412.000000     ILSVRC2010_val_00025082.JPEG
74181   789.000000     n01915811_2739.JPEG
10035553        859.000000     ILSVRC2010_val_00035554.JPEG
10048727        929.000000     ILSVRC2010_val_00048728.JPEG
94028   924.000000     n01980166_4956.JPEG
1080682 650.000000     n11807979_571.JPEG
972457  633.000000     n07723039_1627.JPEG
7534    11.000000      n01630670_4486.JPEG
1191261 249.000000     n12407079_5106.JPEG

Step 2. Create the Binary File

To generate a binary image, use im2rec in the tool folder. im2rec takes the path of the image list fileyou generated, the root path of the images, and the output file path as input. This process usually takes several hours, so be patient.

Sample command:

./bin/im2rec image.lst image_root_dir output.bin resize=256

For more details, run ./bin/im2rec.

Extension: Multiple Labels for a Single Image

The im2rec tool and mx.io.ImageRecordIter have multi-label support for a single image. For example, if you have four labels for a single image, you can use the following procedure to use the RecordIO tools.

  1. Write the image list files as follows:
integer_image_index \t label_1 \t label_2 \t   label_3 \t label_4 \t path_to_image
  1. Run im2rec, adding a 'label_width=4' to the command argument, for example:
./bin/im2rec image.lst image_root_dir output.bin resize=256 label_width=4
  1. In the iterator generation code, set label_width=4 and path_imglist=<<The PATH TO YOUR image.lst>>, for example:

copy

dataiter = mx.io.ImageRecordIter(
  path_imgrec="data/cifar/train.rec",
  data_shape=(3,28,28),
  path_imglist="data/cifar/image.lst",
  label_width=4
)
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值