cifar-10数据集读取

最新推荐文章于 2024-08-01 14:05:32 发布

chaixl_Hello_World

最新推荐文章于 2024-08-01 14:05:32 发布

阅读量8.4k

点赞数 1

本文链接：https://blog.csdn.net/chaianlove/article/details/84589113

版权

数据集官网：https://www.cs.toronto.edu/~kriz/cifar.html

python2读取方式：

def unpickle(file):

import cPickle

with open(file, 'rb') as fo:

dict = cPickle.load(fo)

return dict

python3读取方式：

def unpickle(file):

import pickle

with open(file, 'rb') as fo:

dict = pickle.load(fo, encoding='bytes')

return dict

Loaded in this way, each of the batch files contains a dictionary with the following elements:

data -- a 10000x3072 numpy array of uint8s. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image.

labels -- a list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.

The dataset contains another file, called batches.meta. It too contains a Python dictionary object. It has the following entries:

label_names -- a 10-element list which gives meaningful names to the numeric labels in the labels array described above. For example, label_names[0] == "airplane", label_names[1] == "automobile", etc.

按这种方式加载的话，每一个.batch文件包含一个含有一下元素的字典：

data：一个数据格式为uint8的10000*3072维的数组。数组的每一行储存一幅32*32colour的彩色图像。前1024个条目包含红色通道的值，接下来的1024是绿色通道，然后是蓝色通道。图像以行为主要顺序进行存储，所以这个数组的前32个条目是第一幅图像的红色通道的前32个值。

labels:一个含有10000个数字的列表，这些数字的范围在0-9。每个位置的数字对应data中相应图像的类别。

batches：它也包含一个python字典对象。它包含如下条目：

label_names:含有十个元素的列表，给了数字化标号有意义的名称。例如：label_names[0] == "airplane", label_names[1] == "automobile"。

对以上数据的组织形式不是太明白，所以做了一下测试。