how to use cifar10 in python
the first step:download the cifar10 using the shell scripts
#!/usr/bin/env bash
if ! [ -d "cifar-10-batches-py" ]; then
wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
tar xvzf cifar-10-python.tar.gz
rm -f cifar-10-python.tar.gz
fi
in the first line, it means that this is a bash shell script
the second line represents that if there is no cifar-10-batches-py in the folder,then it will automatically download the batch file at toronto.
the third line, it stands for unfold the cifar10 at the same folder.
the forth line, it means that delete the compressed file right now
the last line, the scripts is finished.
Suppose you have created a folder named data, then if you cd /data then you will see the contents below.

if you cd into the cifar-10-py you will see the file below data_batch_1 …5 is the training data,and the test_batch is the testing data 
how to split the cifar10 into training data, testing data
in the training process of a model, using the training data, in the testing process, it will use the testing data. So it is necessary to split the data into training data and testing data. But to our joy, the cifar10 have already split the data into training data and testing data, so what you need to do is to just take it out.
#because there are five files as the training data in the folder as you can see above,so the nbbatch=5
def load_cifar10_2(nbbatch=5):
all_data = []#this is the traning data
all_labels = []#this is the trianing label
test_data=[]#this is the testing data
test_labels=[]#this is the testing label
########
#this section is for getting the training data
for i in range(nbbatch):
data = open("./data/cifar-10-batches-py/data_batch_%s" % (i + 1), 'rb')
#open files in a sequence, and the flag is 'rb' because this file is opened in a read-only and Binary mode.(all images should do like this)
dict = pickle.load(data, encoding='bytes')
#the pickle.load return a dict in a bytes mode
data = dict[b'data']
labels = np.asarray(dict[b'labels']).reshape((-1,1))
#it changes it to an array
all_data.append(data)
all_labels.append(labels)
########
data=open("./data/cifar-10-batches-py/test_batch",'rb')
dict=pickle.load(data,encoding='bytes')
data=dict[b'data']
labels = np.asarray(dict[b'labels']).reshape((-1,1))
test_data.append(data)
test_labels.append(labels)
all_data = np.concatenate(all_data, axis=0)
all_labels = np.concatenate(all_labels, axis=0)
#cat the data and labels
test_data=np.concatenate(test_data,axis=0)
test_labels=np.concatenate(test_labels,axis=0)
return (all_data, all_labels,test_data,test_labels)
how to change the data more convient
def cifar10_proper_array(data):
all_red = data[:,:1024].reshape(-1, 32, 32)
all_green = data[:,1024:2048].reshape(-1, 32, 32)
all_blue = data[:,2048:].reshape(-1, 32, 32)
return np.stack([all_red, all_green, all_blue], axis=1) / 255.0
the snippet above is for data normalization.
data, labels,test_data,test_label =load_cifar10_2()
labels = labels.reshape(-1)
test_label=test_label.reshape(-1)
data = cifar10_proper_array(data)
test_data=cifar10_proper_array(test_data)
the code above is the main function
本文详细介绍了如何使用Python和Shell脚本下载CIFAR-10数据集,包括数据集的解压、训练数据与测试数据的分离以及数据预处理的方法。CIFAR-10数据集已预先分为训练集和测试集,通过脚本可以轻松获取。
557

被折叠的 条评论
为什么被折叠?



