最近开始学习Keras,尝试跑官方CIFAR10代码,碰到下载数据集过慢的问题。网上搜到的方法是把下载下来的数据集放到~/.keras/datasets/ 目录下。但我运行cifar10.load_data()还是出错,看了一下提示说明是cPickle不兼容Python3。你一官方代码竟然不让选择数据加载路径,而且还有兼容性问题。好吧还是自己写个加载数据集代码吧。
import numpy as np
import os
def load_batch(file):
import pickle
with open(file, 'rb') as fo:
d = pickle.load(fo, encoding='bytes')
d_decoded = {}
for k, v in d.items():
d_decoded[k.decode('utf8')] = v
d = d_decoded
data = d['data']
labels = d['labels']
data = data.reshape(data.shape[0], 3, 32, 32)
return data, labels
def load_data(path ='data/cifar-10-batches-py'):
"""Loads CIFAR10 dataset.
# Returns
Tuple of Numpy arrays: `(x_train, y_train), (x_test, y_test)`.
"""
from keras import backend as K
num_train_samples = 50000
x_train = np.empty((num_train_samples, 3, 32, 32), dtype='uint8')
y_train = np.empty((num_train_samples,), dtype='uint8')
for i in range(1, 6):
fpath = os.path.join(path, 'data_batch_' + str(i))
(x_train[(i - 1) * 10000: i * 10000, :, :, :],
y_train[(i - 1) * 10000: i * 10000]) = load_batch(fpath)
fpath = os.path.join(path, 'test_batch')
x_test, y_test = load_batch(fpath)
y_train = np.reshape(y_train, (len(y_train), 1))
y_test = np.reshape(y_test, (len(y_test), 1))
if K.image_data_format() == 'channels_last':
x_train = x_train.transpose(0, 2, 3, 1)
x_test = x_test.transpose(0, 2, 3, 1)
return (x_train, y_train), (x_test, y_test)
使用方法
把网上下载的cifar-10-python.tar.gz包解压,解压后文件名通常为cifar-10-batches-py。
调用方法
(x_train, y_train), (x_test, y_test) = load_data(path)
path填你的cifar-10-batches-py文件夹路径