深度学习实战 fashion-mnist数据集预处理技术分析

算法与编程之美

已于 2022-04-01 22:26:12 修改

阅读量1.5k

点赞数 2

文章标签：深度学习

于 2019-05-29 08:29:32 首次发布

《算法与编程之美》技术专栏荣获2020年腾讯云+社区“人气作者”优秀专栏，2021年荣获“CSDN博客专家”称号，全网累计阅读量突破100万人次。

本文链接：https://blog.csdn.net/gschen_cn/article/details/102795655

版权

keras的fashion-mnist数据集的源码为：

def load_data():

"""Loads the Fashion-MNIST dataset.

# Returns

Tuple of Numpy arrays: `(x_train, y_train), (x_test, y_test)`.

"""

dirname = os.path.join('datasets', 'fashion-mnist')

base = 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/'

files = ['train-labels-idx1-ubyte.gz', 'train-images-idx3-ubyte.gz',

't10k-labels-idx1-ubyte.gz', 't10k-images-idx3-ubyte.gz']

paths = []

for fname in files:

paths.append(get_file(fname,

origin=base + fname,

cache_subdir=dirname))

with gzip.open(paths[0], 'rb') as lbpath:

y_train = np.frombuffer(lbpath.read(), np.uint8, offset=8)

with gzip.open(paths[1], 'rb') as imgpath:

x_train = np.frombuffer(imgpath.read(), np.uint8,

offset=16).reshape(len(y_train), 28, 28)

with gzip.open(paths[2], 'rb') as lbpath:

y_test = np.frombuffer(lbpath.read(), np.uint8, offset=8)

with gzip.open(paths[3], 'rb') as imgpath:

x_test = np.frombuffer(imgpath.read(), np.uint8,

offset=16).reshape(len(y_test), 28, 28)

return (x_train, y_train), (x_test, y_test)

fashion-mnist数据集以四个gzip格式的方式存储在远程服务器上，利用keras的get_file()下载到本地的keras缓存目录。

然后利用gzip的open()打开文件，利用numpy的frombuffer方法直接加载numpy的数组。如果是图像数据的话，需要进行reshape操作。

此处，为什么加载图片数据的时候需要offset=16，标签数据的时候需要offset=8？

fashion-mnist图像数据集的预处理方式和mnist有很大的不同，四个gz文件分别存放了x_train, y_train, x_test, y_test四个部分，然后分别读取四个文件利用np.frombuffer()方式加载。这种处理方式相对mnist来说复杂一些。为什么会这样处理？

算法与编程之美

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

打赏作者

算法与编程之美 欢迎关注『算法与编程之美』

¥1 ¥2 ¥4 ¥6 ¥10 ¥20

扫码支付：¥1

获取中

扫码支付

您的余额不足，请更换扫码支付或充值

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。