tensorflow2.0 datasets.shuffle(buffer_size).batch(batch_size)

本文详细介绍了如何使用TensorFlow对MNIST数据集进行预处理,包括从原始数据中选择部分样本,重塑数据维度,以及创建TensorFlow Dataset进行批处理。通过shuffle和batch操作,作者展示了如何确保数据在训练过程中的随机性和高效利用。
摘要由CSDN通过智能技术生成

一些个人理解,记录参考

// An highlighted block
import tensorflow as tf

(train_images, train_labels), (_, _) = tf.keras.datasets.mnist.load_data()
print('读入初始训练图像, train_images.shape:', train_images.shape)

train_images = train_images[0:5000, :, :]
print('取前5000个训练图像, train_images.shape:', train_images.shape)

train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')  # 变成4print('升为4维后,train_images.shape:', train_images.shape)

输出结果

读入初始训练图像,train_images.shape: (60000, 28, 28)
取前6000个训练图像,train_images.shape: (5000, 28, 28)
升为4维后,train_images.shape: (5000, 28, 28, 1)
BATCH_SIZE = 512
BUFFER_SIZE = 5000

print('==========')
//#datasets = tf.data.Dataset.from_tensor_slices(train_images, train_labels)
datasets = tf.data.Dataset.from_tensor_slices(train_images)
print('datasets:',datasets)

datasets = datasets.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
print('datasets:',datasets)

times = 1  //批数
total = 0  //图像数
for item in datasets:  //遍历训练数据,相当于一个epoch
    print(f'=======当前批数:{times}========')
    print(item.shape)

    batch_count = item.shape[0]  //# batch_size设置为512,但是不一定能整除,实际最后一个batch达不到512
    total += batch_count
    times += 1

print('扫过数据数量:', total)

输出结果:

datasets: <TensorSliceDataset shapes: (28, 28, 1), types: tf.float32>
datasets: <BatchDataset shapes: (None, 28, 28, 1), types: tf.float32>
=======当前批数:1========
(512, 28, 28, 1)
=======当前批数:2========
(512, 28, 28, 1)
=======当前批数:3========
(512, 28, 28, 1)
=======当前批数:4========
(512, 28, 28, 1)
=======当前批数:5========
(512, 28, 28, 1)
=======当前批数:6========
(512, 28, 28, 1)
=======当前批数:7========
(512, 28, 28, 1)
=======当前批数:8========
(512, 28, 28, 1)
=======当前批数:9========
(512, 28, 28, 1)
=======当前批数:10========
(392, 28, 28, 1)
扫过数据数量: 5000

Process finished with exit code 0

个人理解,datasets = datasets.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)这一步,先将BUFFER_SIZE范围内数据打乱,再按BATCHSIZE大小将训练数据按批分好,每一批的shape为:(batch_size, 28, 28, 1)。datasets数量仍为训练总数5000。for item in datasets:,item即为每一个batch的训练数据。

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值