tf.data.dataset 的一些方法总结

1.from_tensor_slices()

返回的是dataset。

该函数对dataset元素进行切片,切片针对的是第一维(1轴),输入张量的第一维必须具有相同大小,例如a=tf.constant([[1,2,3],[2,5],[8,9,6,3]]),此时的a不能进行切片,因为他的第一维元素个数不等。

# Slicing a 1D tensor produces scalar tensor elements.
dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
list(dataset.as_numpy_iterator())

     [1, 2, 3]

# Slicing a 2D tensor produces 1D tensor elements.
dataset = tf.data.Dataset.from_tensor_slices([[1, 2], [3, 4]])
list(dataset.as_numpy_iterator())
[array([1, 2]), array([3, 4])]

如果输入是元组的话,切片方式不太一样。

# Slicing a tuple of 1D tensors produces tuple elements containing
# scalar tensors.
dataset = tf.data.Dataset.from_tensor_slices(([1, 2,7], [3, 4,8], [5, 6,3]))
list(dataset.as_numpy_iterator())
[(1, 3, 5), (2, 4, 6), (7, 8, 3)]

 

 对字典切片

# Dictionary structure is also preserved.
dataset = tf.data.Dataset.from_tensor_slices({"a": [1, 2], "b": [3, 4]})
list(dataset.as_numpy_iterator()) == [{'a': 1, 'b': 3},
                                      {'a': 2, 'b': 4}]
True

 

 加上标签切片。

batched_features = tf.constant([[[1, 3], [2, 3]],
                                [[2, 1], [1, 2]],
                                [[3, 3], [3, 2]]], shape=(3, 2, 2))
batched_labels = tf.constant([['A', 'A'],
                              ['B', 'B'],
                              ['A', 'B']], shape=(3, 2, 1))
dataset = Dataset.from_tensor_slices((batched_features, batched_labels))
for element in dataset.as_numpy_iterator():
  print(element)
(array([[1, 3],
       [2, 3]]), array([[b'A'],
       [b'A']], dtype=object))
(array([[2, 1],
       [1, 2]]), array([[b'B'],
       [b'B']], dtype=object))
(array([[3, 3],
       [3, 2]]), array([[b'A'],
       [b'B']], dtype=object))

2.from_tensor()

创建一个Dataset包含给定张量的单个元素,from_tensors生成仅包含单个元素的数据集。要将输入张量切成多个元素,请改用from_tensor_slices。

dataset = tf.data.Dataset.from_tensors([[1, 2, 3],[4,5,6]])
list(dataset.as_numpy_iterator())
[array([[1, 2, 3],
        [4, 5, 6]])]
dataset = tf.data.Dataset.from_tensors(([1, 2, 3], 'A'))
list(dataset.as_numpy_iterator())
[(array([1, 2, 3]), b'A')]
# You can use `from_tensors` to produce a dataset which repeats
# the same example many times.
example = tf.constant([1,2,3])
dataset = tf.data.Dataset.from_tensors(example).repeat(2)
list(dataset.as_numpy_iterator())
[array([1, 2, 3]), array([1, 2, 3])]

 3.batch()

batch(
    batch_size, drop_remainder=False, num_parallel_calls=None, deterministic=None,
    name=None
)

 batch_size: 表示要在单个批次中组合的此数据集的连续元素的数量

 drop_remainder: 表示在最后一批少于元素的情况下是否应该删除它 ;默认行为是不删除较   小的批次。

4.shuffle()
 

shuffle(
    buffer_size, seed=None, reshuffle_each_iteration=None, name=None
)

   buffer_size 是指缓冲区的大小,shuffle功能是将数据集的每条数据元素顺序打乱,现将前      buffer_size条数据元素放入缓冲区内,从缓冲区随机拿走一个数据元素时,就得从没进入     缓冲区的数据中抽一个放入缓冲区内。

 seed可以保证程序关闭后 再打开后 进行shuffle得到的dataset不变。https://tensorflow.google.cn/api_docs/python/tf/random/set_seed

5.as_numpy_iterator()

返回一个迭代器,它将数据集的所有元素转换为 numpy,返回一个NumpyIterator对象,即迭代器,

这个迭代器的每个元素都是array类型。

dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
print(list(dataset.as_numpy_iterator()))
[1, 2, 3]
dataset = tf.data.Dataset.from_tensor_slices([[1, 2, 3],[3,6,9]])
print(list(dataset.as_numpy_iterator()))
[array([1, 2, 3]), array([3, 6, 9])]

6.map()与apply()

针对数据集映射使用函数。

7.padded_batch()

padded_batch(
    batch_size, padded_shapes=None, padding_values=None, drop_remainder=False,
    name=None
)

  https://blog.csdn.net/cqupt0901/article/details/108030260

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
boston_housing module: Boston housing price regression dataset. cifar10 module: CIFAR10 small images classification dataset. cifar100 module: CIFAR100 small images classification dataset. fashion_mnist module: Fashion-MNIST dataset. imdb module: IMDB sentiment classification dataset. mnist module: MNIST handwritten digits dataset. reuters module: Reuters topic classification dataset. import tensorflow as tf from tensorflow import keras fashion_mnist = keras.datasets.fashion_mnist (x_train, y_train), (x_test, y_test) = fashion_mnist.load_data() mnist = keras.datasets.mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() cifar100 = keras.datasets.cifar100 (x_train, y_train), (x_test, y_test) = cifar100.load_data() cifar10 = keras.datasets.cifar10 (x_train, y_train), (x_test, y_test) = cifar10.load_data() imdb = keras.datasets.imdb (x_train, y_train), (x_test, y_test) = imdb.load_data() # word_index is a dictionary mapping words to an integer index word_index = imdb.get_word_index() # We reverse it, mapping integer indices to words reverse_word_index = dict([(value, key) for (key, value) in word_index.items()]) # We decode the review; note that our indices were offset by 3 # because 0, 1 and 2 are reserved indices for "padding", "start of sequence", and "unknown". decoded_review = ' '.join([reverse_word_index.get(i - 3, '?') for i in x_train[0]]) print(decoded_review) boston_housing = keras.datasets.boston_housing (x_train, y_train), (x_test, y_test) = boston_housing.load_data() reuters= keras.datasets.reuters (x_train, y_train), (x_test, y_test) = reuters.load_data() tf.keras.datasets.reuters.get_word_index( path='reuters_word_index.json' )

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值