TensorFlow Datasets

TensorFlow Datasets

Table of Contents

TensorFlow Datasets provides many public datasets as `tf.data.Datasets`.

[![Kokoro](https://storage.googleapis.com/tfds-kokoro-public/kokoro-build.svg)](https://storage.googleapis.com/tfds-kokoro-public/kokoro-build.html)
[![PyPI version](https://badge.fury.io/py/tensorflow-datasets.svg)](https://badge.fury.io/py/tensorflow-datasets)

* [List of datasets](https://github.com/tensorflow/datasets/tree/master/docs/datasets.md)
* [Try it in Colab](https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/overview.ipynb)
* [API docs](https://www.tensorflow.org/datasets/api_docs/python/tfds)
* [Add a dataset](https://github.com/tensorflow/datasets/tree/master/docs/add_dataset.md)



### Installation

```sh
pip install tensorflow-datasets

# Requires TF 1.12+ to be installed.
# Some datasets require additional libraries; see setup.py extras_require
pip install tensorflow
# or:
pip install tensorflow-gpu

Usage

import tensorflow_datasets as tfds
import tensorflow as tf

# tfds works in both Eager and Graph modes
tf.enable_eager_execution()

# See available datasets
print(tfds.list_builders())

# Construct a tf.data.Dataset
ds_train, ds_test = tfds.load(name="mnist", split=["train", "test"])

# Build your input pipeline
ds_train = ds_train.shuffle(1000).batch(128).prefetch(10)
for features in ds_train.take(1):
  image, label = features["image"], features["label"]

Try it interactively in a
Colab notebook.

DatasetBuilder

All datasets are implemented as subclasses of
DatasetBuilder
and
tfds.load
is a thin convenience wrapper.
DatasetInfo
documents the dataset.

import tensorflow_datasets as tfds

# The following is the equivalent of the `load` call above.

# You can fetch the DatasetBuilder class by string
mnist_builder = tfds.builder("mnist")

# Download the dataset
mnist_builder.download_and_prepare()

# Construct a tf.data.Dataset
ds = mnist_builder.as_dataset(split=tfds.Split.TRAIN)

# Get the `DatasetInfo` object, which contains useful information about the
# dataset and its features
info = mnist_builder.info
print(info)

    tfds.core.DatasetInfo(
        name='mnist',
        version=1.0.0,
        description='The MNIST database of handwritten digits.',
        urls=[u'http://yann.lecun.com/exdb/mnist/'],
        features=FeaturesDict({
            'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
            'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10)
        },
        total_num_examples=70000,
        splits={
            u'test': <tfds.core.SplitInfo num_examples=10000>,
            u'train': <tfds.core.SplitInfo num_examples=60000>
        },
        supervised_keys=(u'image', u'label'),
        citation='"""
            @article{lecun2010mnist,
              title={MNIST handwritten digit database},
              author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
              journal={ATT Labs [Online]. Available: http://yann. lecun. com/exdb/mnist},
              volume={2},
              year={2010}
            }
      """',
  )

NumPy Usage with tfds.as_numpy

As a convenience for users that want simple NumPy arrays in their programs, you
can use
tfds.as_numpy
to return a generator that yields NumPy array
records out of a tf.data.Dataset. This allows you to build high-performance
input pipelines with tf.data but use whatever you’d like for your model
components.

train_ds = tfds.load("mnist", split=tfds.Split.TRAIN)
train_ds = train_ds.shuffle(1024).batch(128).repeat(5).prefetch(10)
for example in tfds.as_numpy(train_ds):
  numpy_images, numpy_labels = example["image"], example["label"]

You can also use tfds.as_numpy in conjunction with batch_size=-1 to
get the full dataset in NumPy arrays from the returned tf.Tensor object:

train_ds = tfds.load("mnist", split=tfds.Split.TRAIN, batch_size=-1)
numpy_ds = tfds.as_numpy(train_ds)
numpy_images, numpy_labels = numpy_ds["image"], numpy_ds["label"]

Note that the library still requires tensorflow as an internal dependency.

Want a certain dataset?

Adding a dataset is really straightforward by following
our guide.

Request a dataset by opening a
Dataset request GitHub issue.

And vote on the current
set of requests
by adding a thumbs-up reaction to the issue.

Disclaimers

This is a utility library that downloads and prepares public datasets. We do
not host or distribute these datasets, vouch for their quality or fairness, or
claim that you have license to use the dataset. It is your responsibility to
determine whether you have permission to use the dataset under the dataset’s
license.

If you’re a dataset owner and wish to update any part of it (description,
citation, etc.), or do not want your dataset to be included in this
library, please get in touch through a GitHub issue. Thanks for your
contribution to the ML community!

If you’re interested in learning more about responsible AI practices, including
fairness, please see Google AI’s Responsible AI Practices.

tensorflow/datasets is Apache 2.0 licensed. See the LICENSE file.


### 回答1: TensorFlow Datasets是一个开源的数据集库,提供了许多常用的机器学习数据集,包括图像分类、自然语言处理、语音识别等领域。这些数据集可以直接在TensorFlow中使用,方便用户进行模型训练和评估。TensorFlow Datasets还提供了数据预处理、数据增强等功能,帮助用户更好地处理数据。同时,TensorFlow Datasets还支持自定义数据集的导入和使用,方便用户使用自己的数据集进行机器学习。 ### 回答2: TensorFlow Datasets(简称TFDS)是由TensorFlow团队提供的可用于深度学习的开放数据集合。TFDS旨在提供方便的数据集标准化、深度学习数据管道和易于使用的API。使用TFDS可以省去获取、清理、格式化和存储数据的过程,使深度学习的数据处理过程更加简便,节省时间。 TFDS包含许多常用的数据集,例如ImageNet、CIFAR-10、MNIST等。它们都按照标准格式进行了预处理,并可以通过API进行快速和方便的访问。TFDS还包含许多不同领域的数据集,比如自然语言处理、机器翻译、物体识别等等,可以满足多种场景下的需求。 除此之外,TFDS还提供了数据集加载、数据集转换和数据集元数据等功能。数据集加载功能可用于加载TFDS中提供的数据集;数据集转换功能可用于对数据集进行处理和改变数据集的格式;数据集元数据功能可用于显示数据集的信息和属性。 总之,TFDS提供了许多预处理过的数据集,并提供易于使用的API来读取和管理数据。使用TFDS可以大大简化深度学习数据处理的过程,让研究人员可以更加专注于模型的开发和训练。 ### 回答3: TensorFlow Datasets(TFDS)是一个丰富的、易于使用的开源数据集库,可在 TensorFlow 中使用。它提供了许多经过预处理的数据集,适用于机器学习和其他相关领域的研究。 TFDS 的优势在于它提供了高质量的数据集和统一的数据加载方式,使得开发者可以快速地开始实现机器学习项目。TFDS 不仅提供了常见的图像和语音数据集,还提供了各种其他类型的数据集,如文本、结构化数据、序列等。这些数据集可以用于各种任务,例如分类、回归、聚类、生成等。 使用 TFDS 有许多好处。首先,TFDS 中的数据集都经过了良好的预处理,例如归一化、标准化和缩放,以便于机器学习模型的训练。其次,数据集统一了加载方式,让使用者不需要关注数据的解析和转换,从而节省了很多时间和精力。此外,TFDS 还提供了一些方便的功能,例如数据集分割、随机化、批处理和预处理,以便于开发者更轻松地处理数据。 总之,TFDS 是一个非常有用的工具,可以帮助机器学习从业者更有效地进行数据预处理和模型训练。使用 TFDS,我们可以轻松地获取和加载各种类型的数据集,并将其应用于机器学习项目中。随着时间的推移,TFDS 将继续增长和扩展,为整个机器学习社区提供更丰富、更高质量的数据集。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值