Tensorflow Dataset API详解

最新推荐文章于 2025-03-19 14:13:09 发布

流水空间

最新推荐文章于 2025-03-19 14:13:09 发布

阅读量1.2w

点赞数 6

分类专栏： tensorflow 文章标签： tensorflow DataSet API

本文链接：https://blog.csdn.net/liushuikong/article/details/79213632

版权

Tensorflow是一个十分受欢迎的深度学习框架。为了提高框架的性能和易使用性，随着版本的迭代，tensorflow逐步添加了许多高级API。这些高级API中，有一部分是对原来API的更高级封装，还有一部分就是为了提高性能（取代旧API）而开发出来的新API。其中，Dataset API和Estimator API是TensorFlow 1.3 中引入的高级API，官方文档也推荐用户使用它们创建模型。

Datasets：一种为 TensorFlow 模型创建输入管道的新方式。The Dataset API has methods to load and manipulatedata,and feed it into your model. The Datasets API meshes well with the Estimators API.
Estimators:用来表示一个完整的 TensorFlow 模型。The Estimator API provides methods to train the model, to judgethe model's accuracy, and to generate predictions.

下图是tensorflow API的完整架构图：

在TensorFlow 1.3以前的版本中总体来说有两种读取数据方法：

使用placeholder和feed_dict读内存中的数据
使用queue pipeline(队列式管道)读取硬盘中的数据（原理介绍可以参考这篇文章：十图详解tensorflow数据读取机制）

Dataset API是从 TensorFlow 1.3开始添加新的输入管道。使用此 API 的性能要比使用 feed_dict 或队列式管道的性能高得多，而且此 API 更简洁，使用起来更容易。在TensorFlow 1.3中，Dataset API是放在contrib包中的：tf.contrib.data.Dataset，而在TensorFlow 1.4中则是tf.data.Dataset。

Datasets API是由以下图中所示的类组成：

其中：

Dataset: Base class containing methods tocreate and transform datasets. Also allows you to initialize a dataset from data in memory, or from a Python generator.
TextLineDataset: Reads lines from text files(txt,csv...).
TFRecordDataset: Reads records from TFRecord files.
FixedLengthRecordDataset: Reads fixed size records from binary files.
Iterator: Provides a way to access one data set element at a time.