Tensorflow2.0技术解析与实战——Dataset类详解


知识树

1.Dataset类相关操作

1.1 Dataset类创建数据集

tf.data.Dataset 类创建数据集,对数据集实例化。
最常用的如:

  • tf.data.Dataset.from_tensors() :创建Dataset对象, 合并输入并返回具有单个元素的数据集。
  • tf.data.Dataset.from_tensor_slices() :创建一个Dataset对象,输入可以是一个 或者多个 tensor,若是多个 tensor,需要以元组或者字典等形式组装起来
  • tf.data.Dataset. from_generator() :迭代生成所需的数据集,一般数据量较大时使用。

注:Dataset可以看作是相同类型“元素”的有序列表。在实际使用时,单个“元素” 可以是向量,也可以是字符串、图片,甚至是tuple或者dict。

1.2 Dataset类数据转换

Dataset包含了非常丰富的数据转换功能:

  • map(f): 对数据集中的每个元素应用函数 f ,得到一个新的数据集(这部分往往结合 tf.io 进行读写和解码文件, tf.image 进行图像处理)
  • shuffle(buffer_size) :将数据集打乱(设定一个固定大小的缓冲区(Buffer),取出前 buffer_size 个元素放入,并从缓冲区中随机采样,采样后的数据用后续数据替换)
  • repeat(count):数据集重复次数
  • batch(batch_size) :将数据集分成批次,即对每 batch_size 个元素,使用 tf.stack() 在第 0 维合并,成为一个元素
  • flat_map(): 将map函数映射到数据集的每一个元素,并将嵌套的Dataset压平。

flat_map()程序案例1:

import tensorflow as tf
a = tf.data.Dataset.range(1, 6)  # ==> [ 1, 2, 3, 4, 5 ]
# NOTE: New lines indicate "block" boundaries.
b=a.flat_map(lambda x: tf.data.Dataset.from_tensors(x).repeat(6)) 
for item in b:
    print(item.numpy(),end=', ')

下面运行出 程序结果

1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 

flat_map()程序案例2:

import tensorflow as tf
dataset = tf.data.Dataset.from_tensor_slices([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
dataset_flat = dataset.flat_map(lambda x: tf.data.Dataset.from_tensor_slices(x))
for line in dataset:
    print(line)

下面运行出 程序结果

tf.Tensor([1 2 3], shape=(3,), dtype=int32)
tf.Tensor([4 5 6], shape=(3,), dtype=int32)
tf.Tensor([7 8 9], shape=(3,), dtype=int32)

flat_map()程序案例3:

import tensorflow as tf
dataset = tf.data.Dataset.from_tensor_slices([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
dataset_flat = dataset.flat_map(lambda x: tf.data.Dataset.from_tensor_slices(x))
for line in dataset_flat:
    print(line)

下面运行出 程序结果

tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor(4, shape=(), dtype=int32)
tf.Tensor(5, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(7, shape=(), dtype=int32)
tf.Tensor(8, shape=(), dtype=int32)
tf.Tensor(9, shape=(), dtype=int32)
  • interleave(): 效果类似flat_map,但可以将不同来源的数据夹在一起
  • take(): 截取数据集中的前若干个元素
  • filter: 过滤掉某些元素
  • zip: 将两个长度相同的Dataset横向铰合
    zip程序案例1:
a = tf.data.Dataset.range(1, 4)  # ==> [ 1, 2, 3 ]
b = tf.data.Dataset.range(4, 7) # ==> [ 4, 5, 6 ]
ds = tf.data.Dataset.zip((a, b))
for line in ds:
    print(line)

下面运行出 程序结果

(<tf.Tensor: id=182, shape=(), dtype=int64, numpy=1>, <tf.Tensor: id=183, shape=(), dtype=int64, numpy=4>)
(<tf.Tensor: id=184, shape=(), dtype=int64, numpy=2>, <tf.Tensor: id=185, shape=(), dtype=int64, numpy=5>)
(<tf.Tensor: id=186, shape=(), dtype=int64, numpy=3>, <tf.Tensor: id=187, shape=(), dtype=int64, numpy=6>)

zip程序案例2:

a = tf.data.Dataset.range(1, 4)  # ==> [ 1, 2, 3 ]
b = tf.data.Dataset.range(4, 7) # ==> [ 4, 5, 6 ]
ds = tf.data.Dataset.zip((b, a))
for line in ds:
    print(line)

下面运行出 程序结果

(<tf.Tensor: id=194, shape=(), dtype=int64, numpy=4>, <tf.Tensor: id=195, shape=(), dtype=int64, numpy=1>)
(<tf.Tensor: id=196, shape=(), dtype=int64, numpy=5>, <tf.Tensor: id=197, shape=(), dtype=int64, numpy=2>)
(<tf.Tensor: id=198, shape=(), dtype=int64, numpy=6>, <tf.Tensor: id=199, shape=(), dtype=int64, numpy=3>)
  • concatenate: 将两个Dataset纵向连接

concatenate程序案例:

a = tf.data.Dataset.range(1, 4)  # ==> [ 1, 2, 3 ]
b = tf.data.Dataset.range(4, 7) # ==> [ 4, 5, 6 ]
ds = a.concatenate(b)
for line in ds:
    print(line)

下面运行出 程序结果

tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(2, shape=(), dtype=int64)
tf.Tensor(3, shape=(), dtype=int64)
tf.Tensor(4, shape=(), dtype=int64)
tf.Tensor(5, shape=(), dtype=int64)
tf.Tensor(6, shape=(), dtype=int64)
  • reduce: 执行归并操作

interleave()是Dataset的类方法,所以interleave是作用在一个Dataset上的。
首先该方法会从该Dataset中取出cycle_length个element,然后对这些element apply map_func, 得到cycle_length个新的Dataset对象。然后从这些新生成的Dataset对象中取数据,每个Dataset对象一次取block_length个数据。当新生成的某个Dataset的对象 取尽时,从原Dataset中再取一个element,然后apply map_func,以此类推。

  • 3
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
code! 此为源代码! 视频教程见: May 1, 2020 Machine Learning Projects with TensorFlow 2.0: Supercharge your Machine Learning skills with Tensorflow 2 English | MP4 | AVC 1920×1080 | AAC 48KHz 2ch | 4h 20m | 965 MB eLearning | Skill level: All Levels Build and train models for real-world machine learning projects using Tensorflow 2.0 TensorFlow is the world’s most widely adopted framework for Machine Learning and Deep Learning. TensorFlow 2.0 is a major milestone due to its inclusion of some major changes making TensorFlow easier to learn and use such as “Eager Execution”. It will support more platforms and languages, improved compatibility and remove deprecated APIs. This course will guide you to upgrade your skills in Machine Learning by practically applying them by building real-world Machine Learning projects. Each section should cover a specific project on a Machine Learning task and you will learn how to implement it into your system using TensorFlow 2. You will implement various Machine Learning techniques and algorithms using the TensorFlow 2 library. Each project will put your skills to test, help you understand and overcome the challenges you can face in a real-world scenario and provide some tips and tricks to help you become more efficient. Throughout the course, you will cover the new features of TensorFlow 2 such as Eager Execution. You will cover at least 3-4 projects. You will also cover some tasks such as Reinforcement Learning and Transfer Learning. By the end of the course, you will be confident to build your own Machine Learning Systems with TensorFlow 2 and will be able to add this valuable skill to your CV. Learn Strengthen your foundations to build TensorFlow 2.0 projects by exploring its new features Analyze the Titanic data set to obtain desired results with ease Implement and organize your Tensorflow projects in a professional manner Use Tensorboard to inspect various metrics and monitor your project’s performance Research and make the most of other people’s Kaggle solutions Use OpenAI Gym Environments for implementing state of the art reinforcement learning techniques using TF-Agents Apply the latest Transfer Learning techniques from Tensorflow + Table of Contents Regression Task Airbnb Prices in New York 1 Course Overview 2 Setting Up TensorFlow 2.0 3 Getting Started with TensorFlow 2.0 4 Analyzing the Airbnb Dataset and Making a Plan 5 Implementing a Simple Linear Regression Algorithm 6 Implementing a Multi Layer Perceptron (Artificial Neural Network) 7 Improving the Network with Better Activation Functions and Dropout 8 Adding More Metrics to Gain a Better Understanding 9 Putting It All Together in a Professional Way Classification Task Build Real World Apps – Who Will Win the Next UFC 10 Collecting Possible Kaggle Data 11 Analysis and Planning of the Dataset 12 Introduction to Google Colab and How It Benefits Us 13 Setting Up Training on Google Colab 14 Some Advanced Neural Network Approaches 15 Introducing a Deeper Network 16 Inspecting Metrics with TensorBoard 17 Inspecting the Existing Kaggle Solutions Natural Language Processing Task – How to Generate Our Own Text 18 Introduction to Natural Language Processing 19 NLP and the Importance of Data Preprocessing 20 A Simple Text Classifier 21 Text Generation Methods 22 Text Generation with a Recurrent Neural Network 23 Refinements with Federated Learning Reinforcement Learning Task – How to Become Best at Pacman 24 Introduction to Reinforcement Learning 25 OpenAI Gym Environments 26 The Pacman Gym Environment That We Are Going to Use 27 Reinforcement Learning Principles with TF-Agents 28 TF-Agents for Our Pacman Gym Environment 29 The Agents That We Are Going to Use 30 Selecting the Best Approaches and Real World Applications Transfer Learning Task – How to Build a Powerful Image Classifier 31 Introduction to Transfer Learning in TensorFlow 2 32 Picking a Kaggle Dataset to Work On 33 Picking a Base Model Suitable for Transfer Learning with Our Dataset 34 Implementing our Transfer Learning approach 35 How Well Are We Doing and Can We Do Better 36 Conclusions and Future Work

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值