TensorFlow基础1_对张量的基础操作_用Dataset API构建流水线

鲸鱼在dn

于 2023-11-19 15:12:31 发布

阅读量72

点赞数

分类专栏：机器学习和深度学习基础文章标签： tensorflow 人工智能 python

本文链接：https://blog.csdn.net/qq_41697157/article/details/134490926

版权

机器学习和深度学习基础专栏收录该内容

9 篇文章 0 订阅

订阅专栏

文章目录

0、什么是TensorFlow
1、学习TensorFlow
2、用TensorFlow的Dataset API构建流水线

0、什么是TensorFlow

TensorFlow是一个实施和运行机器学习算法的编程接口，它具有可扩展和跨平台的优点，而且包括了为了深度学习特别准备的便捷封装。

TensorFlow围绕由一组节点组成的计算图工作。每个节点表示一个可能具有零个或多个输入或输出的操作。创建张量（Tensor）作为符号句柄来表示这些操作的输入与输出。我们可以把张量理解为数学意义的标量（0级张量）、向量（1级张量）、和矩阵（2级张量）等的抽象。

1、学习TensorFlow

1.1 安装TensorFlow

pip install tensorflow

# 安装对应的版本
pip install tensorflow==[desired-version]

1.2 在TensorFlow中创建张量

1）tf.convert_to_tensor()函数可以从列表或Numpy数组创建张量，可以在张量上调用.numpy()方法来访问张量数值

2）tf.constant()可以创建一个常值张量

a = np.array([1, 2, 3], dtype=np.int32)
b = [4, 5, 6]

t_a = tf.convert_to_tensor(a)
t_b = tf.convert_to_tensor(b)

print(t_a)
print(t_b)

# 创建张量，结果
# tf.Tensor([1 2 3], shape=(3,), dtype=int32)
# tf.Tensor([4 5 6], shape=(3,), dtype=int32)


const_tensor = tf.constant([1.2, 5, np.pi], dtype=tf.float32)
print(const_tensor)

#常值张量，结果tf.Tensor([1.2   5.    3.142], shape=(3,), dtype=float32)

1.3 对张量的形状和数值进行操作

1）对张量的数据类型操作，用tf.cast()函数

2）对张量的转置tf.transpose()、重塑tf.reshape()、减少不必要的维度tf.squeeze()

# 对张量的数据类型操作
t_a_new = tf.cast(t_a, tf.int64)
print(t_a_new.dtype)
# 结果 
# <dtype: 'int64'>

# 对张量的转置
t = tf.random.uniform(shape=(3, 5))

t_tr = tf.transpose(t)
print(t.shape, ' --> ', t_tr.shape)
# (3, 5)  -->  (5, 3)

# 重塑
t = tf.zeros((30,))
print(t)
t_reshape = tf.reshape(t, shape=(5, 6))
print(t_reshape)
print(t_reshape.shape)
# 结果
#tf.Tensor(
#[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
# 0. 0. 0. 0. 0. 0.], shape=(30,), dtype=float32)
#tf.Tensor(
#[[0. 0. 0. 0. 0. 0.]
# [0. 0. 0. 0. 0. 0.]
# [0. 0. 0. 0. 0. 0.]
# [0. 0. 0. 0. 0. 0.]
# [0. 0. 0. 0. 0. 0.]], shape=(5, 6), dtype=float32)
# (5, 6)

# 减少维度
t = tf.zeros((1, 2, 1, 4, 1))
t_sqz = tf.squeeze(t, axis=(2, 4))
print(t.shape, ' --> ', t_sqz.shape)

# 结果
# (1, 2, 1, 4, 1)  -->  (1, 2, 4)

1.4 对张量的数学运算

1）元素相乘 tf.multiply(t1, t2)

2）如果要沿某个轴或多个轴来计算均值、总和、标准差，用函数
tf.math.reduce_mean(t1, axis=0)、
tf.math.reduce_sum()
tf.math.reduce_std()

1.5 拆分（split）、堆叠（stack）、连接张量(concatenate)

一个张量拆成两个或多个tf.split()，沿指定维度输入张量的大小，必须可以被需要拆分的个数整除。
tf.stack() 堆叠
tf.concat()

>>> tf.random.set_seed(1)
>>> t = tf.random.uniform((6,))
>>> print(t.numpy())
>>> t_splits = tf.split(t, 3)
>>> [item.numpy() for item in t_splits]

[0.165 0.901 0.631 0.435 0.292 0.643]
[array([0.165, 0.901], dtype=float32),
 array([0.631, 0.435], dtype=float32),
 array([0.292, 0.643], dtype=float32)]
 
 
>>> A = tf.ones((3,))
>>> B = tf.zeros((2,))

>>> C = tf.concat([A, B], axis=0)
print(C.numpy())
[1. 1. 1. 0. 0.]

>>> A = tf.ones((3,))
>>> B = tf.zeros((3,))

>>> S = tf.stack([A, B], axis=1)
print(S.numpy())

[[1. 0.]
 [1. 0.]
 [1. 0.]]

2、用TensorFlow的Dataset API构建流水线

概述构建TensorFlow数据集的不同方法，包括数据集转换和常见的预处理步骤。

2.1 用现存张量创建TensorFlow的数据集

tf.data.Dataset.from_tensor_slices() 可以从现有的列表和Numpy数组中创建数据集。函数返回Dataset类对象，同时调用.batch()方法可以创建不同大小的批处理。

# 创建数据集ds
>>> a = [1.2, 3.4, 7.5, 4.1, 5.0, 1.0]
>>> ds = tf.data.Dataset.from_tensor_slices(a)
>>> print(ds)

<_TensorSliceDataset element_spec=TensorSpec(shape=(), dtype=tf.float32, name=None)>

# 遍历数据集ds
>>> for item in ds:
>>>     print(item)
tf.Tensor(1.2, shape=(), dtype=float32)
tf.Tensor(3.4, shape=(), dtype=float32)
tf.Tensor(7.5, shape=(), dtype=float32)
tf.Tensor(4.1, shape=(), dtype=float32)
tf.Tensor(5.0, shape=(), dtype=float32)
tf.Tensor(1.0, shape=(), dtype=float32)

# 创建大小为3的批处理
>>> ds_batch = ds.batch(3)
>>> for i, elem in enumerate(ds_batch, 1):
    print('batch {}:'.format(i), elem.numpy())

2.2 把两个张量合成一个联合数据集

假设有2个张量，t_x存储特征、t_y存储标签,我们构建数据集把两个张量整合在一起，有两种方法：
1）tf.data.Dataset.zip
2）tf.data.Dataset.from_tensor_slices

对数据集中的每一个元素进行操作，可以用.map()方法

# 创建数据集
>>> tf.random.set_seed(1)
>>> t_x = tf.random.uniform([4, 3], dtype=tf.float32)
>>> t_y = tf.range(4)
>>> ds_x = tf.data.Dataset.from_tensor_slices(t_x)
>>> ds_y = tf.data.Dataset.from_tensor_slices(t_y)

## 1）方法1：
>>> ds_joint = tf.data.Dataset.zip((ds_x, ds_y))
>>> 
>>> for example in ds_joint:
>>>     print('  x: ', example[0].numpy(), 
>>>           '  y: ', example[1].numpy())

  x:  [0.165 0.901 0.631]   y:  0
  x:  [0.435 0.292 0.643]   y:  1
  x:  [0.976 0.435 0.66 ]   y:  2
  x:  [0.605 0.637 0.614]   y:  3

## 方法2:
>>> ds_joint = tf.data.Dataset.from_tensor_slices((t_x, t_y))

>>> for example in ds_joint:
>>>     print('  x: ', example[0].numpy(), 
>>>           '  y: ', example[1].numpy())
  x:  [0.165 0.901 0.631]   y:  0
  x:  [0.435 0.292 0.643]   y:  1
  x:  [0.976 0.435 0.66 ]   y:  2
  x:  [0.605 0.637 0.614]   y:  3

# 对数据集中的每个元素进行操作
>>> ds_trans = ds_joint.map(lambda x, y: (x*2-1.0, y))

>>> for example in ds_trans:
>>>     print('  x: ', example[0].numpy(), 
>>>           '  y: ', example[1].numpy())
  x:  [-0.67   0.803  0.262]   y:  0
  x:  [-0.131 -0.416  0.285]   y:  1
  x:  [ 0.952 -0.13   0.32 ]   y:  2
  x:  [0.21  0.273 0.229]   y:  3

2.3 洗牌(Shuffle), 批处理(batch), 和重复 (repeat)

2.3.1 洗牌

1）.shuffle()方法用于洗牌
buffer_size参数，确定在洗牌之前的数据集中有多少个元素被分在一起

1).batch() 用于分批
2).repeat() 方法用于重复

    > > > tf.random.set\_seed(1)
    > > > ds = ds\_joint.shuffle(buffer\_size=len(t\_x))

    > > > for example in ds:
    > > > print('  x: ', example\[0].numpy(),
    > > > '  y: ', example\[1].numpy())

    x:  \[0.976 0.435 0.66 ]   y:  2
    x:  \[0.435 0.292 0.643]   y:  1
    x:  \[0.165 0.901 0.631]   y:  0
    x:  \[0.605 0.637 0.614]   y:  3

    ## 分批，例子

    > > > ds = ds\_joint.batch(batch\_size=3,
    > > > ···                     drop\_remainder=False)

    > > > batch\_x, batch\_y = next(iter(ds))
    > > > print('Batch-x: \n', batch\_x.numpy())
    > > > print('Batch-y:   ', batch\_y.numpy())

    Batch-x:
    \[\[0.165 0.901 0.631]
    \[0.435 0.292 0.643]
    \[0.976 0.435 0.66 ]]
    Batch-y:    \[0 1 2]

    ## 重复，例子

    > > > ds = ds\_joint.batch(3).repeat(count=2)

    > > > for i,(batch\_x, batch\_y) in enumerate(ds):
    > > > print(i, batch\_x.shape, batch\_y.numpy())

    0 (3, 3) \[0 1 2]
    1 (1, 3) \[3]
    2 (3, 3) \[0 1 2]
    3 (1, 3) \[3]

鲸鱼在dn

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
TensorFlow基础1_对张量的基础操作_用Dataset API构建流水线

TensorFlow中对张量的定义；对张量的操作：拆分（split）、堆叠（stack）、连接张量(concatenate)；构建数据流水线的方式
复制链接

扫一扫

专栏目录