【TF2.0】【笔记】TensorFlow数据类型基础2

最新推荐文章于 2021-12-13 15:58:09 发布

Samanii

最新推荐文章于 2021-12-13 15:58:09 发布

阅读量541

点赞数

分类专栏： Tensorflow2.0 文章标签： python 深度学习 numpy tensorflow 数据分析

本文链接：https://blog.csdn.net/lyx0708/article/details/104587950

版权

Tensorflow2.0 专栏收录该内容

5 篇文章 1 订阅

订阅专栏

文章目录

Tensorflow进阶

Tensorflow进阶

合并与分割

合并

使用tf.concat （concatenate）合并张量，不会产生新的维度，合并操作可以在任意的维度上进行，唯一的约束是非合并维度的长度必须一致

使用tf.stack（stack）会产生新的维度，它需要所有合并的张量 shape 完全一致才可合并

a = tf.random.normal([4, 35, 8])
b = tf.random.normal([4, 35, 8])
c = tf.concat([a, b, b], axis = 1)
print(c.shape)
d = tf.stack([a, b], axis=0)
e = tf.stack([a, b], axis=-1)
print(f'd:{d.shape} e:{e.shape}')

分割

tf.split和tf.unstack

a = tf.random.normal([4, 35, 8])
r1 = tf.split(a, axis=0, num_or_size_splits=4)
r2 = tf.split(a, axis=1, num_or_size_splits=[30, 5])
r3 = tf.unstack(a)
print(type(r1))
print(tf.convert_to_tensor(r1).shape)
print(len(r2))
print(tf.convert_to_tensor(r3).shape)

split和unstack的返回值是list类型

数据统计

向量范数

向量范数（Vector norm）是表征向量“长度”的一种度量方法，在神经网络中，常用来表示张量的权值大小，梯度大小，下面是常见的范数：

在这里插入图片描述

x = tf.ones([2, 2])
print(tf.norm(x, ord=1))
print(tf.norm(x, ord=2))
print(tf.norm(x, ord=np.inf))

最大值最小值均值和

统计学tf.reduce_max, tf.reduce_min, tf.reduce_mean, tf.reduce_sum

x = tf.random.normal([4, 10])
print(tf.reduce_max(x))
print(tf.reduce_min(x, axis=0))
print(tf.reduce_mean(x))
print(tf.reduce_mean(x, axis=0))
print(tf.reduce_sum(x, axis=-1))
a = tf.nn.softmax(x, axis=1)
print(a)
print(tf.argmax(a, axis=1))

张量比较

分类问题的准确率

out = tf.random.normal([100, 10])
out = tf.nn.softmax(out, axis=1)
pred = tf.argmax(out, axis=1)
y = tf.random.uniform([100],dtype=tf.int64, maxval=10)
print(pred)
print(y)
compare = tf.equal(pred, y)
print(compare)
out = tf.cast(compare, dtype=tf.float32)
acc = tf.reduce_sum(out) / out.shape[0]
print(acc)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ub9g9Tym-1583030212232)(user-image/1581477674693.png)]

填充和复制

填充

将不同长度的数据扩张为相同长度，比如在对单词使用数字编码时

填充操作可以通过 tf.pad(x, paddings)函数实现，paddings 是包含了多个 [𝐿𝑒𝑓𝑡 𝑃𝑎𝑑𝑑𝑖𝑛𝑔,𝑅𝑖𝑔ℎ𝑡 𝑃𝑎𝑑𝑑𝑖𝑛𝑔]的嵌套方案 List，如[[0,0],[2,1],[1,2]]表示第一个维度不填充，第二个维度左边(起始处)填充两个单元，右边(结束处)填充一个单元，第三个维度左边填充一个单元，右边填充两个单元。

total_words = 10000
max_review_len = 80
embedding_len = 100
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data(num_words=10000)
x_train = tf.keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len, truncating='post', padding='post')

pad_sequences的truncating和padding只有两个值‘pre’和‘post’，pre表示截断前面的，post表示截断后面。

复制

通过 tf.tile 函数可以在任意维度将数据重复复制多份，如shape 为[4,32,32,3]的数据，复制方案 multiples=[2,3,3,1]，即通道数据不复制，高宽方向分别复制 2 份，图片数再复制 1 份

x = tf.random.normal([4, 32, 32, 3])
print(tf.tile(x, multiples=[2,3,3,1]))

数据限幅

使用限幅函数实现Relu函数

x = tf.range(9)
print(tf.maximum(x, 3))
print(tf.minimum(x, 6))
def relu(x):
    return  tf.minimum(x, 0.)
print(tf.clip_by_value(x, 3, 7))

高级操作

主要包括采样

tf.gather、tf.gather_nd、tf.boolean_mask、

x = tf.random.uniform([4, 5, 8], maxval=100, dtype=tf.int32)
print(x)
print(tf.gather(x, [1, 2], axis=0))
print(tf.gather_nd(x,[[1,1,1],[2,2,1],[3,3,1]]))

tf.where

下面举例将张量中的大于零的数取出来

通过比较运算，得到正数的掩码

通过 tf.where 提取此掩码处 True 元素的索引

拿到索引后，通过 tf.gather_nd 即可恢复出所有正数的元素

a = tf.zeros([3,3])
b = tf.ones([3,3])
cond = tf.constant([[True, False, True],[False, True, False],[True, False, True]])
print(tf.where(cond, a, b))
x = tf.random.normal([3, 3])
print(x)
mask = x>0
indices = tf.where(mask)
print(indices)

tf.scatter_nd

通过 tf.scatter_nd(indices, updates, shape)可以高效地刷新张量的部分数据，但是只能在全 0 张量的白板上面刷新，因此可能需要结合其他操作来实现现有张量的数据刷新功能

indices = tf.constant([[1],[3]])
update = tf.constant([
    [[5,5,5,5],[6,6,6,6],[7,7,7,7],[8,8,8,8]],
    [[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]]
])
print(tf.scatter_nd(indices, update, [4,4,4]))

tf.meshgrid

3D显示多元函数

x = tf.linspace(-16.,16., 100)
y = tf.linspace(-16.,16., 100)
x, y = tf.meshgrid(x, y)
z = tf.sqrt(x**2 + y**2)
z = tf.sin(z) / z
fig = plt.figure()
ax = Axes3D(fig)
ax.contour3D(x.numpy(), y.numpy(), z.numpy(), 100)
plt.show()

经典数据库加载

(x, y), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
train_db = tf.data.Dataset.from_tensor_slices((x, y))
train_db = train_db.shuffle(10000)
train_db = train_db.batch(128)

def preprocess(x, y):
    x = tf.cast(x, dtype=tf.float32)
    x = tf.reshape(x, [-1, 28*28])
    y = tf.cast(y, dtype=tf.int32)
    y = tf.one_hot(y, depth=10)
    return x, y
train_db = train_db.map(preprocess)
train_db = train_db.repeat(20)