TensorFlow 2.x 常用API

最新推荐文章于 2024-07-11 11:16:39 发布

原创最新推荐文章于 2024-07-11 11:16:39 发布 · 置顶 · 1.3k 阅读

22 ·

CC 4.0 BY-SA版权

文章标签：

#tensorflow

TensorFlow 专栏收录该内容

10 篇文章

订阅专栏

本文详细介绍了TensorFlow2.x中的关键API，包括模型创建、网络层、激活函数、优化器、损失函数及评估指标等内容，帮助读者快速掌握TensorFlow2.x的核心组件。

部署运行你感兴趣的模型镜像

TensorFlow 2.x 常用API

1. 简介
2. 创建Model
3. 网络层(tf.keras.layers)
4. tf.keras.activations
5. tf.keras.optimizers
6. tf.keras.losses函数
7. tf.keras.metrics
参考

1. 简介

模型（Model）：而模型则将各种层进行组织和连接，并封装成一个整体，描述了如何将输入数据通过各种层以及运算而得到输出
层（Layer）：层将各种计算流程和变量进行了封装（例如基本的全连接层，CNN 的卷积层、池化层等）

2. 创建Model

2.1 tf.keras.Model

用途：把层(layers)线性栈或非线性栈组织到一个object（tf.keras.Model）中，以进行训练和推理
定义

tf.keras.Model(
    *args, **kwargs
)

参数	数据类型	描述
inputs	keras.Input对象或列表	model的输入
outputs	层对象	模型的输出
name	String	模型的名字

2.1.1 Model常用方法

方法	功能	定义
compile	配置	compile(optimizer=‘rmsprop’, loss=None, metrics=None, loss_weights=None, weighted_metrics=None, run_eagerly=None, **kwargs)
fit	训练	fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None, validation_batch_size=None, validation_freq=1, max_queue_size=10, workers=1, use_multiprocessing=False)
evaluate	评估	evaluate(x=None, y=None, batch_size=None, verbose=1, sample_weight=None, steps=None, callbacks=None, max_queue_size=10, workers=1, use_multiprocessing=False, return_dict=False)
predict	预测	predict (x, batch_size=None, verbose=0, steps=None, callbacks=None, max_queue_size=10, workers=1, use_multiprocessing=False)
save	保存	save(filepath, overwrite=True, include_optimizer=True, save_format=None, signatures=None, options=None)
load_model	加载	tf.keras.models.load_model( filepath, custom_objects=None, compile=True, options=None)

compile

2.1.2 函数式API创建Model

示例

inputs = tf.keras.Input(shape=(3,))

#x = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs)
x = tf.keras.layers.Dense(4, activation='relu')(inputs)

outputs = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(x)
# outputs = tf.keras.layers.Dense(5, activation='softmax')(x)

model = tf.keras.Model(inputs = inputs, outputs = outputs)

# show model information
model.summary()
keras.utils.plot_model(model, "model.jpg", show_shapes=True)

2.1.3 自定义Model

Keras 模型以类的形式呈现，我们可以通过继承 tf.keras.Model 这个 Python 类来定义自己的模型
在继承类中，我们需要重写 init() （构造函数，初始化）和 call(input) （模型调用）两个方法
也可以根据需要增加自定义的方法
派生类函数说明：
- 使用super()函数调用父类方法
- 使用call() 方法对实例进行调用

data = np.random.random((1000, 32))
labels = np.random.random((1000, 5))

class MyModel(tf.keras.Model):
    def __init__(self):
        super(MyModel, self).__init__()
        self.dense1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
        self.dense2 = tf.keras.layers.Dense(5, activation=tf.nn.softmax)
        self.dropout = tf.keras.layers.Dropout(0.5)
        
    def call(self, inputs, training = False):
        x = self.dense1(inputs)
        if training:
            x = self.dropout(x, training=training)
        return self.dense2(x)
    
model = MyModel()

model.compile(optimizer=keras.optimizers.RMSprop(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.fit(data, labels, batch_size=32, epochs=5)

# show model information
model.summary()
keras.utils.plot_model(model, "model.jpg", show_shapes=True)

2.1.4 tf.keras.Model.compile的重要参数

定义

compile(
    optimizer='rmsprop', loss=None, metrics=None, loss_weights=None,
    weighted_metrics=None, run_eagerly=None, **kwargs
)

重要参数

参数	功能	说明
oplimizer	优化器	可从 tf.keras.optimizers 中选（对象名字符串或对象实例）
loss	损失函数	可从 tf.keras.losses 中选择（对象名字符串或对象实例）
metrics	评估指标	可从 tf.keras.metrics 中选择（对象名字符串或对象实例）

2.2. tf.keras.Sequential

用途：把层(layers)的线性栈组织到一个tf.keras.Model中，以进行训练和推理
定义

tf.keras.Sequential(
    layers=None, name=None
)

参数	数据类型	描述
layers	list of layers (可选)	增加到Model的层列表
name	string(可选)	Model的名字

2.2.1 Sequential常用方法

多一个Add方法
compile, fit, evaluate, predict与tf.keras.Model的方法一样

方法	功能	定义
add	增加一个Layer	add(layer)

2.2.2 示例

model = tf.keras.Sequential()

# statement 1 is the same as statement 2 in essense.

## statement 1
model.add(tf.keras.layers.Dense(8, input_shape=(32,)))

## statement 2
# model.add(tf.keras.Input(shape=(32,)))
# model.add(tf.keras.layers.Dense(8))

# Afterwards, we do automatic shape inference:
model.add(tf.keras.layers.Dense(4))
model.summary()
tf.keras.utils.plot_model(model, "model.jpg", show_shapes=True)

3. 网络层(tf.keras.layers)

4D张量形状

data_format	4D张量形状
‘channels_first’	(batch_size, height, width, channels)
‘channels_last’	(batch_size, channels,height, width)

3.1 常用网络层

3.2 tf.keras.Input (网络输入)

用途：用于实例化一个Keras张量，以作为网络Model的输入
定义：

tf.keras.Input(
    shape=None, batch_size=None, name=None, dtype=None, sparse=False, tensor=None,
    ragged=False, **kwargs
)

参数	描述
shape	形状元组(整数)，如shape=(32,) : 表示一个32维的向量 shape=(32,20)：表示一个32(行)x20(列)的矩阵
batch_size	静态的batch_size (可选)
name	定义层的名字（可选）
dtype	期望输入数据的类型（字符串：float32, float64, int32 …）

3.3 tf.keras.layers.Dense(全连接层)

用途：定义常规的全连接层，实现特征的非线性组合
实现的操作:线性变换
output = activation(dot(input, kernel) + bias
定义

tf.keras.layers.Dense(
    units, activation=None, use_bias=True, kernel_initializer='glorot_uniform',
    bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None,
    activity_regularizer=None, kernel_constraint=None, bias_constraint=None,
    **kwargs
)

参数	描述
units	定义本层神经元的个数 (正整数)
activation	指定激活函数，否则 a(x)=x
use_bias	是否使用偏置项（True 或 False）
kernel_initializer	指定Kernel权重矩阵的初始化方法
bias_initializer	指定bias向量的初始化方法
kernel_regularizer	应用于内核权重矩阵的正则化函数
bias_regularizer	应用于bias向量的正则化函数
activity_regularizer	应用于本层输出的正则化函数
kernel_constraint	应用于kernel权重矩阵的约束函数
bias_constraint	应用于bias向量的约束函数

示例

# Create a `Sequential` model and add a Dense layer as the first layer.
model = tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(16,)))
model.add(tf.keras.layers.Dense(32, activation='relu'))
# Now the model will take as input arrays of shape (None, 16)
# and output arrays of shape (None, 32).
# Note that after the first layer, you don't need to specify
# the size of the input anymore:
model.add(tf.keras.layers.Dense(32))
model.output_shape

输出

(None, 32)

3.4 f.keras.layers.Reshape

用途：把输入重塑为要求的形状
定义

tf.keras.layers.Reshape(
    target_shape, **kwargs
)

输出：(batch_size, ) + target_shape
input_shape / target_shape：中不包含样本数或batch_size这一个维度，即只定义一个样本的维度
示例

# as first layer in a Sequential model
model = tf.keras.Sequential()
model.add(tf.keras.layers.Reshape((3, 4), input_shape=(12,)))
# model.output_shape == (None, 3, 4), `None` is the batch size.
model.output_shape

输出

(None, 3, 4)

3.5 tf.keras.layers.Flatten

用途：展平输入(即把多维输入变成一维)，不影响batch size的大小
定义

tf.keras.layers.Flatten(
    data_format=None, **kwargs
)

参数	描述
data_format	channels_last（默认）或channels_first中的一个（字符串）

示例

model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(64, 3, 3, input_shape=(3, 32, 32)))
print(model.output_shape)

model.add(tf.keras.layers.Flatten())
print(model.output_shape)

输出

(None, 1, 10, 64)
(None, 640)

3.6 tf.keras.layers.Conv2D

用途：在图像上执行2D空间卷积
定义

tf.keras.layers.Conv2D(
    filters, kernel_size, strides=(1, 1), padding='valid', data_format=None,
    dilation_rate=(1, 1), groups=1, activation=None, use_bias=True,
    kernel_initializer='glorot_uniform', bias_initializer='zeros',
    kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
    kernel_constraint=None, bias_constraint=None, **kwargs
)

参数说明

参数	数据类型	描述
filters	整数	卷积核(filter)的个数
kernel_size	整数或有2个整数的元组	指定卷积核的大小, 3等效于 (3, 3)
strides	整数或有2个整数的元组	指定height和width的stride
padding	字符串	'valid’或 'same‘
data_format	字符串	‘channels_last’ : 输入shape: (batch_size, height, width, channels) ‘channels_first’: 输入shape: (batch_size, channels,height, width)
dilation_rate	整数或有两个整数的元组	指定用于扩张卷积的扩张率
groups	正整数	指定输入沿通道轴划分的组数。每个组分别与过滤器/组过滤器卷积。输出是沿通道轴的所有组结果的串联。输入通道和过滤器都必须被组整除。如MobielNetV3中的深度可分享卷积
activation	字符串或激活函数	若不指定，则不执行激活函数

示例

# The inputs are 28x28 RGB images with `channels_last` and the batch
# size is 4.
input_shape = (4, 28, 28, 3)
x = tf.random.normal(input_shape)
y = tf.keras.layers.Conv2D(2, 3, activation='relu', input_shape=input_shape[1:])(x)
print(input_shape[1:])
print(y.shape)
##################
## output
#(28, 28, 3)
#(4, 26, 26, 2)

# With `dilation_rate` as 2.
input_shape = (4, 28, 28, 3)
x = tf.random.normal(input_shape)
y = tf.keras.layers.Conv2D(2, 3, activation='relu', dilation_rate=2, input_shape=input_shape[1:])(x)
print(y.shape)
##################
#(4, 24, 24, 2)

# With `padding` as "same".
input_shape = (4, 28, 28, 3)
x = tf.random.normal(input_shape)
y = tf.keras.layers.Conv2D(
2, 3, activation='relu', padding="same", input_shape=input_shape[1:])(x)
print(y.shape)
##################
#(4, 28, 28, 2)

# With extended batch shape [4, 7]:
input_shape = (4, 7, 28, 28, 3)
x = tf.random.normal(input_shape)
y = tf.keras.layers.Conv2D(
2, 3, activation='relu', input_shape=input_shape[2:])(x)
print(y.shape)
##################
#(4, 7, 26, 26, 2)

3.7 tf.keras.layers.Conv1D

用途：1维卷积层(如时间卷积:temporal convolution)
定义：

tf.keras.layers.Conv1D(
    filters, kernel_size, strides=1, padding='valid', data_format='channels_last',
    dilation_rate=1, groups=1, activation=None, use_bias=True,
    kernel_initializer='glorot_uniform', bias_initializer='zeros',
    kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
    kernel_constraint=None, bias_constraint=None, **kwargs
)

参数说明

参数	数据类型	描述
filters	整数	卷积核(filter)的个数
kernel_size	整数或有1个整数的元组	指定1D卷积窗口的长度, 3等效于 (3)
strides	整数或有1个整数的元组	指定stride

示例：

# The inputs are 128-length vectors with 10 timesteps, and the batch size
# is 4.
input_shape = (4, 10, 128)
x = tf.random.normal(input_shape)
y = tf.keras.layers.Conv1D(32, 3, activation='relu',input_shape=input_shape[1:])(x)
print(y.shape)
#####################
# （10 + 2*0 - 3）/1 + 1
#(4, 8, 32)

# With extended batch shape [4, 7] (e.g. weather data where batch
# dimensions correspond to spatial location and the third dimension
# corresponds to time.)
input_shape = (4, 7, 10, 128)
x = tf.random.normal(input_shape)
y = tf.keras.layers.Conv1D(
32, 3, activation='relu', input_shape=input_shape[2:])(x)
print(y.shape)
#####################
# (4, 7, 8, 32)

3.8 tf.keras.layers.Conv3D

用途：3D卷积层（如：立体上的空间卷积）
定义：

tf.keras.layers.Conv3D(
    filters, kernel_size, strides=(1, 1, 1), padding='valid', data_format=None,
    dilation_rate=(1, 1, 1), groups=1, activation=None, use_bias=True,
    kernel_initializer='glorot_uniform', bias_initializer='zeros',
    kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
    kernel_constraint=None, bias_constraint=None, **kwargs
)

参数说明

参数	数据类型	描述
filters	整数	卷积核(filter)的个数
kernel_size	整数或有3个整数的元组	指定卷积核的大小, 3等效于 (3, 3, 3)
strides	整数或有3个整数的元组	指定depth, height和width的stride

示例

# The inputs are 28x28x28 volumes with a single channel, and the
# batch size is 4
input_shape =(4, 28, 28, 28, 1)
x = tf.random.normal(input_shape)
y = tf.keras.layers.Conv3D(2, 3, activation='relu', input_shape=input_shape[1:])(x)
print(y.shape)
#####################
#(4, 26, 26, 26, 2)

# With extended batch shape [4, 7], e.g. a batch of 4 videos of 3D frames,
# with 7 frames per video.
input_shape = (4, 7, 28, 28, 28, 1)
x = tf.random.normal(input_shape)
y = tf.keras.layers.Conv3D(
2, 3, activation='relu', input_shape=input_shape[2:])(x)
print(y.shape)
#####################
# (4, 7, 26, 26, 26, 2)

3.9 tf.keras.layers.MaxPool2D

tf.keras.layers.MaxPool2D = tf.keras.layers.MaxPooling2D
用途：2D空间数据最大池化操作，即对输入图像进行下采样，在每个pool_size窗口中取其最大值
定义

tf.keras.layers.MaxPool2D(
    pool_size=(2, 2), strides=None, padding='valid', data_format=None, **kwargs
)

参数	数据类型	描述
pool_size	整数或2个整数的元组	取最大值的窗口
strides	整数或2个整数的元组	指定pooling窗口每次移动的位置

输出shape

padding	output_shape
‘valid’	output_shape = (input_shape - pool_size + 1) / strides
‘same’	output_shape = input_shape / strides

示例代码

x = tf.constant([[1., 2., 3.],
                 [4., 5., 6.],
                 [7., 8., 9.]])
x = tf.reshape(x, [1, 3, 3, 1])
print('x=',x)
max_pool_2d = tf.keras.layers.MaxPooling2D(pool_size=(2, 2),strides=(1, 1), padding='valid')
print('max_pool_2d(x)=', max_pool_2d(x))

输出

x= tf.Tensor(
[[[[1.]
   [2.]
   [3.]]

  [[4.]
   [5.]
   [6.]]

  [[7.]
   [8.]
   [9.]]]], shape=(1, 3, 3, 1), dtype=float32)
max_pool_2d(x)= tf.Tensor(
[[[[5.]
   [6.]]

  [[8.]
   [9.]]]], shape=(1, 2, 2, 1), dtype=float32)

示例

x = tf.constant([[1., 2., 3., 4.],
                 [5., 6., 7., 8.],
                 [9., 10., 11., 12.], 
                 [13., 14., 15., 16.]])
x = tf.reshape(x, [1, 4, 4, 1])
print('x=',x)
max_pool_2d = tf.keras.layers.MaxPooling2D(pool_size=(2, 2),strides=(2, 2), padding='valid')
max_pool_2d(x)
print('max_pool_2d(x)=', max_pool_2d(x))

输出

x= tf.Tensor(
[[[[ 1.]
   [ 2.]
   [ 3.]
   [ 4.]]

  [[ 5.]
   [ 6.]
   [ 7.]
   [ 8.]]

  [[ 9.]
   [10.]
   [11.]
   [12.]]

  [[13.]
   [14.]
   [15.]
   [16.]]]], shape=(1, 4, 4, 1), dtype=float32)
max_pool_2d(x)= tf.Tensor(
[[[[ 6.]
   [ 8.]]

  [[14.]
   [16.]]]], shape=(1, 2, 2, 1), dtype=float32)

3.10 tf.keras.layers.AveragePooling2D

用途：对空间数据进行平均池化
定义：

tf.keras.layers.AveragePooling2D(
    pool_size=(2, 2), strides=None, padding='valid', data_format=None, **kwargs
)

4. tf.keras.activations

4.1 饱和激活函数

Sigmod和tanh都是饱和激活函数，即两者在输入值较大或较小的时候对输入值不再那么敏感

4.1.1 tf.keras.activations.sigmoid

用途：Sigmoid激活函数，一般只在二分类的最后输出层或全连接层使用
函数定义：
$\frac{1}{1+e^{-x}} = \frac{e^x}{1+e^x}$
导函数:
$S^{'} (x) = S (x) (1 - S (x))$
特点
- 可以解释，比如将0-1之间的取值解释成一个神经元的激活率（firing rate）
缺陷
- 有饱和区域，是软饱和，在大的正数和负数作为输入的时候，梯度就会变成零，使得神经元基本不能更新
- 只有正数输出（不是zero-centered），这就导致所谓的zigzag现象
输出说明：
- 输出shape与输入shape相同，对每个元素进行单独计算
- 输出范围：(0, 1)
- x < -5，输出接近于0
- x > +5，输出接近于1
- 等效于二元softmax
定义

tf.keras.activations.sigmoid(
    x
)

示例：

a = tf.constant([-20, -5, -1.0, 0.0, 1.0,5,20], dtype = tf.float32)
tf.keras.activations.sigmoid(a).numpy()

输出：

array([2.0611537e-09, 6.6928566e-03, 2.6894143e-01, 5.0000000e-01,
       7.3105860e-01, 9.9330711e-01, 1.0000000e+00], dtype=float32

4.1.2 tf.keras.activations.tanh

用途：
- 双曲正切（Hyperbolic tangent ）激活函数，
- 一般只在二分类的最后输出层使用或全连接层
- ** sigmoid和tanh在RNN（LSTM、注意力机制等）结构上有所应用，作为门控或者概率值 **
函数定义：
$\frac{sinh(x)}{cosh(x)} = \frac{e^x - e^{-x}}{e^x + e^{-x}}$
导函数:
$\frac{e^x + e^{-x}}{e^x - e^{-x}}$
特点
- tanh(x)的梯度消失问题比sigmoid要轻
输出说明：
- 输出shape与输入shape相同，对每个元素进行单独计算
- 输出范围：(-1, 1)
- x < -3，输出接近于-1
- x > +3，输出接近于1
- 等效于二元softmax
定义

tf.keras.activations.tanh(
    x
)

示例

a = tf.constant([-3.0,-1.0, 0.0,1.0,3.0], dtype = tf.float32)
tf.keras.activations.tanh(a).numpy()

输出

array([-0.9950547, -0.7615942,  0.       ,  0.7615942,  0.9950547],
      dtype=float32)

4.1.3 tf.keras.activations.softmax

用途：softmax激活函数，将实向量转换为分类概率向量
- 一般只在多分类的最后输出层，并解释为概率分布(categorical probabilities)
函数定义：
$S(x_i) = \frac{e^{x_i}} {\sum_{j=1}^{N} e^{x_j}} \quad (x为N维向量)$
输出说明：
- 输出shape与输入shape相同，对每个元素进行单独计算
- 输出向量元素值范围：(0, 1)
- 输出向量所有元素值之和为1
定义

tf.keras.activations.softmax(
    x, axis=-1
)

示例

a = tf.constant([[-3.0, 2.0],[-1.0, 2.0], [0.0, 2.0],[1.0, 2.0],[3.0, 2.0]], dtype = tf.float32)
tf.keras.activations.softmax(a).numpy()

输出

array([[0.00669285, 0.9933072 ],
       [0.04742587, 0.95257413],
       [0.11920291, 0.880797  ],
       [0.26894143, 0.7310586 ],
       [0.7310586 , 0.26894143]], dtype=float32)

4.2 非饱和激活函数

ReLU及其变体为代表的一类非饱和激活函数在防止梯度消失方面具有无可比拟的优势，现在基本很少在网络中使用Sigmod等饱和激活函数。

4.2.1 tf.keras.activations.relu

用途：修正线性单元激活函数，多用于卷积层
ReLU函数定义：
$R e L U (x) = m a x (x, 0)$
ReLU6函数定义
$R e L U 6 = m i n (6, m a x (0, x))$
ReLU6目的
- 主要是为了在移动端float16的低精度的时候，也能有很好的数值分辨率，如果对ReLu的输出值不加限制，那么输出范围就是0到正无穷，而低精度的float16无法精确描述其数值，带来精度损失
输出
- 输出shape与输入shape相同，对每个元素进行单独计算
定义

tf.keras.activations.relu(
    x, alpha=0.0, max_value=None, threshold=0
)

参数	数据类型	描述
x	Tensor或变量	输入张量或变量
alpha	float	控制低于阈值的斜率，默认阈值为0, 若alpha值不为0，则低于threshold的值不为0，而是按此alpha值进行计算
max_value	float	设置饱和度阈值（即函数将返回的最大值）
threshold	float	给出激活函数的阈值，低于该阈值将衰减或设置为零，默认为0

示例

foo = tf.constant([-10, -5, 0.0, 4, 10], dtype = tf.float32)
tf.keras.activations.relu(foo,alpha=0.5,threshold=5,max_value=8).numpy()

输出

array([-7.5, -5. , -2.5, -0.5,  8. ], dtype=float32)

说明：
- 当输入值=threshold（5）时，则输出值为0，则不大于5时，其输输出计算公式为：
  $y = a l p h a * x + b = 0.5 x + b = > 0 = 0.5 * 5 + b = > b = - 2.5$
- 即不大5的计算公式为： $y = 0.5 x - 2.5$

5. tf.keras.optimizers

5.1 tf.keras.optimizers.Adagrad

用途：实现Adagrad算法的优化器
定义：

tf.keras.optimizers.Adagrad(
    learning_rate=0.001, initial_accumulator_value=0.1, epsilon=1e-07,
    name='Adagrad', **kwargs
)

示例
输出

5.2 tf.keras.optimizers.Adam

用途：实现Adam算法的优化器
定义：

tf.keras.optimizers.Adam(
    learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, amsgrad=False,
    name='Adam', **kwargs
)

示例

opt = tf.keras.optimizers.Adam(learning_rate=0.1)
var1 = tf.Variable(10.0)
loss = lambda: (var1 ** 2)/2.0       # d(loss)/d(var1) == var1
step_count = opt.minimize(loss, [var1]).numpy()
# The first step is `-learning_rate*sign(grad)`
print(var1.numpy())

step_count = opt.minimize(loss, [var1]).numpy()
# The first step is `-learning_rate*sign(grad)`
print(var1.numpy())

输出

9.9
9.800028

5.3 tf.keras.optimizers.RMSprop

用途：实现RMSProp算法的优化器
特点：
- 保持梯度平方的移动（折后）平均值
- 将梯度除以该平均值的根
- RMSprop的此实现使用简单动量，而不使用Nesterov动量
定义：

tf.keras.optimizers.RMSprop(
    learning_rate=0.001, rho=0.9, momentum=0.0, epsilon=1e-07, centered=False,
    name='RMSprop', **kwargs
)

示例

opt = tf.keras.optimizers.RMSprop(learning_rate=0.1)
var1 = tf.Variable(10.0)
loss = lambda: (var1 ** 2) / 2.0    # d(loss) / d(var1) = var1
step_count = opt.minimize(loss, [var1]).numpy()
print(var1.numpy())
step_count = opt.minimize(loss, [var1]).numpy()
print(var1.numpy())

输出

9.683772
9.45788

5.4 tf.keras.optimizers.SGD

用途：梯度下降（带动量）优化器
定义：

tf.keras.optimizers.SGD(
    learning_rate=0.01, momentum=0.0, nesterov=False, name='SGD', **kwargs
)

示例1

opt = tf.keras.optimizers.SGD(learning_rate=0.1)
var = tf.Variable(1.0)
loss = lambda: (var ** 2)/2.0         # d(loss)/d(var1) = var1
step_count = opt.minimize(loss, [var]).numpy()
# Step is `- learning_rate * grad`
print(var.numpy())
step_count = opt.minimize(loss, [var]).numpy()
# Step is `- learning_rate * grad`
print(var.numpy())

输出1

0.9
0.81

示例2

opt = tf.keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
var = tf.Variable(1.0)
val0 = var.value()
loss = lambda: (var ** 2)/2.0         # d(loss)/d(var1) = var1
# First step is `- learning_rate * grad`
step_count = opt.minimize(loss, [var]).numpy()
val1 = var.value()
print(val1)
print((val0 - val1).numpy())

# On later steps, step-size increases because of momentum
step_count = opt.minimize(loss, [var]).numpy()
val2 = var.value()
print(val2)
print((val1 - val2).numpy())

输出2

tf.Tensor(0.9, shape=(), dtype=float32)
0.100000024
tf.Tensor(0.71999997, shape=(), dtype=float32)
0.18

6. tf.keras.losses函数

主要是用于衡量预测值与真实值的差异
- 如果预测结果和真实结果越接近，损失函数值越小，预测结果和真实结果越不同，损失函数值越大
- 通过BP算法，根据损失函数，可以不断优化神经网络各种参数
交叉熵（cross entropy）
- Categorical Cross Entropy (CCE)：用于多分类
- Binary Cross Entropy (BCE)：用于二分类

6.1 tf.keras.losses.binary_crossentropy

用途：
- 计算二分类交叉熵损失
- 一般用于二分类
定义

tf.keras.losses.binary_crossentropy(
    y_true, y_pred, from_logits=False, label_smoothing=0
)

示例

y_true = [[0, 1], [0, 0]]  # [0, 1] 单标签二分类， [0, 0]:多标签二分类
y_pred = [[0.6, 0.4], [0.4, 0.6]]
loss = tf.keras.losses.binary_crossentropy(y_true, y_pred)
assert loss.shape == (2,)
# - 1 * math.log(0.4)
# -((1-0)*math.log(1-0.4) + (1-0)*math.log(1-0.6))/2
loss.numpy()

输出

array([0.9162905 , 0.71355796], dtype=float32)

6.2 tf.keras.losses.categorical_crossentropy

用途：用于计算分类交叉熵损失
何时使用?：类别较少，且y_true为one-hot编码（y_true：矩阵）
标签真值类型：one-hot编码，即一行有且仅有一个1，其它为0，1所在的位置索引代表类别（这是与sparse_categorical_crossentropy的区别）如：

　　[[0, 1, 0],
　　 [1, 0, 0],
　　 [0, 0, 1]]

定义

tf.keras.losses.categorical_crossentropy(
    y_true, y_pred, from_logits=False, label_smoothing=0
)

示例

y_true = [[0, 1, 0], [0, 0, 1]]  # [0, 1, 0], [0, 0, 1]：单标签多分类
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
loss = tf.keras.losses.categorical_crossentropy(y_true, y_pred)
assert loss.shape == (2,)
# -1 * math.log(0.95)
# -1 * math.log(0.1)
loss.numpy()

输出

array([0.05129331, 2.3025851 ], dtype=float32)

6.3 tf.keras.losses.sparse_categorical_crossentropy

用途：计算稀疏的分类交叉熵损失
何时使用?：类别很多，成百上千，且y_true的元素值为类别索引（y_true:向量）
标签真值类型：数字编码，即非one-hot编码，其中每个数字代码一个类别索引 (这是与categorical_crossentropy的区别），其中的一个数字与y_pred中的一个向量相对应，即为向量中的一个索引。如：

[2, 0, 1, 5, 19]

定义

tf.keras.losses.sparse_categorical_crossentropy(
    y_true, y_pred, from_logits=False, axis=-1
)

示例

y_true = [1, 2, 3] # 1表示[0.05, 0.95, 0]中索引为1的位置为其类别，即其one-hot为[0 1 0]
y_pred = [[0.05, 0.90, 0, 0.05], [0.1, 0.8, 0.05, 0.05],[0.15, 0.1, 0.5, 0.25] ]
loss = tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred)
print(-(1*math.log(0.90)))
print(-(1*math.log(0.05)))
print(-(1*math.log(0.25)))
loss.numpy()

输出

0.10536051565782628
2.995732273553991
1.3862943611198906
array([0.10536067, 2.9957323 , 1.3862944 ], dtype=float32)

6.4 tf.keras.losses.MSE

用途：用于计算标签和预测之间的均方误差 (mean squared error )
定义

tf.keras.losses.MSE(
    y_true, y_pred
)

示例

y_true = tf.constant([[0, 1, 0], [0, 0, 1]])  # [0, 1, 0], [0, 0, 1]：单标签多分类
y_pred = tf.constant([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
print(y_true.numpy())
print(y_pred.numpy())
loss = tf.keras.losses.mean_squared_error(y_true, y_pred)
# (math.pow(0-0.05,2) + math.pow(1-0.95,2) + math.pow(0-0,2))/3
# (math.pow(0-0.1,2) + math.pow(0-0.8,2) + math.pow(1-0.1,2))/3
print(loss.numpy())

输出

[[0 1 0]
 [0 0 1]]
[[0.05 0.95 0.  ]
 [0.1  0.8  0.1 ]]
[0.00166667 0.48666668]

6.5 tf.keras.losses.MAE

用途：用于计算标签和预测之间的平均绝对误差 (mean absolute error )
定义

tf.keras.losses.MAE(
    y_true, y_pred
)

示例

y_true = tf.constant([[0, 1, 0], [0, 0, 1]])  # [0, 1, 0], [0, 0, 1]：单标签多分类
y_pred = tf.constant([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
print(y_true.numpy())
print(y_pred.numpy())
# (abs(0-0.05) + abs(1-0.95) + abs(0-0))/3
# (abs(0-0.1) + abs(0-0.8) + abs(1-0.1))/3
loss = tf.keras.losses.MAE(y_true, y_pred)
print(loss.numpy())

输出

[[0 1 0]
 [0 0 1]]
[[0.05 0.95 0.  ]
 [0.1  0.8  0.1 ]]
[0.03333334 0.59999996]

7. tf.keras.metrics

用途：定义评价指标
Accuracy（准确率）：是机器学习中最简单的一种评价模型好坏的指标

7.1 tf.keras.metrics.Accuracy

用途：计算预测等于标签的概率
定义：

tf.keras.metrics.Accuracy(
    name='accuracy', dtype=None
)

示例：

m = tf.keras.metrics.Accuracy()
m.update_state([[1], [2], [3], [4]], [[0], [2], [3], [4]])
print('accuracy={}'.format(3/4)) # 相同的个数/总数
m.result().numpy()

输出：

accuracy=0.75
0.75

7.2 tf.keras.metrics.BinaryAccuracy

用途：计算预测等于标签的概率
与Accuracy的比较：
- binary_accuracy的计算除了y_true和y_pred外，还有一个threshold参数，该参数默认为0.5，在计算准确率前，需根据threshold处理y_pred
计算方法：
- 将y_pred中的每个预测值和threshold对比，大于threshold的设为1，小于等于threshold的设为0
- result = y_pred与y_true中相同个数/总数
- 定义：

tf.keras.metrics.BinaryAccuracy(
    name='binary_accuracy', dtype=None, threshold=0.5
)

示例：

m = tf.keras.metrics.BinaryAccuracy()
# threshold=0.5, [[0.98], [1], [0], [0.6]]=> [[1], [1], [0], [1]]
m.update_state([[1], [1], [0], [0]], [[0.98], [1], [0], [0.6]]) 
print('accuracy={}'.format(3/4)) # 相同的个数/总数
m.result().numpy()

输出：

accuracy=0.75
0.75

7.3 tf.keras.metrics.categorical_accuracy

用途：计算预测等于one-hot标签的概率
与Accuracy的比较：
- accuracy针对的是y_true和y_pred都为具体标签的情况
- categorical_accuracy针对的是y_true为onehot标签，y_pred为向量的情况
计算方法：
- 实例：比如有4个样本，其y_true为[[0, 0, 1], [0, 1, 0], [0, 1, 0], [1, 0, 0]]，y_pred为[[0.1, 0.6, 0.3], [0.2, 0.7, 0.1], [0.3, 0.6, 0.1], [0.9, 0, 0.1]]，则其categorical_accuracy为75%
- 将y_true转为非onehot的形式，即y_true_new=[2, 1, 1, 0]
- 根据y_pred中的每个样本预测的分数得到y_pred_new=[1, 1, 1, 0]
- y_true_new与y_pred_new的相同个数/总数 = 3/4 = 0.75
定义：

tf.keras.metrics.categorical_accuracy(
    y_true, y_pred
)

示例

y_true = [[0, 0, 1], [0, 1, 0]]  # => [2,1]
y_pred = [[0.1, 0.9, 0.8], [0.05, 0.95, 0]]  # => [1, 1]
m = tf.keras.metrics.categorical_accuracy(y_true, y_pred)
m.numpy()

输出

array([0., 1.], dtype=float32)

7.4 tf.keras.metrics.sparse_categorical_accuracy

用途：计算预测等于整数标签的概率
与categorical_accuracy的比较：
- 和categorical_accuracy功能一样，只是其y_true为非onehot的形式
计算方法：
- 如有4个样本，其y_true为[2， 1， 1， 0]，y_pred为[[0.1, 0.6, 0.3], [0.2, 0.7, 0.1], [0.3, 0.6, 0.1], [0.9, 0, 0.1]]，则其sparse_categorical_accuracy为75%
- 根据y_pred中的每个样本预测的分数得到y_pred_new=[1, 1, 1, 0]
- 将y_true和y_pred_new中的相同个数/总数 = 3/4=0.75
定义

tf.keras.metrics.sparse_categorical_accuracy(
    y_true, y_pred
)

示例：

y_true = [2, 1] 
y_pred = [[0.1, 0.9, 0.8], [0.05, 0.95, 0]] 
m = tf.keras.metrics.sparse_categorical_accuracy(y_true, y_pred) 
m.numpy()

输出：

array([0., 1.], dtype=float32)

7.5 tf.keras.metrics.top_k_categorical_accuracy

用途：计算目标在前K个预测中的概率
与categorical_accuracy的比较：
- 在categorical_accuracy的基础上加上top_k
- categorical_accuracy要求样本在真值类别上的预测分数是在所有类别上预测分数的最大值，才算预测对
- 而top_k_categorical_accuracy只要求样本在真值类别上的预测分数排在其在所有类别上的预测分数的前k名就行
计算方法：
- 如有4个样本，其y_true为[[0, 0, 1], [0, 1, 0], [0, 1, 0], [1, 0, 0]]，y_pred为[[0.3, 0.6, 0.1], [0.5, 0.4, 0.1], [0.3, 0.6, 0.1], [0.9, 0, 0.1]]，根据前面知识我们可以计算得到其categorical_accuracy=50%，但是其top_k_categorical_accuracy是多少呢？答案跟k息息相关。
- 如果k大于或等于3，其top_k_categorical_accuracy毫无疑问是100%，因为总共就3个类别。如果k小于3，那就要计算了，比如k=2，那么top_k_categorical_accuracy=75%。
- 计算步骤：
  - 1）将y_true转为非onehot的形式，即y_true_new=[2, 1, 1, 0]；
  - 2）计算y_pred的top_k的label，比如k=2时，y_pred_new = [[0, 1], [0, 1], [0, 1], [0, 2]]；
  - 3）根据每个样本的真实标签是否在预测标签的top_k内来统计准确率，上述4个样本为例，2不在[0, 1]内，1在[0, 1]内，1在[0, 1]内，0在[0, 2]内，4个样本总共预测对了3个，因此k=2时top_k_categorical_accuracy=75%。
  - tf.keras中计算top_k_categorical_accuracy时默认的k值为5。
定义

tf.keras.metrics.top_k_categorical_accuracy(
    y_true, y_pred, k=5
)

示例

y_true = [[0, 0, 1], [0, 1, 0]]  # [2,1]
y_pred = [[0.1, 0.9, 0.8], [0.05, 0.95, 0]] # [[1,2], [0,1]]
m = tf.keras.metrics.top_k_categorical_accuracy(y_true, y_pred, k=2)
m.numpy()

输出

array([1., 1.], dtype=float32)

7.6 tf.keras.metrics.sparse_top_k_categorical_accuracy

用途：计算整数目标在前K个预测中的概率
与top_k_categorical_accuracy的比较：
- 和top_k_categorical_accuracy功能一样，只是其y_true为非onehot的形式
计算方法：
- 如有4个样本，其y_true为[2， 1， 1， 0]，y_pred为[[0.3, 0.6, 0.1], [0.5, 0.4, 0.1], [0.3, 0.6, 0.1], [0.9, 0, 0.1]]
- 计算步骤：
  - 1）计算y_pred的top_k的label，比如k=2时，y_pred_new = [[0, 1], [0, 1], [0, 1], [0, 2]]；
  - 2）根据每个样本的真实标签是否在预测标签的top_k内来统计准确率，上述4个样本为例，2不在[0, 1]内，1在[0, 1]内，1在[0, 1]内，0在[0, 2]内，4个样本总共预测对了3个，因此k=2时top_k_categorical_accuracy=75%。
示例

y_true = [2, 1] 
y_pred = [[0.1, 0.9, 0.8], [0.05, 0.95, 0]]  # [[1,2], [0,1]]
m = tf.keras.metrics.sparse_top_k_categorical_accuracy(y_true, y_pred, k=2)
m.numpy()

输出

array([1., 1.], dtype=float32)

7.7 Accuracy总结

当你的标签和预测值都是具体的label index（如y_true=[1, 2, 1], y_pred=[0, 1, 1]）时，用keras.metrics.accuracy。
当你的标签是具体的label index，而prediction是向量形式（如y_true=[1, 2, 1], y_pred=[[0.2, 0.3, 0.5], [0.9, 0.1, 0], [0, 0.4, 0.6]]）时，用keras.metrics.sparse_categorical_accuracy。
当你的标签是onehot形式，而prediction是向量形式（如y_true=[[0, 1, 0], [0, 0, 1], [0, 1, 0]], y_pred=[[0.2, 0.3, 0.5], [0.9, 0.1, 0], [0, 0.4, 0.6]]）时，用keras.metrics.categorical_accuracy。