机器学习实践—基于Scikit-Learn、Keras和TensorFlow2第二版—第12章利用TensorFlow自定义模型并训练(Custom Models and Training with_resourcevariable' object does not support item ass-CSDN博客

本文链接：https://blog.csdn.net/Jwenxue/article/details/107530295

机器学习实践—基于Scikit-Learn、Keras和TensorFlow2第二版—第12章利用TensorFlow自定义模型并训练(Custom Models and Training with TensorFlow)

0. 导入所需的库

import tensorflow as tf
import tensorflow.keras as keras
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import sklearn

for i in (tf, np, mpl, sklearn):
    print(i.__name__,": ",i.__version__,sep="")

输出：

tensorflow: 2.2.0
numpy: 1.17.4
matplotlib: 3.1.2
sklearn: 0.21.3

1. TensorFlow速览

TensorFlow API：

深度学习高级API：tf.keras, tf.estimator
深度学习低级API：tf.nn, tf.losses, tf.metrics, tf.optimizers, tf.train, tf,initizalizers
自动微分：tf.GradientTape, tf.gradients()
I/O和预处理：tf.data, tf.feature_column, tf.audio, tf.image, tf.io, tf.queue
用于TensorBoard可视化：tf.summary()
部署和优化：tf.distribute, tf.save_model, tf.autograph, tf.graph_util, tf.lite, tf.quantization, tf.tpu, tf.xla
特殊数据结构：tf.lookup, tf.nest, tf.ragged, tf.sets, tf.sparse, tf.strings
数学函数（包括线性代数和信号处理）：tf.math, tf.linalg, tf.signal, tf.random, tf.bitwise
其它：tf.compat, tf.config等等

TensorFlow的低级API都是用高效有C++实现的。

GPU通过将任务分成小块，在多个GPU线程中并行运行，可以极大地提高计算速度。而TPU是自定义的ASIC芯片，专门用来做深度学习运算的，速度更快。

TensorFlow不只这些库，还有TensorBoard用于可视化，TensorFlow Extended(TFX)用于生产化的库，TensorFlow Hub方便下载和复用预训练好的模型。

模型资源：

https://github.com/tensorflow/models/
TensorFlow Resources：https://www.tensorflow.org/resources
https://github.com/jtoy/awesome-tensorflow
https://paperswithcode.com/

2. 像Numpy一样使用TensorFlow

2.1 张量和运算

使用tf.constant()创建张量：

tf.constant([[1.0, 2.0, 3.0],[4.0, 5.0, 6.0]])

输出：

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

tf.constant(42)

输出：

<tf.Tensor: shape=(), dtype=int32, numpy=42>

张量有形状和类型属性：

t = tf.constant([[1.0, 2.0, 3.0],[4.0, 5.0, 6.0]])

print(t.shape, t.dtype)

输出：

(2, 3) <dtype: 'float32'>

张量索引方式与NumPy类似：

t[:, 1:]

输出：

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[2., 3.],
       [5., 6.]], dtype=float32)>

t[..., 1, tf.newaxis]

输出：

<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[2.],
       [5.]], dtype=float32)>

张量运算：

t + 10

输出：

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[11., 12., 13.],
       [14., 15., 16.]], dtype=float32)>

tf.add(t,10)

输出：

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[11., 12., 13.],
       [14., 15., 16.]], dtype=float32)>

tf.square(t)

输出：

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)>

t @ tf.transpose(t)

输出：

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[14., 32.],
       [32., 77.]], dtype=float32)>

tf.matmul(t, tf.transpose(t))

输出：

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[14., 32.],
       [32., 77.]], dtype=float32)>

TensorFlow还有很多其它运算：

基础运算：tf.add(), tf.multiply(), tf.square(), tf.exp(), tf.sqrt()
类似NumPy中的运算：tf.reshape(), tf.squeeze(), tf.tile()
TF特有的：tf.reduce_mean(), tf.reduce_sum(), tf.reduce_max(), tf.math.log()等

注意：许多函数和类都有别名，例如tf.add()和tf.math.add()是相同的。

Keras的低级API：Keras的低级API位于keras.backend，包括square(), exp(), sqrt()等，请看下面例子：

from tensorflow import keras

k = keras.backend
k.square(k.transpose(t)) + 10

输出：

<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[11., 26.],
       [14., 35.],
       [19., 46.]], dtype=float32)>

2.2 张量和NumPy

张量和NumPy相互支持非常好，可以互相创建，也支持再者混合运算：

a = np.array([2., 4., 5.])
tf.constant(a)

输出：

<tf.Tensor: shape=(3,), dtype=float64, numpy=array([2., 4., 5.])>

t.numpy()

输出：

array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)

np.array(t)

输出：

array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)

tf.square(a)

输出：

<tf.Tensor: shape=(3,), dtype=float64, numpy=array([ 4., 16., 25.])>

np.square(t)

输出：

array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)

注意：NumPy默认使用64位精度，而TF默认使用32位精度，因为对于神经网络32位精度足够使用了，并且占用更少内存，运行速度也快。

2.3 类型转换

类型转换对性能影响非常大，TF不会自动做任何类型转换。如果类型不匹配，则会报错：

try:
    tf.constant(2.0) + tf.constant(40)
except tf.errors.InvalidArgumentError as ex:
    print(ex)

输出：

cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a int32 tensor [Op:AddV2]

try:
    tf.constant(2.0) + tf.constant(40.0, dtype=tf.float64)
except tf.errors.InvalidArgumentError as ex:
    print(ex)

输出：

cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a double tensor [Op:AddV2]

t2 = tf.constant(40., dtype=tf.float64)
tf.constant(2.0) + tf.cast(t2, tf.float32)

输出：

<tf.Tensor: shape=(), dtype=float32, numpy=42.0>

强制类型转换请使用tf.cast()

2.4 变量

tf.Variable：

v = tf.Variable([[1., 2., 3.],[4., 5., 6.]])

v.assign(2 * v)

输出：

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

v[0,1].assign(42)

输出：

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

v[:, 2].assign([0., 1.])

输出：

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  0.],
       [ 8., 10.,  1.]], dtype=float32)>

try:
    v[1] = [7., 8., 9.]
except TypeError as ex:
    print(ex)

输出：

'ResourceVariable' object does not support item assignment

v.scatter_nd_add(indices=[[0,0],[1,2]], updates=[100.,200.])

输出：

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[102.,  42.,   0.],
       [  8.,  10., 201.]], dtype=float32)>

sparse_delta = tf.IndexedSlices(values=[[1., 2., 3.],[4., 5., 6.]], indices=[1,0])
v.scatter_update(sparse_delta)

输出：

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[4., 5., 6.],
       [1., 2., 3.]], dtype=float32)>

2.5 其它数据结构

TF还支持其它几种数据结构：

稀疏张量：tf.SparseTensor，稀疏张量表示含有许多0的张量，只存储有值的索引和值，tf.sparse包含对稀疏张量的运算

s = tf.SparseTensor(indices=[[0,1],[1,0],[2,3]], values=[1., 2., 3.], dense_shape=[3,4])

print(s)

输出：

SparseTensor(indices=tf.Tensor(
[[0 1]
 [1 0]
 [2 3]], shape=(3, 2), dtype=int64), values=tf.Tensor([1. 2. 3.], shape=(3,), dtype=float32), dense_shape=tf.Tensor([3 4], shape=(2,), dtype=int64))

tf.sparse.to_dense(s)

输出：

<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[0., 1., 0., 0.],
       [2., 0., 0., 0.],
       [0., 0., 0., 3.]], dtype=float32)>

s2 = s*2.0
print(s2)
print()
print(tf.sparse.to_dense(s2))

输出：

SparseTensor(indices=tf.Tensor(
[[0 1]
 [1 0]
 [2 3]], shape=(3, 2), dtype=int64), values=tf.Tensor([2. 4. 6.], shape=(3,), dtype=float32), dense_shape=tf.Tensor([3 4], shape=(2,), dtype=int64))

tf.Tensor(
[[0. 2. 0. 0.]
 [4. 0. 0. 0.]
 [0. 0. 0. 6.]], shape=(3, 4), dtype=float32)

try:
    s3 = s+1.0
except TypeError as ex:
    print(ex)

输出：

unsupported operand type(s) for +: 'SparseTensor' and 'float'

s4 = tf.constant([[10., 20.],[30., 40.],[50., 60.],[70., 80.]])
tf.sparse.sparse_dense_matmul(s, s4)

输出：

<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[ 30.,  40.],
       [ 20.,  40.],
       [210., 240.]], dtype=float32)>

s5 = tf.SparseTensor(indices=[[0, 2], [0, 1]], values=[1., 2.], dense_shape=[3, 4])
print(s5)

输出：

SparseTensor(indices=tf.Tensor(
[[0 2]
 [0 1]], shape=(2, 2), dtype=int64), values=tf.Tensor([1. 2.], shape=(2,), dtype=float32), dense_shape=tf.Tensor([3 4], shape=(2,), dtype=int64))

try:
    tf.sparse.to_dense(s5)
except tf.errors.InvalidArgumentError as ex:
    print(ex)

输出：

indices[1] = [0,1] is out of order. Many sparse ops require sorted indices.
    Use `tf.sparse.reorder` to create a correctly ordered copy.

 [Op:SparseToDense]

s6 = tf.sparse.reorder(s5)
tf.sparse.to_dense(s6)

输出：

<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[0., 2., 1., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]], dtype=float32)>

张量数组：tf.TensorArray，张量的列表，要求形状和类型相同，固定大小

array = tf.TensorArray(dtype=tf.float32, size=3)
array = array.write(0, tf.constant([1., 2.]))
array = array.write(1, tf.constant([3., 10.]))
array = array.write(2, tf.constant([5., 7.]))

array.read(1)

输出：

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([ 3., 10.], dtype=float32)>

array.stack()

输出：

<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[1., 2.],
       [0., 0.],
       [5., 7.]], dtype=float32)>

mean, variance = tf.nn.moments(array.stack(), axes=0)

print(mean)
print(variance)

输出：

tf.Tensor([2. 3.], shape=(2,), dtype=float32)
tf.Tensor([4.6666665 8.666667 ], shape=(2,), dtype=float32)

字符串张量：tf.string，是字节串而不是Unicode字符串。如果用Unicode字符串创建字符串张量，则会被自动转换为UTF-8格式。同时也可以用tf.int32类型的张量表示Unicode字符串，其中每项表示一个Unicode码。tf.strings提供对字符串张量和Unicode字符串的运算，包括二者的转换运算。tf.string是原子性的，即长度不会出现在张量的形状中，一旦转换成了Unicode才有长度。

p = tf.constant(["Cafe","Coffee","caffe","咖啡"])

print(p)

输出：

tf.Tensor([b'Cafe' b'Coffee' b'caffe' b'\xe5\x92\x96\xe5\x95\xa1'], shape=(4,), dtype=string)

tf.strings.length(p, unit="UTF8_CHAR")

输出：

<tf.Tensor: shape=(4,), dtype=int32, numpy=array([4, 6, 5, 2])>

r = tf.strings.unicode_decode(p, "UTF8")
r

输出：

<tf.RaggedTensor [[67, 97, 102, 101], [67, 111, 102, 102, 101, 101], [99, 97, 102, 102, 101], [21654, 21857]]>

嵌套张量：tf.RaggedTensor，张量列表的列表，tf.ragged提供相关运算

print(r[1])

输出：

tf.Tensor([ 67 111 102 102 101 101], shape=(6,), dtype=int32)

print(r[1:3])

输出：

<tf.RaggedTensor [[67, 111, 102, 102, 101, 101], [99, 97, 102, 102, 101]]>

r2 = tf.ragged.constant([[65, 66],[],[67]])
print(tf.concat([r,r2], axis=0))

输出：

<tf.RaggedTensor [[67, 97, 102, 101], [67, 111, 102, 102, 101, 101], [99, 97, 102, 102, 101], [21654, 21857], [65, 66], [], [67]]>

r3 = tf.ragged.constant([[68, 69, 70], [71], [], [72, 73]])
print(tf.concat([r, r3], axis=1))

输出：

<tf.RaggedTensor [[67, 97, 102, 101, 68, 69, 70], [67, 111, 102, 102, 101, 101, 71], [99, 97, 102, 102, 101], [21654, 21857, 72, 73]]>

tf.strings.unicode_encode(r3, "UTF-8")

输出：

<tf.Tensor: shape=(4,), dtype=string, numpy=array([b'DEF', b'G', b'', b'HI'], dtype=object)>

r.to_tensor()

输出：

<tf.Tensor: shape=(4, 6), dtype=int32, numpy=
array([[   67,    97,   102,   101,     0,     0],
       [   67,   111,   102,   102,   101,   101],
       [   99,    97,   102,   102,   101,     0],
       [21654, 21857,     0,     0,     0,     0]])>

集合：表示为常规张量或稀疏张量

set1 = tf.constant([[2,3,5,7],[7,9,0,0]])
set2 = tf.constant([[4,5,6],[9,10,0]])

tf.sparse.to_dense(tf.sets.union(set1, set2))

输出：

<tf.Tensor: shape=(2, 6), dtype=int32, numpy=
array([[ 2,  3,  4,  5,  6,  7],
       [ 0,  7,  9, 10,  0,  0]])>

tf.sparse.to_dense(tf.sets.difference(set1, set2))

输出：

<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[2, 3, 7],
       [7, 0, 0]])>

tf.sparse.to_dense(tf.sets.intersection(set1, set2))

输出：

<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[5, 0],
       [0, 9]])>

队列：用来在多个步骤之间保存张量，TF中提供了多种队列，先进行出FIFOQueue，优先级队列PriorityQueue，随机队列RandomShuffleQueue，以及PaddingFIFOQueue

3. 自定义模型和训练算法

3.1 自定义损失函数

均方误差可能对大误差惩罚过重，导致模型可能不准确，而平均绝对误差可能会需要很长的训练时间，模型可能也不太准确。而Huber损失就比MSE好多了。

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = train_test_split(housing.data, housing.target.reshape(-1, 1),
                                                              random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_valid_scaled = scaler.transform(X_valid)
X_test_scaled = scaler.transform(X_test)

def huber_fn(y_true, y_pred):
    error = y_true - y_pred
    is_small_error = tf.abs(error) <  1
    squared_loss = tf.square(error) / 2
    linear_loss = tf.abs(error) - 0.5
    return tf.where(is_small_error, squared_loss, linear_loss)

plt.figure(figsize=(12, 5))
z = np.linspace(-4, 4, 200)
plt.plot(z, huber_fn(0, z), "b-", linewidth=2, label="huber($z$)")
plt.plot(z, z**2 / 2, "b:", linewidth=1, label=r"$\frac{1}{2}z^2$")
plt.plot([-1, -1], [0, huber_fn(0., -1.)], "r--")
plt.plot([1, 1], [0, huber_fn(0., 1.)], "r--")
plt.gca().axhline(y=0, color='k')
plt.gca().axvline(x=0, color='k')
plt.axis([-4, 4, 0, 4])
plt.grid(True)
plt.xlabel("$z$")
plt.legend(fontsize=14)
plt.title("Huber loss", fontsize=14)
plt.show()

输出：

input_shape = X_train.shape[1:]

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(30, activation="selu", kernel_initializer="lecun_normal",input_shape=input_shape),
    tf.keras.layers.Dense(1)
])

model.compile(loss=huber_fn, optimizer="nadam", metrics=["mae"])
model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))

输出：

Epoch 1/2
363/363 [==============================] - 1s 4ms/step - loss: 0.5597 - mae: 0.9113 - val_loss: 0.3763 - val_mae: 0.6760
Epoch 2/2
363/363 [==============================] - 1s 3ms/step - loss: 0.2193 - mae: 0.5157 - val_loss: 0.3128 - val_mae: 0.5996

<tensorflow.python.keras.callbacks.History at 0x208b3da0>

3.2 保存和加载包含自定义部件的模型

# 保存模型
model.save("my_model_with_a_custom_loss.h5")

model = tf.keras.models.load_model("my_model_with_a_custom_loss.h5", custom_objects={"huber_fn":huber_fn})

model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))

输出：

Epoch 1/2
363/363 [==============================] - 1s 3ms/step - loss: 0.1999 - mean_absolute_error: 0.4883 - val_loss: 0.2136 - val_mean_absolute_error: 0.4925
Epoch 2/2
363/363 [==============================] - 1s 3ms/step - loss: 0.1940 - mean_absolute_error: 0.4810 - val_loss: 0.1793 - val_mean_absolute_error: 0.4563

<tensorflow.python.keras.callbacks.History at 0x615462b0>

对于上面的代码，在-1和1之间的误差被认为是小的误差，可以自定义其阈值：

def create_huber(threshold=1.0):
    def huber_fn(y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < threshold
        squared_loss = tf.square(error)/2
        linear_loss = threshold*tf.abs(error) - threshold**2/2
        return tf.where(is_small_error, squared_loss, linear_loss)
    return huber_fn

model.compile(loss=create_huber(2.0), optimizer="nadam", metrics=["mae"])

model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))

输出：

Epoch 1/2
363/363 [==============================] - 1s 4ms/step - loss: 0.8442 - mae: 0.9523 - val_loss: 0.3565 - val_mae: 0.5651
Epoch 2/2
363/363 [==============================] - 1s 3ms/step - loss: 0.2431 - mae: 0.5133 - val_loss: 0.2223 - val_mae: 0.4913

<tensorflow.python.keras.callbacks.History at 0x20c13ba8>

在保存模型时，threshold不会被保存，因此在加载模型时需要指定threshold的值：

model.save("my_model_with_a_custom_loss_threshold_2.h5")

model = tf.keras.models.load_model("my_model_with_a_custom_loss_threshold_2.h5",custom_objects={"huber_fn": create_huber(2.0)})

model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))

输出：

Epoch 1/2
363/363 [==============================] - 1s 4ms/step - loss: 0.2029 - mean_absolute_error: 0.4649 - val_loss: 0.2330 - val_mean_absolute_error: 0.4664
Epoch 2/2
363/363 [==============================] - 1s 4ms/step - loss: 0.2011 - mean_absolute_error: 0.4625 - val_loss: 0.1842 - val_mean_absolute_error: 0.4414

<tensorflow.python.keras.callbacks.History at 0x62353438>

想要解决这个问题，可以创建一个keras.losses.Loss子类，然后实现get_config()方法：

class HuberLoss(tf.keras.losses.Loss):
    def __init__(self, threshold=1.0, **kwargs):
        self.threshold = threshold
        super().__init__(**kwargs)
    def call(self, y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < self.threshold
        squared_loss = tf.square(error) / 2
        linear_loss  = self.threshold * tf.abs(error) - self.threshold**2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "threshold": self.threshold}
    
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(30, activation="selu", kernel_initializer="lecun_normal",input_shape=input_shape),
    tf.keras.layers.Dense(1)
])

model.compile(loss=HuberLoss(2.), optimizer="nadam", metrics=["mae"])

model.fit(X_train_scaled, y_train, epochs=2,validation_data=(X_valid_scaled, y_valid))

输出：

Epoch 1/2
363/363 [==============================] - 1s 4ms/step - loss: 0.8382 - mae: 0.9633 - val_loss: 0.4488 - val_mae: 0.6093
Epoch 2/2
363/363 [==============================] - 2s 5ms/step - loss: 0.2443 - mae: 0.5109 - val_loss: 0.3325 - val_mae: 0.5488

<tensorflow.python.keras.callbacks.History at 0x63882160>

此时保存模型时，阈值会被一起保存：

model.save("my_model_with_a_custom_loss_class.h5")

model = tf.keras.models.load_model("my_model_with_a_custom_loss_class.h5",custom_objects={"HuberLoss": HuberLoss})

model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))

输出：

Epoch 1/2
363/363 [==============================] - 1s 4ms/step - loss: 0.2323 - mean_absolute_error: 0.5003 - val_loss: 0.2461 - val_mean_absolute_error: 0.4959
Epoch 2/2
363/363 [==============================] - 1s 4ms/step - loss: 0.2224 - mean_absolute_error: 0.4912 - val_loss: 0.2066 - val_mean_absolute_error: 0.4746

<tensorflow.python.keras.callbacks.History at 0x61c11668>

model.loss.threshold

输出：

2.0

3.3 自定义激活函数、初始化方法、正则化方法和约束方法

# 等价于keras.activation.softplus()或tf.nn.softplus()
def my_softplus(z):
    return tf.math.log(tf.exp(z)+1.0)

# 等价于keras.initializer.glorot_normal()
def my_glorot_initializer(shape, dtype=tf.float32):
    stddev = tf.sqrt(2./(shape[0] + shape[1]))
    return tf.random.normal(shape, stddev=stddev, dtype=dtype)

# 等价于keras.regularizers.l1(0.01)
def my_l1_regularizer(weights):
    return tf.reduce_sum(tf.abs(0.01*weights))

# 等价于keras.constraints.nonneg()或tf.nn.relu()
def my_positive_weights(weights):
    return tf.where(weights<0., tf.zeros_like(weights), weights)

layer = tf.keras.layers.Dense(1, activation=my_softplus, kernel_initializer=my_glorot_initializer,
                             kernel_regularizer=my_l1_regularizer, kernel_constraint=my_positive_weights)

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(30,activation="selu",kernel_initializer="lecun_normal",input_shape=input_shape),
    tf.keras.layers.Dense(1,activation=my_softplus,kernel_regularizer=my_l1_regularizer,
                         kernel_constraint=my_positive_weights,kernel_initializer=my_glorot_initializer)
])

model.compile(loss="mse", optimizer="nadam", metrics=["mae"])

model.fit(X_train_scaled, y_train, epochs=2,validation_data=(X_valid_scaled, y_valid))

输出：

Epoch 1/2
363/363 [==============================] - 2s 4ms/step - loss: 1.4884 - mae: 0.8813 - val_loss: inf - val_mae: inf
Epoch 2/2
363/363 [==============================] - 2s 4ms/step - loss: 0.5808 - mae: 0.5139 - val_loss: 2.6715 - val_mae: 0.5219

<tensorflow.python.keras.callbacks.History at 0x65025588>

保存模型：

model.save("my_model_with_many_custom_parts.h5")

加载模型：

model = tf.keras.models.load_model("my_model_with_many_custom_parts.h5",
                                  custom_objects={"my_l1_regularizer":my_l1_regularizer,
                                                 "my_positive_weights":lambda:my_positive_weights,
                                                 "my_glorot_initializer":my_glorot_initializer,
                                                 "my_softplus":my_softplus})

class MyL1Regularizer(tf.keras.regularizers.Regularizer):
    def __init__(self, factor):
        self.factor = factor
    def __call__(self, weights):
        return tf.reduce_sum(tf.abs(self.factor * weights))
    def get_config(self):
        return {"factor": self.factor}

model = keras.models.Sequential([
    keras.layers.Dense(30, activation="selu", kernel_initializer="lecun_normal",
                       input_shape=input_shape),
    keras.layers.Dense(1, activation=my_softplus,
                       kernel_regularizer=MyL1Regularizer(0.01),
                       kernel_constraint=my_positive_weights,
                       kernel_initializer=my_glorot_initializer),
])

model.compile(loss="mse", optimizer="nadam", metrics=["mae"])

model.fit(X_train_scaled, y_train, epochs=2,
          validation_data=(X_valid_scaled, y_valid))

输出：

Epoch 1/2
363/363 [==============================] - 1s 4ms/step - loss: 1.6796 - mae: 0.8995 - val_loss: inf - val_mae: inf
Epoch 2/2
363/363 [==============================] - 1s 4ms/step - loss: 0.6537 - mae: 0.5356 - val_loss: inf - val_mae: inf

<tensorflow.python.keras.callbacks.History at 0x664c25f8>

model.save("my_model_with_many_custom_parts.h5")

model = keras.models.load_model(
    "my_model_with_many_custom_parts.h5",
    custom_objects={
       "MyL1Regularizer": MyL1Regularizer,
       "my_positive_weights": lambda: my_positive_weights,
       "my_glorot_initializer": my_glorot_initializer,
       "my_softplus": my_softplus,
    })

3.4 自定义指标

损失和指标的意义不一样：损失函数用于梯度下降算法，一般要求是可微分的，而指标用来评估模型，一般要求解释性好，可以是不可微分的。

在多数情况下，自定义损失函数和指标函数是完全一样的。

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(30, activation="selu", kernel_initializer="lecun_normal",input_shape=input_shape),
    tf.keras.layers.Dense(1)
])

model.compile(loss="mse", optimizer="nadam", metrics=[create_huber(2.0)])

model.fit(X_train_scaled, y_train, epochs=2)

输出：

Epoch 1/2
363/363 [==============================] - 1s 3ms/step - loss: 3.3241 - huber_fn: 1.0429
Epoch 2/2
363/363 [==============================] - 1s 3ms/step - loss: 0.9869 - huber_fn: 0.3078

<tensorflow.python.keras.callbacks.History at 0x221a9160>

model.compile(loss=create_huber(2.0), optimizer="nadam", metrics=[create_huber(2.0)])

sample_weight = np.random.rand(len(y_train))
history = model.fit(X_train_scaled, y_train, epochs=2, sample_weight=sample_weight)

输出：

Epoch 1/2
363/363 [==============================] - 1s 3ms/step - loss: 0.1298 - huber_fn: 0.2582
Epoch 2/2
363/363 [==============================] - 1s 3ms/step - loss: 0.1172 - huber_fn: 0.2328

history.history["loss"][0], history.history["huber_fn"][0] * sample_weight.mean()

输出：

(0.12980547547340393, 0.1294616864812135)

precision = tf.keras.metrics.Precision()
precision([0, 1, 1, 1, 0, 1, 0, 1], [1, 1, 0, 1, 0, 1, 0, 1])

输出：

<tf.Tensor: shape=(), dtype=float32, numpy=0.8>

precision([0, 1, 0, 0, 1, 0, 1, 1], [1, 0, 1, 1, 0, 0, 0, 0])

输出：

<tf.Tensor: shape=(), dtype=float32, numpy=0.5>

precision.result()

输出：

<tf.Tensor: shape=(), dtype=float32, numpy=0.5>

precision.variables

输出：

[<tf.Variable 'true_positives:0' shape=(1,) dtype=float32, numpy=array([4.], dtype=float32)>,
 <tf.Variable 'false_positives:0' shape=(1,) dtype=float32, numpy=array([4.], dtype=float32)>]

precision.reset_states()

class HuberMetric(tf.keras.metrics.Metric):
    def __init__(self, threshold=1.0, **kwargs):
        super().__init__(**kwargs) # handles base args (e.g., dtype)
        self.threshold = threshold
        #self.huber_fn = create_huber(threshold) # TODO: investigate why this fails
        self.total = self.add_weight("total", initializer="zeros")
        self.count = self.add_weight("count", initializer="zeros")
    def huber_fn(self, y_true, y_pred): # workaround
        error = y_true - y_pred
        is_small_error = tf.abs(error) < self.threshold
        squared_loss = tf.square(error) / 2
        linear_loss  = self.threshold * tf.abs(error) - self.threshold**2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)
    def update_state(self, y_true, y_pred, sample_weight=None):
        metric = self.huber_fn(y_true, y_pred)
        self.total.assign_add(tf.reduce_sum(metric))
        self.count.assign_add(tf.cast(tf.size(y_true), tf.float32))
    def result(self):
        return self.total / self.count
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "threshold": self.threshold}

m = HuberMetric(2.)

# total = 2 * |10 - 2| - 2²/2 = 14
# count = 1
# result = 14 / 1 = 14
m(tf.constant([[2.]]), tf.constant([[10.]]))

输出：

<tf.Tensor: shape=(), dtype=float32, numpy=14.0>

# total = total + (|1 - 0|² / 2) + (2 * |9.25 - 5| - 2² / 2) = 14 + 7 = 21
# count = count + 2 = 3
# result = total / count = 21 / 3 = 7
m(tf.constant([[0.], [5.]]), tf.constant([[1.], [9.25]]))

m.result()

输出：

<tf.Tensor: shape=(), dtype=float32, numpy=7.0>

m.variables

输出：

[<tf.Variable 'total:0' shape=() dtype=float32, numpy=21.0>,
 <tf.Variable 'count:0' shape=() dtype=float32, numpy=3.0>]

m.reset_states()
m.variables

输出：

[<tf.Variable 'total:0' shape=() dtype=float32, numpy=0.0>,
 <tf.Variable 'count:0' shape=() dtype=float32, numpy=0.0>]

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(30, activation="selu", kernel_initializer="lecun_normal",input_shape=input_shape),
    tf.keras.layers.Dense(1)
])

model.compile(loss=create_huber(2.0), optimizer="nadam", metrics=[HuberMetric(2.0)])

model.fit(X_train_scaled.astype(np.float32), y_train.astype(np.float32), epochs=2)

输出：

Epoch 1/2
363/363 [==============================] - 1s 3ms/step - loss: 0.8021 - huber_metric_3: 0.8021
Epoch 2/2
363/363 [==============================] - 1s 3ms/step - loss: 0.2392 - huber_metric_3: 0.2392

<tensorflow.python.keras.callbacks.History at 0x2062b208>

model.save("my_model_with_a_custom_metric.h5")

model = tf.keras.models.load_model("my_model_with_a_custom_metric.h5",
                                custom_objects={"huber_fn": create_huber(2.0),"HuberMetric": HuberMetric})

model.fit(X_train_scaled.astype(np.float32), y_train.astype(np.float32), epochs=2)

输出：

Epoch 1/2
363/363 [==============================] - 1s 3ms/step - loss: 0.2273 - huber_metric_3: 0.2273
Epoch 2/2
363/363 [==============================] - 1s 3ms/step - loss: 0.2214 - huber_metric_3: 0.2214

<tensorflow.python.keras.callbacks.History at 0x25ac0518>

class HuberMetric(tf.keras.metrics.Mean):
    def __init__(self, threshold=1.0, name='HuberMetric', dtype=None):
        self.threshold = threshold
        self.huber_fn = create_huber(threshold)
        super().__init__(name=name, dtype=dtype)
    def update_state(self, y_true, y_pred, sample_weight=None):
        metric = self.huber_fn(y_true, y_pred)
        super(HuberMetric, self).update_state(metric, sample_weight)
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "threshold": self.threshold}   

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(30, activation="selu", kernel_initializer="lecun_normal",input_shape=input_shape),
    tf.keras.layers.Dense(1),
])

model.compile(loss=tf.keras.losses.Huber(2.0), optimizer="nadam", weighted_metrics=[HuberMetric(2.0)])

sample_weight = np.random.rand(len(y_train))
history = model.fit(X_train_scaled, y_train, epochs=2, sample_weight=sample_weight)

输出：

Epoch 1/2
363/363 [==============================] - 1s 4ms/step - loss: 0.4151 - HuberMetric: 0.8278
Epoch 2/2
363/363 [==============================] - 1s 4ms/step - loss: 0.1203 - HuberMetric: 0.2399

history.history["loss"][0], history.history["HuberMetric"][0] * sample_weight.mean()

输出：

(0.4150879383087158, 0.4150878116273832)

3.5 自定义层

对于没有权重参数的自定义层，定义一个函数，然后将其包装进tf.keras.layers.Lambda即可：

exponential_layer = tf.keras.layers.Lambda(lambda x: tf.exp(x))

exponential_layer([-1., 0., 1.])

输出：

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([0.36787945, 1.        , 2.7182817 ], dtype=float32)>

指数层放在回归模型输出位置很有用：

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(30, activation="relu", input_shape=input_shape),
    tf.keras.layers.Dense(1),
    exponential_layer
])

model.compile(loss="mse", optimizer="nadam")
model.fit(X_train_scaled, y_train, epochs=5, validation_data=(X_valid_scaled, y_valid))
model.evaluate(X_test_scaled, y_test)

输出：

Epoch 1/5
363/363 [==============================] - 1s 3ms/step - loss: 1.0172 - val_loss: 0.5744
Epoch 2/5
363/363 [==============================] - 1s 3ms/step - loss: 0.5303 - val_loss: 0.4132
Epoch 3/5
363/363 [==============================] - 1s 3ms/step - loss: 0.4393 - val_loss: 0.3729
Epoch 4/5
363/363 [==============================] - 1s 3ms/step - loss: 0.4092 - val_loss: 0.3631
Epoch 5/5
363/363 [==============================] - 1s 3ms/step - loss: 0.4055 - val_loss: 0.3536
162/162 [==============================] - 0s 1ms/step - loss: 0.3817A: 0s - loss: 0.39

0.3816811144351959

对于有权重参数的自定义层，需要创建keras.layers.Layer的子类：

class MyDense(tf.keras.layers.Layer):
    def __init__(self, units, activation=None, **kwargs):
        super().__init__(**kwargs)
        self.units = units
        self.activation = tf.keras.activations.get(activation)

    def build(self, batch_input_shape):
        self.kernel = self.add_weight(name="kernel", shape=[batch_input_shape[-1], self.units],
                                      initializer="glorot_normal")
        self.bias = self.add_weight(name="bias", shape=[self.units], initializer="zeros")
        super().build(batch_input_shape) # must be at the end

    def call(self, X):
        return self.activation(X @ self.kernel + self.bias)

    def compute_output_shape(self, batch_input_shape):
        return tf.TensorShape(batch_input_shape.as_list()[:-1] + [self.units])

    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "units": self.units,"activation": keras.activations.serialize(self.activation)}
    
model = tf.keras.models.Sequential([
    MyDense(30, activation="relu", input_shape=input_shape),
    MyDense(1)
])

model.compile(loss="mse", optimizer="nadam")

model.fit(X_train_scaled, y_train, epochs=2,validation_data=(X_valid_scaled, y_valid))
model.evaluate(X_test_scaled, y_test)

输出：

Epoch 1/2
363/363 [==============================] - 1s 3ms/step - loss: 1.3062 - val_loss: 0.9271
Epoch 2/2
363/363 [==============================] - 1s 3ms/step - loss: 0.5722 - val_loss: 0.5473
162/162 [==============================] - 0s 1ms/step - loss: 0.4995

0.49953779578208923

model.save("my_model_with_a_custom_layer.h5")

model = keras.models.load_model("my_model_with_a_custom_layer.h5",
                                custom_objects={"MyDense": MyDense})

class MyMultiLayer(keras.layers.Layer):
    def call(self, X):
        X1, X2 = X
        return X1 + X2, X1 * X2

    def compute_output_shape(self, batch_input_shape):
        batch_input_shape1, batch_input_shape2 = batch_input_shape
        return [batch_input_shape1, batch_input_shape2]
    
inputs1 = keras.layers.Input(shape=[2])
inputs2 = keras.layers.Input(shape=[2])
outputs1, outputs2 = MyMultiLayer()((inputs1, inputs2))

print(outputs1, outputs2)

输出：

Tensor("my_multi_layer/Identity:0", shape=(None, 2), dtype=float32) Tensor("my_multi_layer/Identity_1:0", shape=(None, 2), dtype=float32)

tf.keras.activations.get(tf.nn.relu)

输出：

<function tensorflow.python.ops.gen_nn_ops.relu(features, name=None)>

训练和测试时不同行为的层：

class AddGaussianNoise(keras.layers.Layer):
    def __init__(self, stddev, **kwargs):
        super().__init__(**kwargs)
        self.stddev = stddev

    def call(self, X, training=None):
        if training:
            noise = tf.random.normal(tf.shape(X), stddev=self.stddev)
            return X + noise
        else:
            return X

    def compute_output_shape(self, batch_input_shape):
        return batch_input_shape

3.6 自定义模型

X_new_scaled = X_test_scaled

class ResidualBlock(keras.layers.Layer):
    def __init__(self, n_layers, n_neurons, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [keras.layers.Dense(n_neurons, activation="elu",kernel_initializer="he_normal")
                       for _ in range(n_layers)]

    def call(self, inputs):
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        return inputs + Z

class ResidualRegressor(keras.models.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden1 = keras.layers.Dense(30, activation="elu",kernel_initializer="he_normal")
        self.block1 = ResidualBlock(2, 30)
        self.block2 = ResidualBlock(2, 30)
        self.out = keras.layers.Dense(output_dim)

    def call(self, inputs):
        Z = self.hidden1(inputs)
        for _ in range(1 + 3):
            Z = self.block1(Z)
        Z = self.block2(Z)
        return self.out(Z)

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = ResidualRegressor(1)
model.compile(loss="mse", optimizer="nadam")
history = model.fit(X_train_scaled, y_train, epochs=5)
score = model.evaluate(X_test_scaled, y_test)
y_pred = model.predict(X_new_scaled)

输出：

Epoch 1/5
363/363 [==============================] - 2s 6ms/step - loss: 9.1324
Epoch 2/5
363/363 [==============================] - 2s 6ms/step - loss: 1.0581
Epoch 3/5
363/363 [==============================] - 2s 6ms/step - loss: 0.8870
Epoch 4/5
363/363 [==============================] - 2s 6ms/step - loss: 0.5835
Epoch 5/5
363/363 [==============================] - 2s 6ms/step - loss: 0.6458
162/162 [==============================] - 0s 2ms/step - loss: 0.6501

model.save("my_custom_model.ckpt")

输出：

INFO:tensorflow:Assets written to: my_custom_model.ckpt\assets

model = keras.models.load_model("my_custom_model.ckpt")

history = model.fit(X_train_scaled, y_train, epochs=5)

输出：

Epoch 1/5
363/363 [==============================] - 2s 6ms/step - loss: 0.7998
Epoch 2/5
363/363 [==============================] - 2s 6ms/step - loss: 0.4894
Epoch 3/5
363/363 [==============================] - 2s 6ms/step - loss: 0.4649
Epoch 4/5
363/363 [==============================] - 2s 6ms/step - loss: 0.4497
Epoch 5/5
363/363 [==============================] - 2s 6ms/step - loss: 0.5049

也可以使用连续性API调用：

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

block1 = ResidualBlock(2, 30)
model = keras.models.Sequential([
    keras.layers.Dense(30, activation="elu", kernel_initializer="he_normal"),
    block1, block1, block1, block1,
    ResidualBlock(2, 30),
    keras.layers.Dense(1)
])

model.compile(loss="mse", optimizer="nadam")
history = model.fit(X_train_scaled, y_train, epochs=5)
score = model.evaluate(X_test_scaled, y_test)
y_pred = model.predict(X_new_scaled)

输出：

Epoch 1/5
363/363 [==============================] - 2s 5ms/step - loss: 0.8695
Epoch 2/5
363/363 [==============================] - 2s 5ms/step - loss: 0.4720
Epoch 3/5
363/363 [==============================] - 2s 5ms/step - loss: 0.5537
Epoch 4/5
363/363 [==============================] - 2s 5ms/step - loss: 0.3809
Epoch 5/5
363/363 [==============================] - 2s 5ms/step - loss: 0.4012
162/162 [==============================] - 0s 1ms/step - loss: 0.4852

3.7 基于模型内部的损失和指标

前面的自定义损失和指标都是基于标签值和预测值。而有时可能基于模型的其它部分定义损失，例如隐藏层的权重等，这么做可能是基于正则化的目的。

基于模型内部的自定义损失需要先做基于模型部件的计算，然后将结果传递给add_loss()方法。

class ReconstructingRegressor(keras.models.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [keras.layers.Dense(30, activation="selu",
                                          kernel_initializer="lecun_normal")
                       for _ in range(5)]
        self.out = keras.layers.Dense(output_dim)
        # TODO: check https://github.com/tensorflow/tensorflow/issues/26260
        #self.reconstruction_mean = keras.metrics.Mean(name="reconstruction_error")

    def build(self, batch_input_shape):
        n_inputs = batch_input_shape[-1]
        self.reconstruct = keras.layers.Dense(n_inputs)
        super().build(batch_input_shape)

    def call(self, inputs, training=None):
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        reconstruction = self.reconstruct(Z)
        recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
        self.add_loss(0.05 * recon_loss)
        #if training:
        #    result = self.reconstruction_mean(recon_loss)
        #    self.add_metric(result)
        return self.out(Z)
    
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = ReconstructingRegressor(1)
model.compile(loss="mse", optimizer="nadam")
history = model.fit(X_train_scaled, y_train, epochs=2)
y_pred = model.predict(X_test_scaled)

3.8 使用自动微分计算梯度

def f(w1, w2):
    return 3 * w1 ** 2 + 2*w1*w2

w1, w2 = 5,3
eps = 1e-6
(f(w1+eps, w2) - f(w1, w2))/eps

输出：

36.000003007075065

(f(w1, eps+w2) - f(w1, w2))/eps

输出：

10.000000003174137

TensorFlow中自动微分的实现：

w1, w2 = tf.Variable(5.), tf.Variable(3.)
with tf.GradientTape() as tape:
    z = f(w1, w2)

gradients = tape.gradient(z, [w1, w2])
gradients

输出：

[<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>]

with tf.GradientTape() as tape:
    z = f(w1, w2)

dz_dw1 = tape.gradient(z, w1)
print(dz_dw1)
try:
    dz_dw2 = tape.gradient(z, w2)
except RuntimeError as ex:
    print(ex)

输出：

tf.Tensor(36.0, shape=(), dtype=float32)
GradientTape.gradient can only be called once on non-persistent tapes.

指定 persistent=True参数，并在用完后释放资源：

with tf.GradientTape(persistent=True) as tape:
    z = f(w1, w2)

dz_dw1 = tape.gradient(z, w1)
print(dz_dw1)
dz_dw2 = tape.gradient(z, w2) # works now!
print(dz_dw2)
del tape

输出：

tf.Tensor(36.0, shape=(), dtype=float32)
tf.Tensor(10.0, shape=(), dtype=float32)

c1, c2 = tf.constant(5.), tf.constant(3.)
with tf.GradientTape() as tape:
    z = f(c1, c2)

gradients = tape.gradient(z, [c1, c2])
gradients

输出：

[None, None]

强制记录器监视想要监视的张量：

with tf.GradientTape() as tape:
    tape.watch(c1)
    tape.watch(c2)
    z = f(c1, c2)

gradients = tape.gradient(z, [c1, c2])
gradients

输出：

[<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>]

with tf.GradientTape() as tape:
    z1 = f(w1, w2 + 2.)
    z2 = f(w1, w2 + 5.)
    z3 = f(w1, w2 + 7.)

tape.gradient([z1, z2, z3], [w1, w2])

输出：

[<tf.Tensor: shape=(), dtype=float32, numpy=136.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=30.0>]

with tf.GradientTape(persistent=True) as tape:
    z1 = f(w1, w2 + 2.)
    z2 = f(w1, w2 + 5.)
    z3 = f(w1, w2 + 7.)

result = tf.reduce_sum(tf.stack([tape.gradient(z, [w1, w2]) for z in (z1, z2, z3)]), axis=0)
print(result)
del tape

输出：

tf.Tensor([136.  30.], shape=(2,), dtype=float32)

实现求二次导：

with tf.GradientTape(persistent=True) as hessian_tape:
    with tf.GradientTape() as jacobian_tape:
        z = f(w1, w2)
    jacobians = jacobian_tape.gradient(z, [w1, w2])
hessians = [hessian_tape.gradient(jacobian, [w1, w2]) for jacobian in jacobians]
del hessian_tape

for hessian in hessians:
    print(hessian)

输出：

[<tf.Tensor: shape=(), dtype=float32, numpy=6.0>, <tf.Tensor: shape=(), dtype=float32, numpy=2.0>]
[<tf.Tensor: shape=(), dtype=float32, numpy=2.0>, None]

让部分梯度停止传播：

def f(w1, w2):
    return 3 * w1 ** 2 + tf.stop_gradient(2 * w1 * w2)

with tf.GradientTape() as tape:
    z = f(w1, w2)

tape.gradient(z, [w1, w2])

输出：

[<tf.Tensor: shape=(), dtype=float32, numpy=30.0>, None]

对于较大的数值，计算my_softplus()函数的梯度时结果出现NaN：

x = tf.Variable(100.)
with tf.GradientTape() as tape:
    z = my_softplus(x)

tape.gradient(z, [x])

输出：

[<tf.Tensor: shape=(), dtype=float32, numpy=nan>]

这是因为当x=100.0时，softplus的导数值是exp(100)/(exp(100)+1)，由于数值精度的误差，自动微分变成无穷除以无穷，结果是NaN。

tf.math.log(tf.exp(tf.constant(30., dtype=tf.float32)) + 1.)

输出：

<tf.Tensor: shape=(), dtype=float32, numpy=30.0>

@tf.custom_gradient
def my_better_softplus(z):
    exp = tf.exp(z)
    def my_softplus_gradients(grad):
        return grad / (1 + 1 / exp)
    return tf.math.log(exp + 1), my_softplus_gradients


def my_better_softplus(z):
    return tf.where(z > 30., z, tf.math.log(tf.exp(z) + 1.))


x = tf.Variable([1000.])
with tf.GradientTape() as tape:
    z = my_better_softplus(x)

z, tape.gradient(z, [x])

输出：

(<tf.Tensor: shape=(1,), dtype=float32, numpy=array([1000.], dtype=float32)>,
 [<tf.Tensor: shape=(1,), dtype=float32, numpy=array([nan], dtype=float32)>])

3.9 自定义训练循环

注意：除非真的需要自定义，否则还是最好使用fit()方法。

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

l2_reg = keras.regularizers.l2(0.05)
model = keras.models.Sequential([
    keras.layers.Dense(30, activation="elu", kernel_initializer="he_normal",kernel_regularizer=l2_reg),
    keras.layers.Dense(1, kernel_regularizer=l2_reg)
])

def random_batch(X, y, batch_size=32):
    idx = np.random.randint(len(X), size=batch_size)
    return X[idx], y[idx]

def print_status_bar(iteration, total, loss, metrics=None):
    metrics = " - ".join(["{}: {:.4f}".format(m.name, m.result()) for m in [loss] + (metrics or [])])
    end = "" if iteration < total else "\n"
    print("\r{}/{} - ".format(iteration, total) + metrics, end=end)
    
import time

mean_loss = keras.metrics.Mean(name="loss")
mean_square = keras.metrics.Mean(name="mean_square")
for i in range(1, 50 + 1):
    loss = 1 / i
    mean_loss(loss)
    mean_square(i ** 2)
    print_status_bar(i, 50, mean_loss, [mean_square])
    time.sleep(0.05)

输出：

50/50 - loss: 0.0900 - mean_square: 858.5000

def progress_bar(iteration, total, size=30):
    running = iteration < total
    c = ">" if running else "="
    p = (size - 1) * iteration // total
    fmt = "{{:-{}d}}/{{}} [{{}}]".format(len(str(total)))
    params = [iteration, total, "=" * p + c + "." * (size - p - 1)]
    return fmt.format(*params)

progress_bar(3500, 10000, size=6)

输出：

' 3500/10000 [=>....]'

def print_status_bar(iteration, total, loss, metrics=None, size=30):
    metrics = " - ".join(["{}: {:.4f}".format(m.name, m.result())
                         for m in [loss] + (metrics or [])])
    end = "" if iteration < total else "\n"
    print("\r{} - {}".format(progress_bar(iteration, total), metrics), end=end)
    
mean_loss = keras.metrics.Mean(name="loss")
mean_square = keras.metrics.Mean(name="mean_square")
for i in range(1, 50 + 1):
    loss = 1 / i
    mean_loss(loss)
    mean_square(i ** 2)
    print_status_bar(i, 50, mean_loss, [mean_square])
    time.sleep(0.05)

输出：

50/50 [==============================] - loss: 0.0900 - mean_square: 858.5000

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

n_epochs = 5
batch_size = 32
n_steps = len(X_train) // batch_size
optimizer = keras.optimizers.Nadam(lr=0.01)
loss_fn = keras.losses.mean_squared_error
mean_loss = keras.metrics.Mean()
metrics = [keras.metrics.MeanAbsoluteError()]

for epoch in range(1, n_epochs + 1):
    print("Epoch {}/{}".format(epoch, n_epochs))
    for step in range(1, n_steps + 1):
        X_batch, y_batch = random_batch(X_train_scaled, y_train)
        with tf.GradientTape() as tape:
            y_pred = model(X_batch)
            main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
            loss = tf.add_n([main_loss] + model.losses)
        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        for variable in model.variables:
            if variable.constraint is not None:
                variable.assign(variable.constraint(variable))
        mean_loss(loss)
        for metric in metrics:
            metric(y_batch, y_pred)
        print_status_bar(step * batch_size, len(y_train), mean_loss, metrics)
    print_status_bar(len(y_train), len(y_train), mean_loss, metrics)
    for metric in [mean_loss] + metrics:
        metric.reset_states()

输出：

Epoch 1/5
11610/11610 [==============================] - mean: 1.3955 - mean_absolute_error: 0.5722
Epoch 2/5
11610/11610 [==============================] - mean: 0.6774 - mean_absolute_error: 0.5280
Epoch 3/5
11610/11610 [==============================] - mean: 0.6351 - mean_absolute_error: 0.5177
Epoch 4/5
11610/11610 [==============================] - mean: 0.6384 - mean_absolute_error: 0.5181
Epoch 5/5
11610/11610 [==============================] - mean: 0.6440 - mean_absolute_error: 0.5222

try:
    from tqdm.notebook import trange
    from collections import OrderedDict
    with trange(1, n_epochs + 1, desc="All epochs") as epochs:
        for epoch in epochs:
            with trange(1, n_steps + 1, desc="Epoch {}/{}".format(epoch, n_epochs)) as steps:
                for step in steps:
                    X_batch, y_batch = random_batch(X_train_scaled, y_train)
                    with tf.GradientTape() as tape:
                        y_pred = model(X_batch)
                        main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
                        loss = tf.add_n([main_loss] + model.losses)
                    gradients = tape.gradient(loss, model.trainable_variables)
                    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
                    for variable in model.variables:
                        if variable.constraint is not None:
                            variable.assign(variable.constraint(variable))                    
                    status = OrderedDict()
                    mean_loss(loss)
                    status["loss"] = mean_loss.result().numpy()
                    for metric in metrics:
                        metric(y_batch, y_pred)
                        status[metric.name] = metric.result().numpy()
                    steps.set_postfix(status)
            for metric in [mean_loss] + metrics:
                metric.reset_states()
except ImportError as ex:
    print("To run this cell, please install tqdm, ipywidgets and restart Jupyter")

输出：

HBox(children=(FloatProgress(value=0.0, description='All epochs', max=5.0, style=ProgressStyle(description_wid…
HBox(children=(FloatProgress(value=0.0, description='Epoch 1/5', max=362.0, style=ProgressStyle(description_wi…

HBox(children=(FloatProgress(value=0.0, description='Epoch 2/5', max=362.0, style=ProgressStyle(description_wi…

HBox(children=(FloatProgress(value=0.0, description='Epoch 3/5', max=362.0, style=ProgressStyle(description_wi…

HBox(children=(FloatProgress(value=0.0, description='Epoch 4/5', max=362.0, style=ProgressStyle(description_wi…

HBox(children=(FloatProgress(value=0.0, description='Epoch 5/5', max=362.0, style=ProgressStyle(description_wi…

3.10 TensorFlow的函数和图

def cube(x):
    return x ** 3

cube(2)

输出：

cube(tf.constant(2.0))

输出：

<tf.Tensor: shape=(), dtype=float32, numpy=8.0>

使用tf.function()将Python函数变为TensorFlow函数：

tf_cube = tf.function(cube)
tf_cube

输出：

<tensorflow.python.eager.def_function.Function at 0x6286a588>

tf_cube(2)

输出：

<tf.Tensor: shape=(), dtype=int32, numpy=8>

tf_cube(tf.constant(2.0))

输出：

<tf.Tensor: shape=(), dtype=float32, numpy=8.0>

concrete_function = tf_cube.get_concrete_function(tf.constant(2.0))
concrete_function.graph

输出：

<tensorflow.python.framework.func_graph.FuncGraph at 0x62bfed30>

concrete_function(tf.constant(2.0))

输出：

<tf.Tensor: shape=(), dtype=float32, numpy=8.0>

concrete_function is tf_cube.get_concrete_function(tf.constant(2.0))

输出：

True

TensorFlow优化了计算图，删掉了没用的节点，简化了表达式。通常情况下TF函数比Python函数快的多，特别是做复杂计算时。

在Keras模型中使用自定义的损失函数、自定义指标、自定义层等时，Keras会自动将其转换成TF函数，不用手动使用tf.function()。

concrete_function.graph

输出：

<tensorflow.python.framework.func_graph.FuncGraph at 0x62bfed30>

ops = concrete_function.graph.get_operations()
ops

输出：

[<tf.Operation 'x' type=Placeholder>,
 <tf.Operation 'pow/y' type=Const>,
 <tf.Operation 'pow' type=Pow>,
 <tf.Operation 'Identity' type=Identity>]

pow_op = ops[2]
list(pow_op.inputs)

输出：

[<tf.Tensor 'x:0' shape=() dtype=float32>,
 <tf.Tensor 'pow/y:0' shape=() dtype=float32>]

pow_op.outputs

输出：

[<tf.Tensor 'pow:0' shape=() dtype=float32>]

concrete_function.graph.get_operation_by_name('x')

输出：

<tf.Operation 'x' type=Placeholder>

concrete_function.graph.get_tensor_by_name('Identity:0')

输出：

<tf.Tensor 'Identity:0' shape=() dtype=float32>

concrete_function.function_def.signature

输出：

name: "__inference_cube_1099596"
input_arg {
  name: "x"
  type: DT_FLOAT
}
output_arg {
  name: "identity"
  type: DT_FLOAT
}

3.11 自动图和跟踪

TensorFlow生成计算图的过程：

自动绘图(AutoGraph)：分析Python函数源码，得出所有的数据流控制语句
分析完源码后生成TF版本的代码。
TF调用升级后的代码，但没有向其传递参数，而是传递一个符号张量，即没有值，只有名字、类型和形状。

@tf.function
def tf_cube(x):
    print("print:", x)
    return x ** 3

result = tf_cube(tf.constant(2.0))

输出：

print: Tensor("x:0", shape=(), dtype=float32)

result

输出：

<tf.Tensor: shape=(), dtype=float32, numpy=8.0>

result = tf_cube(2)
result = tf_cube(3)
result = tf_cube(tf.constant([[1., 2.]])) # New shape: trace!
result = tf_cube(tf.constant([[3., 4.], [5., 6.]])) # New shape: trace!
result = tf_cube(tf.constant([[7., 8.], [9., 10.], [11., 12.]])) # no trace

输出：

print: 2
print: 3
print: Tensor("x:0", shape=(1, 2), dtype=float32)
print: Tensor("x:0", shape=(2, 2), dtype=float32)
print: Tensor("x:0", shape=(3, 2), dtype=float32)

@tf.function(input_signature=[tf.TensorSpec([None, 28, 28], tf.float32)])
def shrink(images):
    print("Tracing", images)
    return images[:, ::2, ::2] # drop half the rows and columns


keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

img_batch_1 = tf.random.uniform(shape=[100, 28, 28])
img_batch_2 = tf.random.uniform(shape=[50, 28, 28])
preprocessed_images = shrink(img_batch_1) # Traces the function.
preprocessed_images = shrink(img_batch_2) # Reuses the same concrete function.

img_batch_3 = tf.random.uniform(shape=[2, 2, 2])
try:
    preprocessed_images = shrink(img_batch_3)  # rejects unexpected types or shapes
except ValueError as ex:
    print(ex)

输出：

Tracing Tensor("images:0", shape=(None, 28, 28), dtype=float32)
Python inputs incompatible with input_signature:
  inputs: (
    tf.Tensor(
[[[0.7413678  0.62854624]
  [0.01738465 0.3431449 ]]

 [[0.51063764 0.3777541 ]
  [0.07321596 0.02137029]]], shape=(2, 2, 2), dtype=float32))
  input_signature: (
    TensorSpec(shape=(None, 28, 28), dtype=tf.float32, name=None))

@tf.function
def add_10(x):
    for i in range(10):
        x += 1
    return x

add_10(tf.constant(5))

输出：

<tf.Tensor: shape=(), dtype=int32, numpy=15>

add_10.get_concrete_function(tf.constant(5)).graph.get_operations()

输出：

[<tf.Operation 'x' type=Placeholder>,
 <tf.Operation 'add/y' type=Const>,
 <tf.Operation 'add' type=AddV2>,
 <tf.Operation 'add_1/y' type=Const>,
 <tf.Operation 'add_1' type=AddV2>,
 <tf.Operation 'add_2/y' type=Const>,
 <tf.Operation 'add_2' type=AddV2>,
 <tf.Operation 'add_3/y' type=Const>,
 <tf.Operation 'add_3' type=AddV2>,
 <tf.Operation 'add_4/y' type=Const>,
 <tf.Operation 'add_4' type=AddV2>,
 <tf.Operation 'add_5/y' type=Const>,
 <tf.Operation 'add_5' type=AddV2>,
 <tf.Operation 'add_6/y' type=Const>,
 <tf.Operation 'add_6' type=AddV2>,
 <tf.Operation 'add_7/y' type=Const>,
 <tf.Operation 'add_7' type=AddV2>,
 <tf.Operation 'add_8/y' type=Const>,
 <tf.Operation 'add_8' type=AddV2>,
 <tf.Operation 'add_9/y' type=Const>,
 <tf.Operation 'add_9' type=AddV2>,
 <tf.Operation 'Identity' type=Identity>]

@tf.function
def add_10(x):
    condition = lambda i, x: tf.less(i, 10)
    body = lambda i, x: (tf.add(i, 1), tf.add(x, 1))
    final_i, final_x = tf.while_loop(condition, body, [tf.constant(0), x])
    return final_x

add_10(tf.constant(5))

输出：

<tf.Tensor: shape=(), dtype=int32, numpy=15>

add_10.get_concrete_function(tf.constant(5)).graph.get_operations()

输出：

[<tf.Operation 'x' type=Placeholder>,
 <tf.Operation 'Const' type=Const>,
 <tf.Operation 'while/maximum_iterations' type=Const>,
 <tf.Operation 'while/loop_counter' type=Const>,
 <tf.Operation 'while' type=StatelessWhile>,
 <tf.Operation 'Identity' type=Identity>]

@tf.function
def add_10(x):
    for i in tf.range(10):
        x = x + 1
    return x

add_10.get_concrete_function(tf.constant(0)).graph.get_operations()

输出：

[<tf.Operation 'x' type=Placeholder>,
 <tf.Operation 'range/start' type=Const>,
 <tf.Operation 'range/limit' type=Const>,
 <tf.Operation 'range/delta' type=Const>,
 <tf.Operation 'range' type=Range>,
 <tf.Operation 'sub' type=Sub>,
 <tf.Operation 'floordiv' type=FloorDiv>,
 <tf.Operation 'mod' type=FloorMod>,
 <tf.Operation 'zeros_like' type=Const>,
 <tf.Operation 'NotEqual' type=NotEqual>,
 <tf.Operation 'Cast' type=Cast>,
 <tf.Operation 'add' type=AddV2>,
 <tf.Operation 'zeros_like_1' type=Const>,
 <tf.Operation 'Maximum' type=Maximum>,
 <tf.Operation 'while/loop_counter' type=Const>,
 <tf.Operation 'while' type=StatelessWhile>,
 <tf.Operation 'Identity' type=Identity>]

counter = tf.Variable(0)

@tf.function
def increment(counter, c=1):
    return counter.assign_add(c)

increment(counter)
increment(counter)

输出：

<tf.Tensor: shape=(), dtype=int32, numpy=2>

function_def = increment.get_concrete_function(counter).function_def
function_def.signature.input_arg[0]

输出：

name: "counter"
type: DT_RESOURCE

counter = tf.Variable(0)

@tf.function
def increment(c=1):
    return counter.assign_add(c)

increment()
increment()

输出：

<tf.Tensor: shape=(), dtype=int32, numpy=2>

function_def = increment.get_concrete_function().function_def
function_def.signature.input_arg[0]

输出：

name: "assignaddvariableop_resource"
type: DT_RESOURCE

class Counter:
    def __init__(self):
        self.counter = tf.Variable(0)

    @tf.function
    def increment(self, c=1):
        return self.counter.assign_add(c)
    
c = Counter()
c.increment()
c.increment()

输出：

<tf.Tensor: shape=(), dtype=int32, numpy=2>

@tf.function
def add_10(x):
    for i in tf.range(10):
        x += 1
    return x

print(tf.autograph.to_code(add_10.python_function))

输出：

def tf__add_10(x):
    do_return = False
    retval_ = ag__.UndefinedReturnValue()
    with ag__.FunctionScope('add_10', 'fscope', ag__.ConversionOptions(recursive=True, user_requested=True, optional_features=(), internal_convert_user_code=True)) as fscope:

        def get_state():
            return (x,)

        def set_state(loop_vars):
            nonlocal x
            (x,) = loop_vars

        def loop_body(itr):
            nonlocal x
            i = itr
            x += 1
        ag__.for_stmt(ag__.converted_call(tf.range, (10,), None, fscope), None, loop_body, get_state, set_state, ('x',), {})
        try:
            do_return = True
            retval_ = fscope.mark_return_value(x)
        except:
            do_return = False
            raise
    (do_return,)
    return ag__.retval(retval_)

def display_tf_code(func):
    from IPython.display import display, Markdown
    if hasattr(func, "python_function"):
        func = func.python_function
    code = tf.autograph.to_code(func)
    display(Markdown('```python\n{}\n```'.format(code)))
    
display_tf_code(add_10)

输出：

<IPython.core.display.Markdown object>

# Custom loss function
def my_mse(y_true, y_pred):
    print("Tracing loss my_mse()")
    return tf.reduce_mean(tf.square(y_pred - y_true))

# Custom metric function
def my_mae(y_true, y_pred):
    print("Tracing metric my_mae()")
    return tf.reduce_mean(tf.abs(y_pred - y_true))

# Custom layer
class MyDense(keras.layers.Layer):
    def __init__(self, units, activation=None, **kwargs):
        super().__init__(**kwargs)
        self.units = units
        self.activation = keras.activations.get(activation)

    def build(self, input_shape):
        self.kernel = self.add_weight(name='kernel', 
                                      shape=(input_shape[1], self.units),
                                      initializer='uniform',
                                      trainable=True)
        self.biases = self.add_weight(name='bias', 
                                      shape=(self.units,),
                                      initializer='zeros',
                                      trainable=True)
        super().build(input_shape)

    def call(self, X):
        print("Tracing MyDense.call()")
        return self.activation(X @ self.kernel + self.biases)
    
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

# Custom model
class MyModel(keras.models.Model):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.hidden1 = MyDense(30, activation="relu")
        self.hidden2 = MyDense(30, activation="relu")
        self.output_ = MyDense(1)

    def call(self, input):
        print("Tracing MyModel.call()")
        hidden1 = self.hidden1(input)
        hidden2 = self.hidden2(hidden1)
        concat = keras.layers.concatenate([input, hidden2])
        output = self.output_(concat)
        return output

model = MyModel()

model.compile(loss=my_mse, optimizer="nadam", metrics=[my_mae])

model.fit(X_train_scaled, y_train, epochs=2, validation_data=(X_valid_scaled, y_valid))
model.evaluate(X_test_scaled, y_test)

输出：

Epoch 1/2
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
362/363 [============================>.] - ETA: 0s - loss: 1.3270 - my_mae: 0.7905Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
363/363 [==============================] - 1s 4ms/step - loss: 1.3255 - my_mae: 0.7900 - val_loss: 0.5569 - val_my_mae: 0.4819
Epoch 2/2
363/363 [==============================] - 1s 4ms/step - loss: 0.4419 - my_mae: 0.4767 - val_loss: 0.4664 - val_my_mae: 0.4576
162/162 [==============================] - 0s 1ms/step - loss: 0.4164 - my_mae: 0.4639

[0.41635245084762573, 0.4639027416706085]

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = MyModel(dynamic=True)

model.compile(loss=my_mse, optimizer="nadam", metrics=[my_mae])

model.fit(X_train_scaled[:64], y_train[:64], epochs=1,
          validation_data=(X_valid_scaled[:64], y_valid[:64]), verbose=0)
model.evaluate(X_test_scaled[:64], y_test[:64], verbose=0)

输出：

Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()

[5.507259368896484, 2.0566811561584473]

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = MyModel()

model.compile(loss=my_mse, optimizer="nadam", metrics=[my_mae], run_eagerly=True)

model.fit(X_train_scaled[:64], y_train[:64], epochs=1,
          validation_data=(X_valid_scaled[:64], y_valid[:64]), verbose=0)
model.evaluate(X_test_scaled[:64], y_test[:64], verbose=0)

输出：

Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()
Tracing MyModel.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing MyDense.call()
Tracing loss my_mse()
Tracing metric my_mae()

[5.507259368896484, 2.0566811561584473]

3.12 TF函数规则

将Python函数转换成TF函数要么用@tf.function()来装饰，要么让Keras来负责，但是也有一些规则：

调用任何外部库，包括NumPy、标准库等，调用只会在跟踪中运行，不会是图的一部分
只能将Python函数的源码转换成TF函数，对于其它文件例如.pyc，则无法生成图。
TF只能捕获迭代张量或数据集的for循环，也就是说要使用for i in tf.range(x)，而不是for i in range(x)。
最好使用矢量化的实现方式，而不是使用循环，否则性能会受到影响。

自定义优化器：

class MyMomentumOptimizer(keras.optimizers.Optimizer):
    def __init__(self, learning_rate=0.001, momentum=0.9, name="MyMomentumOptimizer", **kwargs):
        """Call super().__init__() and use _set_hyper() to store hyperparameters"""
        super().__init__(name, **kwargs)
        self._set_hyper("learning_rate", kwargs.get("lr", learning_rate)) # handle lr=learning_rate
        self._set_hyper("decay", self._initial_decay) # 
        self._set_hyper("momentum", momentum)
    
    def _create_slots(self, var_list):
        """For each model variable, create the optimizer variable associated with it.
        TensorFlow calls these optimizer variables "slots".
        For momentum optimization, we need one momentum slot per model variable.
        """
        for var in var_list:
            self.add_slot(var, "momentum")

    @tf.function
    def _resource_apply_dense(self, grad, var):
        """Update the slots and perform one optimization step for one model variable
        """
        var_dtype = var.dtype.base_dtype
        lr_t = self._decayed_lr(var_dtype) # handle learning rate decay
        momentum_var = self.get_slot(var, "momentum")
        momentum_hyper = self._get_hyper("momentum", var_dtype)
        momentum_var.assign(momentum_var * momentum_hyper - (1. - momentum_hyper)* grad)
        var.assign_add(momentum_var * lr_t)

    def _resource_apply_sparse(self, grad, var):
        raise NotImplementedError

    def get_config(self):
        base_config = super().get_config()
        return {
            **base_config,
            "learning_rate": self._serialize_hyperparameter("learning_rate"),
            "decay": self._serialize_hyperparameter("decay"),
            "momentum": self._serialize_hyperparameter("momentum"),
        }
    
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([keras.layers.Dense(1, input_shape=[8])])
model.compile(loss="mse", optimizer=MyMomentumOptimizer())
model.fit(X_train_scaled, y_train, epochs=5)

输出：

Epoch 1/5
363/363 [==============================] - 1s 2ms/step - loss: 3.8128
Epoch 2/5
363/363 [==============================] - 1s 2ms/step - loss: 1.4877
Epoch 3/5
363/363 [==============================] - 1s 2ms/step - loss: 0.9162
Epoch 4/5
363/363 [==============================] - 1s 2ms/step - loss: 0.7587
Epoch 5/5
363/363 [==============================] - 1s 2ms/step - loss: 0.7050

<tensorflow.python.keras.callbacks.History at 0x4ff57048>