机器学习实践—基于Scikit-Learn、Keras和TensorFlow2第二版—第10章 利用Keras搭建人工神经网络概述(Chapter 10. Introduction to Artifi

ANNs: artificial neural networks,人工神经网络,受人类大脑生物神经元的启发。

人工神经网络是深度学习的核心,其应用广泛、强大并且扩展性好。深度学习在很多IT公司都有布局,例如Google Images、Apple Siri、YouTube视频推荐、DeepMind AlphaGo等等。

0. 导入所需的库

import tensorflow as tf
import matplotlib as mpl
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os

for i in (tf, mpl, np, pd):
    print(i.__name__,": ",i.__version__,sep="")

输出:

tensorflow: 2.2.0
matplotlib: 3.1.2
numpy: 1.17.4
pandas: 0.25.3

1. 从生物神经元到人工神经元

1.1 概述

人工神经网络首次由神经生物学家Warren McCulloch和数学家Walter Pitts于1943年提出。

人工神经网络有很大的发展前景,主要原因有:

  1. 当今世界是大数据的时代,有大量的数据可供神经网络使用,并且ANNs强大的解决问题的能力。
  2. 计算力的巨大提升,根据摩尔定律每2年翻一倍,同时也是由于游戏行业的发展,带动了GPU等的生产和进步,同时云计算也起到很大的作用。
  3. 在实践中,ANNs的一些理论局限性被证明是良性的。

1.2 生物神经元

生物神经元由具有许多分支的树突和很长的轴突组成,轴突的长度是细胞体的几倍到几万倍不等。

1.3 神经元的逻辑运算

McCulloch和Pitts提出了一种非常简单的生物神经元模型,即人工神经元:有一个或多个二进制的输入,最终得出一个二进制的输出。

1.4 感知机(Perceptron)

感知机是一种最简单的人工神经网络,由Frank Rosenblatt于1957年发明。

感知机基于叫作阈值逻辑单元(TLU)的人工神经元,输入和输出都是数字,每个输入都有一个权重。TLU计算输入的加权和,最后经过阶跃函数得到输出。用公式表示如下:

阶跃函数通常是Heaviside阶跃函数(也叫单位阶跃函数),或是sign函数。

Heaviside阶跃函数:大于等于0返回1,否则返回0

sign函数:大于0返回1,小于0返回-1,等于0返回0

如果当前层的所有神经元与前一层每个神经元都有连接,则称该层为全连接层(fully connected layer,也叫dense layer)。全连接层计算公式如下:

其中:

  • X:表示输入特征,每行一个样本,每列一特征
  • W:表示权重矩阵
  • b:表示偏置向量
  • 𝜙:表示激活函数,如果是人工神经元就是阶跃函数

感知机权重更新公式如下:

其中:

  • 𝜂:表示学习率

sklearn中提供了Perceptron类:

from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

iris = load_iris()
X = iris.data[:,(2,3)]
y = (iris.target==0).astype(np.int)

per_clf = Perceptron(max_iter=1000, tol=1e-3, random_state=42)
per_clf.fit(X, y)

输出:

Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=1.0,
           fit_intercept=True, max_iter=1000, n_iter_no_change=5, n_jobs=None,
           penalty=None, random_state=42, shuffle=True, tol=0.001,
           validation_fraction=0.1, verbose=0, warm_start=False)

sklearn中Perceptron类使用梯度下降算法,因此该类也等同于设置:loss="perceptron", learning_rate="constant", eta0=1, penalty=None时的SGDClassifier类。

y_pred = per_clf.predict([[2, 0.5]])
y_pred

输出:

array([1])

如上输出所示,Perceptron类给出的类别值,而不是属于每个类别的概率。因此,在实际应用中更常用逻辑回归而不是感知机。

a = - per_clf.coef_[0][0] / per_clf.coef_[0][1]
b = - per_clf.intercept_ / per_clf.coef_[0][1]

axes = [0, 5, 0, 2]

x0, x1 = np.meshgrid(
        np.linspace(axes[0], axes[1], 500).reshape(-1, 1),
        np.linspace(axes[2], axes[3], 200).reshape(-1, 1),
    )
X_new = np.c_[x0.ravel(), x1.ravel()]
y_predict = per_clf.predict(X_new)
zz = y_predict.reshape(x0.shape)

plt.figure(figsize=(12, 5))
plt.plot(X[y==0, 0], X[y==0, 1], "bs", label="Not Iris-Setosa")
plt.plot(X[y==1, 0], X[y==1, 1], "yo", label="Iris-Setosa")

plt.plot([axes[0], axes[1]], [a * axes[0] + b, a * axes[1] + b], "k-", linewidth=3)
from matplotlib.colors import ListedColormap
custom_cmap = ListedColormap(['#9898ff', '#fafab0'])

plt.contourf(x0, x1, zz, cmap=custom_cmap)
plt.xlabel("Petal length", fontsize=14)
plt.ylabel("Petal width", fontsize=14)
plt.legend(loc="lower right", fontsize=14)
plt.axis(axes)

plt.show()

输出:

如上输出所示为感知机的决策边界。

感知机有一个致命的缺点,就是无法解决机器学习中很简单的异或问题。也就是因为研究者发现感知机连这么简单的异或问题都无法解决,有大部分研究者开始放弃对神经网络的研究,纷纷将目光转向其它研究领域。

1.5 多层感知机和反向传播算法

以上提到的感知机无法解决异或问题是针对单层感知机。而多层感知机可以解决异或问题。

多层感知机由输入层、多个隐藏层和输出层组成,除了输出层,其它层神经元都有一个偏置,并与下一层是全连接的。由于信息的流动是单向的,即从输入层流向输出层,因此这种多层感知机也叫作前馈神经网络。

深度神经网络:具有大量隐藏层的人工神经网络。

多年来,人们一直在研究如何训练多层感知机。直到1986年,由David Rumelhart、Geoffrey Hinton和Ronald Williams一起发表了著名的具有突破性的关于反向传播算法的文章。

梯度下降算法:神经网络中有两个通路,加权求和的信息向前传播,预测值与真实值之间的误差(损失函数)的梯度反向(向后)传播,然后根据梯度更新参数,不断重复前向传播和反向传播,直到网络收敛。

一般情况下,权重参数W进行随机初始化,偏置b一般进行0初始化。

由于阶跃函数在反向传播过程中无法求导,因此常常被替换成更常用的激活函数。

为什么要用激活函数?增加更多非线性。对于多层全连接神经网络,如果每层都不使用激活函数,则输出y=(((XW1)W2)...Wn,即相当于y=XW1W2...Wn,令W=W1W2...Wn,则该网络就相当于单层网络。因此,如果不使用激活函数,无论多少层的神经网络还是相当于单层神经网络。反过来说,任何足够大的具有非线性激活的神经网络理论上能够拟合任何连接函数。

常用的激活函数如下:

  1. Sigmoid函数:f(x) = 1/(1+exp(-x)),函数是S形曲线,在实数定义域内连续且可导,值域(0,1)。因为函数输出分布在0和1之间,所以输出可以被看作是概率。
  2. 双曲正切函数:tanh(x) = (exp(x) - exp(-x)) / ( exp(x) + exp(-x),函数是S形曲线,在实数定义域内连续且可导,值域(-1,1)。因此,在训练刚开始时输出更可能在0附近,有利于模型的收敛。
  3. ReLU函数:relu(x) = max(0,x),实数定义域内连续,但在0处不可微,但在实际应用中如果碰到0处求导的情况,默认直接让其导数等于0,这样就解决了0处不可导的问题。ReLU用于激活函数的优点是计算导数简单,如果X大于0,导数为1,如果X小于0,导数为0。
  4. softplus:ReLU函数的变体,softplus(x)=log(1+exp(x))。
  5. softmax:常常用于多类别分类任务

更多的激活函数请参考:https://mp.csdn.net/console/editor/html/107147043

mpl.rcParams['font.sans-serif'] = ['SimHei']
mpl.rcParams['font.serif'] = ['SimHei']
mpl.rcParams['axes.unicode_minus'] = False

def sigmoid(x):
    return 1/(1+np.exp(-x))

def relu(x):
    return np.maximum(0,x)

def derivative(f, x, eps=0.000001):
    return (f(x + eps) - f(x - eps))/(2 * eps)

z = np.linspace(-5, 5, 200)

plt.figure(figsize=(12,5))

plt.subplot(121)
plt.plot(z, np.sign(z), "r-", linewidth=1, label="Step")     # 阶跃函数
plt.plot(z, sigmoid(z), "g--", linewidth=2, label="Sigmoid") # Sigmoid函数
plt.plot(z, np.tanh(z), "b-", linewidth=2, label="Tanh")     # tanh函数
plt.plot(z, relu(z), "m-.", linewidth=2, label="ReLU")       # ReLU函数
plt.grid(True)
plt.legend(loc="center right", fontsize=14)
plt.title("激活函数", fontsize=14)
plt.axis([-5, 5, -1.2, 1.2])

plt.subplot(122)
plt.plot(z, derivative(np.sign, z), "r-", linewidth=1, label="Step")
plt.plot(0, 0, "ro", markersize=5)
plt.plot(0, 0, "rx", markersize=10)
plt.plot(z, derivative(sigmoid, z), "g--", linewidth=2, label="Sigmoid")
plt.plot(z, derivative(np.tanh, z), "b-", linewidth=2, label="Tanh")
plt.plot(z, derivative(relu, z), "m-.", linewidth=2, label="ReLU")
plt.grid(True)
plt.legend(fontsize=14)
plt.title("Derivatives", fontsize=14)
plt.title(r"导数", fontsize=14)
plt.axis([-5, 5, -0.2, 1.2])

plt.tight_layout()
plt.show()

输出:

1.6 多层感知机回归(Regression MLPs)

Huber Loss:当误差小于阈值delta(通常为1)时求平方,否则求绝对值。Huber损失函数其实是圴方误差MSE和绝对平均误差MAE的综合。

1.7 多层感知机分类(Classification MLPs)

softmax激活函数常常用于多类别分类任务。

2. 利用Keras实现多层感知机

2.1 Keras

Keras是一个用于创建、训练、评估和运行各类神经网络的高级API。官网:https://keras.io/

Keras由François Chollet于2015年作为开源项目发布,作为高级API,目前后端支持调用TensorFlow、Microsoft Cognitive Toolkit(CNTK)和Theano。

在TensorFlow2中也对Keras进行了实现,即tf.keras,同时加入了一些原生Keras没有的特性,例如tf.keras支持TensorFlowr的Data API,因此tf.keras后端只支持TensorFlow。

除了Keras、TensorFlow等深度学习库外,PyTorch也是非常流行,其源于Facebook,PyTorch的API非常的类似于Keras。而这些API都受到了Python库sklearn和Chainer的启发。

TensorFlow 1.x存在许多缺点,很难入门,易用性不好,导致2018年PyTorch大火。TensorFlow2放弃对1.x的延续或继承而进行了重写,进行了重大的调整,并且采用Keras为官方高级API,并精简和清理了大量冗余的API。同样PyTorch1.0解决了PyTorch的主要不足。

2.2 TensorFlow2安装

安装CPU版本:

pip install tensorflow -i https://pypi.douban.com/simple

其中:-i参数指定安装下载源。直接使用pip install tensorflow安装时由于服务器在国外,导致下载速度很慢很慢很慢,可能会报超时错误。因此可以指定国内镜像源从而加快下载速度。常用的国内镜像源如下:

TensorFlow2 GPU版本安装请参考:https://blog.csdn.net/Jwenxue/article/details/89300028

import tensorflow as tf

print(tf.__version__)
print(tf.keras.__version__)

输出:

2.2.0
2.3.0-tf

2.3 利用连续API构建图像分类器

Fashion MNIST数据集:与MNIST类似,共有7万张灰度图像,每张图像大小28*28,总共10个类别,每张图像代表不同的时尚用品。因此Fashion MNIST数据集相比于MNIST比较复杂,利用普通的线性模型在MNIST数据集上能达到92%的准确率,而在Fashion MNIST上只能达到83%。

fashion_mnist = tf.keras.datasets.fashion_mnist
(X_train_full, y_train_full),(X_test, y_test) = fashion_mnist.load_data()

for i in (X_train_full, y_train_full, X_test, y_test):
    print(i.shape, i.dtype)

输出:

(60000, 28, 28) uint8
(60000,) uint8
(10000, 28, 28) uint8
(10000,) uint8

如上输出所示,sklearn中加载mnist数据集后每个样本为784的向量,向量中每个值的范围是0.0到255.0的浮点数;而用tf.keras加载的MNIST、Fashion MNIST数据集每个样本是28*28的矩阵,每个值的范围是0到255的整数。

X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0 # 将数据归一化到0.0到1.0之间
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_test = X_test / 255.0

for i in (X_train, y_train, X_valid, y_valid, X_test, y_test):
    print(i.shape, i.dtype)

输出:

(55000, 28, 28) float64
(55000,) uint8
(5000, 28, 28) float64
(5000,) uint8
(10000, 28, 28) float64
(10000,) uint8

Fashion MNIST有10个类别的时尚用品:

class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
               "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]
# 查看前25张样本图像
plt.figure(figsize=(12,5))

for i in range(10):
    plt.subplot(2,5,i+1) 
    plt.imshow(X_train[i], cmap="binary")
    plt.axis('off')
    plt.title(str(y_train[i])+": "+class_names[y_train[i]])

plt.tight_layout()
plt.show()

输出:

如上输出所示,上图所示为训练集中前10张的图像,每张图像的标签是0-9之间的整数。

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=[28, 28]))  # 对输入数据进行X.reshape(-1,1)操作
model.add(tf.keras.layers.Dense(300, activation="relu")) # "relu"等价于tf.keras.activations.relu
model.add(tf.keras.layers.Dense(100, activation="relu"))
model.add(tf.keras.layers.Dense(10, activation="softmax"))

tf.keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

也可以使用如下的方法将所有层以list的形式写在Sequential()中:

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=[28, 28]),
    tf.keras.layers.Dense(300, activation="relu"),
    tf.keras.layers.Dense(100, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax")
])
model.layers

输出:

[<tensorflow.python.keras.layers.core.Flatten at 0x2af3e588>,
 <tensorflow.python.keras.layers.core.Dense at 0x24b49358>,
 <tensorflow.python.keras.layers.core.Dense at 0x24dd25f8>,
 <tensorflow.python.keras.layers.core.Dense at 0x2465a6d8>]
model.summary()

输出:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 300)               235500    
_________________________________________________________________
dense_1 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1010      
=================================================================
Total params: 266,610
Trainable params: 266,610
Non-trainable params: 0
_________________________________________________________________

如上输出所示,layer的名字由TensorFlow自动生成,也可以自定义;None代表batch-size,可以是任意大小。观察可以发现全连接有大量的参数。

tf.keras.utils.plot_model(model, "my_mnist_model.png",show_shapes=True)

输出:

hidden1 = model.layers[1]
hidden1.name

输出:

'dense'
model.get_layer(hidden1.name) is hidden1

输出:

True
weights, biases = hidden1.get_weights()

print(weights.shape)
print(weights)

输出:

(784, 300)
[[ 0.01613547  0.03505557 -0.03335547 ...  0.04602281  0.05135202
  -0.07207453]
 [-0.05218205  0.01972669 -0.01451721 ...  0.01652082 -0.05650191
  -0.04502674]
 [ 0.04127794  0.06563474 -0.00815298 ... -0.07359543 -0.02140902
  -0.07401732]
 ...
 [ 0.06348912 -0.0053833  -0.03209457 ... -0.02296805 -0.05252977
   0.0024286 ]
 [-0.01588191 -0.02908474  0.05714309 ...  0.06348763  0.05528355
   0.04353456]
 [-0.03586832 -0.03909247  0.0099389  ... -0.02353308 -0.00736362
  -0.03326848]]
print(biases.shape)
print(biases)

输出:

(300,)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

如上输出所示,权重W进行了随机初始化,而偏置b进行了全0初始化。如果想指定初始化方法,可以使用超参数kernel_initializer和bias_initializer进行指定。

model.compile(loss="sparse_categorical_crossentropy",optimizer="sgd",metrics=["accuracy"])

如上过程对模型进行编译,主要指定所使用的损失函数,优化器和衡量指标。

loss使用sparse_categorical_crossentropy是因为标签是稀疏的,每个样本有0-9之间的一个数字作为标签;如果one_hot格式的标签,则需要使用categorical_crossentropy。可以使用tf.keras.utils.to_categorical()函数转换成one_hot编码。

优化器使用sgd,即随机梯度下降,默认学习率为0.01

history = model.fit(X_train, y_train, epochs=30, validation_data=(X_valid, y_valid))

输出:

Epoch 1/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.7146 - accuracy: 0.7649 - val_loss: 0.5156 - val_accuracy: 0.8232
Epoch 2/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.4915 - accuracy: 0.8295 - val_loss: 0.4821 - val_accuracy: 0.8302
Epoch 3/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.4455 - accuracy: 0.8452 - val_loss: 0.4759 - val_accuracy: 0.8238
Epoch 4/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.4165 - accuracy: 0.8552 - val_loss: 0.4037 - val_accuracy: 0.8604
Epoch 5/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.3971 - accuracy: 0.8602 - val_loss: 0.3886 - val_accuracy: 0.8684
Epoch 6/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.3806 - accuracy: 0.8660 - val_loss: 0.3813 - val_accuracy: 0.8696
Epoch 7/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.3664 - accuracy: 0.8710 - val_loss: 0.3918 - val_accuracy: 0.8668
Epoch 8/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.3547 - accuracy: 0.8739 - val_loss: 0.3637 - val_accuracy: 0.8728
Epoch 9/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.3445 - accuracy: 0.8785 - val_loss: 0.3517 - val_accuracy: 0.8770
Epoch 10/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.3358 - accuracy: 0.8812 - val_loss: 0.3424 - val_accuracy: 0.8778
Epoch 11/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.3274 - accuracy: 0.8838 - val_loss: 0.3428 - val_accuracy: 0.8788
Epoch 12/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.3195 - accuracy: 0.8867 - val_loss: 0.3481 - val_accuracy: 0.8764
Epoch 13/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.3127 - accuracy: 0.8886 - val_loss: 0.3323 - val_accuracy: 0.8820
Epoch 14/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.3042 - accuracy: 0.8909 - val_loss: 0.3246 - val_accuracy: 0.8844
Epoch 15/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2992 - accuracy: 0.8927 - val_loss: 0.3179 - val_accuracy: 0.8878
Epoch 16/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2925 - accuracy: 0.8945 - val_loss: 0.3179 - val_accuracy: 0.8878
Epoch 17/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2869 - accuracy: 0.8973 - val_loss: 0.3201 - val_accuracy: 0.8858
Epoch 18/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2822 - accuracy: 0.8986 - val_loss: 0.3196 - val_accuracy: 0.8844
Epoch 19/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2769 - accuracy: 0.9006 - val_loss: 0.3356 - val_accuracy: 0.8764
Epoch 20/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2727 - accuracy: 0.9029 - val_loss: 0.3286 - val_accuracy: 0.8826
Epoch 21/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2667 - accuracy: 0.9047 - val_loss: 0.3069 - val_accuracy: 0.8884
Epoch 22/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2612 - accuracy: 0.9060 - val_loss: 0.3109 - val_accuracy: 0.8890
Epoch 23/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2567 - accuracy: 0.9073 - val_loss: 0.3172 - val_accuracy: 0.8826
Epoch 24/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2531 - accuracy: 0.9091 - val_loss: 0.3078 - val_accuracy: 0.8934
Epoch 25/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2494 - accuracy: 0.9095 - val_loss: 0.3025 - val_accuracy: 0.8940
Epoch 26/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2451 - accuracy: 0.9117 - val_loss: 0.3074 - val_accuracy: 0.8874
Epoch 27/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2409 - accuracy: 0.9135 - val_loss: 0.3001 - val_accuracy: 0.8924
Epoch 28/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2370 - accuracy: 0.9148 - val_loss: 0.2903 - val_accuracy: 0.8944
Epoch 29/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2324 - accuracy: 0.9169 - val_loss: 0.2959 - val_accuracy: 0.8928
Epoch 30/30
1719/1719 [==============================] - 4s 2ms/step - loss: 0.2294 - accuracy: 0.9175 - val_loss: 0.3083 - val_accuracy: 0.8898

mode.fit()函数中参数epochs默认值为1,默认batch_size为32,故55000/32=1718.75。

也可以通过指定validation_split=0.1指定将训练集中10%的数据用于验证,而不用再单独传入验证集。

history.params

输出:

{'verbose': 1, 'epochs': 30, 'steps': 1719}
print(history.epoch)

输出:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
history.history.keys()

输出:

dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
pd.DataFrame(history.history).plot(figsize=(8,5))
plt.grid(True)
plt.gca().set_ylim(0,1)
plt.show()

输出:

模型评估:

model.evaluate(X_test, y_test)

输出:

313/313 [==============================] - 1s 2ms/step - loss: 0.3415 - accuracy: 0.8767

[0.34149253368377686, 0.8766999840736389]

如上输出所示,输出为一个列表,第一个值代表损失,第二个值代表准确率。

下面就可以用模型进行预测了:

X_new = X_test[:3]
y_proba = model.predict(X_new)
y_proba.round(2)

输出:

array([[0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.02, 0.  , 0.98],
       [0.  , 0.  , 1.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 1.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ]],
      dtype=float32)
y_pred = model.predict_classes(X_new)
y_pred

输出:

array([9, 2, 1], dtype=int64)
np.array(class_names)[y_pred]

输出:

array(['Ankle boot', 'Pullover', 'Trouser'], dtype='<U11')
y_new = y_test[:3]
y_new

输出:

array([9, 2, 1], dtype=uint8)
plt.figure(figsize=(8, 5))
for index, image in enumerate(X_new):
    plt.subplot(1, 3, index + 1)
    plt.imshow(image, cmap="binary", interpolation="nearest")
    plt.axis('off')
    plt.title(class_names[y_test[index]], fontsize=12)

plt.tight_layout()
plt.show()

输出:

2.4 利用连续API实现多层感知机回归任务

使用加利福尼亚房价数据集(california_hosing)进行分析:

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = train_test_split(housing.data, housing.target, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)

for i in (X_train, X_valid, X_test, y_train, y_valid, y_test):
    print(i.shape)

输出:

(11610, 8)
(3870, 8)
(5160, 8)
(11610,)
(3870,)
(5160,)
np.random.seed(42)
tf.random.set_seed(42)

# 构建模型结构
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(30, activation="relu",input_shape=(X_train.shape[1:])),
    tf.keras.layers.Dense(1)
])

# 编译模型
model.compile(loss="mean_squared_error",optimizer=tf.keras.optimizers.SGD(learning_rate=1e-3))

model.summary()

输出:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 30)                270       
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 31        
=================================================================
Total params: 301
Trainable params: 301
Non-trainable params: 0
_________________________________________________________________
history = model.fit(X_train, y_train, epochs=20, validation_data=(X_valid, y_valid))

输出:

Epoch 1/20
363/363 [==============================] - 1s 3ms/step - loss: 1.6419 - val_loss: 0.8560
Epoch 2/20
363/363 [==============================] - 1s 2ms/step - loss: 0.7047 - val_loss: 0.6531
Epoch 3/20
363/363 [==============================] - 1s 2ms/step - loss: 0.6345 - val_loss: 0.6099
Epoch 4/20
363/363 [==============================] - 1s 2ms/step - loss: 0.5977 - val_loss: 0.5658
Epoch 5/20
363/363 [==============================] - 1s 2ms/step - loss: 0.5706 - val_loss: 0.5355
Epoch 6/20
363/363 [==============================] - 1s 2ms/step - loss: 0.5472 - val_loss: 0.5173
Epoch 7/20
363/363 [==============================] - 1s 2ms/step - loss: 0.5288 - val_loss: 0.5081
Epoch 8/20
363/363 [==============================] - 1s 2ms/step - loss: 0.5130 - val_loss: 0.4799
Epoch 9/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4992 - val_loss: 0.4690
Epoch 10/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4875 - val_loss: 0.4656
Epoch 11/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4777 - val_loss: 0.4482
Epoch 12/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4688 - val_loss: 0.4479
Epoch 13/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4615 - val_loss: 0.4296
Epoch 14/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4547 - val_loss: 0.4233
Epoch 15/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4488 - val_loss: 0.4176
Epoch 16/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4435 - val_loss: 0.4123
Epoch 17/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4389 - val_loss: 0.4071
Epoch 18/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4347 - val_loss: 0.4037
Epoch 19/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4306 - val_loss: 0.4000
Epoch 20/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4273 - val_loss: 0.3969
mse_test = model.evaluate(X_test, y_test)

输出:

162/162 [==============================] - 0s 1ms/step - loss: 0.4212
X_new = X_test[:5]
y_pred = model.predict(X_new)
print(y_pred)

输出:

[[0.38856655]
 [1.6792021 ]
 [3.1022797 ]
 [2.6324043 ]
 [2.6914022 ]]
y_test[:5]

输出:

array([0.477  , 0.458  , 5.00001, 2.186  , 2.78   ])

如上输出所示,预测结果与真实结果还是有一定的差异。

plt.plot(pd.DataFrame(history.history))
plt.grid(True)
plt.gca().set_ylim(0,1)
plt.show()

输出:

2.5 利用Keras函数式API实现复杂模型

np.random.seed(42)
tf.random.set_seed(42)

input_ = tf.keras.layers.Input(shape=X_train.shape[1:])
hidden1 = tf.keras.layers.Dense(30, activation="relu")(input_) # 通过类似函数的方式转入input_,因此叫函数式API
hidden2 = tf.keras.layers.Dense(30, activation="relu")(hidden1)
concat = tf.keras.layers.concatenate([input_, hidden2])
output = tf.keras.layers.Dense(1)(concat)

model = tf.keras.models.Model(inputs=[input_], outputs=[output])

model.summary()

输出:

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 8)]          0                                            
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 30)           270         input_1[0][0]                    
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 30)           930         dense_2[0][0]                    
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 38)           0           input_1[0][0]                    
                                                                 dense_3[0][0]                    
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 1)            39          concatenate[0][0]                
==================================================================================================
Total params: 1,239
Trainable params: 1,239
Non-trainable params: 0
__________________________________________________________________________________________________
tf.keras.utils.plot_model(model, "wide_deep_model.png", show_shapes=True)

输出:

# 模型编译
model.compile(loss="mean_squared_error", optimizer=tf.keras.optimizers.SGD(learning_rate=1e-3))
history = model.fit(X_train, y_train, epochs=20, validation_data=(X_valid,y_valid))

输出:

Epoch 1/20
363/363 [==============================] - 1s 3ms/step - loss: 1.2611 - val_loss: 3.3940
Epoch 2/20
363/363 [==============================] - 1s 2ms/step - loss: 0.6580 - val_loss: 0.9360
Epoch 3/20
363/363 [==============================] - 1s 3ms/step - loss: 0.5878 - val_loss: 0.5649
Epoch 4/20
363/363 [==============================] - 1s 2ms/step - loss: 0.5582 - val_loss: 0.5712
Epoch 5/20
363/363 [==============================] - 1s 3ms/step - loss: 0.5347 - val_loss: 0.5045
Epoch 6/20
363/363 [==============================] - 1s 2ms/step - loss: 0.5158 - val_loss: 0.4831
Epoch 7/20
363/363 [==============================] - 1s 3ms/step - loss: 0.5002 - val_loss: 0.4639
Epoch 8/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4876 - val_loss: 0.4638
Epoch 9/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4760 - val_loss: 0.4421
Epoch 10/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4659 - val_loss: 0.4313
Epoch 11/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4577 - val_loss: 0.4345
Epoch 12/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4498 - val_loss: 0.4168
Epoch 13/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4428 - val_loss: 0.4230
Epoch 14/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4366 - val_loss: 0.4047
Epoch 15/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4307 - val_loss: 0.4078
Epoch 16/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4257 - val_loss: 0.3938
Epoch 17/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4210 - val_loss: 0.3952
Epoch 18/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4167 - val_loss: 0.3860
Epoch 19/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4121 - val_loss: 0.3827
Epoch 20/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4088 - val_loss: 0.4054
mse_test = model.evaluate(X_test, y_test)
mse_test

输出:

162/162 [==============================] - 0s 1ms/step - loss: 0.4032

0.4031672179698944
y_pred = model.predict(X_new)
y_pred

输出:

array([[0.47010738],
       [1.8735046 ],
       [3.3798232 ],
       [2.7344117 ],
       [2.8069582 ]], dtype=float32)

如果想通过wide通路和deep通路传入不同的数据特征子集,就可以使用下面的例子实现:

np.random.seed(42)
tf.random.set_seed(42)

input_A = tf.keras.layers.Input(shape=[5],name="wide_input")
input_B = tf.keras.layers.Input(shape=[6],name="deep_input")
hidden1 = tf.keras.layers.Dense(30, activation="relu")(input_B)
hidden2 = tf.keras.layers.Dense(30, activation="relu")(hidden1)
concat = tf.keras.layers.concatenate([input_A, hidden2])
output = tf.keras.layers.Dense(1, name="output")(concat)
model = tf.keras.models.Model(inputs=[input_A, input_B], outputs=[output])

model.summary()

输出:

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
deep_input (InputLayer)         [(None, 6)]          0                                            
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 30)           210         deep_input[0][0]                 
__________________________________________________________________________________________________
wide_input (InputLayer)         [(None, 5)]          0                                            
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 30)           930         dense_5[0][0]                    
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 35)           0           wide_input[0][0]                 
                                                                 dense_6[0][0]                    
__________________________________________________________________________________________________
output (Dense)                  (None, 1)            36          concatenate_1[0][0]              
==================================================================================================
Total params: 1,176
Trainable params: 1,176
Non-trainable params: 0
__________________________________________________________________________________________________
tf.keras.utils.plot_model(model, "wide_deep_model2.png",show_shapes=True)

输出:

model.compile(loss="mse",optimizer=tf.keras.optimizers.SGD(learning_rate=1e-3))

X_train_A, X_train_B = X_train[:, :5], X_train[:, 2:]
X_valid_A, X_valid_B = X_valid[:, :5], X_valid[:, 2:]
X_test_A, X_test_B = X_test[:, :5], X_test[:, 2:]
X_new_A, X_new_B = X_test_A[:3], X_test_B[:3]

history = model.fit((X_train_A,X_train_B),y_train,epochs=20,validation_data=((X_valid_A,X_valid_B),y_valid))

输出:

Epoch 1/20
363/363 [==============================] - 1s 3ms/step - loss: 1.8145 - val_loss: 0.8072
Epoch 2/20
363/363 [==============================] - 1s 3ms/step - loss: 0.6771 - val_loss: 0.6658
Epoch 3/20
363/363 [==============================] - 1s 3ms/step - loss: 0.5979 - val_loss: 0.5687
Epoch 4/20
363/363 [==============================] - 1s 3ms/step - loss: 0.5584 - val_loss: 0.5296
Epoch 5/20
363/363 [==============================] - 1s 3ms/step - loss: 0.5334 - val_loss: 0.4993
Epoch 6/20
363/363 [==============================] - 1s 2ms/step - loss: 0.5120 - val_loss: 0.4811
Epoch 7/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4970 - val_loss: 0.4696
Epoch 8/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4843 - val_loss: 0.4496
Epoch 9/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4730 - val_loss: 0.4404
Epoch 10/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4644 - val_loss: 0.4315
Epoch 11/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4570 - val_loss: 0.4268
Epoch 12/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4510 - val_loss: 0.4166
Epoch 13/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4462 - val_loss: 0.4125
Epoch 14/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4421 - val_loss: 0.4074
Epoch 15/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4385 - val_loss: 0.4044
Epoch 16/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4356 - val_loss: 0.4007
Epoch 17/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4322 - val_loss: 0.4013
Epoch 18/20
363/363 [==============================] - 1s 3ms/step - loss: 0.4305 - val_loss: 0.3987
Epoch 19/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4274 - val_loss: 0.3934
Epoch 20/20
363/363 [==============================] - 1s 2ms/step - loss: 0.4261 - val_loss: 0.4204
mse_test = model.evaluate((X_test_A, X_test_B), y_test)
mse_test

输出:

162/162 [==============================] - 0s 1ms/step - loss: 0.4219

0.42188265919685364
y_pred = model.predict((X_new_A, X_new_B))
y_pred

输出:

array([[0.30591238],
       [1.9540673 ],
       [3.4426107 ]], dtype=float32)

添加辅助输出用于正则化:

np.random.seed(42)
tf.random.set_seed(42)

input_A = tf.keras.layers.Input(shape=[5],name="wide_input")
input_B = tf.keras.layers.Input(shape=[6],name="deep_input")
hidden1 = tf.keras.layers.Dense(30, activation="relu")(input_B)
hidden2 = tf.keras.layers.Dense(30, activation="relu")(hidden1)
concat = tf.keras.layers.concatenate([input_A, hidden2])
output = tf.keras.layers.Dense(1, name="output")(concat)
aux_output = tf.keras.layers.Dense(1,name="aux_output")(hidden2)
model = tf.keras.models.Model(inputs=[input_A, input_B], outputs=[output, aux_output])

model.summary()

输出:

Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
deep_input (InputLayer)         [(None, 6)]          0                                            
__________________________________________________________________________________________________
dense_7 (Dense)                 (None, 30)           210         deep_input[0][0]                 
__________________________________________________________________________________________________
wide_input (InputLayer)         [(None, 5)]          0                                            
__________________________________________________________________________________________________
dense_8 (Dense)                 (None, 30)           930         dense_7[0][0]                    
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 35)           0           wide_input[0][0]                 
                                                                 dense_8[0][0]                    
__________________________________________________________________________________________________
output (Dense)                  (None, 1)            36          concatenate_2[0][0]              
__________________________________________________________________________________________________
aux_output (Dense)              (None, 1)            31          dense_8[0][0]                    
==================================================================================================
Total params: 1,207
Trainable params: 1,207
Non-trainable params: 0
__________________________________________________________________________________________________
tf.keras.utils.plot_model(model, "wide_deep_model_aux_output.png",show_shapes=True)

输出:

model.compile(loss=["mse","mse"],loss_weights=[0.9,0.1],optimizer=tf.keras.optimizers.SGD(learning_rate=1e-3))

history = model.fit([X_train_A, X_train_B],[y_train, y_train],epochs=20,validation_data=([X_valid_A,X_valid_B],[y_valid,y_valid]))

输出:

Epoch 1/20
363/363 [==============================] - 1s 4ms/step - loss: 2.1365 - output_loss: 1.9196 - aux_output_loss: 4.0890 - val_loss: 1.6233 - val_output_loss: 0.8468 - val_aux_output_loss: 8.6117
Epoch 2/20
363/363 [==============================] - 1s 4ms/step - loss: 0.8905 - output_loss: 0.6969 - aux_output_loss: 2.6326 - val_loss: 1.5163 - val_output_loss: 0.6836 - val_aux_output_loss: 9.0109
Epoch 3/20
363/363 [==============================] - 1s 4ms/step - loss: 0.7429 - output_loss: 0.6088 - aux_output_loss: 1.9499 - val_loss: 1.4639 - val_output_loss: 0.6229 - val_aux_output_loss: 9.0326
Epoch 4/20
363/363 [==============================] - 1s 4ms/step - loss: 0.6771 - output_loss: 0.5691 - aux_output_loss: 1.6485 - val_loss: 1.3388 - val_output_loss: 0.5481 - val_aux_output_loss: 8.4552
Epoch 5/20
363/363 [==============================] - 1s 4ms/step - loss: 0.6381 - output_loss: 0.5434 - aux_output_loss: 1.4911 - val_loss: 1.2177 - val_output_loss: 0.5194 - val_aux_output_loss: 7.5030
Epoch 6/20
363/363 [==============================] - 1s 4ms/step - loss: 0.6079 - output_loss: 0.5207 - aux_output_loss: 1.3923 - val_loss: 1.0935 - val_output_loss: 0.5106 - val_aux_output_loss: 6.3396
Epoch 7/20
363/363 [==============================] - 1s 4ms/step - loss: 0.5853 - output_loss: 0.5040 - aux_output_loss: 1.3175 - val_loss: 0.9918 - val_output_loss: 0.5115 - val_aux_output_loss: 5.3151
Epoch 8/20
363/363 [==============================] - 1s 4ms/step - loss: 0.5666 - output_loss: 0.4898 - aux_output_loss: 1.2572 - val_loss: 0.8733 - val_output_loss: 0.4733 - val_aux_output_loss: 4.4740
Epoch 9/20
363/363 [==============================] - 1s 4ms/step - loss: 0.5504 - output_loss: 0.4771 - aux_output_loss: 1.2101 - val_loss: 0.7832 - val_output_loss: 0.4555 - val_aux_output_loss: 3.7323
Epoch 10/20
363/363 [==============================] - 1s 4ms/step - loss: 0.5373 - output_loss: 0.4671 - aux_output_loss: 1.1695 - val_loss: 0.7170 - val_output_loss: 0.4604 - val_aux_output_loss: 3.0262
Epoch 11/20
363/363 [==============================] - 1s 4ms/step - loss: 0.5266 - output_loss: 0.4591 - aux_output_loss: 1.1344 - val_loss: 0.6510 - val_output_loss: 0.4293 - val_aux_output_loss: 2.6468
Epoch 12/20
363/363 [==============================] - 1s 4ms/step - loss: 0.5173 - output_loss: 0.4520 - aux_output_loss: 1.1048 - val_loss: 0.6051 - val_output_loss: 0.4310 - val_aux_output_loss: 2.1722
Epoch 13/20
363/363 [==============================] - 1s 4ms/step - loss: 0.5095 - output_loss: 0.4465 - aux_output_loss: 1.0765 - val_loss: 0.5644 - val_output_loss: 0.4161 - val_aux_output_loss: 1.8992
Epoch 14/20
363/363 [==============================] - 1s 4ms/step - loss: 0.5027 - output_loss: 0.4417 - aux_output_loss: 1.0511 - val_loss: 0.5354 - val_output_loss: 0.4119 - val_aux_output_loss: 1.6466
Epoch 15/20
363/363 [==============================] - 1s 4ms/step - loss: 0.4967 - output_loss: 0.4376 - aux_output_loss: 1.0280 - val_loss: 0.5124 - val_output_loss: 0.4047 - val_aux_output_loss: 1.4812
Epoch 16/20
363/363 [==============================] - 1s 4ms/step - loss: 0.4916 - output_loss: 0.4343 - aux_output_loss: 1.0070 - val_loss: 0.4934 - val_output_loss: 0.4034 - val_aux_output_loss: 1.3035
Epoch 17/20
363/363 [==============================] - 1s 4ms/step - loss: 0.4867 - output_loss: 0.4311 - aux_output_loss: 0.9872 - val_loss: 0.4801 - val_output_loss: 0.3984 - val_aux_output_loss: 1.2150
Epoch 18/20
363/363 [==============================] - 1s 4ms/step - loss: 0.4829 - output_loss: 0.4289 - aux_output_loss: 0.9686 - val_loss: 0.4694 - val_output_loss: 0.3962 - val_aux_output_loss: 1.1279
Epoch 19/20
363/363 [==============================] - 1s 4ms/step - loss: 0.4785 - output_loss: 0.4260 - aux_output_loss: 0.9510 - val_loss: 0.4580 - val_output_loss: 0.3936 - val_aux_output_loss: 1.0372
Epoch 20/20
363/363 [==============================] - 1s 4ms/step - loss: 0.4756 - output_loss: 0.4246 - aux_output_loss: 0.9344 - val_loss: 0.4655 - val_output_loss: 0.4048 - val_aux_output_loss: 1.0118
total_loss, main_loss, aux_loss = model.evaluate([X_test_A,X_test_B],[y_test,y_test])

输出:

162/162 [==============================] - 0s 2ms/step - loss: 0.4668 - output_loss: 0.4178 - aux_output_loss: 0.9082
y_pred_main, y_pred_aux = model.predict([X_new_A,X_new_B])
print(y_pred_main)
print()
print(y_pred_aux)

输出:

[[0.2676243]
 [1.980763 ]
 [3.3396282]]

[[0.9593649]
 [1.9240991]
 [2.515281 ]]

2.6 利用子类API构建动态模型

无论是连续性API还是函数式API都是声明式的,即事先声明使用哪些层,层与层之间怎样连接,并且喂入数据后开始训练和预测推断。这样做的优点是:模型易于保存、克隆和分享,结构易于可视化和分析,易于检查形状(shape)和类型,易于调试等等。缺点是模型是静态的。

对于其它类型的模型,例如包括环状结构、条件分支以及其它动态行为等等,这种情况就需要子类API来完成:

class WideAndDeepModel(tf.keras.models.Model):
    def __init__(self, units=30, activation="relu", **kwargs):
        super().__init__(**kwargs)
        self.hidden1 = tf.keras.layers.Dense(units, activation=activation)
        self.hidden2 = tf.keras.layers.Dense(units, activation=activation)
        self.main_output = tf.keras.layers.Dense(1)
        self.aux_output = tf.keras.layers.Dense(1)
    def call(self, inputs):
        input_A, input_B = inputs
        hidden1 = self.hidden1(input_B)
        hidden2 = self.hidden2(hidden1)
        concat = tf.keras.layers.concatenate([input_A, hidden2])
        main_output = self.main_output(concat)
        aux_output = self.aux_output(hidden2)
        return main_output, aux_output
    
model = WideAndDeepModel(30, activation="relu")
model.compile(loss="mse",loss_weights=[0.9,0.1],optimizer=tf.keras.optimizers.SGD(learning_rate=1e-3))
history = model.fit((X_train_A,X_train_B),(y_train,y_train),epochs=20,validation_data=((X_valid_A,X_valid_B),(y_valid,y_valid)))

输出:

Epoch 1/20
363/363 [==============================] - 1s 4ms/step - loss: 2.3298 - output_1_loss: 2.2186 - output_2_loss: 3.3304 - val_loss: 2.1435 - val_output_1_loss: 1.1581 - val_output_2_loss: 11.0117
Epoch 2/20
363/363 [==============================] - 1s 4ms/step - loss: 0.9714 - output_1_loss: 0.8543 - output_2_loss: 2.0252 - val_loss: 1.7567 - val_output_1_loss: 0.8205 - val_output_2_loss: 10.1825
Epoch 3/20
363/363 [==============================] - 1s 4ms/step - loss: 0.8268 - output_1_loss: 0.7289 - output_2_loss: 1.7082 - val_loss: 1.5664 - val_output_1_loss: 0.7913 - val_output_2_loss: 8.5419
Epoch 4/20
363/363 [==============================] - 1s 3ms/step - loss: 0.7636 - output_1_loss: 0.6764 - output_2_loss: 1.5477 - val_loss: 1.3088 - val_output_1_loss: 0.6549 - val_output_2_loss: 7.1933
Epoch 5/20
363/363 [==============================] - 1s 3ms/step - loss: 0.7211 - output_1_loss: 0.6402 - output_2_loss: 1.4489 - val_loss: 1.1357 - val_output_1_loss: 0.5964 - val_output_2_loss: 5.9898
Epoch 6/20
363/363 [==============================] - 1s 4ms/step - loss: 0.6895 - output_1_loss: 0.6124 - output_2_loss: 1.3833 - val_loss: 1.0036 - val_output_1_loss: 0.5937 - val_output_2_loss: 4.6933
Epoch 7/20
363/363 [==============================] - 1s 3ms/step - loss: 0.6632 - output_1_loss: 0.5894 - output_2_loss: 1.3274 - val_loss: 0.8904 - val_output_1_loss: 0.5591 - val_output_2_loss: 3.8714
Epoch 8/20
363/363 [==============================] - 1s 4ms/step - loss: 0.6410 - output_1_loss: 0.5701 - output_2_loss: 1.2796 - val_loss: 0.8009 - val_output_1_loss: 0.5243 - val_output_2_loss: 3.2903
Epoch 9/20
363/363 [==============================] - 1s 4ms/step - loss: 0.6204 - output_1_loss: 0.5514 - output_2_loss: 1.2416 - val_loss: 0.7357 - val_output_1_loss: 0.5144 - val_output_2_loss: 2.7275
Epoch 10/20
363/363 [==============================] - 1s 4ms/step - loss: 0.6024 - output_1_loss: 0.5355 - output_2_loss: 1.2043 - val_loss: 0.6849 - val_output_1_loss: 0.5014 - val_output_2_loss: 2.3370
Epoch 11/20
363/363 [==============================] - 1s 4ms/step - loss: 0.5870 - output_1_loss: 0.5223 - output_2_loss: 1.1692 - val_loss: 0.6522 - val_output_1_loss: 0.4842 - val_output_2_loss: 2.1641
Epoch 12/20
363/363 [==============================] - 1s 3ms/step - loss: 0.5729 - output_1_loss: 0.5098 - output_2_loss: 1.1408 - val_loss: 0.6092 - val_output_1_loss: 0.4764 - val_output_2_loss: 1.8042
Epoch 13/20
363/363 [==============================] - 1s 4ms/step - loss: 0.5604 - output_1_loss: 0.4993 - output_2_loss: 1.1107 - val_loss: 0.5781 - val_output_1_loss: 0.4581 - val_output_2_loss: 1.6587
Epoch 14/20
363/363 [==============================] - 1s 4ms/step - loss: 0.5492 - output_1_loss: 0.4897 - output_2_loss: 1.0849 - val_loss: 0.5545 - val_output_1_loss: 0.4498 - val_output_2_loss: 1.4962
Epoch 15/20
363/363 [==============================] - 1s 3ms/step - loss: 0.5396 - output_1_loss: 0.4817 - output_2_loss: 1.0604 - val_loss: 0.5372 - val_output_1_loss: 0.4416 - val_output_2_loss: 1.3972
Epoch 16/20
363/363 [==============================] - 1s 3ms/step - loss: 0.5314 - output_1_loss: 0.4751 - output_2_loss: 1.0380 - val_loss: 0.5203 - val_output_1_loss: 0.4369 - val_output_2_loss: 1.2707
Epoch 17/20
363/363 [==============================] - 1s 3ms/step - loss: 0.5242 - output_1_loss: 0.4694 - output_2_loss: 1.0168 - val_loss: 0.5075 - val_output_1_loss: 0.4302 - val_output_2_loss: 1.2035
Epoch 18/20
363/363 [==============================] - 1s 3ms/step - loss: 0.5175 - output_1_loss: 0.4644 - output_2_loss: 0.9957 - val_loss: 0.5036 - val_output_1_loss: 0.4289 - val_output_2_loss: 1.1752
Epoch 19/20
363/363 [==============================] - 1s 3ms/step - loss: 0.5111 - output_1_loss: 0.4593 - output_2_loss: 0.9768 - val_loss: 0.4892 - val_output_1_loss: 0.4242 - val_output_2_loss: 1.0741
Epoch 20/20
363/363 [==============================] - 1s 3ms/step - loss: 0.5062 - output_1_loss: 0.4559 - output_2_loss: 0.9592 - val_loss: 0.4896 - val_output_1_loss: 0.4251 - val_output_2_loss: 1.0702
total_loss, main_loss, aux_loss = model.evaluate((X_test_A,X_test_B),(y_test,y_test))
for i in (total_loss, main_loss, aux_loss):
    print(i)

输出:

162/162 [==============================] - 0s 2ms/step - loss: 0.4939 - output_1_loss: 0.4455 - output_2_loss: 0.9296
0.4939037561416626
0.4454963207244873
0.9295728802680969
y_pred_main, y_pred_aux = model.predict((X_new_A, X_new_B))

for i in (y_pred_main,"", y_pred_aux):
    print(i)

输出:

[[0.36133328]
 [1.8392311 ]
 [3.2579417 ]]

[[1.1205194]
 [1.9981728]
 [2.6419516]]
model.summary()

输出:

Model: "wide_and_deep_model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_13 (Dense)             multiple                  210       
_________________________________________________________________
dense_14 (Dense)             multiple                  930       
_________________________________________________________________
dense_15 (Dense)             multiple                  36        
_________________________________________________________________
dense_16 (Dense)             multiple                  31        
=================================================================
Total params: 1,207
Trainable params: 1,207
Non-trainable params: 0
_________________________________________________________________

如上输出所示,使用子类API构建的模型summary()方法输出结果很简单,没有具体的输出形状信息。

子类API这种方法在运行前Keras不会对其进行检查,因此很容易出错。因此,除非特别需要,否则一般情况下建议使用连续性API或函数式API。

3. 模型保存和加载

对于通过连续性API或函数式API构建的模型,可以使用save()方法保存模型,load_model()方法加载模型:

np.random.seed(42)
tf.random.set_seed(42)

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(30, activation="relu",input_shape=[8]),
    tf.keras.layers.Dense(30, activation="relu"),
    tf.keras.layers.Dense(1)
])

model.compile(loss="mse",optimizer=tf.keras.optimizers.SGD(learning_rate=1e-3))

history = model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))

输出:

Epoch 1/10
363/363 [==============================] - 1s 2ms/step - loss: 1.8866 - val_loss: 0.7126
Epoch 2/10
363/363 [==============================] - 1s 2ms/step - loss: 0.6577 - val_loss: 0.6880
Epoch 3/10
363/363 [==============================] - 1s 2ms/step - loss: 0.5934 - val_loss: 0.5803
Epoch 4/10
363/363 [==============================] - 1s 2ms/step - loss: 0.5557 - val_loss: 0.5166
Epoch 5/10
363/363 [==============================] - 1s 2ms/step - loss: 0.5272 - val_loss: 0.4895
Epoch 6/10
363/363 [==============================] - 1s 2ms/step - loss: 0.5033 - val_loss: 0.4951
Epoch 7/10
363/363 [==============================] - 1s 2ms/step - loss: 0.4854 - val_loss: 0.4861
Epoch 8/10
363/363 [==============================] - 1s 2ms/step - loss: 0.4709 - val_loss: 0.4554
Epoch 9/10
363/363 [==============================] - 1s 2ms/step - loss: 0.4578 - val_loss: 0.4413
Epoch 10/10
363/363 [==============================] - 1s 2ms/step - loss: 0.4474 - val_loss: 0.4379

以HDF5格式保存模型,包括模型结构、每层超参数、参数(权重和偏置)、优化器等:

model.save("my_keras_model.h5")
model_loaded = tf.keras.models.load_model("my_keras_model.h5")

model_loaded.predict(X_new)

输出:

array([[0.5400236],
       [1.6505971],
       [3.0098243],
       [2.6485033],
       [2.535491 ]], dtype=float32)
X_new.shape

输出:

(5, 8)

如果只想保存参数,可以使用save_weights(),加载参数使用load_weights():

model.save_weights("my_keras_weights.ckpt")
model_loaded_weights = model.load_weights("my_keras_weights.ckpt")

注意:只加载保存的参数时,需要保证模型的结构能与保存的参数对上,否则会出错。

4. 使用回调(callbacks)

fit()方法可以传入参数callbacks,使得Keras在训练的开始和结束调用、在每个epoch开始和结束调用、甚至在对每个batch处理前后调用。

ModelCheckpoint在训练时定期不断保存模型,默认是每个epoch结束保存一次:

tf.keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model.compile(loss="mse", optimizer=tf.keras.optimizers.SGD(learning_rate=1e-3))
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("my_keras_model.h5", save_best_only=True)
history = model.fit(X_train, y_train, epochs=10,validation_data=(X_valid, y_valid),callbacks=[checkpoint_cb])
model = tf.keras.models.load_model("my_keras_model.h5") # rollback to best model
mse_test = model.evaluate(X_test, y_test)
mse_test

输出:

Epoch 1/10
363/363 [==============================] - 2s 5ms/step - loss: 0.4393 - val_loss: 0.4110
Epoch 2/10
363/363 [==============================] - 1s 3ms/step - loss: 0.4315 - val_loss: 0.4266
Epoch 3/10
363/363 [==============================] - 1s 3ms/step - loss: 0.4259 - val_loss: 0.3996
Epoch 4/10
363/363 [==============================] - 1s 2ms/step - loss: 0.4201 - val_loss: 0.3939
Epoch 5/10
363/363 [==============================] - 1s 3ms/step - loss: 0.4154 - val_loss: 0.3889
Epoch 6/10
363/363 [==============================] - 1s 3ms/step - loss: 0.4111 - val_loss: 0.3866
Epoch 7/10
363/363 [==============================] - 1s 3ms/step - loss: 0.4074 - val_loss: 0.3860
Epoch 8/10
363/363 [==============================] - 1s 3ms/step - loss: 0.4040 - val_loss: 0.3793
Epoch 9/10
363/363 [==============================] - 1s 2ms/step - loss: 0.4008 - val_loss: 0.3746
Epoch 10/10
363/363 [==============================] - 1s 2ms/step - loss: 0.3976 - val_loss: 0.3723
162/162 [==============================] - 0s 1ms/step - loss: 0.3951

0.3950933814048767

save_best_only=True参数只保存验证集上效果最好的模型,此时就不用担心由于训练时间太长而导致过拟合的问题

EarlyStopping回调函数用于当模型性能不再改变时提前停止训练。可以同时使用多个回调函数:

model.compile(loss="mse", optimizer=tf.keras.optimizers.SGD(learning_rate=1e-3))
early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=10,restore_best_weights=True)
history = model.fit(X_train, y_train, epochs=100,validation_data=(X_valid, y_valid),callbacks=[checkpoint_cb, early_stopping_cb])
mse_test = model.evaluate(X_test, y_test)

输出:

Epoch 1/100
363/363 [==============================] - 8s 23ms/step - loss: 0.3949 - val_loss: 0.3695
Epoch 2/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3923 - val_loss: 0.3684
Epoch 3/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3898 - val_loss: 0.3650
Epoch 4/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3873 - val_loss: 0.3632
Epoch 5/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3851 - val_loss: 0.3608
Epoch 6/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3829 - val_loss: 0.3585
Epoch 7/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3808 - val_loss: 0.3564
Epoch 8/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3788 - val_loss: 0.3561
Epoch 9/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3768 - val_loss: 0.3552
Epoch 10/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3749 - val_loss: 0.3527
Epoch 11/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3733 - val_loss: 0.3495
Epoch 12/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3716 - val_loss: 0.3549
Epoch 13/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3700 - val_loss: 0.3516
Epoch 14/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3685 - val_loss: 0.3466
Epoch 15/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3671 - val_loss: 0.3660
Epoch 16/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3659 - val_loss: 0.3437
Epoch 17/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3648 - val_loss: 0.3585
Epoch 18/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3635 - val_loss: 0.3506
Epoch 19/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3622 - val_loss: 0.3452
Epoch 20/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3613 - val_loss: 0.3849
Epoch 21/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3602 - val_loss: 0.3531
Epoch 22/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3590 - val_loss: 0.3838
Epoch 23/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3583 - val_loss: 0.3376
Epoch 24/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3571 - val_loss: 0.3572
Epoch 25/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3562 - val_loss: 0.3526
Epoch 26/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3553 - val_loss: 0.3672
Epoch 27/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3547 - val_loss: 0.3339
Epoch 28/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3538 - val_loss: 0.3598
Epoch 29/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3532 - val_loss: 0.3427
Epoch 30/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3523 - val_loss: 0.3629
Epoch 31/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3516 - val_loss: 0.3360
Epoch 32/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3507 - val_loss: 0.3772
Epoch 33/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3505 - val_loss: 0.3300
Epoch 34/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3496 - val_loss: 0.3353
Epoch 35/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3490 - val_loss: 0.3579
Epoch 36/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3485 - val_loss: 0.3292
Epoch 37/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3478 - val_loss: 0.3551
Epoch 38/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3472 - val_loss: 0.3303
Epoch 39/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3466 - val_loss: 0.3382
Epoch 40/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3460 - val_loss: 0.3322
Epoch 41/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3454 - val_loss: 0.3493
Epoch 42/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3451 - val_loss: 0.3389
Epoch 43/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3442 - val_loss: 0.3669
Epoch 44/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3439 - val_loss: 0.3529
Epoch 45/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3435 - val_loss: 0.3292
Epoch 46/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3427 - val_loss: 0.3422
162/162 [==============================] - 0s 1ms/step - loss: 0.3476
class PrintValTrainRatioCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs):
        print("\nval/train: {:.2f}".format(logs["val_loss"] / logs["loss"]))
        
val_train_ratio_cb = PrintValTrainRatioCallback()
history = model.fit(X_train, y_train, epochs=1, validation_data=(X_valid, y_valid),callbacks=[val_train_ratio_cb])

输出:

359/363 [============================>.] - ETA: 0s - loss: 0.3469
val/train: 1.05
363/363 [==============================] - 1s 2ms/step - loss: 0.3476 - val_loss: 0.3654

如上所示,也可以自定义回调类和函数,可以根据需求添加所需的功能,也可以对其它函数进行实现:on_train_begin(), on_train_end(), on_epoch_begin, on_epoch_end(), on_batch_begin(), on_batch_end()。

也可以自定义在评估时使用的回调类和函数:on_test_begin(), on_test_end(), on_test_batch_begin(), on_test_batch_end()。

也可以自定义在预测时使用的回调类和函数:on_predict_begin, on_predict_end(), on_predict_batch_begin(), on_predict_batch_end()

5. TensorBoard可视化

安装TensorFlow时被默认安装。根据二进制日志文件实现实时监控:

root_logdir = os.path.join(os.curdir, "my_logs")

def get_run_logdir():
    import time
    run_id = time.strftime("run_%Y_%m_%d_%H_%M_%S")
    return os.path.join(root_logdir, run_id)

run_logdir = get_run_logdir()
run_logdir

输出:

'.\\my_logs\\run_2020_07_06_21_09_09'
tf.keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(30, activation="relu", input_shape=[8]),
    tf.keras.layers.Dense(30, activation="relu"),
    tf.keras.layers.Dense(1)
])    
model.compile(loss="mse", optimizer=tf.keras.optimizers.SGD(lr=1e-3))

tensorboard_cb = tf.keras.callbacks.TensorBoard(run_logdir)
history = model.fit(X_train, y_train, epochs=30, validation_data=(X_valid,y_valid),callbacks=[checkpoint_cb,tensorboard_cb])

输出:

Epoch 1/30
  2/363 [..............................] - ETA: 2:17 - loss: 7.0195WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.406500). Check your callbacks.
363/363 [==============================] - 2s 6ms/step - loss: 1.8866 - val_loss: 0.7126
Epoch 2/30
363/363 [==============================] - 1s 3ms/step - loss: 0.6577 - val_loss: 0.6880
Epoch 3/30
363/363 [==============================] - 1s 3ms/step - loss: 0.5934 - val_loss: 0.5803
Epoch 4/30
363/363 [==============================] - 1s 3ms/step - loss: 0.5557 - val_loss: 0.5166
Epoch 5/30
363/363 [==============================] - 1s 3ms/step - loss: 0.5272 - val_loss: 0.4895
Epoch 6/30
363/363 [==============================] - 1s 3ms/step - loss: 0.5033 - val_loss: 0.4951
Epoch 7/30
363/363 [==============================] - 1s 3ms/step - loss: 0.4854 - val_loss: 0.4861
Epoch 8/30
363/363 [==============================] - 1s 3ms/step - loss: 0.4709 - val_loss: 0.4554
Epoch 9/30
363/363 [==============================] - 1s 3ms/step - loss: 0.4578 - val_loss: 0.4413
Epoch 10/30
363/363 [==============================] - 1s 3ms/step - loss: 0.4474 - val_loss: 0.4379
Epoch 11/30
363/363 [==============================] - 1s 3ms/step - loss: 0.4393 - val_loss: 0.4396
Epoch 12/30
363/363 [==============================] - 1s 3ms/step - loss: 0.4318 - val_loss: 0.4507
Epoch 13/30
363/363 [==============================] - 1s 3ms/step - loss: 0.4261 - val_loss: 0.3997
Epoch 14/30
363/363 [==============================] - 1s 3ms/step - loss: 0.4202 - val_loss: 0.3956
Epoch 15/30
363/363 [==============================] - 1s 3ms/step - loss: 0.4155 - val_loss: 0.3916
Epoch 16/30
363/363 [==============================] - 1s 3ms/step - loss: 0.4112 - val_loss: 0.3937
Epoch 17/30
363/363 [==============================] - 1s 3ms/step - loss: 0.4077 - val_loss: 0.3809
Epoch 18/30
363/363 [==============================] - 1s 3ms/step - loss: 0.4040 - val_loss: 0.3793
Epoch 19/30
363/363 [==============================] - 1s 3ms/step - loss: 0.4004 - val_loss: 0.3850
Epoch 20/30
363/363 [==============================] - 1s 3ms/step - loss: 0.3980 - val_loss: 0.3809
Epoch 21/30
363/363 [==============================] - 1s 3ms/step - loss: 0.3949 - val_loss: 0.3701
Epoch 22/30
363/363 [==============================] - 1s 3ms/step - loss: 0.3924 - val_loss: 0.3781
Epoch 23/30
363/363 [==============================] - 1s 3ms/step - loss: 0.3898 - val_loss: 0.3650
Epoch 24/30
363/363 [==============================] - 1s 3ms/step - loss: 0.3874 - val_loss: 0.3655
Epoch 25/30
363/363 [==============================] - 1s 3ms/step - loss: 0.3851 - val_loss: 0.3611
Epoch 26/30
363/363 [==============================] - 1s 3ms/step - loss: 0.3829 - val_loss: 0.3626
Epoch 27/30
363/363 [==============================] - 1s 3ms/step - loss: 0.3809 - val_loss: 0.3564
Epoch 28/30
363/363 [==============================] - 1s 3ms/step - loss: 0.3788 - val_loss: 0.3579
Epoch 29/30
363/363 [==============================] - 1s 3ms/step - loss: 0.3769 - val_loss: 0.3561
Epoch 30/30
363/363 [==============================] - 1s 3ms/step - loss: 0.3750 - val_loss: 0.3548

在命令行窗口运行:tensorboard --logdir=.\my_logs --port=6006

等TensorBoard服务启动后,在浏览器网址栏中输入http://localhost:6006 ,即可打开TensorBoard,使用结束后使用Ctrl+C关闭TensorBoard服务。打开TensorBoard结果如下图所示:

也可以使用如下代码在jupyter notebook中打开TensorBoard:

%load_ext tensorboard
%tensorboard --logdir=./my_logs --port=6006

输出:

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard
Reusing TensorBoard on port 6006 (pid 176), started 0:04:17 ago. (Use '!kill 176' to kill it.)

run_logdir2 = get_run_logdir()
run_logdir2

输出:

'.\\my_logs\\run_2020_07_06_21_20_30'
tf.keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(30, activation="relu", input_shape=[8]),
    tf.keras.layers.Dense(30, activation="relu"),
    tf.keras.layers.Dense(1)
])    
model.compile(loss="mse", optimizer=tf.keras.optimizers.SGD(lr=0.05))

tensorboard_cb = tf.keras.callbacks.TensorBoard(run_logdir2)
history = model.fit(X_train, y_train, epochs=30,validation_data=(X_valid, y_valid),callbacks=[checkpoint_cb, tensorboard_cb])

输出:

Epoch 1/30
  2/363 [..............................] - ETA: 47s - loss: 5.0901WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.133500). Check your callbacks.
363/363 [==============================] - 1s 4ms/step - loss: 0.5530 - val_loss: 302.8539
Epoch 2/30
363/363 [==============================] - 1s 3ms/step - loss: 68.3936 - val_loss: 0.9455
Epoch 3/30
363/363 [==============================] - 1s 3ms/step - loss: 0.9222 - val_loss: 0.9269
Epoch 4/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8909 - val_loss: 0.8038
Epoch 5/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8633 - val_loss: 0.8012
Epoch 6/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8928 - val_loss: 1.0847
Epoch 7/30
363/363 [==============================] - 1s 3ms/step - loss: 1.0071 - val_loss: 0.8835
Epoch 8/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8886 - val_loss: 0.8735
Epoch 9/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8679 - val_loss: 0.8426
Epoch 10/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8451 - val_loss: 0.8143
Epoch 11/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8152 - val_loss: 0.8351
Epoch 12/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8382 - val_loss: 0.8246
Epoch 13/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8119 - val_loss: 0.7439
Epoch 14/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8031 - val_loss: 0.7638
Epoch 15/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8888 - val_loss: 0.7946
Epoch 16/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8115 - val_loss: 0.7601
Epoch 17/30
363/363 [==============================] - 1s 3ms/step - loss: 0.7800 - val_loss: 0.7490
Epoch 18/30
363/363 [==============================] - 1s 3ms/step - loss: 0.9172 - val_loss: 0.8195
Epoch 19/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8281 - val_loss: 0.8023
Epoch 20/30
363/363 [==============================] - 1s 3ms/step - loss: 0.7862 - val_loss: 0.7283
Epoch 21/30
363/363 [==============================] - 1s 3ms/step - loss: 1.1389 - val_loss: 0.8288
Epoch 22/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8178 - val_loss: 0.7605
Epoch 23/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8155 - val_loss: 0.8096
Epoch 24/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8188 - val_loss: 0.7906
Epoch 25/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8082 - val_loss: 0.9338
Epoch 26/30
363/363 [==============================] - 1s 3ms/step - loss: 0.8033 - val_loss: 0.7659
Epoch 27/30
363/363 [==============================] - 1s 3ms/step - loss: 0.7860 - val_loss: 0.7604
Epoch 28/30
363/363 [==============================] - 1s 3ms/step - loss: 0.7764 - val_loss: 0.7345
Epoch 29/30
363/363 [==============================] - 1s 3ms/step - loss: 0.7490 - val_loss: 0.7405
Epoch 30/30
363/363 [==============================] - 1s 3ms/step - loss: 0.7403 - val_loss: 0.6817
help(tf.keras.callbacks.TensorBoard.__init__)

输出:

Help on function __init__ in module tensorflow.python.keras.callbacks:

__init__(self, log_dir='logs', histogram_freq=0, write_graph=True, write_images=False, update_freq='epoch', profile_batch=2, embeddings_freq=0, embeddings_metadata=None, **kwargs)
    Initialize self.  See help(type(self)) for accurate signature.

TensorFlow也提供了低级API用于保存信息,并显示在TensorBoard中,请看下面示例代码:

test_logdir = get_run_logdir()

writer = tf.summary.create_file_writer(test_logdir) # 实例化对象

with writer.as_default():
    for step in range(1,1000+1):
        tf.summary.scalar("my_scalar",np.sin(step/10),step=step)
        
        data = (np.random.randn(100) + 2) * step / 100
        tf.summary.histogram("my_hist",data,buckets=50,step=step)
        
        images = np.random.rand(2,32,32,3)
        tf.summary.image("my_images",images*step/1000,step=step)
        
        texts =["The step is "+str(step),"Its square is " + str(step**2)]
        tf.summary.text("my_text",texts,step=step)
        
        sine_wave = tf.math.sin(tf.range(12000)/48000*2*np.pi*step)
        audio = tf.reshape(tf.cast(sine_wave,tf.float32),[1,-1,1])
        tf.summary.audio("my_audio",audio,sample_rate=48000,step=step)

6. 优化神经网络超参数

神经网络由于太灵活,因此也存在一些弊端,例如超参数太多。

优化超参数,通常使用网络搜索、随机搜索等方法:

tf.keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

def build_model(n_hidden=1, n_neurons=30, learning_rate=3e-3, input_shape=[8]):
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.InputLayer(input_shape=input_shape))
    for layer in range(n_hidden):
        model.add(tf.keras.layers.Dense(n_neurons, activation="relu"))
    model.add(tf.keras.layers.Dense(1))
    optimizer = tf.keras.optimizers.SGD(lr=learning_rate)
    model.compile(loss="mse", optimizer=optimizer)
    return model

keras_reg = tf.keras.wrappers.scikit_learn.KerasRegressor(build_model)

keras_reg.fit(X_train,y_train,epochs=100,validation_data=(X_valid, y_valid),callbacks=[tf.keras.callbacks.EarlyStopping(patience=10)])

输出:

Epoch 1/100
363/363 [==============================] - 1s 2ms/step - loss: 1.0896 - val_loss: 20.7721
Epoch 2/100
363/363 [==============================] - 1s 2ms/step - loss: 0.7606 - val_loss: 5.0266
Epoch 3/100
363/363 [==============================] - 1s 2ms/step - loss: 0.5456 - val_loss: 0.5490
Epoch 4/100
363/363 [==============================] - 1s 2ms/step - loss: 0.4732 - val_loss: 0.4529
Epoch 5/100
363/363 [==============================] - 1s 2ms/step - loss: 0.4503 - val_loss: 0.4188
Epoch 6/100
363/363 [==============================] - 1s 2ms/step - loss: 0.4338 - val_loss: 0.4129
Epoch 7/100
363/363 [==============================] - 1s 2ms/step - loss: 0.4241 - val_loss: 0.4004
Epoch 8/100
363/363 [==============================] - 1s 2ms/step - loss: 0.4168 - val_loss: 0.3944
Epoch 9/100
363/363 [==============================] - 1s 2ms/step - loss: 0.4108 - val_loss: 0.3961
Epoch 10/100
363/363 [==============================] - 1s 2ms/step - loss: 0.4060 - val_loss: 0.4071
Epoch 11/100
363/363 [==============================] - 1s 2ms/step - loss: 0.4021 - val_loss: 0.3855
Epoch 12/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3984 - val_loss: 0.4136
Epoch 13/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3951 - val_loss: 0.3997
Epoch 14/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3921 - val_loss: 0.3818
Epoch 15/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3894 - val_loss: 0.3829
Epoch 16/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3869 - val_loss: 0.3739
Epoch 17/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3848 - val_loss: 0.4022
Epoch 18/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3829 - val_loss: 0.3873
Epoch 19/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3807 - val_loss: 0.3768
Epoch 20/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3791 - val_loss: 0.4191
Epoch 21/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3774 - val_loss: 0.3927
Epoch 22/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3756 - val_loss: 0.4237
Epoch 23/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3742 - val_loss: 0.3523
Epoch 24/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3725 - val_loss: 0.3842
Epoch 25/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3710 - val_loss: 0.4162
Epoch 26/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3700 - val_loss: 0.3980
Epoch 27/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3691 - val_loss: 0.3473
Epoch 28/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3677 - val_loss: 0.3921
Epoch 29/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3670 - val_loss: 0.3566
Epoch 30/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3653 - val_loss: 0.4191
Epoch 31/100
363/363 [==============================] - 1s 3ms/step - loss: 0.3647 - val_loss: 0.3722
Epoch 32/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3633 - val_loss: 0.3948
Epoch 33/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3632 - val_loss: 0.3423
Epoch 34/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3617 - val_loss: 0.3454
Epoch 35/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3610 - val_loss: 0.4068
Epoch 36/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3608 - val_loss: 0.3417
Epoch 37/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3596 - val_loss: 0.3787
Epoch 38/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3589 - val_loss: 0.3379
Epoch 39/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3582 - val_loss: 0.3419
Epoch 40/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3572 - val_loss: 0.3705
Epoch 41/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3570 - val_loss: 0.3660
Epoch 42/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3563 - val_loss: 0.3803
Epoch 43/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3552 - val_loss: 0.3766
Epoch 44/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3548 - val_loss: 0.3814
Epoch 45/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3543 - val_loss: 0.3326
Epoch 46/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3532 - val_loss: 0.3385
Epoch 47/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3527 - val_loss: 0.3657
Epoch 48/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3521 - val_loss: 0.3576
Epoch 49/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3525 - val_loss: 0.3358
Epoch 50/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3510 - val_loss: 0.3317
Epoch 51/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3504 - val_loss: 0.3564
Epoch 52/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3502 - val_loss: 0.3522
Epoch 53/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3496 - val_loss: 0.4581
Epoch 54/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3497 - val_loss: 0.3808
Epoch 55/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3490 - val_loss: 0.3539
Epoch 56/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3485 - val_loss: 0.3721
Epoch 57/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3479 - val_loss: 0.3336
Epoch 58/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3470 - val_loss: 0.4011
Epoch 59/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3475 - val_loss: 0.3263
Epoch 60/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3465 - val_loss: 0.3271
Epoch 61/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3452 - val_loss: 0.3348
Epoch 62/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3453 - val_loss: 0.3492
Epoch 63/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3444 - val_loss: 0.3401
Epoch 64/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3450 - val_loss: 0.3274
Epoch 65/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3437 - val_loss: 0.3296
Epoch 66/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3431 - val_loss: 0.3307
Epoch 67/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3428 - val_loss: 0.3252
Epoch 68/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3423 - val_loss: 0.3242
Epoch 69/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3419 - val_loss: 0.3254
Epoch 70/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3413 - val_loss: 0.3659
Epoch 71/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3414 - val_loss: 0.3379
Epoch 72/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3405 - val_loss: 0.3272
Epoch 73/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3399 - val_loss: 0.3242
Epoch 74/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3402 - val_loss: 0.3661
Epoch 75/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3397 - val_loss: 0.3284
Epoch 76/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3394 - val_loss: 0.3243
Epoch 77/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3383 - val_loss: 0.3372
Epoch 78/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3384 - val_loss: 0.3366
mse_test = keras_reg.score(X_test, y_test) # 注意score是MSE的相反数,因为sklearn使用scores,表示值越大越好。
mse_test

输出:

162/162 [==============================] - 0s 1ms/step - loss: 0.3412

-0.34119632840156555
y_pred = keras_reg.predict(X_new)
y_pred

输出:

array([0.65143514, 1.6107547 , 4.071347  , 2.594837  , 2.9076238 ],
      dtype=float32)

由于超参数种类多,因此使用随机搜索多网络搜索更好。下面搜索隐藏层层数、神经元数目和学习率三个超参数,下面代码运行可以需要很慢时间,根据硬件不同,可能需要一到几个小时:

np.random.seed(42)
tf.random.set_seed(42)

from scipy.stats import reciprocal
from sklearn.model_selection import RandomizedSearchCV

param_distribs = {
    "n_hidden": [0, 1, 2, 3],  # 隐层个数
    "n_neurons": np.arange(1, 100), # 神经元个数
    "learning_rate": reciprocal(3e-4, 3e-2), # 学习率
}

rnd_search_cv = RandomizedSearchCV(keras_reg, param_distribs, n_iter=10, cv=3, verbose=2)
rnd_search_cv.fit(X_train, y_train, epochs=100,validation_data=(X_valid, y_valid),callbacks=[tf.keras.callbacks.EarlyStopping(patience=10)])

输出:

Fitting 3 folds for each of 10 candidates, totalling 30 fits
[CV] learning_rate=0.001683454924600351, n_hidden=0, n_neurons=15 ....
Epoch 1/100
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
242/242 [==============================] - 1s 3ms/step - loss: 3.5557 - val_loss: 1.8752
Epoch 2/100
242/242 [==============================] - 1s 2ms/step - loss: 1.3347 - val_loss: 0.9522
Epoch 3/100
242/242 [==============================] - 1s 2ms/step - loss: 0.8591 - val_loss: 0.7820
...
Epoch 52/100
242/242 [==============================] - 1s 2ms/step - loss: 0.5294 - val_loss: 0.5996
Epoch 53/100
242/242 [==============================] - 1s 2ms/step - loss: 0.5282 - val_loss: 0.6414
121/121 [==============================] - 0s 1ms/step - loss: 0.5368
[CV]  learning_rate=0.001683454924600351, n_hidden=0, n_neurons=15, total=  29.9s
[CV] learning_rate=0.001683454924600351, n_hidden=0, n_neurons=15 ....
Epoch 1/100
  1/242 [..............................] - ETA: 0s - loss: 7.3553
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   29.8s remaining:    0.0s
242/242 [==============================] - 1s 2ms/step - loss: 3.5605 - val_loss: 23.0855
Epoch 2/100
242/242 [==============================] - 1s 2ms/step - loss: 1.4777 - val_loss: 10.8387
Epoch 3/100
242/242 [==============================] - 1s 2ms/step - loss: 1.0149 - val_loss: 4.4392
...
Epoch 15/100
242/242 [==============================] - 1s 2ms/step - loss: 0.5683 - val_loss: 15.4321
121/121 [==============================] - 0s 1ms/step - loss: 0.9198
[CV]  learning_rate=0.001683454924600351, n_hidden=0, n_neurons=15, total=   8.3s
[CV] learning_rate=0.001683454924600351, n_hidden=0, n_neurons=15 ....
Epoch 1/100
242/242 [==============================] - 1s 3ms/step - loss: 3.2972 - val_loss: 1.3307
Epoch 2/100
242/242 [==============================] - 1s 2ms/step - loss: 0.9648 - val_loss: 0.6934
Epoch 3/100
242/242 [==============================] - 1s 2ms/step - loss: 0.6150 - val_loss: 0.5469
...
Epoch 45/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3169 - val_loss: 0.9337
Epoch 46/100
363/363 [==============================] - 1s 2ms/step - loss: 0.3180 - val_loss: 0.6996

RandomizedSearchCV(cv=3, error_score='raise-deprecating',
                   estimator=<tensorflow.python.keras.wrappers.scikit_learn.KerasRegressor object at 0x000000005E24B630>,
                   iid='warn', n_iter=10, n_jobs=None,
                   param_distributions={'learning_rate': <scipy.stats._distn_infrastructure.rv_frozen object at 0x000000005E50D7B8>,
                                        'n_hidden': [0, 1, 2, 3],
                                        'n_neurons': array([ 1,  2,  3,  4,  5,  6,  7...
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
       52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
       69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
       86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])},
                   pre_dispatch='2*n_jobs', random_state=None, refit=True,
                   return_train_score=False, scoring=None, verbose=2)
rnd_search_cv.best_params_

输出:

{'learning_rate': 0.0033625641252688094, 'n_hidden': 2, 'n_neurons': 42}
rnd_search_cv.best_score_

输出:

-0.35000195105870563
rnd_search_cv.score(X_test, y_test)

输出:

162/162 [==============================] - 0s 1ms/step - loss: 0.3268

-0.3267730176448822
model = rnd_search_cv.best_estimator_.model

model.evaluate(X_test, y_test)

输出:

162/162 [==============================] - 0s 1ms/step - loss: 0.3268

0.3267730176448822
model.summary()

输出:

Model: "sequential_31"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_68 (Dense)             (None, 42)                378       
_________________________________________________________________
dense_69 (Dense)             (None, 42)                1806      
_________________________________________________________________
dense_70 (Dense)             (None, 1)                 43        
=================================================================
Total params: 2,227
Trainable params: 2,227
Non-trainable params: 0
_________________________________________________________________
tf.keras.utils.plot_model(model, "random_search_model.png", show_shapes=True)

输出:

如上输出所示,通过随机搜索,得到最佳隐层数目为2,最佳神经元个数为42,最佳学习率为0.0033。

可以用于超参数优化的Python库:

  1. Hyperopt:
  2. Hyperas、kopt和Talos:用于Keras模型的超参数优化,前两个是基于Hyperopt的
  3. Keras Tuner:由Google开发的用于Keras模型超参数优化库,提供了可视化和分析的托管服务
  4. Scikit-Optimize (skopt):通用的优化库。
  5. Spearmint:一款贝叶斯优化库。
  6. Hyperband:速度快。
  7. Sklearn-Deap:一款基于进化算法的超参数优化库。

7. 隐藏层数目

大部分情况下,可以先从一至两个隐藏层开始尝试,对于复杂问题,可以慢慢增加隐藏层数目直到开始发生过拟合,对于非常复杂的问题,例如大型图像分类或语音识别,可能需要几十甚至上百的隐藏层(但通常可能不需要全连接,因为参数太多)

8. 每个隐层的神经元数目

输入层和输出层的神经元数目分别由输入数据的形状和输出需求所决定。

对于隐藏层,过去的做法是神经元数目越来越少,形成金字塔形状,这是因为低层特征不断合并成较高层特征。

与隐藏层数目一样,当模型发生过拟合时,可以适当地增加神经元数目。

对于隐藏层和神经元数目,通常可以这样做:尽量选择比较大的隐藏层数目和神经元数目,然后使用early stopping,这样既可以防止过拟合的发生,也防止了欠拟合。

9. 学习率、批大小(batch size)和其它超参数

学习率:学习率是最重要的超参数,一般情况下最佳的学习率是最大学习率的一半。

优化器:选择一个好的优化器也非常重要。

批大小:批大小对训练时间有重要影响,较大的批大小在训练开始时可能造成不稳定,同时最终的模型可能泛化性比批大小比较小的差。

激活函数:基本上ReLU最好。

迭代次数:基本不用太需要调整,只需要使用Early stopping就好了。

 

 

  • 4
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: "Hands-On Machine Learning with Scikit-LearnKeras"是由Aurélien Géron撰写的一本深度学习和机器学习实践指南。它是学习机器学习和深度学习的极好资源。 这本书首先介绍了机器学习的基础概念,然后深入讨论了如何使用Scikit-LearnKeras这两个流行的Python机器学习库。Scikit-Learn提供了丰富的机器学习算法和工具,可以帮助我们构建、训练和评估模型。而Keras是一个用于构建深度学习模型的高级神经网络库。 在这本书中,作者结合实践案例和详细的代码示例,带领读者通过实际的项目学习机器学习和深度学习的应用。你将学习如何预处理和清洗数据、选择合适的模型、训练和调整模型参数,以及评估模型的性能。 此外,这本书还探讨了深度学习的各个方面,包括卷积神经网络、循环神经网络、生成对抗网络等。作者通过讲解这些概念和技术,帮助读者理解深度学习的原理和应用,并将其应用于实际项目中。 总体而言,“Hands-On Machine Learning with Scikit-LearnKeras”提供了一个全面而易于理解的学习路径,帮助读者从初学者逐步成为机器学习和深度学习的专家。无论你是新手还是有一定经验的开发者,这本书都是一个值得推荐的资源。 ### 回答2: "Hands-On Machine Learning with Scikit-Learn, Keras" 是一本介绍机器学习和深度学习的书籍,作者是Aurélien Géron。这本书的目的是帮助读者从实践的角度深入了解使用Scikit-LearnKeras库进行机器学习和深度学习的方法。 这本书采用了实践驱动的方法,通过编写代码和实际项目的例子,帮助读者理解机器学习和深度学习的核心概念和技术。书中涵盖了各种机器学习和深度学习的主题,包括数据预处理、监督学习、无监督学习、集成学习、深度神经网络等。 书中的案例涉及到了实际应用场景,比如图像分类、文本分类、推荐系统等。读者可以通过具体的例子理解机器学习和深度学习在实际项目中的应用。 这本书还介绍了使用Scikit-LearnKeras库的基本操作和功能。读者可以学习如何安装和配置这些库,并学会使用它们进行数据处理、模型训练和评估等操作。 总的来说,《Hands-On Machine Learning with Scikit-Learn, Keras》是一本非常实用的机器学习和深度学习实践指南。它适合那些对机器学习和深度学习感兴趣的读者,尤其是那些希望通过具体的例子和实践项目来学习这些技术的人。这本书将帮助读者理解机器学习和深度学习的基本原理和技术,并将它们应用到实际项目中。 ### 回答3: 《机器学习实战:基于Scikit-LearnKeras实践》是一本非常受欢迎的机器学习教材,它由Aurelien Geron编写。这本书提供了关于使用Scikit-LearnKeras进行实践的详细指导和示例。 Scikit-Learn是一个常用的Python机器学习库,它集成了许多常用的机器学习算法和工具,使机器学习模型的开发变得更加简单和高效。Keras是另一个流行的深度学习库,它提供了高级的神经网络建模接口,使深度学习模型的设计和实现变得更加容易。 《机器学习实战:基于Scikit-LearnKeras实践》主要分为三个部分。第一部分介绍了机器学习的基本概念和常用技术,如线性回归、逻辑回归、决策树和随机森林等。第二部分介绍了深度学习的基本原理和常用模型,如卷积神经网络和循环神经网络等。第三部分通过几个实际项目的实例,展示了如何使用Scikit-LearnKeras进行机器学习和深度学习的实践。 这本书在整个实践过程中给出了详细的步骤和代码示例,有助于读者理解和复现。此外,书中还提供了相关的数据集和预训练模型,方便读者进行实际的实验和项目开发。 总体而言,《机器学习实战:基于Scikit-LearnKeras实践》是一本很好的机器学习实战指南,适合那些对机器学习和深度学习感兴趣的读者。无论是初学者还是有经验的从业者,都能从中获得宝贵的知识和技能。希望通过阅读这本书,读者能够在机器学习和深度学习领域取得更好的成果。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值