TensorFlow2.0学习笔记(一)
数据读取和展示
首先我们导入相关的头文件和库
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import sklearn
import pandas as pd
import os
import sys
import time
import tensorflow as tf
from tensorflow import keras
print(tf.__version__)
print(sys.version_info)
for module in mpl, np, pd, sklearn, tf, keras:
print(module.__name__, module.__version__)
输出我的环境配置:
2.0.1
sys.version_info(major=3, minor=7, micro=6, releaselevel='final', serial=0)
matplotlib 3.2.1
numpy 1.18.2
pandas 1.0.3
sklearn 0.22.2.post1
tensorflow 2.0.1
tensorflow_core.keras 2.2.4-tf
下载数据,加载数据,并对训练数据进行划分,前5000为验证集,之后的为训练集。
fashion_mnist = keras.datasets.fashion_mnist
(x_train_all, y_train_all), (x_test, y_test) = fashion_mnist.load_data()
x_valid, x_train = x_train_all[:5000], x_train_all[5000:]
y_valid, y_train = y_train_all[:5000], y_train_all[5000:]
print(x_valid.shape, y_valid.shape)
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
(5000, 28, 28) (5000,)
(55000, 28, 28) (55000,)
(10000, 28, 28) (10000,)
展示一张照片:
def show_single_image(img_arr):
plt.imshow(img_arr, cmap="binary")
plt.show()
show_single_image(x_train[0])
输出结果如下:
按照3*5的格子展示图片,并标注每张图片的标签
def show_imgs(n_rows, n_cols, x_data, y_data, class_names):
assert len(x_data) == len(y_data)
assert n_rows * n_cols < len(x_data)
plt.figure(figsize = (n_cols*1.4, n_rows*1.6))
for row in range(n_rows):
for col in range(n_cols):
index = n_cols * row + col
plt.subplot(n_rows, n_cols, index+1)
plt.imshow(x_data[index], cmap="binary",
interpolation = 'nearest')
plt.axis('off')
plt.title(class_names[y_data[index]])
plt.show()
class_names = ['T-shirt', 'Trouser', 'Pullover', 'Dress',
'Coat', 'Sandal', 'Shirt', 'Sneaker',
'Bag', 'Ankle boot']
运行结果如下:
模型构建
有两种建立模型的方法,可以根据需要自己选择:
# tf.keras.models.Sequential()
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape = [28, 28]))#会对输入进行扁平化处理
model.add(keras.layers.Dense(300, activation='relu'))
model.add(keras.layers.Dense(100, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))
# relu: y=max(0, x)
# softmax: 将向量变为概率分布. x = [x1, x2, x3]
# y = [e^x1/sum, e^x2/sum, e^x3/sum]
# sum = e^x1 + e^x2 + e^x3
# 第二种Sequential使用方式:
# model = keras.model.Sequential([
# keras.layers.Flatten(input_shape = [28, 28]),
# keras.layers.Dense(300, activation='relu'),
# keras.layers.Dense(100, activation='relu'),
# keras.layers.Dense(10, activation='softmax')
# ])
# sparse的原因:因为y->index.也就是y是一个索引,如果标签是1的话,y的值就为1
model.compile(loss="sparse_categorical_crossentropy",
optimizer = 'sgd',
metrics = ['accuracy'])
打印建立的模型
# 第一层: [None, 784] * W + b -> [None, 300]
# W.shape [784, 300], b.shape = [300, 1]
model.summary()
模型结构如下:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 784) 0
_________________________________________________________________
dense (Dense) (None, 300) 235500
_________________________________________________________________
dense_1 (Dense) (None, 100) 30100
_________________________________________________________________
dense_2 (Dense) (None, 10) 1010
=================================================================
Total params: 266,610
Trainable params: 266,610
Non-trainable params: 0
_________________________________________________________________
进行训练,参数为输入的训练数据以及标签,训练的轮数epoch,和验证集的数据
history = model.fit(x_train, y_train, epochs = 10,
validation_data = (x_valid, y_valid))
训练过程如下:
Train on 55000 samples, validate on 5000 samples
Epoch 1/10
55000/55000 [==============================] - 6s 114us/sample - loss: 2.0620 - accuracy: 0.7345 - val_loss: 0.6053 - val_accuracy: 0.8018
Epoch 2/10
55000/55000 [==============================] - 5s 93us/sample - loss: 0.5567 - accuracy: 0.7990 - val_loss: 0.5311 - val_accuracy: 0.8220
Epoch 3/10
55000/55000 [==============================] - 5s 93us/sample - loss: 0.4939 - accuracy: 0.8215 - val_loss: 0.4981 - val_accuracy: 0.8324
Epoch 4/10
55000/55000 [==============================] - 5s 93us/sample - loss: 0.4561 - accuracy: 0.8331 - val_loss: 0.4771 - val_accuracy: 0.8416
Epoch 5/10
55000/55000 [==============================] - 6s 115us/sample - loss: 0.4260 - accuracy: 0.8430 - val_loss: 0.4728 - val_accuracy: 0.8396
Epoch 6/10
55000/55000 [==============================] - 6s 107us/sample - loss: 0.4054 - accuracy: 0.8496 - val_loss: 0.4586 - val_accuracy: 0.8510
Epoch 7/10
55000/55000 [==============================] - 5s 91us/sample - loss: 0.3888 - accuracy: 0.8560 - val_loss: 0.4454 - val_accuracy: 0.8522
Epoch 8/10
55000/55000 [==============================] - 5s 91us/sample - loss: 0.3749 - accuracy: 0.8624 - val_loss: 0.4365 - val_accuracy: 0.8530
Epoch 9/10
55000/55000 [==============================] - 5s 91us/sample - loss: 0.3631 - accuracy: 0.8648 - val_loss: 0.4520 - val_accuracy: 0.8474
Epoch 10/10
55000/55000 [==============================] - 5s 91us/sample - loss: 0.3533 - accuracy: 0.8684 - val_loss: 0.4298 - val_accuracy: 0.8574
接下来我们查看一下history的类型和训练和验证集的准确率和loss
type(history)
history.history
结果如下:
tensorflow.python.keras.callbacks.History
{'accuracy': [0.73447275,
0.799,
0.8214545,
0.83314544,
0.8429818,
0.8496,
0.85603637,
0.8623818,
0.86483634,
0.8684],
'loss': [2.062013185561787,
0.5567055275223471,
0.4938889234282754,
0.4560928507761522,
0.42596423987475307,
0.40540996001850477,
0.38881416422453796,
0.3748693872906945,
0.36312224663387643,
0.35331400035077876],
'val_accuracy': [0.8018,
0.822,
0.8324,
0.8416,
0.8396,
0.851,
0.8522,
0.853,
0.8474,
0.8574],
'val_loss': [0.6053233849525451,
0.5311240508079529,
0.49814177188873293,
0.4771014800071716,
0.472820001745224,
0.45861260561943057,
0.4453620403766632,
0.43645666875839234,
0.4519670714616775,
0.42980154151916505]}
接下来打印他的loss曲线和准确率曲线
def plot_learning_curves(history):
pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.grid(True)
plt.gca().set_ylim(0, 1)
plt.show()
plot_learning_curves(history)
最后计算测试集上的准确率
model.evaluate(x_test, y_test)
10000/10000 [==============================] - 0s 50us/sample - loss: 0.4591 - accuracy: 0.8440
[0.45911563657522203, 0.844]
数据归一化
在以上的实验中,数据的最大值和最小值我们可以看到,他在[0,255]区间内。我们现在需要对这个数据进行归一化,让其均值为0,方差为1.
归一化是利用特征的最大最小值,将特征的值缩放到[0,1]区间
reshape转化为二维数据,数组的每一行代表一条数据,每一列代表一个维度,对数组的每一列进行标准化
# x = (x-u)/std
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# x_train: [None, 28, 28] -> [None, 784]
x_train_scaled = scaler.fit_transform(
x_train.astype(np.float32).reshape(-1,1)).reshape(-1, 28, 28)
x_valid_scaled = scaler.transform(
x_valid.astype(np.float32).reshape(-1,1)).reshape(-1, 28, 28)
x_test_scaled = scaler.transform(
x_test.astype(np.float32).reshape(-1,1)).reshape(-1, 28, 28)
我们发现用这个归一化的数据去进行实验的话,他的loss曲线下降的更快更平滑,同时在训练轮数保持不变的情况下,准确率会得到提高。
回调函数
回调函数,也就是训练的中间过程中我们需要做一些事情,例如提早结束(earlystopping),保持最后的模型(modelcheckpoit),绘制过程曲线(tensorboard)等等
定义一个callbacks数组,里面放置我们需要的操作,这边就以上面的三个作为例子。
# Tensorboard, earlystopping, ModelCheckpoint
logdir = './callbacks'
if not os.path.exists(logdir):
os.mkdir(logdir)
#保存最好的模型的路径
output_model_file = os.path.join(logdir,
"fashion_mnist_model.h5")
# EarlyStopping min_delta前后两次的阈值 patience能够忍受小于最小阈值的次数
callbacks = [
keras.callbacks.TensorBoard(logdir),
keras.callbacks.ModelCheckpoint(output_model_file,
save_best_only=True),
keras.callbacks.EarlyStopping(patience=5, min_delta=1e-3),
]
history = model.fit(x_train_scaled, y_train, epochs = 10,
validation_data = (x_valid_scaled, y_valid),
callbacks = callbacks)
回归模型
我们采用的数据集为房价预测的数据集,数据集的训练数据的维度为(20640, 8),房价的预测结果为输出数据,维度为(20640,)。其中训练数据的8,指的是影响房价的8个特征。
读取数据,划分数据:
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
from sklearn.model_selection import train_test_split
# test_size默认为0.25 也就是0.75分为训练集,0.25分为测试集
# random_state:随机数种子
x_train_all, x_test, y_train_all, y_test = train_test_split(
housing.data, housing.target, random_state = 7, test_size = 0.25)
x_train, x_valid, y_train, y_valid = train_test_split(
x_train_all, y_train_all, random_state = 11)
print(x_train.shape, y_train.shape)
print(x_valid.shape, y_valid.shape)
print(x_test.shape, y_test.shape)
输出结果为:
(11610, 8) (11610,)
(3870, 8) (3870,)
(5160, 8) (5160,)
对数据进行归一化处理:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_valid_scaled = scaler.transform(x_valid)
x_test_scaled = scaler.transform(x_test)
建立模型,打印模型,建立编译器,定义callbacks,训练模型。
model = keras.models.Sequential([
keras.layers.Dense(30, activation='relu',
input_shape=x_train.shape[1:]),
keras.layers.Dense(1),
])
model.summary()
model.compile(loss="mean_squared_error", optimizer = "sgd")
callbacks = [keras.callbacks.EarlyStopping(patience=5, min_delta=1e-3)]
history = model.fit(x_train_scaled, y_train,
validation_data = (x_valid_scaled, y_valid),
epochs = 100,
callbacks = callbacks)
绘制loss曲线,测试 测试集。
def plot_learning_curves(history):
pd.DataFrame(history.history).plot(figsize=(8,5))
plt.grid(True)
plt.gca().set_ylim(0,1)
plt.show()
plot_learning_curves(history)
model.evaluate(x_test_scaled, y_test)
5160/5160 [==============================] - 0s 23us/sample - loss: 0.3925
0.3925278560135716
分类模型深度神经网络
和之前的分类模型相比,主要是相差在他的建立模型的部分。他的网络层需要建立多层,例如20层。
# tf.keras.models.Sequential()
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape = [28, 28]))
for _ in range(20):
model.add(keras.layers.Dense(100, activation="relu"))
model.add(keras.layers.Dense(10, activation="softmax"))
# sparse的原因:因为y->index.
model.compile(loss="sparse_categorical_crossentropy",
optimizer = 'sgd',
metrics = ['accuracy'])
批归一化
还是修改建立深度神经网络的部分,在添加层的下方增加批归一化层进行处理。
tf.keras.models.Sequential()
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape = [28, 28]))
for _ in range(20):
model.add(keras.layers.Dense(100, activation="relu"))
model.add(keras.layers.BatchNormalization())
"""
上面的方法relu激活函数放在批归一化之前
以下的方法,将bn放在激活函数前
model.add(keras.layers.Dense(100))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Activation("relu"))
"""
model.add(keras.layers.Dense(10, activation="softmax"))
# sparse的原因:因为y->index.
model.compile(loss="sparse_categorical_crossentropy",
optimizer = 'sgd',
metrics = ['accuracy'])
批归一化的作用:有效的缓解梯度消失,加快模型训练的收敛。
激活函数
激活函数的话,也就是直接修改"relu",可以修改成"selu"
selu的激活函数可以理解为relu和bn两者相结合,并且收敛速度更快,效果更好,建议使用selu。
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape = [28, 28]))
for _ in range(20):
model.add(keras.layers.Dense(100, activation="selu"))
model.add(keras.layers.Dense(10, activation="softmax"))
# sparse的原因:因为y->index.
model.compile(loss="sparse_categorical_crossentropy",
optimizer = 'sgd',
metrics = ['accuracy'])
droupout
用于防止过拟合。一般放在深度神经网络的最后一层。
# tf.keras.models.Sequential()
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape = [28, 28]))
for _ in range(20):
model.add(keras.layers.Dense(100, activation="selu"))
model.add(keras.layers.AlphaDropout(rate=0.5))
# AlphaDropout(比普通的droupout更好,参数一般设置为0.5): 1. 均值和方差不变 2. 归一化性质也不变
model.add(keras.layers.Dense(10, activation="softmax"))
# sparse的原因:因为y->index.
model.compile(loss="sparse_categorical_crossentropy",
optimizer = 'sgd',
metrics = ['accuracy'])
Wide&Deep模型
其建立模型部分,模型主要结构是将两部分wide和deep的结果在最后一层组合拼接在一起。
需要使用函数是API
# 函数式API
input = keras.layers.Input(shape=x_train.shape[1:])
hidden1 = keras.layers.Dense(30, activation='relu')(input)
hidden2 = keras.layers.Dense(30, activation='relu')(hidden1)
#复合函数: f(x) = h(g(x))
concat = keras.layers.concatenate([input, hidden2])
output = keras.layers.Dense(1)(concat)
model = keras.models.Model(inputs = [input],
outputs = [output])
model.summary()
model.compile(loss="mean_squared_error", optimizer = "sgd")
callbacks = [keras.callbacks.EarlyStopping(patience=5, min_delta=1e-3)]
打印模型结构的输出为:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) [(None, 8)] 0
__________________________________________________________________________________________________
dense_2 (Dense) (None, 30) 270 input_2[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 30) 930 dense_2[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 38) 0 input_2[0][0]
dense_3[0][0]
__________________________________________________________________________________________________
dense_4 (Dense) (None, 1) 39 concatenate[0][0]
==================================================================================================
Total params: 1,239
Trainable params: 1,239
Non-trainable params: 0
__________________________________________________________________________________________________
子类API实现wide&deep
我们定义一个类来定义实现这个网络模型:
# 子类API
class WideDeepModel(keras.models.Model):
def __init__(self):
super(WideDeepModel, self).__init__()
# 定义模型的层次
self.hidden1_layer = keras.layers.Dense(30, activation='relu')
self.hidden2_layer = keras.layers.Dense(30, activation='relu')
self.output_layer = keras.layers.Dense(1)
def call(self, input):
# 完成模型的正向运算
hidden1 = self.hidden1_layer(input)
hidden2 = self.hidden2_layer(hidden1)
concat = keras.layers.concatenate([input, hidden2])
output = self.output_layer(concat)
return output
# model = WideDeepModel()
model = keras.models.Sequential([
WideDeepModel(),
])
model.build(input_shape=(None, 8))
model.summary()
model.compile(loss="mean_squared_error", optimizer = "sgd")
callbacks = [keras.callbacks.EarlyStopping(patience=5, min_delta=1e-3)]
多输入、多输出
多输入,我们以原来输入数据的前5个特征作为wide的输入数据,后6个特征作为deep的输入数据。
多输出,我们可以看懂有两个输出的结果。
我们在keras.models.Model中的输入和输出都有多个。
# 多输入
input_wide = keras.layers.Input(shape=[5])
input_deep = keras.layers.Input(shape=[6])
hidden1 = keras.layers.Dense(30, activation='relu')(input_deep)
hidden2 = keras.layers.Dense(30, activation='relu')(hidden1)
concat = keras.layers.concatenate([input_wide, hidden2])
output = keras.layers.Dense(1)(concat)
output2 = keras.layers.Dense(1)(hidden2)
model = keras.models.Model(inputs = [input_wide, input_deep],
outputs = [output, output2])
model.summary()
model.compile(loss="mean_squared_error", optimizer = "sgd")
callbacks = [keras.callbacks.EarlyStopping(patience=5, min_delta=1e-3)]
打印模型结构:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) [(None, 6)] 0
__________________________________________________________________________________________________
dense (Dense) (None, 30) 210 input_2[0][0]
__________________________________________________________________________________________________
input_1 (InputLayer) [(None, 5)] 0
__________________________________________________________________________________________________
dense_1 (Dense) (None, 30) 930 dense[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 35) 0 input_1[0][0]
dense_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 1) 36 concatenate[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 1) 31 dense_1[0][0]
==================================================================================================
Total params: 1,207
Trainable params: 1,207
Non-trainable params: 0
因为输入数据有两个,所以我们需要对原始的数据进行划分
x_train_scaled_wide = x_train_scaled[:, :5]
x_train_scaled_deep = x_train_scaled[:, 2:]
x_valid_scaled_wide = x_valid_scaled[:, :5]
x_valid_scaled_deep = x_valid_scaled[:, 2:]
x_test_scaled_wide = x_test_scaled[:, :5]
x_test_scaled_deep = x_test_scaled[:, 2:]
history = model.fit([x_train_scaled_wide, x_train_scaled_deep], [y_train, y_train],
validation_data = ([x_valid_scaled_wide, x_valid_scaled_deep], [y_valid, y_valid]),
epochs = 100,
callbacks = callbacks)
因为最后输出有两个,所以也有两个loss,所以最后的总的loss是将这两个loss按照系数1,1加权求和。
超参数搜索
最普通的方法,用循环的方式,循环所有的learning_rate。
# learning_rate: [1e-4, 3e-4, 1e-3, 3e-3, 1e-2, 3e-2]
# W = W + grad * learning_rate
learning_rates = [1e-4, 3e-4, 1e-3, 3e-3, 1e-2, 3e-2]
histories = []
for lr in learning_rates:
model = keras.models.Sequential([
keras.layers.Dense(30, activation='relu',
input_shape=x_train.shape[1:]),
keras.layers.Dense(1),
])
optimizer = keras.optimizers.SGD(lr)
model.compile(loss="mean_squared_error", optimizer = optimizer)
callbacks = [keras.callbacks.EarlyStopping(patience=5, min_delta=1e-2)]
history = model.fit(x_train_scaled, y_train,
validation_data = (x_valid_scaled, y_valid),
epochs = 100,
callbacks = callbacks)
#我们需要把每个learning_rate对应的history都保存下来
histories.append(history)
sklearn超参数搜索
sklearn封装keras模型。
三个需要随机化的参数分别为hidden_layers 、layer_size 、learning_rate
# RandomizedSearchCV
# 1. 转化为sklearn的model
# 2. 定义参数集合
# 3. 搜索参数
def build_model(hidden_layers = 1,
layer_size = 30,
learning_rate = 3e-3):
model = keras.models.Sequential()
model.add(keras.layers.Dense(layer_size, activation='relu',
input_shape=x_train.shape[1:]))
for _ in range(hidden_layers-1):
model.add(keras.layers.Dense(layer_size, activation='relu'))
model.add(keras.layers.Dense(1))
optimizer = keras.optimizers.SGD(learning_rate)
model.compile(loss='mse', optimizer=optimizer)
return model
sklearn_model = keras.wrappers.scikit_learn.KerasRegressor(build_model)
callbacks = [keras.callbacks.EarlyStopping(patience=5, min_delta=1e-2)]
history = sklearn_model.fit(x_train_scaled, y_train,
epochs=100,
validation_data = (x_valid_scaled, y_valid),
callbacks = callbacks)
现在对这三个随机化组合。
from scipy.stats import reciprocal
# 分布函数:f(x) = 1/(x*log(b/a)) a<=x<=b
param_distribution = {
"hidden_layers": [1, 2, 3, 4],
"layer_size": np.arange(1, 100),
"learning_rate":reciprocal(1e-4, 1e-2),
}
from sklearn.model_selection import RandomizedSearchCV
#n_iter 从上面那个参数集合中随机出多少种组合。
#njobs 并行化多少个线程
random_search_cv = RandomizedSearchCV(sklearn_model,
param_distribution,
n_iter = 10,
n_jobs = 1)
"""
cross_validation: 训练集分成n份,n-1份训练,最后一份验证
默认cv = 3
random_search_cv = RandomizedSearchCV(sklearn_model,
param_distribution,
n_iter = 10,
cv = 3,
n_jobs = 1)
"""
random_search_cv.fit(x_train_scaled, y_train,
epochs=100,
validation_data = (x_valid_scaled, y_valid),
callbacks = callbacks)
最后我们看一下最好的参数,最好的分数,最好的模型。然后用最好的模型测试一下测试集合上的效果。
print(random_search_cv.best_params_)
print(random_search_cv.best_score_)
print(random_search_cv.best_estimator_)
model = random_search_cv.best_estimator_.model
model.evaluate(x_test_scaled, y_test)
输出结果为:
{'hidden_layers': 2, 'layer_size': 64, 'learning_rate': 0.0074971092331801445}
-0.34148263905604886
<tensorflow.python.keras.wrappers.scikit_learn.KerasRegressor object at 0x7f2d38b72d30>
5160/5160 [==============================] - 0s 17us/sample - loss: 0.3413
0.3412971150043399