本文为《Python深度学习》的学习笔记。
第3章 神经网络入门
3.1.1 层:深度学习的基础组件 3.1.2 模型:层构成的网络 3.1.3 损失函数与优化器:配置学习过程的关键
3.2.1 Keras、Tensorflow、Theano和CNTK
3.4.1 IMDB数据集 3.4.2 准备数据 3.4.3 构建网络 3.4.4 验证你的方法
3.4.5 使用训练好的网络在新数据上生成预测结果 3.4.6 进一步实验
3.5.1 路透社数据集 3.5.2 准备数据 3.5.3 构建网络 3.5.4 验证你的方法 完整代码
3.5.5 在新数据上生成预测结果 3.5.6 处理标签和损失的另一种方法
3.5.7 中间层维度足够大的重要性 3.5.8 进一步实验 3.5.9 小结
3.6.1 波士顿房价数据集 3.6.2 准备数据 3.6.3 构建网络 3.6.4 利用K折验证来验证你的方法
第3章 神经网络入门
3.1 神经网络剖析
层,多层(网络模型)
输入数据和相应的目标
损失函数
优化器
3.1.1 层:深度学习的基础组件
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(32, input_shape=(784,)))
model.add(layers.Dense(32))
3.1.2 模型:层构成的网络
双分支网络(two-branch)
多头网络(multihead)
Inception模块
3.1.3 损失函数与优化器:配置学习过程的关键
损失函数(目标函数):最小化,能衡量当前任务是否已经成功完成。
优化器:决定如何基于损失函数对网络进行更新。
对于二分类问题,就可以选用二元交叉熵(binary crossentropy)损失函数;
对于多分类问题,可以用分类交叉熵(categorical crossentropy)损失函数;
对于序列数据问题,可以用联结时序分类(CTC)损失函数
3.2 Keras简介
特性:
- 相同的代码可以在CPU或者GPU上无缝切换运行
- 具有用户良好的API,便于快速开发深度学习模型
- 内置支持卷积网络,循环网络以及二者任意组合
- 支持任意网络架构:多输入或多输出、层共享、模型共享等
3.2.1 Keras、Tensorflow、Theano和CNTK
目前Keras有三个后端实现:Tensorflow后端(Google)、Theano后端(蒙特利尔大学)、CNTK(Microsoft)。
其中在Tensorflow的GPU上运行时叫做NVIDIA CUDA(cuDNN)
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(32, activation = 'relu', input_shape=(784,)))
model.add(layers.Dense(10, activation = 'softmax'))
# 下面是用函数式API定义的相同模型
input_tensor = layers.Input(shape = (784, ))
x = layers.Dense(32, activation = 'relu')(input_tensor)
output_tensor = layers.Dense(10, activation = 'softmax')(x)
model = models.Model(inputs = input_tensor, outputs = output_tensor)
定义好模型后,配置学习过程是在编译这一步,需要制定模型使用的优化器和损失函数。
from keras import optimizers
model.compile(optimizer = optimizers.RMSprop(lr = 0.001),
loss = 'mse',
metrics = ['accuracy'])
最后,学习过程通过fit()方法将输入数据的Numpy数组传入模型,与Scikit-Learn类似。
model.fit(input_tensor, target_tensor, batch_size = 128, epochs = 10)
3.3 建立深度学习工作站
安装GPU 这里推荐我写的另一篇blog:win10+Anaconda3+tensorflow(gpu)+cuda9.0+cudnn7.1+ide(sublime)+好用的科学工具包配置
Jupyter
首推这一个,详细可以见我另外一个学习笔记Google框架:Tensorflow
3.4 电影评论分类:二分类问题
3.4.1 IMDB数据集
来自互联网电影数据库50000条严重两极分化的评论。训练和测试各25000条。
# 3-1 加载IMDB数据库
from keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
print(train_data[0])
print(train_labels[0])
# 可以迅速把某条评论解码为英文单词
word_index = imdb.get_word_index()
reverse_word_index = dict(
[(value, key) for (key, value) in word_index.items()])
decoded_review = ' '.join(
[reverse_word_index.get(i - 3, '?') for i in train_data[0]])
# 输出
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
1
3.4.2 准备数据
- 填充列表,使其具有相同的长度(samples, word_indices)
- 对列表进行one-hot编码,使其转换为0或1组成的向量
# 3-2 将整数序列编码为二进制矩阵
import numpy as np
def vectorize_sequences(sequences, dimension = 10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
print(x_train[0])
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')
# 输出 [0. 1. 1. ... 0. 0. 0.]
3.4.3 构建网络
# 3-3 模型定义
from keras import models
from keras import layers
# 序列数据
model = models.Sequential()
#使用两个中间层,没吃呢个16个隐藏单元
model.add(layers.Dense(16, activation = 'relu', input_shape = (10000,) ))
model.add(layers.Dense(16, activation = 'relu'))
# 用sigmoid对最后一层激活输出一个0-1范围内概率值
model.add(layers.Dense(1, activation = 'sigmoid'))
在编译器处选择优化器和激活函数,这里使用RMSProp优化器和binary_crossentropy损失函数来配置模型
# 3-4 编译模型
model.compile(optimizer = 'rmsprop',
loss = 'binary_crossentropy',
metrics = ['accuracy'])
卓和可以自己通过向loss和metrics参数传入函数对象
# 3-5 配置优化器
from keras import optimizers
model.compile(optimizer = optimizers.RMSprop(lr = 0.001),
loss = 'binary_crossentropy',
metrics = ['accuracy'])
# 3-6 使用自定义的损失和指标
from keras import losses
from keras import metrics
model.compile(optimizer = optimizers.RMSprop(lr = 0.001),
loss = losses.binary_crossentropy,
metrics = [metrics.binary_accuracy])
3.4.4 验证你的方法
为了监控训练过程中在未见数据集上的精度,需要将原始训练的数据留出10000个样本作为验证集。
# 3-7 留出验证集
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]
# 3-8 训练模型
model.compile(optimizer = 'rmsprop',
loss = 'binary_crossentropy',
metrics = ['acc'])
# history 会记录每一次的运行结果
history = model.fit(partial_x_train,
partial_y_train,
epochs = 20,
batch_size = 512,
validation_data = (x_val, y_val))
history_dict = history.history
history_dict.keys()
# dict_keys(['val_acc', 'acc', 'val_loss', 'loss'])
- 绘制训练损失和验证损失
# 3-9 绘制训练损失和验证损失
import matplotlib.pyplot as plt
history_dict = history.history
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']
epochs = range(1, len(loss_values) + 1)
plt.plot(epochs, loss_values, 'bo', label = 'Training loss')
plt.plot(epochs, val_loss_values, 'b', label = 'Validation loss')
plt.title('Training and Validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.show()
- 绘制训练精度和验证精度
# 3-10 绘制训练精度和验证精度
plt.clf()
acc = history_dict['acc']
val_acc = history_dict['val_acc']
plt.plot(epochs, acc, 'bo', label = 'Training acc')
plt.plot(epochs, val_acc, 'b', label = 'Validation acc')
plt.title('Training and Validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
- 可以发现在第二轮后,模型在训练数据上精度很高,但是在验证精度上并非如此,此时是过拟合现象(overfit)
# 从头开始训练一个模型
model = models.Sequential()
model.add(layers.Dense(16, activation = 'relu', input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics = ['accuracy'])
model.fit(x_train, y_train, epochs=4, batch_size=512)
results = model.evaluate(x_test, y_test)
print(results)
# [0.28807684420585633, 0.88504]
3.4.5 使用训练好的网络在新数据上生成预测结果
model.predict(x_test)
model.predict(x_test)
model.predict(x_test)
array([[0.23583308],
[0.99974436],
[0.85348594],
...,
[0.12692562],
[0.06978337],
[0.6001638 ]], dtype=float32)
3.4.6 进一步实验
- 可以尝试使用一个或者三个隐藏层
# 使用三个隐藏层
model1 = models.Sequential()
model1.add(layers.Dense(16, activation = 'relu', input_shape=(10000,)))
model1.add(layers.Dense(16, activation='relu'))
model1.add(layers.Dense(16, activation='relu'))
model1.add(layers.Dense(1, activation='sigmoid'))
model1.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics = ['accuracy'])
model1.fit(x_train, y_train, epochs=4, batch_size=512)
results1 = model1.evaluate(x_test, y_test)
print(results1)
# 使用一个隐藏层
model2 = models.Sequential()
model2.add(layers.Dense(16, activation = 'relu', input_shape=(10000,)))
model2.add(layers.Dense(1, activation='sigmoid'))
model2.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics = ['accuracy'])
model2.fit(x_train, y_train, epochs=4, batch_size=512)
results2 = model2.evaluate(x_test, y_test)
print(results2)
- 尝试使用更多或者更少的隐藏单元,比如32个、64个
# 使用三个隐藏层
model1 = models.Sequential()
model1.add(layers.Dense(32, activation = 'relu', input_shape=(10000,)))
model1.add(layers.Dense(32, activation='relu'))
model1.add(layers.Dense(1, activation='sigmoid'))
model1.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics = ['accuracy'])
model1.fit(x_train, y_train, epochs=4, batch_size=512)
results1 = model1.evaluate(x_test, y_test)
print(results1)
# 使用一个隐藏层
model2 = models.Sequential()
model2.add(layers.Dense(64, activation = 'relu', input_shape=(10000,)))
model1.add(layers.Dense(64, activation='relu'))
model2.add(layers.Dense(1, activation='sigmoid'))
model2.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics = ['accuracy'])
model2.fit(x_train, y_train, epochs=4, batch_size=512)
results2 = model2.evaluate(x_test, y_test)
print(results2)
- 尝试使用mse损失函数代替binary_crossentropy
model = models.Sequential()
model.add(layers.Dense(16, activation = 'relu', input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='mse',
metrics = ['accuracy'])
model.fit(x_train, y_train, epochs=4, batch_size=512)
results = model.evaluate(x_test, y_test)
print(results)
- 尝试使用tanh激活代替relu
model = models.Sequential()
model.add(layers.Dense(16, activation = 'tanh', input_shape=(10000,)))
model.add(layers.Dense(16, activation='tanh'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics = ['accuracy'])
model.fit(x_train, y_train, epochs=4, batch_size=512)
results = model.evaluate(x_test, y_test)
print(results)
3.5 新闻分类:多分类问题
3.5.1 路透社数据集
数据包含许多短新闻及对应的主题。
# 3-12 加载路透社数据集
from keras.datasets import reuters
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)
print(len(train_data))
print(len(test_data))
# 3-13 将索引解码为新闻文本
word_index = reuters.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
decoded_newswire = ''.join([reverse_word_index.get(i - 3, '?') for i in train_data[0]])
3.5.2 准备数据
将测试数据和训练数据向量化
# 3-14 编码数据
import numpy as np
def vetorize_sequences(sequences, dimension = 10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1
return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
one-hot 编码:将标签向量化
def to_one_hot(labels, dimension = 46):
results = np.zeros((len(labels), dimension))
for i,label in enumerate(labels):
results[i, label] = 1
return results
# 使用one-hot编码来作为标签列表的分类编码
one_hot_train_labels = to_one_hot(train_labels)
one_hot_test_labels = to_one_hot(test_labels)
# 使用to_categorical将标签向量化
'''
from keras.utils.np_utils import to_categorical
one_hot_train_labels = to_categorical(train_labels)
one_hot_test_labels = to_categorical(test_labels)
'''
3.5.3 构建网络
从imdb的二分类变成现在46分类,每层使用64个单元。
# 3-15 模型定义
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(64, activation = 'relu', input_shape = (10000,)))
model.add(layers.Dense(64, activation = 'relu'))
model.add(layers.Dense(46, activation = 'softmax'))
# 3-16 编译模型
model.compile(optimizer = 'rmsprop',
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
3.5.4 验证你的方法
# 3-17 留出验证集
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]
# 3-18 训练模型
history = model.fit(partial_x_train,
partial_y_train,
epochs = 20,
batch_size = 512,
validation_data=(x_val, y_val))
- 绘制训练损失和验证损失
# 3-19 绘制训练损失和验证损失
import matplotlib.pyplot as plt
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'bo', label = 'Training loss')
plt.plot(epochs, val_loss, 'b', label = 'Validation loss')
plt.title('Training and Validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
- 绘制训练精度和验证精度
# 3-20 绘制训练精度和验证精度
import matplotlib.pyplot as plt
loss = history.history['acc']
val_loss = history.history['val_acc']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, acc, 'bo', label = 'Training acc')
plt.plot(epochs, val_acc, 'b', label = 'Validation acc')
plt.title('Training and Validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
完整代码
# 3-21 从头开始训练一个模型
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))
model.compile(optimizer = 'rmsprop',
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
model.fit(partial_x_train,
partial_y_train,
epochs=9,
batch_size=512,
validation_data=(x_val, y_val))
results = model.evaluate(x_test, one_hot_test_labels)
3.5.5 在新数据上生成预测结果
predict()
# 3-22 在新数据上生成预测结果
predictions = model.predict(x_test)
print(predictions[0].shape)
# (46,)
print(np.sum(predictions[0]))
# 0.9999998
print(np.argmax(predictions[0]))
# 3
3.5.6 处理标签和损失的另一种方法
y_train = np.array(train_labels)
y_test = np.array(test_labels)
model.compile(optimizer='rmsprop',
loss = 'sparse_categorical_crossentropy',
metrics=['acc'])
3.5.7 中间层维度足够大的重要性
中间层的单元一定要大于分类数
# 3-23 具有信息瓶颈足够大的重要性
model = models.Sequential()
model.add(layers.Dense(64, activation = 'relu', input_shape=(10000,)))
model.add(layers.Dense(4, activation = 'relu'))
model.add(layers.Dense(46, activation = 'softmax'))
model.compile(optimizer = 'rmsprop',
loss = 'categorical_crossentropy',
metrics=['accuracy'])
model.fit(partial_x_train,
partial_y_train,
epochs = 20,
batch_size = 128,
validation_data = (x_val, y_val))
# acc: 0.8582 - val_loss: 1.5473 - val_acc: 0.7200
精度为72,下降了接近10个百分比。
3.5.8 进一步实验
- 尝试使用更多或者更少的隐藏单元
- 使用一个或者三个隐藏层
3.5.9 小结
- 对N个类别的数据分类,网络最后一层应该是大小为N的Dense层
- 对于多标签问题,最后一层应该用softmax激活
- 使用分类交叉熵categorical_crossentropy
- 处理多分类问题的两种方法:
one-hot编码,categorical_crossentropy作为损失函数
将标签编码为整数,使用sparse_categorical_crossentropy为损失函数
- 如果有许多类别,应该避免使用太小的中间层,以免造成信息瓶颈
3.6 预测房价:回归问题
预测的是连续值,而不是离散的标签。
3.6.1 波士顿房价数据集
输入的数据每个特征是有不同的取值范围。
# 3-24 加载波士顿房价数据
from keras.datasets import boston_housing
(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()
print(train_data.shape)
# (404, 13)
print(test_data.shape)
# (102, 13)
print(train_targets)
3.6.2 准备数据
对数据特征标准化(normalize),对输入数据减去平均值,再除以标准差。
# 3-25 准备数据
mean = train_data.mean(axis=0)
train_data -= mean
std = train_data.std(axis=0)
train_data /=std
test_data -= mean
test_data /= std
3.6.3 构建网络
由于数据样本很小,使用一个小网络,减少过拟合程度。
from keras import models
from keras import layers
def build_model():
model = models.Sequential()
model.add(layers.Dense(64, activation='relu',input_shape=(train_data.shape[1],)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])
return model
3.6.4 利用K折验证来验证你的方法
K折交叉验证:份数据集分成K份,每次在K-1个模型上训练
# 3-27 K折验证
import numpy as np
k = 4
num_val_samples = len(train_data) // k
num_epochs = 100
all_scores = []
for i in range(k):
print('processing fold #', i)
val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]
# concatenate()拼接数组
partial_train_data = np.concatenate(
[train_data[:i * num_val_samples],
train_data[(i + 1) * num_val_samples:]],
axis = 0)
partial_train_targets = np.concatenate(
[train_targets[:i * num_val_samples],
train_targets[(i + 1) * num_val_samples:]],
axis = 0)
model = build_model()
model.fit(partial_train_data, partial_train_targets,
epochs=num_epochs, batch_size=1, verbose=0)
val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0)
all_scores.append(val_mae)
记录每一轮的验证分数(下面注意书中代码有误,history的key只有val_loss)
# 3-28 保存每折的验证结果
import numpy as np
k = 4
num_val_samples = len(train_data) // k
num_epochs = 500
all_mae_histories = []
for i in range(k):
print('processing fold #', i)
val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]
# concatenate()拼接数组
partial_train_data = np.concatenate(
[train_data[:i * num_val_samples],
train_data[(i + 1) * num_val_samples:]],
axis = 0)
partial_train_targets = np.concatenate(
[train_targets[:i * num_val_samples],
train_targets[(i + 1) * num_val_samples:]],
axis = 0)
model = build_model()
history = model.fit(partial_train_data, partial_train_targets,
validation_data = (val_data, val_targets),
epochs=num_epochs, batch_size=1, verbose=0)
mae_history = history.history['val_loss']
all_mae_histories.append(mae_history)
# 3-29 计算所有轮次中的K折验证分数平均值
`average_mae_history = [np.mean([x[i] for x in all_mae_histories]) for i in range(num_epochs)]
# 3-30 绘制验证分数
import matplotlib.pyplot as plt
plt.plot(range(1, len(average_mae_history) + 1), average_mae_history)
plt.xlabel('Epochs')
plt.ylabel('Validation MAE')
plt.show()
因为纵轴范围太大,删除前10个数据点,重新绘制
def smooth_curve(points, factor = 0.9):
smoothed_points = []
for point in points:
if smoothed_points:
previous = smoothed_points[-1]
smoothed_points.append(previous * factor + point * (1 - factor))
else:
smoothed_points.append(point)
return smoothed_points
smooth_mae_history = smooth_curve(average_mae_history[10:])
plt.plot(range(1, len(smooth_mae_history) + 1), smooth_mae_history)
plt.xlabel('Epochs')
plt.ylabel('Validation MAE')
plt.show()
使用最后训练的模型参数,再在所有数据上训练处最终的生产模型,观察在测试集上的性能。
# 3-32 训练最终模型
model = build_model()
model.fit(train_data, train_targets, epochs = 80, batch_size = 16, verbose=0)
test_mse_score, test_mae_score = model.evaluate(test_data, test_targets)
print(test_mae_score)
102/102 [==============================] - 1s 5ms/step
19.081563463398055