Python深度学习之神经网络入门

最新推荐文章于 2024-06-18 09:16:39 发布

CDFMLR

最新推荐文章于 2024-06-18 09:16:39 发布

阅读量1.1k

点赞数 2

分类专栏： Python深度学习文章标签： python 机器学习人工智能

本文链接：https://blog.csdn.net/u012419550/article/details/107202126

版权

本文是《Deep Learning with Python》的读书笔记，涵盖了电影评论的二分类问题、新闻的多分类问题以及波士顿房价的回归问题。通过IMDB和路透社数据集，运用Keras构建和训练神经网络，探讨了激活函数、损失函数和网络结构的选择。实验证明，合适的网络维度和层数对模型性能至关重要，避免过拟合也是一项关键任务。

摘要由CSDN通过智能技术生成

Deep Learning with Python

这篇文章是我学习《Deep Learning with Python》(第二版，François Chollet 著) 时写的系列笔记之一。文章的内容是从 Jupyter notebooks 转成 Markdown 的，当我完成所以文章后，会在 GitHub 发布我写的所有 Jupyter notebooks。

你可以在这个网址在线阅读这本书的正版原文(英文)：https://livebook.manning.com/book/deep-learning-with-python

这本书的作者也给出了一套 Jupyter notebooks：https://github.com/fchollet/deep-learning-with-python-notebooks

本文为 第3章神经网络入门 (Chapter 3. Getting started with neural networks) 的笔记整合。

本文目录：

电影评论分类:二分类问题

原文链接

IMDB 数据集

IMDB 数据集里是 50,000 条电影评论。一半是训练集，一半是测试集。
数据里 50% 是积极评价，50% 是消极评价。

Keras 内置了做过预处理的 IMDB 数据集，把单词序列转化成了整数序列（一个数字对应字典里的词）：

from tensorflow.keras.datasets import imdb

# 数据集
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(
    num_words=10000)

num_words=10000 是只保留出现频率前 10000 的单词。

先来随便看一条评论，这是条好评：

# 字典
index_word = {
   v: k for k, v in imdb.get_word_index().items()}

# 还原一条评论看看
text = ' '.join([index_word[i] for i in train_data[0]])

print(f"{train_labels[0]}:", text)

1: the as you with out themselves powerful lets loves their becomes reaching had journalist of lot from anyone to have after out atmosphere never more room and it so heart shows to years of every never going and help moments or of every chest visual movie except her was several of enough more with is now current film as you of mine potentially unfortunately of you than him that with out themselves her get for was camp of you movie sometimes movie that with scary but and to story wonderful that in seeing in character to of 70s musicians with heart had shadows they of here that with her serious to have does when from why what have critics they is you that isn't one will very to as itself with other and in of seen over landed for anyone of and br show's to whether from than out themselves history he name half some br of and odd was two most of mean for 1 any an boat she he should is thought frog but of script you not while history he heart to real at barrel but when from one bit then have two of script their with her nobody most that with wasn't to with armed acting watch an for with heartfelt film want an

数据准备

先看一下 train_data 现在的形状：

train_data.shape

输出：

(25000,)

我们要把它变成 (samples, word_indices) 的样子，大概是下面这种：

[[0, 0, ..., 1, ..., 0, ..., 1],
 [0, 1, ..., 0, ..., 1, ..., 0],
 ...
]

有这个词就是 1，没有就是 0。

import numpy as np

def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.
    return results

x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

x_train

输出：

array([[0., 1., 1., ..., 0., 0., 0.],
       [0., 1., 1., ..., 0., 0., 0.],
       [0., 1., 1., ..., 0., 0., 0.],
       ...,
       [0., 1., 1., ..., 0., 0., 0.],
       [0., 1., 1., ..., 0., 0., 0.],
       [0., 1., 1., ..., 0., 0., 0.]])

labels 也随便搞一下：

train_labels

输出：

array([1, 0, 0, ..., 0, 1, 0])

处理一下：

y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')

y_train

输出：

array([1., 0., 0., ..., 0., 1., 0.], dtype=float32)

现在这些数据就可以安全投喂我们一会儿建的神经网络了。

建立网络

对于这种输入是向量、标签是标量（甚至是 0 或 1）的问题：
使用 relu 激活的 Dense (全连接)堆起来的网络：

Dense(16, activation='relu')

这种层的作用是 output = relu(dot(W, input) + b)。

16 是每层里隐藏单元(hidden unit)的个数。一个 hidden unit 就是在这层的表示空间里的一个维度。
W 的形状也是 (input_dimension, 16)，dot 出来就是个 16 维的向量，也就把数据投影到了 16 维的表示空间。

这个维度 (hidden unit 的数量) 可以看成对网络学习的自由度的控制。
维度越高，能学的东西越复杂，但计算消耗也越大，而且可能学到一些不重要的东西导致过拟合。

这里，我们将使用两层这种16个隐藏单元的层，
最后还有一个 sigmoid 激活的层来输出结果（在 $[0, 1]$ 内的值），
这个结果表示预测有多可能这个数据的标签是1，即一条好评。

relu 是过滤掉负值（把输入的负值输出成0），sigmoid 是把值投到 [0, 1]：

relu and sigmoid

在 Keras 中实现这个网络：

from tensorflow.keras import models
from tensorflow.keras import layers

model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000, )))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

激活函数的作用

我们之前在 MNIST 里用过 relu 激活函数，所以激活函数到底是干嘛的？

一个没有激活函数的 Dense 层的作用只是一个线性变换：

output = dot(W, input) + b

如果每一层都是这种线性变换，把多个这种层叠在一起，假设空间并不会变大，所以能学到的东西很有限。

而激活函数就是在 dot(W, input) + b 外面套的一个函数，比如 relu 激活是 output = relu(dot(W, input) + b)。
利用这种激活函数，可以拓展表示空间，也就可以让网络学习到更复杂的“知识”。

编译模型

编译模型时，我们还需要选择损失函数、优化器和指标。

对这种最后输出 0 或 1 的二元分类问题，损失函数可以使用 binary_crossentropy(从名字就可以看得出来很合适啦)。

这个 crossentropy 中文叫交叉熵，是信息论里的，是用来衡量概率分布直接的距离的。
所以输出概率的模型经常是用这种 crossentropy 做损失的。

至于优化器，和 MNIST 一样，我们用 rmsprop （书里还没写为什么），指标也还是准确度：

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

因为这几个optimizer、loss、metrics 都是常用的，所以 Keras 内置了，可以直接传字符串。
但也可以传类实例来定制一些参数：

from tensorflow.keras import optimizers
from tensorflow.keras import losses
from tensorflow.keras import metrics

model.compile(optimizer=optimizers.RMSprop(lr=0.001),
              loss=losses.binary_crossentropy,
              metrics=[metrics.binary_accuracy])

训练模型

为了在训练的过程中验证模型在它没见过的数据上精度如何，我们从原来的训练数据里分 10,000 个样本出来：

x_val = x_train[:10000]
partial_x_train = x_train[10000:]

y_val = y_train[:10000]
partial_y_train = y_train[10000:]

用一批 512 个样本的 mini-batches，跑 20 轮（所有x_train里的数据跑一遍算一轮），
并用刚分出来的 10,000 的样本做精度验证：

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['acc'])

history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=20,
                    batch_size=512,
                    validation_data=(x_val, y_val))

Train on 15000 samples, validate on 10000 samples
Epoch 1/20
15000/15000 [==============================] - 3s 205us/sample - loss: 0.5340 - acc: 0.7867 - val_loss: 0.4386 - val_acc: 0.8340
.......
Epoch 20/20
15000/15000 [==============================] - 1s 74us/sample - loss: 0.0053 - acc: 0.9995 - val_loss: 0.7030 - val_acc: 0.8675

fit 阔以返回 history，里面保存了训练过程里每个 Epoch 的黑历史：

history_dict = history.history
history_dict.keys()

输出：

dict_keys(['loss', 'acc', 'val_loss', 'val_acc'])

我们可以把这些东西画图出来看：

# 画训练和验证的损失

import matplotlib.pyplot as plt

history_dict = history.history
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']

epochs = range(1, len(loss_values) + 1)

plt.plot(epochs, loss_values, 'ro-', label='Training loss')
plt.plot(epochs, val_loss_values, 'bs-', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

png

# 画训练和验证的准确度

plt.clf()

acc = history_dict['acc']
val_acc = history_dict['val_acc']

plt.plot(epochs, acc, 'ro-', label='Training acc')
plt.plot(epochs, val_acc, 'bs-', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

png

我们可以看到，训练集上的精度倒是一直在增（损失一直减），
但在验证集上，到了后面损失反而大了，差不多第4轮左右就到最好的峰值了。

这就是过拟合了，其实从第二轮开始就开始过了。所以，我们其实跑个3、4轮就 ok 了。
要是再跑下去，咱的模型就只“精通”训练集，不认识其他没见过的数据了。

所以，我们重新训练一个模型（要从建立网络开始重写，不然fit是接着刚才已经进行过的这些），那去用测试集测试：

model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000, )))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop',
             loss='binary_crossentropy',
             metrics=['accuracy'])

model.fit(x_train, y_train, epochs=4, batch_size=512)
result = model.evaluate(x_test, y_test, verbose=2)    # verbose=2 to avoid a looooong progress bar that fills the screen with '='. https://github.com/tensorflow/tensorflow/issues/32286

Train on 25000 samples
Epoch 1/4
25000/25000 [==============================] - 2s 69us/sample - loss: 0.4829 - accuracy: 0.8179
Epoch 2/4
25000/25000 [==============================] - 1s 42us/sample - loss: 0.2827 - accuracy: 0.9054
Epoch 3/4
25000/25000 [==============================] - 1s 42us/sample - loss: 0.2109 - accuracy: 0.9253
Epoch 4/4
25000/25000 [==============================] - 1s 43us/sample - loss: 0.1750 - accuracy: 0.9380
25000/1 - 3s - loss: 0.2819 - accuracy: 0.8836

把结果输出出来看看：