深度学习的入门Python代码：电影评论分类—二分类问题

咳咳~~

已于 2022-11-19 11:06:10 修改

阅读量762

点赞数 2

分类专栏：自然语言处理笔记文章标签： python 深度学习分类神经网络

于 2022-11-19 11:05:10 首次发布

本文链接：https://blog.csdn.net/weixin_60805452/article/details/127934153

版权

自然语言处理笔记专栏收录该内容

3 篇文章 0 订阅

订阅专栏

深度学习的入门Python代码：电影评论分类—二分类问题

1.加载数据集

#加入忽略
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# 1.加载数据集
from keras.datasets import imdb

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words = 10000)

2.将数据集张量化

# 2.使用网络第一层  或者  编码方式进行张量化
# 此处使用了编码方式将训练数据的数字序列编码为二进制矩阵，即将整数序列转换为张量
import numpy as np

# 定义训练数据序列转换为二进制矩阵的函数
def vectorize_sequences(sequences, dimension = 10000):
    results = np.zeros((len(sequences), dimension))
    for i , sequence in enumerate(sequences):
        results[i, sequence] = 1
    return results

# 训练数据和测试数据向量化
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

# 训练标签和测试标签向量化
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')

3.构建神经网络

# 3.构建神经网络，选择神经网络模型，确定层数，并且确定使用哪种激活函数和哪种层
#   这里使用的是3个全连接层，激活函数分别是前两个"relu"，最后一层使用"sigmoid"
from keras import models
from keras import layers
from keras import optimizers
from keras import regularizers

def build_model_1():
    model = models.Sequential()
    model.add(layers.Dense(16, activation= 'relu', input_shape = (10000,)))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(16, activation= 'relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(1, activation= 'sigmoid'))

4.留出验证集

# 4.留出验证集， 流出10000个样本作为验证集
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]

5.开始训练模型

model_1 = build_model_1()
history_1 = model_1.fit(x_train,
                    y_train,
                    epochs= 12,
                    batch_size= 512,
                    validation_data= (x_test, y_test))

在这里插入图片描述

6.绘制训练损失和验证损失图像

import matplotlib.pyplot as plt

history_dict_1 = history_1.history
loss = history_dict_1['loss']
val_loss = history_dict_1['val_loss']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, partial_x_train, 'bo', label = 'Traing loss')
plt.plot(epochs, val_loss_values_1, 'b', label = 'Validation loss')
plt.title('Traing and Smaller Validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid()
plt.show()

在这里插入图片描述

7.绘制训练精度和验证精度

plt.clf()
acc = history_dict_1['accuracy']
val_acc = history_dict_1['val_accuracy']

plt.plot(epochs, acc, 'bo', label = 'Traing accuracy')
plt.plot(epochs, val_acc, 'b', label = 'Validation accuracy')
plt.title('Traing and Validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

在这里插入图片描述

由图像可得，当轮次到达第3次后，开始过拟合，所以在测试数据上，可以在第三轮次就停止训练。

8.重新训练模型（这次直接使用测试数据）

model_1 = build_model_1()
history_1 = model_1.fit(x_train,
                    y_train,
                    epochs= 3,
                    batch_size= 512,
                    validation_data= (x_test, y_test))
results = model_1.evaluate(x_test, y_test)
print(results)

在这里插入图片描述

9.生成预测结果

# 使用训练好的网络在新数据上生成预测结果
predict = model_1.predict(x_test)
print(predict)

在这里插入图片描述

最终准确率达到88.8%，后续可以通过调参的方式继续对此模型进行优化。

咳咳~~

关注

2
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录