IMDB-Second-classification

二分类-IMDB-全连接

IMDB数据集:包含来自互联网电影数据库的50000条严重两极分化的评论。
其中的数据已经预处理过了:评论(单词序列)已经被转换为整数序列。其中每个整数代表字典中的某个单词。

加载IMDB数据集

from keras.datasets import imdb

(train_data,train_labels),(test_data,test_labels) = imdb.load_data(num_words=10000)#num_words=10000:仅保留训练数据中前10000个最常出现的单词
Using TensorFlow backend.

数据格式如下:

train_data[0]
[1,14,22,16,........,19,178,32]
train_labels[0]#0代表负面,1代表正面
1

将整数序列编码为二进制矩阵

import numpy as np
def vectorize_sequences(sequences,dimension=10000):
    results = np.zeros((len(sequences),dimension))
    for i,sequences in enumerate(sequences):
        results[i,sequences] = 1.
    return results
x_train = vectorize_sequences(train_data)#将数据向量化
x_test = vectorize_sequences(test_data)

y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')

数据格式如下:

x_train[0]
array([0., 1., 1., ..., 0., 0., 0.])
y_train[0]
1.0

模型定义

from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(16,activation='relu',input_shape=(10000,)))
model.add(layers.Dense(16,activation='relu'))
model.add(layers.Dense(1,activation='sigmoid'))

编译模型

from keras import optimizers
model.compile(optimizer=optimizers.RMSprop(lr=0.001),#等价:optimizer='rmsprop'
             loss='binary_crossentropy',#二元交叉熵 等价 loss=loss.binary_crossentropy
             metrics=['accuracy'])

留出验证集

x_val = x_train[:5000]#验证集
partial_x = x_train[20000:]#喂入的10000数据

y_val = y_train[:5000]
partial_y = y_train[20000:]

训练模型

512个样本组成一个小批量,模型训练20轮,查看损失和精度

history = model.fit(partial_x,
                   partial_y,
                   epochs=20,
                   batch_size=512,
                   validation_data=(x_val,y_val))
Train on 5000 samples, validate on 5000 samples
Epoch 1/20
5000/5000 [==============================] - 1s 160us/step - loss: 0.6254 - acc: 0.6744 - val_loss: 0.5463 - val_acc: 0.7770
Epoch 2/20
5000/5000 [==============================] - 1s 127us/step - loss: 0.4486 - acc: 0.8744 - val_loss: 0.4641 - val_acc: 0.8222
Epoch 3/20
5000/5000 [==============================] - 1s 101us/step - loss: 0.3429 - acc: 0.9172 - val_loss: 0.3895 - val_acc: 0.8602
Epoch 4/20
5000/5000 [==============================] - 1s 126us/step - loss: 0.2680 - acc: 0.9366 - val_loss: 0.4053 - val_acc: 0.8276
Epoch 5/20
5000/5000 [==============================] - 1s 104us/step - loss: 0.2055 - acc: 0.9612 - val_loss: 0.3356 - val_acc: 0.8658
Epoch 6/20
5000/5000 [==============================] - 1s 127us/step - loss: 0.1692 - acc: 0.9678 - val_loss: 0.3563 - val_acc: 0.8516
Epoch 7/20
5000/5000 [==============================] - 0s 95us/step - loss: 0.1319 - acc: 0.9784 - val_loss: 0.3378 - val_acc: 0.8592
Epoch 8/20
5000/5000 [==============================] - 1s 117us/step - loss: 0.1030 - acc: 0.9878 - val_loss: 0.3214 - val_acc: 0.8666
Epoch 9/20
5000/5000 [==============================] - 0s 95us/step - loss: 0.0845 - acc: 0.9896 - val_loss: 0.3687 - val_acc: 0.8542
Epoch 10/20
5000/5000 [==============================] - 1s 127us/step - loss: 0.0640 - acc: 0.9946 - val_loss: 0.3287 - val_acc: 0.8668
Epoch 11/20
5000/5000 [==============================] - 0s 98us/step - loss: 0.0484 - acc: 0.9970 - val_loss: 0.3365 - val_acc: 0.8672
Epoch 12/20
5000/5000 [==============================] - 1s 122us/step - loss: 0.0402 - acc: 0.9974 - val_loss: 0.3928 - val_acc: 0.8542
Epoch 13/20
5000/5000 [==============================] - 1s 101us/step - loss: 0.0300 - acc: 0.9986 - val_loss: 0.3823 - val_acc: 0.8550
Epoch 14/20
5000/5000 [==============================] - 1s 121us/step - loss: 0.0214 - acc: 0.9992 - val_loss: 0.3926 - val_acc: 0.8572
Epoch 15/20
5000/5000 [==============================] - 0s 93us/step - loss: 0.0161 - acc: 1.0000 - val_loss: 0.3999 - val_acc: 0.8600
Epoch 16/20
5000/5000 [==============================] - 1s 121us/step - loss: 0.0132 - acc: 1.0000 - val_loss: 0.5017 - val_acc: 0.8406
Epoch 17/20
5000/5000 [==============================] - 0s 95us/step - loss: 0.0096 - acc: 1.0000 - val_loss: 0.4391 - val_acc: 0.8570
Epoch 18/20
5000/5000 [==============================] - 1s 124us/step - loss: 0.0064 - acc: 1.0000 - val_loss: 0.4587 - val_acc: 0.8570
Epoch 19/20
5000/5000 [==============================] - 0s 92us/step - loss: 0.0047 - acc: 1.0000 - val_loss: 0.4774 - val_acc: 0.8584
Epoch 20/20
5000/5000 [==============================] - 1s 123us/step - loss: 0.0040 - acc: 1.0000 - val_loss: 0.6021 - val_acc: 0.8436

结果可视化

import matplotlib.pyplot as plt
%matplotlib inline

loss_values = history.history['loss']
val_loss_values = history.history['val_loss']
epochs = range(1,len(loss_values)+1)

plt.plot(epochs,loss_values,'bo',label = 'Training loss')
plt.plot(epochs,val_loss_values,'b',label = 'Validation loss')

plt.xlabel('Epochs')
plt.ylabel('Loss')

plt.legend()#显示标签

plt.show()

在这里插入图片描述

plt.clf()#清空图像
acc = history.history['acc']
val_acc = history.history['val_acc']

plt.plot(epochs,acc,'bo',label = 'Training acc')
plt.plot(epochs,val_acc,'b',label = 'Validation acc')
plt.title('Training and validation accuracy')

plt.xlabel('Epochs')
plt.ylabel('Accuracy')

plt.legend()#显示标签

plt.show()

在这里插入图片描述

第八轮的时候验证损失和验证精度趋于最佳

通过分析,使用最优的迭代次数重新查看效果

model = models.Sequential()
model.add(layers.Dense(16,activation='relu',input_shape=(10000,)))
model.add(layers.Dense(16,activation='relu'))
model.add(layers.Dense(1,activation='sigmoid'))

model.compile(optimizer=optimizers.RMSprop(lr=0.001),
             loss='binary_crossentropy',
             metrics=['accuracy'])

model.fit(x_train,y_train,epochs=8,batch_size=512)
results = model.evaluate(x_test,y_test)

Epoch 1/8
25000/25000 [==============================] - 1s 55us/step - loss: 0.4774 - acc: 0.8181
Epoch 2/8
25000/25000 [==============================] - 1s 50us/step - loss: 0.2760 - acc: 0.9049
Epoch 3/8
25000/25000 [==============================] - 1s 47us/step - loss: 0.2119 - acc: 0.9252
Epoch 4/8
25000/25000 [==============================] - 1s 45us/step - loss: 0.1768 - acc: 0.9371
Epoch 5/8
25000/25000 [==============================] - 1s 46us/step - loss: 0.1528 - acc: 0.9471
Epoch 6/8
25000/25000 [==============================] - 1s 46us/step - loss: 0.1351 - acc: 0.9538
Epoch 7/8
25000/25000 [==============================] - 1s 45us/step - loss: 0.1209 - acc: 0.9595
Epoch 8/8
25000/25000 [==============================] - 1s 46us/step - loss: 0.1062 - acc: 0.9647
25000/25000 [==============================] - 3s 108us/step
results
[0.36312115608215334, 0.87224]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Mr.Ma.01

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值