二分类-IMDB-全连接
IMDB数据集:包含来自互联网电影数据库的50000条严重两极分化的评论。
其中的数据已经预处理过了:评论(单词序列)已经被转换为整数序列。其中每个整数代表字典中的某个单词。
加载IMDB数据集
from keras.datasets import imdb
(train_data,train_labels),(test_data,test_labels) = imdb.load_data(num_words=10000)#num_words=10000:仅保留训练数据中前10000个最常出现的单词
Using TensorFlow backend.
数据格式如下:
train_data[0]
[1,14,22,16,........,19,178,32]
train_labels[0]#0代表负面,1代表正面
1
将整数序列编码为二进制矩阵
import numpy as np
def vectorize_sequences(sequences,dimension=10000):
results = np.zeros((len(sequences),dimension))
for i,sequences in enumerate(sequences):
results[i,sequences] = 1.
return results
x_train = vectorize_sequences(train_data)#将数据向量化
x_test = vectorize_sequences(test_data)
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')
数据格式如下:
x_train[0]
array([0., 1., 1., ..., 0., 0., 0.])
y_train[0]
1.0
模型定义
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(16,activation='relu',input_shape=(10000,)))
model.add(layers.Dense(16,activation='relu'))
model.add(layers.Dense(1,activation='sigmoid'))
编译模型
from keras import optimizers
model.compile(optimizer=optimizers.RMSprop(lr=0.001),#等价:optimizer='rmsprop'
loss='binary_crossentropy',#二元交叉熵 等价 loss=loss.binary_crossentropy
metrics=['accuracy'])
留出验证集
x_val = x_train[:5000]#验证集
partial_x = x_train[20000:]#喂入的10000数据
y_val = y_train[:5000]
partial_y = y_train[20000:]
训练模型
512个样本组成一个小批量,模型训练20轮,查看损失和精度
history = model.fit(partial_x,
partial_y,
epochs=20,
batch_size=512,
validation_data=(x_val,y_val))
Train on 5000 samples, validate on 5000 samples
Epoch 1/20
5000/5000 [==============================] - 1s 160us/step - loss: 0.6254 - acc: 0.6744 - val_loss: 0.5463 - val_acc: 0.7770
Epoch 2/20
5000/5000 [==============================] - 1s 127us/step - loss: 0.4486 - acc: 0.8744 - val_loss: 0.4641 - val_acc: 0.8222
Epoch 3/20
5000/5000 [==============================] - 1s 101us/step - loss: 0.3429 - acc: 0.9172 - val_loss: 0.3895 - val_acc: 0.8602
Epoch 4/20
5000/5000 [==============================] - 1s 126us/step - loss: 0.2680 - acc: 0.9366 - val_loss: 0.4053 - val_acc: 0.8276
Epoch 5/20
5000/5000 [==============================] - 1s 104us/step - loss: 0.2055 - acc: 0.9612 - val_loss: 0.3356 - val_acc: 0.8658
Epoch 6/20
5000/5000 [==============================] - 1s 127us/step - loss: 0.1692 - acc: 0.9678 - val_loss: 0.3563 - val_acc: 0.8516
Epoch 7/20
5000/5000 [==============================] - 0s 95us/step - loss: 0.1319 - acc: 0.9784 - val_loss: 0.3378 - val_acc: 0.8592
Epoch 8/20
5000/5000 [==============================] - 1s 117us/step - loss: 0.1030 - acc: 0.9878 - val_loss: 0.3214 - val_acc: 0.8666
Epoch 9/20
5000/5000 [==============================] - 0s 95us/step - loss: 0.0845 - acc: 0.9896 - val_loss: 0.3687 - val_acc: 0.8542
Epoch 10/20
5000/5000 [==============================] - 1s 127us/step - loss: 0.0640 - acc: 0.9946 - val_loss: 0.3287 - val_acc: 0.8668
Epoch 11/20
5000/5000 [==============================] - 0s 98us/step - loss: 0.0484 - acc: 0.9970 - val_loss: 0.3365 - val_acc: 0.8672
Epoch 12/20
5000/5000 [==============================] - 1s 122us/step - loss: 0.0402 - acc: 0.9974 - val_loss: 0.3928 - val_acc: 0.8542
Epoch 13/20
5000/5000 [==============================] - 1s 101us/step - loss: 0.0300 - acc: 0.9986 - val_loss: 0.3823 - val_acc: 0.8550
Epoch 14/20
5000/5000 [==============================] - 1s 121us/step - loss: 0.0214 - acc: 0.9992 - val_loss: 0.3926 - val_acc: 0.8572
Epoch 15/20
5000/5000 [==============================] - 0s 93us/step - loss: 0.0161 - acc: 1.0000 - val_loss: 0.3999 - val_acc: 0.8600
Epoch 16/20
5000/5000 [==============================] - 1s 121us/step - loss: 0.0132 - acc: 1.0000 - val_loss: 0.5017 - val_acc: 0.8406
Epoch 17/20
5000/5000 [==============================] - 0s 95us/step - loss: 0.0096 - acc: 1.0000 - val_loss: 0.4391 - val_acc: 0.8570
Epoch 18/20
5000/5000 [==============================] - 1s 124us/step - loss: 0.0064 - acc: 1.0000 - val_loss: 0.4587 - val_acc: 0.8570
Epoch 19/20
5000/5000 [==============================] - 0s 92us/step - loss: 0.0047 - acc: 1.0000 - val_loss: 0.4774 - val_acc: 0.8584
Epoch 20/20
5000/5000 [==============================] - 1s 123us/step - loss: 0.0040 - acc: 1.0000 - val_loss: 0.6021 - val_acc: 0.8436
结果可视化
import matplotlib.pyplot as plt
%matplotlib inline
loss_values = history.history['loss']
val_loss_values = history.history['val_loss']
epochs = range(1,len(loss_values)+1)
plt.plot(epochs,loss_values,'bo',label = 'Training loss')
plt.plot(epochs,val_loss_values,'b',label = 'Validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()#显示标签
plt.show()
plt.clf()#清空图像
acc = history.history['acc']
val_acc = history.history['val_acc']
plt.plot(epochs,acc,'bo',label = 'Training acc')
plt.plot(epochs,val_acc,'b',label = 'Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()#显示标签
plt.show()
第八轮的时候验证损失和验证精度趋于最佳
通过分析,使用最优的迭代次数重新查看效果
model = models.Sequential()
model.add(layers.Dense(16,activation='relu',input_shape=(10000,)))
model.add(layers.Dense(16,activation='relu'))
model.add(layers.Dense(1,activation='sigmoid'))
model.compile(optimizer=optimizers.RMSprop(lr=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(x_train,y_train,epochs=8,batch_size=512)
results = model.evaluate(x_test,y_test)
Epoch 1/8
25000/25000 [==============================] - 1s 55us/step - loss: 0.4774 - acc: 0.8181
Epoch 2/8
25000/25000 [==============================] - 1s 50us/step - loss: 0.2760 - acc: 0.9049
Epoch 3/8
25000/25000 [==============================] - 1s 47us/step - loss: 0.2119 - acc: 0.9252
Epoch 4/8
25000/25000 [==============================] - 1s 45us/step - loss: 0.1768 - acc: 0.9371
Epoch 5/8
25000/25000 [==============================] - 1s 46us/step - loss: 0.1528 - acc: 0.9471
Epoch 6/8
25000/25000 [==============================] - 1s 46us/step - loss: 0.1351 - acc: 0.9538
Epoch 7/8
25000/25000 [==============================] - 1s 45us/step - loss: 0.1209 - acc: 0.9595
Epoch 8/8
25000/25000 [==============================] - 1s 46us/step - loss: 0.1062 - acc: 0.9647
25000/25000 [==============================] - 3s 108us/step
results
[0.36312115608215334, 0.87224]