tensorflow2.3实现电影评论数据分类

IMDB数据集是Keras内部集成的,初次导入需要下载一下,之后就可以直接用了。

IMDB数据集包含来自互联网的50000条严重两极分化的评论,该数据被分为用于训练的25000条评论和用于测试的25000条评论,训练集和测试集都包含50%的正面评价和50%的负面评价。该数据集已经经过预处理:评论(单词序列)已经被转换为整数序列,其中每个整数代表字典中的某个单词。

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt

查看tensorflow的版本

print('Tensorflow version: {}'.format(tf.__version__))
Tensorflow version: 2.3

加载数据地址

data = keras.datasets.imdb

数据预处理;设置单词最大长度

max_word = 10000

用load_data读取数据

(x_train, y_train), (x_test, y_test) = data.load_data(num_words=max_word)

num_words = 10000的意思是仅保留训练数据的前10000个最常见出现的单词,低频单词将被舍弃。这样得到的向量数据不会太大,便于处理。

x_train.shape, y_train.shape
(25000,) (25000,)
x_train[0]
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]

把每条文本的特征值长度处理成300少的填充,多的剔除

x_train = keras.preprocessing.sequence.pad_sequences(x_train, 300)
x_test = keras.preprocessing.sequence.pad_sequences(x_test, 300)

建立模型

model = keras.models.Sequential()
model.add(layers.Embedding(10000, 50, input_length=300))
# model.add(layers.Flatten())
model.add(layers.GlobalAveragePooling1D())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))

模型编译训练
需要选择损失函数和优化器,由于面对的是一个二分类问题,网络输出是一个概率值,那么最好使用binary_crossentropy(二元交叉熵)。对于输出概率值的模型,交叉熵(crossentropy)往往是最好的选择。

model.summary()
model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.001), loss='binary_crossentropy', metrics=['acc'])
history = model.fit(x_train, y_train, epochs=15, batch_size=256, validation_data=(x_test, y_test))
plt.plot(history.epoch, history.history.get('acc'), label='acc')
plt.plot(history.epoch, history.history.get('val_acc'), label='val_acc')
plt.legend()
plt.show()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (None, 300, 50)           500000    
_________________________________________________________________
global_average_pooling1d (Gl (None, 50)                0         
_________________________________________________________________
dense (Dense)                (None, 128)               6528      
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 129       
=================================================================
Total params: 506,657
Trainable params: 506,657
Non-trainable params: 0
_________________________________________________________________
Epoch 1/15
98/98 [==============================] - 4s 26ms/step - loss: 0.6886 - acc: 0.5772 - val_loss: 0.6055 - val_acc: 0.7828
Epoch 2/15
98/98 [==============================] - 2s 22ms/step - loss: 0.5216 - acc: 0.8064 - val_loss: 0.3599 - val_acc: 0.8545
Epoch 3/15
98/98 [==============================] - 2s 20ms/step - loss: 0.3041 - acc: 0.8826 - val_loss: 0.3047 - val_acc: 0.8738
Epoch 4/15
98/98 [==============================] - 2s 21ms/step - loss: 0.2397 - acc: 0.9116 - val_loss: 0.2845 - val_acc: 0.8842
Epoch 5/15
98/98 [==============================] - 2s 21ms/step - loss: 0.2011 - acc: 0.9271 - val_loss: 0.2807 - val_acc: 0.8847
Epoch 6/15
98/98 [==============================] - 2s 21ms/step - loss: 0.1780 - acc: 0.9349 - val_loss: 0.2849 - val_acc: 0.8840
Epoch 7/15
98/98 [==============================] - 2s 20ms/step - loss: 0.1552 - acc: 0.9477 - val_loss: 0.2961 - val_acc: 0.8821
Epoch 8/15
98/98 [==============================] - 2s 20ms/step - loss: 0.1400 - acc: 0.9523 - val_loss: 0.3057 - val_acc: 0.8803
Epoch 9/15
98/98 [==============================] - 2s 21ms/step - loss: 0.1258 - acc: 0.9591 - val_loss: 0.3235 - val_acc: 0.8764
Epoch 10/15
98/98 [==============================] - 2s 20ms/step - loss: 0.1172 - acc: 0.9631 - val_loss: 0.3445 - val_acc: 0.8727
Epoch 11/15
98/98 [==============================] - 2s 21ms/step - loss: 0.1050 - acc: 0.9667 - val_loss: 0.3538 - val_acc: 0.8736
Epoch 12/15
98/98 [==============================] - 2s 20ms/step - loss: 0.0961 - acc: 0.9728 - val_loss: 0.3691 - val_acc: 0.8704
Epoch 13/15
98/98 [==============================] - 2s 20ms/step - loss: 0.0887 - acc: 0.9754 - val_loss: 0.3950 - val_acc: 0.8670
Epoch 14/15
98/98 [==============================] - 2s 21ms/step - loss: 0.0820 - acc: 0.9787 - val_loss: 0.4239 - val_acc: 0.8628
Epoch 15/15
98/98 [==============================] - 2s 20ms/step - loss: 0.0743 - acc: 0.9799 - val_loss: 0.4447 - val_acc: 0.8612

在这里插入图片描述

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值