【python 走进NLP】keras情感分析例子

情感分析是自然语言处理很重要的一个方向,目的是让计算机理解文本中包含的情感分析。在这里将通过IMDB收集的对电影评论的数据集,分析某部电影是一部好电影还是一部不好的电影。借此研究情感分析的问题。

1、在这里直接使用keras的imdb.load_data() 函数导入数据。

2、keras通过嵌入层(Embeding)将单词的正整数表示转换为词嵌入。嵌入层需要指定词汇大小预期的最大数量,以及输出的每个词向量的维度。


# -*- coding: utf-8 -*-
from keras.datasets import imdb
import numpy as np
from keras.preprocessing import sequence
from keras.layers.embeddings import Embedding
from keras.layers.convolutional import Conv1D, MaxPooling1D
from keras.layers import Dense, Flatten
from keras.models import Sequential

seed = 7
top_words = 5000
max_words = 500
out_dimension = 32
batch_size = 128
epochs = 10

def create_model():
    model = Sequential()
    # 构建嵌入层
    model.add(Embedding(top_words, out_dimension, input_length=max_words))
    # 1维度卷积层
    model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Flatten())
    model.add(Dense(250, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.summary()
    return model

if __name__ == '__main__':
    np.random.seed(seed=seed)
    # 导入数据
    (x_train, y_train), (x_validation, y_validation) = imdb.load_data(num_words=top_words)
    # 限定数据集的长度
    x_train = sequence.pad_sequences(x_train, maxlen=max_words)
    x_validation = sequence.pad_sequences(x_validation, maxlen=max_words)

    # 生成模型
    model = create_model()
    model.fit(x_train, y_train, validation_data=(x_validation, y_validation),
              batch_size=batch_size, epochs=epochs, verbose=2)

运行结果:

Using TensorFlow backend.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, 500, 32)           160000    
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 500, 32)           3104      
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 250, 32)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 8000)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 250)               2000250   
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 251       
=================================================================
Total params: 2,163,605
Trainable params: 2,163,605
Non-trainable params: 0
_________________________________________________________________
Train on 25000 samples, validate on 25000 samples
Epoch 1/200
 - 31s - loss: 0.4808 - acc: 0.7374 - val_loss: 0.2800 - val_acc: 0.8843
Epoch 2/200
 - 31s - loss: 0.2234 - acc: 0.9118 - val_loss: 0.2727 - val_acc: 0.8858
Epoch 3/200
 - 33s - loss: 0.1737 - acc: 0.9339 - val_loss: 0.2918 - val_acc: 0.8807
Epoch 4/200
 - 33s - loss: 0.1293 - acc: 0.9540 - val_loss: 0.3168 - val_acc: 0.8777
Epoch 5/200
 - 35s - loss: 0.0841 - acc: 0.9744 - val_loss: 0.3721 - val_acc: 0.8751
Epoch 6/200
 - 33s - loss: 0.0450 - acc: 0.9904 - val_loss: 0.4340 - val_acc: 0.8730
Epoch 7/200
 - 32s - loss: 0.0212 - acc: 0.9966 - val_loss: 0.5029 - val_acc: 0.8703
Epoch 8/200
 - 31s - loss: 0.0085 - acc: 0.9993 - val_loss: 0.5897 - val_acc: 0.8688
Epoch 9/200
 - 31s - loss: 0.0027 - acc: 0.9998 - val_loss: 0.6597 - val_acc: 0.8694
Epoch 10/200
 - 31s - loss: 0.0013 - acc: 0.9999 - val_loss: 0.7108 - val_acc: 0.8697
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

东华果汁哥

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值