tensorflow2.x RNN进行情感分析

使用RNN(循环神经网络)对电影评论进行情感分析,结果为positive或negative,分别代表积极和消极的评论。至于为什么使用RNN而不是普通的前馈神经网络,是因为RNN能够存储序列单词信息,得到的结果更为准确。
使用的RNN模型架构如下

在这里插入图片描述

上代码:

import  os
import  tensorflow as tf
import  numpy as np
from tensorflow import keras


"""在这里我们将使用RNN(循环神经网络)对电影评论进行情感分析,结果为positive或negative,分别代表积极和消极的评论。
至于为什么使用RNN而不是普通的前馈神经网络,是因为RNN能够存储序列单词信息,得到的结果更为准确。这里我们将使用一个带有标签的影评数据集进行训练模型。"""


tf.random.set_seed(22)
np.random.seed(22)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
assert tf.__version__.startswith('2.')



# fix random seed for reproducibility
np.random.seed(7)
#加载数据集,但仅保留前10000个字,其余为零
top_words = 10000
# 最长填充序列
max_review_length = 80
(X_train, y_train), (X_test, y_test) = keras.datasets.imdb.load_data('F:/PycharmProjects/TensorFlow-2.0-Tutorials/09-RNN-Sentiment-Analysis/data/imdb.npz', num_words=top_words)
# X_train = tf.convert_to_tensor(X_train)
# y_train = tf.one_hot(y_train, depth=2)

#将序列填充到相同的长度:80。
x_train = keras.preprocessing.sequence.pad_sequences(X_train, maxlen=max_review_length)
x_test = keras.preprocessing.sequence.pad_sequences(X_test, maxlen=max_review_length)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)


class RNN(keras.Model):

    def __init__(self, units, num_classes, num_layers):
        super(RNN, self).__init__()


        # self.cells = [keras.layers.LSTMCell(units) for _ in range(num_layers)]
        #
        # self.rnn = keras.layers.RNN(self.cells, unroll=True)
        self.rnn = keras.layers.LSTM(units, return_sequences=True)
        self.rnn2 = keras.layers.LSTM(units)

        # self.cells = (keras.layers.LSTMCell(units) for _ in range(num_layers))
        # #
        # self.rnn = keras.layers.RNN(self.cells, return_sequences=True, return_state=True)
        # self.rnn = keras.layers.LSTM(units, unroll=True)
        # self.rnn = keras.layers.StackedRNNCells(self.cells)


        # have 1000 words totally, every word will be embedding into 100 length vector
        # the max sentence lenght is 80 words
        self.embedding = keras.layers.Embedding(top_words, 100, input_length=max_review_length)
        self.fc = keras.layers.Dense(1)

    def call(self, inputs, training=None, mask=None):

        print('x', inputs.shape)
        """首先,将单词传入embedding层,之所以使用嵌入层,是因为单词数量太多,使用嵌入式方式词向量来表示单词更有效率"""
        # [b, sentence len] => [b, sentence len, word embedding]
        x = self.embedding(inputs)
        """其次,通过embedding层,新的单词表示传入LSTM cells。这将是一个递归链接网络,所以单词的序列信息会在网络之间传递"""
        x = self.rnn(x) 
        x = self.rnn2(x) 
        # print('rnn', x.shape)
        """最后,LSTM cells连接一个sigmoid output layer。使用sigmoid可以预测该文本是积极的还是消极的情感"""
        x = self.fc(x)
        print(x.shape)

        return x


def main():

    units = 64
    num_classes = 2
    batch_size = 32
    epochs = 5

    model = RNN(units, num_classes, num_layers=2)


    model.compile(optimizer=keras.optimizers.Adam(0.001),
                  loss=keras.losses.BinaryCrossentropy(from_logits=True),
                  metrics=['accuracy'])
    """fit()用于使用给定输入训练模型."""
    model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs,
              validation_data=(x_test, y_test), verbose=1)

    """model.predict只返回y_pred"""
    out = model.predict(x_train)
    print("out:", out)
    """evaluate用于评估您训练的模型。它的输出是准确度或损失,而不是对输入数据的预测。"""
    scores = model.evaluate(x_test, y_test, batch_size, verbose=1)
    print("Final test loss and accuracy :", scores)




if __name__ == '__main__':
    main()
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
1. 导入必要的库 ```python import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt import tensorflow as tf from tensorflow import keras from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences from sklearn.model_selection import train_test_split ``` 2. 加载数据集 ```python df = pd.read_csv('data.csv') ``` 3. 数据预处理 ```python # 去除无用的列 df.drop(columns=['id', 'date', 'query', 'user'], inplace=True) # 重命名列 df.columns = ['sentiment', 'text'] # 将sentiment列中的0替换为负面情感,4替换为正面情感 df['sentiment'] = df['sentiment'].replace({0: 'negative', 4: 'positive'}) # 将sentiment列中的值转换为0或1,0表示负面情感,1表示正面情感 df['sentiment'] = df['sentiment'].replace({'negative': 0, 'positive': 1}) # 将数据集拆分为训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(df['text'], df['sentiment'], test_size=0.2, random_state=42) # 创建一个tokenizer对象,用于将文本转换为数字序列 tokenizer = Tokenizer(num_words=10000, oov_token='<OOV>') tokenizer.fit_on_texts(X_train) # 将训练集和测试集的文本数据转换为数字序列 X_train_seq = tokenizer.texts_to_sequences(X_train) X_test_seq = tokenizer.texts_to_sequences(X_test) # 对数字序列进行填充,使每个序列长度相同 max_len = 50 X_train_seq = pad_sequences(X_train_seq, maxlen=max_len, padding='post', truncating='post') X_test_seq = pad_sequences(X_test_seq, maxlen=max_len, padding='post', truncating='post') # 输出训练集和测试集的形状 print(X_train_seq.shape, y_train.shape) print(X_test_seq.shape, y_test.shape) ``` 4. 构建RNN模型 ```python model = keras.Sequential([ keras.layers.Embedding(input_dim=10000, output_dim=32, input_length=max_len), keras.layers.SimpleRNN(units=32, return_sequences=True), keras.layers.SimpleRNN(units=32), keras.layers.Dense(units=1, activation='sigmoid') ]) model.summary() ``` 5. 编译和训练模型 ```python model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) history = model.fit(X_train_seq, y_train, validation_split=0.2, epochs=5, batch_size=128) ``` 6. 评估模型 ```python # 绘制训练集和测试集的acc和loss曲线 plt.plot(history.history['accuracy'], label='train_acc') plt.plot(history.history['val_accuracy'], label='val_acc') plt.plot(history.history['loss'], label='train_loss') plt.plot(history.history['val_loss'], label='val_loss') plt.legend() plt.show() # 在测试集上评估模型 test_loss, test_acc = model.evaluate(X_test_seq, y_test) print('Test Accuracy:', test_acc) ``` 7. 预测结果 ```python # 对一段文本进行情感分类 text = "I hate this movie, it's so boring!" text_seq = tokenizer.texts_to_sequences([text]) text_seq = pad_sequences(text_seq, maxlen=max_len, padding='post', truncating='post') pred = model.predict(text_seq) sentiment = 'positive' if pred > 0.5 else 'negative' print('Text:', text) print('Sentiment:', sentiment) ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值