2.1 TextCNN - 二元情感分类 - 文本分类的卷积神经网络

最新推荐文章于 2025-02-25 16:14:49 发布

Meruz

最新推荐文章于 2025-02-25 16:14:49 发布

阅读量1.9k

点赞数 1

分类专栏： NLP 文章标签： NLP TextCNN

本文链接：https://blog.csdn.net/weixin_43002202/article/details/95173643

版权

NLP 专栏收录该内容

6 篇文章

订阅专栏

参考 Convolutional Neural Networks for Sentence Classification(2014)

将卷积神经网络CNN应用到文本分类任务，利用多个不同size的kernel来提取句子中的关键信息（类似于多窗口大小的ngram），从而能够更好地捕捉局部相关性。

TextCNN 是利用卷积神经网络对文本进行分类的算法，由 Yoon Kim 于2014年在 “Convolutional Neural Networks for Sentence Classification” 一文中提出的算法。其对于字的表示方式为：使用一个k维向量来表示在句子中的词。

文本的话他不能左右滑动，只能上下滑动，原因很简单，不能将一个单词分开来进行训练，如果非要这样的话，卷积之后获得的数据将会没有什么意义，所以过滤器在进行文本处理的时候只能上下滑动，下面来看一下这个图片。

结构详解

第一层

第一层是输入的7*5的词向量矩阵，词向量的维度为5，共7个单词。

第二层

第二层是卷积层，共有6个卷积核，尺寸为2×5、3*5、4×5，每个尺寸各2个，输入层分别与6个卷积核进行卷积操作，再使用激活函数激活，每个卷积核都得到了对应的feature maps。

第三层

第三层是池化层，使用1-max pooling提取出每个feature map的最大值，然后进行级联，得到6维的特征表示。

第四层

第四层是输出层，输出层使用softmax激活函数进行分类，在这层可以进行正则化操作（l2-regulariation）。

细节介绍

feature

这里的特征就是词向量，词向量有静态和非静态的，静态的可以使用pre-train的，非静态的则可以在训练过程中进行更新，一般推荐非静态的fine-tunning方式，即以pre-train的词向量进行初始化，然后在训练过程中进行调整，它能加速收敛。

channel

图像中可以利用 (R, G, B) 作为不同channel，而文本的输入的channel通常是不同方式的embedding方式（比如 word2vec或Glove），实践中也有利用静态词向量和fine-tunning词向量作为不同channel的做法。

conv-1d

在TextCNN中用的是一维卷积（conv-1d），一维卷积带来的问题是需要设计通过不同size的filter获取不同宽度的视野。

1-max pooling

在TextCNN中用的是1-max pooling，当然也可以使用(dynamic) k-max pooling，在pooling阶段保留 k 个最大值，保留全局信息。

参数设置

序列长度：一般设置为最大句子的长度
类别数量：预测的类别的数量
字典大小：即词汇数量
嵌入长度：即每个词表示的词向量长度，训练词向量可以使用
word2cec、fasttext、glove等工具
卷积核大小：对应n元语法的概念
卷积核个数：卷积核大小对应的卷积核个数

import tensorflow as tf
import numpy as np

tf.reset_default_graph()

# Text-CNN Parameter
embedding_size = 2 # n-gram
sequence_length = 3
num_classes = 2 # 0 or 1
filter_sizes = [2,2,2] # n-gram window
num_filters = 3

# 3 words sentences (=sequence_length is 3)
sentences = ["i love you","he loves me", "she likes baseball", "i hate you","sorry for that", "this is awful"]
labels = [1,1,1,0,0,0] # 1 is good, 0 is not good.

word_list = " ".join(sentences).split()
word_list = list(set(word_list))
word_dict = {w: i for i, w in enumerate(word_list)}
vocab_size = len(word_dict)

inputs = []
for sen in sentences:
    inputs.append(np.asarray([word_dict[n] for n in sen.split()]))

outputs = []
for out in labels:
    outputs.append(np.eye(num_classes)[out]) # ONE-HOT : To using Tensor Softmax Loss function

# Model
X = tf.placeholder(tf.int32, [None, sequence_length])
Y = tf.placeholder(tf.int32, [None, num_classes])

W = tf.Variable(tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0))
embedded_chars = tf.nn.embedding_lookup(W, X) # [batch_size, sequence_length, embedding_size]
embedded_chars = tf.expand_dims(embedded_chars, -1) # add channel(=1) [batch_size, sequence_length, embedding_size, 1]

pooled_outputs = []
for i, filter_size in enumerate(filter_sizes):
    filter_shape = [filter_size, embedding_size, 1, num_filters]
    W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1))
    b = tf.Variable(tf.constant(0.1, shape=[num_filters]))

    conv = tf.nn.conv2d(embedded_chars, # [batch_size, sequence_length, embedding_size, 1]
                        W,              # [filter_size(n-gram window), embedding_size, 1, num_filters(=3)]
                        strides=[1, 1, 1, 1],
                        padding='VALID')
    h = tf.nn.relu(tf.nn.bias_add(conv, b))
    pooled = tf.nn.max_pool(h,
                            ksize=[1, sequence_length - filter_size + 1, 1, 1], # [batch_size, filter_height, filter_width, channel]
                            strides=[1, 1, 1, 1],
                            padding='VALID')
    pooled_outputs.append(pooled) # dim of pooled : [batch_size(=6), output_height(=1), output_width(=1), channel(=1)]

num_filters_total = num_filters * len(filter_sizes)
h_pool = tf.concat(pooled_outputs, num_filters) # h_pool : [batch_size(=6), output_height(=1), output_width(=1), channel(=1) * 3]
h_pool_flat = tf.reshape(h_pool, [-1, num_filters_total]) # [batch_size, ]

# Model-Training
Weight = tf.get_variable('W', shape=[num_filters_total, num_classes], 
                    initializer=tf.contrib.layers.xavier_initializer())
Bias = tf.Variable(tf.constant(0.1, shape=[num_classes]))
model = tf.nn.xw_plus_b(h_pool_flat, Weight, Bias)  
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=model, labels=Y))
optimizer = tf.train.AdamOptimizer(0.001).minimize(cost)

# Model-Predict
hypothesis = tf.nn.softmax(model)
predictions = tf.argmax(hypothesis, 1)
# Training
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

for epoch in range(5000):
    _, loss = sess.run([optimizer, cost], feed_dict={X: inputs, Y: outputs})
    if (epoch + 1)%1000 == 0:
        print('Epoch:', '%06d' % (epoch + 1), 'cost =', '{:.6f}'.format(loss))

# Test
# test_text = 'sorry hate you'
test_text = 'she loves you'
tests = []
tests.append(np.asarray([word_dict[n] for n in test_text.split()]))

predict = sess.run([predictions], feed_dict={X: tests})
result = predict[0][0]
if result == 0:
    print(test_text,"is Bad Mean...")
else:
    print(test_text,"is Good Mean!!")