将卷积神经网络CNN应用到文本分类任务,利用多个不同size的kernel来提取句子中的关键信息(类似于多窗口大小的ngram),从而能够更好地捕捉局部相关性。
1 网络结构
Embedding: 将[batch_size, sentence_len]的句子转化成[batch_size, sentence_len, embedding_dim] 的embedding表示
Convolution: 用不同尺寸(3, 4, 5)的卷积核去提取特征,得到[batch_size, sentence_len, filter_num] 的中间特征
Max-pooling: 在句子长度的维度上进行maxpooling , 得到句子的定长表示[batch_size, filter_num]
FullConnection and Softmax: 全连接+softmax, 计算每个类别的概率
TextCNN 的优势
-
TextCNN最大优势网络结构简单 ,在模型网络结构如此简单的情况下,通过引入已经训练好的词向量依旧有很不错的效果,在多项数据数据集上超越benchmark。
-
网络结构简单导致参数数目少, 计算量少, 训练速度快,在单机单卡的v100机器上,训练165万数据, 迭代26万步,半个小时左右可以收敛。
2 TextCNN的实现
import tensorflow as tf
class TextCnn(object):
def __init__(self, graph_name, param, embedding=None):
self.use_multi_kernel = False
self.init_lookup_table = embedding
self.embedding_dim = param.get('embedding_dim', 300)
self.seq_length = param.get('seq_length', 50)
self.num_filters = param.get('num_filters', 128)
self.kernel_size = param.get('kernel_size', 3)
self.kernel_sizes = param.get('kernel_sizes', [2,3, 4])
self.vocab_size = param.get('vocab_size', 5000)
self.hidden_dim = param.get('hidden_dim', 128)
self.learning_rate = 1e-3
self.dropout_keep_prob = param.get('dropout_keep_prob', 0.5)
self.num_classes = param.get('n_classes')
self.use_language_model = param.get('use_language_model',False)
self.build_graph(graph_name)
def build_graph(self, graph_name):
with tf.name_scope(str(graph_name)):
self.input_x = tf.placeholder(tf.int32, [None, self.seq_length], name='input_x')
self.input_y = tf.placeholder(tf.float32, [None, self.num_classes], name='input_y')
self.keep_prob = tf.placeholder(tf.float32, name='keep_prob')
self.global_steps = tf.Variable(0, trainable=False)
self.lookup_table = tf.get_variable('embedding', [self.vocab_size, self.embedding_dim],
# initializer=tf.random_normal_initializer(mean=0.0,stddev=1.0),
trainable=True)
self.word_embeddings = tf.nn.embedding_lookup(self.lookup_table, self.input_x)
with tf.name_scope("cnn"):
if self.use_multi_kernel:
gmps = []
for i in range(len(self.kernel_sizes)):
kernel_size = self.kernel_sizes[i]
conv = tf.layers.conv1d(self.word_embeddings, self.num_filters, kernel_size, name='conv%s' % (i + 1))
gmp = tf.reduce_max(conv, reduction_indices=[1], name='gmp%s' % (i + 1))
gmps.append(gmp)
gmp = tf.concat(gmps, axis=1)
else:
conv = tf.layers.conv1d(self.word_embeddings, self.num_filters, self.kernel_size,
name='conv')
gmp = tf.reduce_max(conv, reduction_indices=[1], name='gmp')
with tf.name_scope("score"):
fc = tf.layers.dense(gmp, self.hidden_dim, name='fc1')
fc = tf.contrib.layers.dropout(fc, self.keep_prob)
fc = tf.nn.relu(fc)
self.logits = tf.layers.dense(fc, self.num_classes, name='fc2')
self.softmax = tf.nn.softmax(self.logits, name='softmax')
self.y_pred_cls = tf.argmax(self.softmax, 1, name='y_pred_cls')
with tf.name_scope("optimize"):
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=self.logits, labels=self.input_y)
self.loss = tf.reduce_mean(cross_entropy)
self.optim = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.loss,
global_step=self.global_steps)
with tf.name_scope("accuracy"):
correct_pred = tf.equal(tf.argmax(self.input_y, 1), self.y_pred_cls)
self.acc = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
def get_trainable_variables(self):
return tf.trainable_variables()