0. 背景:
- TextCNN对文本浅层特征的抽取能力很强,在短文本领域如搜索、对话领域专注于意图分类时效果很好,应用广泛,且速度快,一般是首选;对长文本领域,TextCNN主要靠filter窗口抽取特征,在长距离建模方面能力受限,且对语序不敏感
paper:Convolutional Neural Networks for Sentence Classification
paper: A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification
github:https://github.com/alexander-rakhlin/CNN-for-Sentence-Classification-in-Keras
word2vector: https://code.google.com/p/word2vec/
- 实验测评: 语义分析, 问题分类
1. 模型结构介绍
- 基于word2vector训练的词向量,添加cnn层。
- 模型结构:
f是一个非线性函数,例如,hyperbolic tangent.
(1)基于filter 过滤器函数,固定窗口大小滑动,生成feature map c。
(2)对feature map 进行maximum, 得到当前过滤器生成的特征,思想是获取最重要的特征。
(3)该模型采用多个不同大小的filter函数。
(4)filter得到的特征传入全链接层。
(5) two ‘channels’ of word vector。 即模型输入包含静态特征向量,以及fine-tuned后的特征向量
(6)每个filter都会在这两个channels进行操作,并将最终结果进行相加。
(7)引入伯努利随机变量对向量进行mask,反向梯度只更新没有mask的神经元。测试时候,对w进行缩放p^ = p* w, 对未出现的语句进行评分。
模型效果测评
- 数据集
收藏数据集,后期使用
MR : Moviereviewswithonesentenceperre- view. Classification involves detecting positive/negative reviews (Pang and Lee, 2005) , @3
SST-1: Stanford Sentiment Treebank—an extension of MR but with train/dev/test splits provided and fine-grained labels (very pos- itive, positive, neutral, negative, very nega-
tive), re-labeled by Socher et al. (2013) @4
SST-2: Same as SST-1 but with neutral re- views removed and binary labels.
Subj: Subjectivity dataset where the task is to classify a sentence as being subjective or objective (Pang and Lee, 2004).
TREC: TREC question dataset—task in- volves classifying a question into 6 question types (whether the question is about person, location, numeric information, etc.) @5
CR: Customer reviews of various products (cameras, MP3s etc.). Task is to predict positive/negative reviews (Hu and Liu, 2004) @6
MPQA: Opinion polarity detection subtask of the MPQA dataset (Wiebe et al., 2005). @7
5 https://www.cs.cornell.edu/people/pabo/movie-review-data/
4 http://nlp.stanford.edu/sentiment/ Data is actually provided at the phrase-level and hence we train the model on both phrases and sentences but only score on sentences at test time, as in Socher et al. (2013), Kalchbrenner et al. (2014), and Le and Mikolov (2014). Thus the training set is an order
of magnitude larger than listed in table 1.
5 http://cogcomp.cs.illinois.edu/Data/QA/QC/
6http://www.cs.uic.edu/∼liub/FBS/sentiment-analysis.html
7http://www.cs.pitt.edu/mpqa/
-
超参数
filter windows (h) of 3, 4, 5 with 100 feature maps each
dropout rate § of 0.5
l2 constraint (s) of 3
mini-batch size of 50 -
预训练向量
google news word2vector向量
如果不在这个词库中的词,则进行向量随机初始化。这里向量初始化方差与worde2vector的向量方差一样 -
模型效果
*静态向量和动态向量差异