句子分类

学习句子分类,使用深度学习的方法对句子数据集进行分类。

问题

句子分类(Sentence Classification)是指给定一个句子,标注预先设定的若干类别中的一个类别。

句子分类包括情感分析(Sentiment Analysis)、问题分类(Question
Classification)等任务。情感分析又称倾向性分析、意见抽取(Opinion extraction)、意见挖掘(Opinion mining)、情感挖掘(Sentiment mining)、主观分析(Subjectivity analysis),它是对带有情感色彩的主观性文本进行分析、处理、归纳和推理的过程,如从评论文本中分析用户对“数码相机”的“变焦、价格、大小、重量、闪光、易用性”等属性的情感倾向。

应用

了解对电影、商品、Twitter 等的褒贬评价,以此来改善产品和服务、发现竞争对手的优劣势、预测股票走势等。

数据集

DataclN|V||V_pre|Test
MR220106621876516448CV
SST-15181185517836162622210
SST-2219961316185148381821
Subj223100002132317913CV
TREC610595295929125500
CR219377553405046CV
MPQA231060662466083CV

- MR: Movie reviews 电影评论,每条评论包含一个句子。1

  • SST-1: Stanford Sentiment Treebank,MR 的扩展但划分了 train/dev/test 集合并提供 5 个细粒度标签(非常积极的,积极的,中性的,负面的,非常消极的)。

  • SST-2: 与 SST-1 一样但移除中性评论并用二进制标签。2

  • Subj: Subjectivity 主观性数据集,任务是将句子分类为主观或客观的。3

  • TREC: TREC question dataset TREC 问题数据集,任务是将一个问题分成 6 类(关于人、位置、数字信息等)。4

  • CR: Customer reviews 各种产品的客户评论,任务是预测正面/负面评论。5

  • MPQA: MPQA 数据集意见极性检测任务。6

方法

通常会把任务拆分成几个子任务:

  1. 分词

    把句子根据意思分成多个词,有时可能还需要去掉停用词、了解词性、转换成词向量等操作。

  2. 提取特征

    有时我们不会直接使用分词后的多个词来直接分类,这时需要提取特征来方便分类。

    常用特征:TF-IDF、LDA、LSI

  3. 构建分类器

    输入特征或词向量等,通过一些模型,对该句子进行分类。

Naive Bayes

NBSVM: Naive Bayes SVM

MNB: Multinomial Naive Bayes 7

combine-skip

combine-skip + NB 8

ModelMRSST-1SST-2SubjTRECCRMPQA
NBSVM79.4--93.2-81.886.3
MNB79.0--93.6-80.086.3
combine-skip76.5--93.692.280.187.1
combine-skip+NB80.4--93.6-81.387.5

RNN

RCNN: Recurrent Convolutional Neural Networks 9

S-LSTM: Long Short-Term Memory Over Recursive Structures 10

LSTM: Long Short-Term Memory

BLSTM: Bidirectional Long Short-Term Memory

Tree-LSTM: Tree-structured Long Short-Term Memory 11

LSTMN: Long Short-Term Memory-Network 12

Multi-Task: Recurrent Neural Network for Text Classification with Multi-Task Learning 13

BLSTM-Att: Bidirectional Long Short-Term Memory, attention-based model

BLSTM-2DPooling: Bidirectional Long Short-Term Memory Networks with Two-Dimensional Max Pooling

BLSTM-2DCNN: Bidirectional Long Short-Term Memory Networks with 2D convolution 14

ModelMRSST-1SST-2SubjTRECCRMPQA
RCNN-47.21-----
S-LSTM--81.9----
LSTM-46.484.9----
BLSTM-49.187.5----
Tree-LSTM-51.088.0----
LSTMN-49.387.3----
Multi-Task-49.687.994.1---
BLSTM80.049.187.692.193.0--
BLSTM-Att81.049.888.293.593.8--
BLSTM-2DPooling81.550.588.393.794.8--
BLSTM-2DCNN82.352.489.594.096.1--

CNN

DCNN: Dynamic Convolutional Neural Network 15

CNN-non-static: Convolutional Neural Networks, the pretrained vectors are fine-tuned for each task

CNN-multichannel: Convolutional Neural Networks with two sets of word vectors 16

TBCNN: Tree-based Convolutional Neural Network 17

Molding-CNN: Molding Convolutional Neural Networks 18

CNN-Ana: Non-static GloVe+word2vec CNN 19

MVCNN: Multichannel Variable-Size Convolution 20

DSCNN: Dependency Sensitive Convolutional Neural Networks 21

ModelMRSST-1SST-2SubjTRECCRMPQA
DCNN-48.586.8-93.0--
CNN-non-static81.548.087.293.493.684.389.5
CNN-multichannel81.147.488.193.292.285.089.4
TBCNN-51.487.9-96.0--
Molding-CNN-51.288.6----
CNN-Ana81.0245.9885.4593.6691.3784.6589.55
MVCNN-49.689.4----
DSCNN81.549.789.193.295.4--

Others

RAE: Recursive Autoencoders with pre-trained word vectors from Wikipedia 22

AdaSent: self-adaptive hierarchical sentence model 23

RNTN: Recursive Neural Tensor Network 24

DRNN: Deep Recursive Neural Networks 25

ModelMRSST-1SST-2SubjTRECCRMPQA
RAE77.743.282.4---86.4
AdaSent83.1--95.592.486.393.3
RNTN-45.785.4----
DRNN-49.886.6----

参考


  1. (ACL 2005) Seeing Stars: Exploiting Class Relationships For Sentiment Categorization With Respect To Rating Scales https://www.cs.cornell.edu/people/pabo/movie-review-data/
  2. (EMNLP 2013) Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank https://nlp.stanford.edu/sentiment/
  3. (ACL 2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts http://www.cs.cornell.edu/people/pabo/movie-review-data
  4. (ACL 2002) Learning Question Classifiers http://cogcomp.org/Data/QA/QC/
  5. (SIGKDD 2004) Mining and Summarizing Customer Reviews http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
  6. (Language Resources and Evaluation 2005) Annotating Expressions Of Opinions And Emotions In Language http://mpqa.cs.pitt.edu/
  7. (ACL 2012) Baselines and Bigrams: Simple, Good Sentiment and Topic Classification
  8. (NIPS 2015) Skip-Thought Vectors
  9. (AAAI 2015) Recurrent Convolutional Neural Networks for Text Classification
  10. (ICML 2015) Long Short-Term Memory Over Recursive Structures
  11. (ACL 2015) Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
  12. (EMNLP2016) Long Short-Term Memory-Networks for Machine Reading
  13. (IJCAI 2016) Recurrent Neural Network for Text Classification with Multi-Task Learning
  14. (COLING 2016) Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling
  15. (ACL 2014) A Convolutional Neural Network for Modelling Sentences
  16. (EMNLP 2014) Convolutional Neural Networks for Sentence Classification
  17. (EMNLP 2015) Discriminative Neural Sentence Modeling by Tree-Based Convolution
  18. (EMNLP 2015) Molding CNNs for text: non-linear, non-consecutive convolutions
  19. (IJCNLP 2017) A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification
  20. (CoNLL 2015) Multichannel Variable-Size Convolution for Sentence Classification
  21. (NAACL 2016) Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents
  22. (EMNLP 2011) Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions
  23. (IJCAI 2015) Self-adaptive hierarchical sentence model
  24. (EMNLP 2013) Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
  25. (NIPS 2014) Deep Recursive Neural Networks for Compositionality in Language
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值