关于SemEval2016 Task4 Sentiment Analysis的分析

最新推荐文章于 2025-03-19 16:20:21 发布

仰望辉煌0530

最新推荐文章于 2025-03-19 16:20:21 发布

阅读量2.5k

点赞数

分类专栏：情感分析文章标签：情感

情感分析专栏收录该内容

1 篇文章

订阅专栏

SemEval2016的任务4聚焦于情感分析，包括消息极性分类、二分尺度推文分类、五分尺度推文分类、二分尺度推文量化和五分尺度推文量化。各子任务中，多数顶级团队采用了深度学习技术，如LSTM、CNN和词嵌入。在量化任务中，部分团队专门针对问题的量化特性进行了系统调整。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

SemEval2016 Sentiment Anaylisis

Introduction

区别概念

Ordinal Classification
对于此处的创新是将二分类问题变为三分类或者五分类问题
Quantification
这里面的应用我们不限于对一个特定人的帖子进行情感分析，而是对于一个特定的话题来说，进行情感的分析

Task Definition（分问题的定义）

Subtask A: Given a tweet, predict whether it is ofpositive, negative, or neutral sentiment.
Subtask B: Given a tweet known to be about a given topic, predict whether it conveys a positive or a negative sentiment towards the topic.
Subtask C: Given a tweet known to be about a given topic, estimate the sentiment it conveys towards the topic on a five-point scale ranging from HIGHLYNEGATIVE to HIGHLYPOSITIVE.
Subtask D: Given a set of tweets known to be abouta given topic, estimate the distribution of the tweets in the POSITIVE andNEGATIVE classes.
Subtask E: Given a set of tweets known to be about a given topic, estimate the distribution of the tweets across the five classes of a fivepoint scale, ranging from HIGHLYNEGATIVE to HIGHLYPOSITIVE

Evaluation Measures

Participants and Results

Subtask A (34 teams)
Subtask B (19 teams)
Subtask C (11 teams)
Subtask D (14 teams)
Subtask E (10 teams)

Subtask A: Message polarity classification

The top-scoring team (SwissCheese1) used an ensemble of convolutional neural networks,differing in their choice of filter shapes, pooling shapes and usage of hidden layers. Word embeddings generated via word2vec were also used, and the neural networks were trained by using distant supervision.
Out of the 10 top-ranked teams, 5teams (SwissCheese1, SENSEI-LIF2, UNIMELB3,INESC-ID4, INSIGHT-18) used deep NNs of somesort, and 7 teams (SwissCheese1, SENSEI-LIF2,UNIMELB3, INESC-ID4, aueb.twitter.sentiment5,I2RNTU7, INSIGHT-18) used either general purpose ortask-specific word embeddings,generated viaword2vec or GloVe.

Subtask B: Tweet classification according to a two-point scale

The top-scoring team (Tweester1) used a combination of convolutional neural networks, topic modeling, and word embeddingsgenerated viaword2vec. Similar to Subtask A, the main trend among all participants is the widespread use of deep learning techniques.
Out of the 10 top-ranked participating teams, 5 teams (Tweester1, LYS2, INSIGHT15, UNIMELB7, Finki10) used convolutional neural networks; 3 teams (thecerealkiller3, UNIMELB7, Finki10) submitted systems using recurrent neural networks; and 7 teams (Tweester1, LYS2, INSIGHT-15, UNIMELB7, Finki10) incorporated in their participating systems eithergeneral-purposeortask-specific word embeddings (generated via toolkits such as GloVe or word2vec).

Subtask C: Tweet classification according to a five-point scale

The top-scoring team (TwiSE1) used a singlelabel multi-class classifier to classify the tweets according to their overall polarity. In particular, they used logistic regression that minimizes the multinomial loss across the classes, with weights to cope with class imbalance. Note that they ignored the given topics altogether.
Only 2 of the 11 participating teams tuned their systems toexploit the ordinal (as opposed to binary, or single-label multi-class) nature of this subtask. The two teams who did exploit the ordinal nature of the problem are PUT3, which uses an ensemble of ordinal regression approaches, and ISTI-CNR7, which uses a tree-based approach to ordinal regression. All other teams used general-purpose approaches for single-label multi-class classification, in many cases relying (as for Subtask B) onconvolutional neural networks,recurrent neural networks, and word embeddings

Subtask D: Tweet quantification according to a two-point scale

The top-scoring team (Finki1) adopts an approach based on“classify and count”,a classification oriented (instead of quantification-oriented) approach, usingrecurrent and convolutional neural networks, andGloVe word embeddings.
Indeed, only5 of the 14participating teams tuned their systems to the fact that it deals with quantification (as opposed to classification). Among the teams who do rely on quantification-oriented approaches, teams LYS2andHSENN14 used an existing structured prediction method that directly optimizesKLD; teams QCRI5 andISTI-CNR11 use existing probabilistic quantification methods; team NRU-HSE7uses an existingiterative quantification method based oncost-sensitive learning. Interestingly, team TwiSE2 uses a “classify and count” approach after comparing it with a quantification oriented method (similar to the one used by teams LYS2 and HSENN14) on the development set, and concluding that the former works better than the latter.
All other teams used“classify and count” approaches, mostly based on convolutional neural networks and word embeddings

Subtask E: Tweet quantification according to a five-point scale

Only 3 of the 10 participants tuned their systems to the specific characteristics of this subtask, i.e., to the fact that it deals withquantification (as opposed to classification) and to the fact that it has an ordinal (as opposed to binary) nature.
The top-scoring team (QCRI1) used anovel algorithm explicitly designed for ordinal quantification, that leverages an ordinal hierarchy of binary probabilistic quantifiers.
Team NRU-HSE4 uses anexisting quantification approach based on cost-sensitive learning, and adapted it to the ordinal case.
Team ISTI-CNR6 instead useda novel adaptation to quantification of a tree-based approach to ordinal regression.
Teams LYS7 and HSENN9 also used anexisting quantification approach, but did not exploit the ordinal nature of the problem.
The other teams mostly used approaches based on“classify and count” (see Section 5.4), and viewed the problem as single-label multi-class (instead of ordinal) classification; some of these teams (notably, team Finki2) obtained very good results, which testifies to the quality of the (general-purpose) features and learning algorithm they used.

Conclusion

值得研读的Paper：
每个任务的最高得分
* A:SwissCheese(Deriu et al., 2016)
* B:Tweester(Palogiannidi et al., 2016)
* C:TwiSE(Balikas and Amini, 2016)
* D:Finki(Stojanovski et al., 2016)
* E:QCRI(Da San Martino et al., 2016)
独具一格
* PUT3(Lango et al., 2016)
* ISTI-CNR(Esuli, 2016)
* LYS(Vilares et al., 2016)
* QCRI5(Da San Martino et al., 2016)
* NRU-HSE(Karpov et al., 2016)

虽然相关的论文很多但是值得深入的不多，基本方法都是运用深度学习，在构建网络的时候不同。