Tweet Sentiment Analysis by Incorporating Sentiment-Specific ~~ ——I. Introduction 引言

“Mining sentiment from tweets has several applications in marketing, research and brand analytics. Organizations use tweet sentiment analysis to discover or analyze user feedback about products and services. Analysts in various domains may use it to conduct research on topics of interest. Companies may apply it to business or branding strategy. Tweets are short, noisy and cover a variety of topics. Tweeters may use different vocabularies, misspelled words, incorrect syntax and incomplete sentences, all of which pose challenges for sentiment analysis. In this study, we try to classify a tweet’s sentiment polarity into one of three types: positive, neutral and negative. Neutral implies either a tweet does not express any opinion or the opinion is neutral.”
从推文中挖掘情感在营销、研究和品牌分析方面有很多应用。机构使用推文情感分析来发现或分析用户对产品和服务的反馈。不同领域的分析师可以使用它对感兴趣的主题进行研究。公司可能会把它应用到商业或品牌战略中。推文短而复杂,涵盖了各种各样的话题。推特用户可能会使用不同的词汇、拼错的单词、不正确的语法和不完整的句子,所有这些都给情感分析带来了挑战。在本研究中,我们试图将推文的情感极性分为积极、中立和消极三种类型。中立指的是一条推文没有表达任何观点或者观点是中立的。

“Current approaches to tweet polarity analysis are either lexicon-based or machine learning based. Most learning based approaches focus on generating effective features. Mohammad et al. [10] implemented the top-performing system, NRC, in the tweet sentiment analysis task track of SemEval 2013 and 2014 [12, 15]. NRC uses hundreds of hand-crafted features and five different sentiment lexicons, including two curated by the authors from about 2.5 million tweets. The feature design and generation is very intense. For example, features used by NRC are: word n-grams, character n-grams, cap words, hashtags, punctuations, emotions, part-of-speech tags, more than one hundred handcrafted features generated from five lexicons, and 1000 topic clusters built from 56 million tweets.”
目前的推文极性分析方法要么基于词典,要么基于机器学习。大多数基于学习的方法都专注于生成有效的特征。Mohammad等人[10]在SemEval 2013和2014的推文情感分析任务中实现了性能最好的系统NRC[12,15]。NRC使用了数百个手工制作的特征和五种不同的情感词汇,其中两种是作者从250万条推特中整理出来的。特征设计和生成非常密集。例如,NRC使用的特征包括:单词n-gram、字符n-gram、cap单词、标签、标点、情感、词性标签、从5个词汇中生成的100多个手工特性,以及从5600万条tweet中构建的1000个主题集群。

参考资料:
[10] Saif M Mohammad, Svetlana Kiritchenko, and Xiaodan Zhu, Nrc-canada: Building the state-ofthe- art in sentiment analysis of tweets. International Workshop on Semantic Evaluation 2013
[12] P. Nakov, S. Rosenthal, Z. Kozareva, V. Stoyanov, A. Ritter, and T. Wilson. 2013. Semeval-2013 task 2: Sentiment analysis in twitter.
[15] Rosenthal, Sara and Ritter, Alan and Nakov, Preslav and Stoyanov, Veselin. SemEval-2014: Sentiment Analysis in Twitter.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

淘淘图兔兔呀

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值