Weibo sentiment analysis method

 

  1. How the project relates to the course material

We choose the Social Network as our topic and focus on the Sina Weibo, which is one of the largest social networking sites in China.

The data on Weibo is very large. In order to get valuable information and use it to do meaningful things, we will use the knowledge related to data management. Our project is improving existing sentiment analysis method of Weibo comments classification.

  1. Why choose this topic

Because we are interested in the Social network, and at the same time, the topic is very hot. With the development of technology, more and more people like to post their feelings, thoughts, comments and something else on the social website. In old words, information is treasure, so how to mining the data from the social website and use data to create value is especially important.

In China, the Weibo is a typical social website. It is very popular and still has lots of space to improve. That’s the reason why we chose this topical and want to do something on the Weibo.

  1. The motivation

The research on sentiment analysis of China Weibo is less than foreign Twitter [3]. Compared to foreign development, the domestic sentiment analysis technology is immature and still has a long way to go[2]. The accuracy of text sentiment analysis is not high enough, so we have put forward some ideas for improving the existing methods.

  1. Some questions we want to address

In Weibo, if a person has lots of fans, when he or she sends a post, he or she may get lots of reviews. And if he or she wants to find a type of review (such as agree, suggest or oppose), it will be difficult, because reviews are just simple classified by time and hot degree which click by fans.

To help him easier find what he wants, uses sentiment analysis to classify them, and make the review area clear.

  1. Overview

5.1 Sentiment Lexicon

How to solve the classification issue which is under the circumstance of lacking sentiment lexicons needs further study. The sentiment lexicon is the root of classification. A better sentiment lexicon is beneficial to the implement of sentiment classification. There are generally three methods for constructing an emotional dictionary, manual construction methods, lexicon-based methods, and corpus-based methods. In our method, we use a hybrid approach to build a specialized dictionary for Weibo comment.

5.1.1 introduction of sentiment lexicon of Weibo

The sentiment lexicon includes 4 parts: comprehensively basic sentiment lexicon, network sentiment lexicon, sentiment lexicon of Weibo, Weibo expression sentiment lexicon. The structure of the lexicon is shown below.

Figure 5.1.1    structure of lexicon

The comprehensively basic sentiment lexicon is made of the general sentiment lexicon and the words from Weibo.

Network sentiment lexicon has selected the three most popular input methods with the biggest number of the current network users, "Sogou Input Method", "Baidu Input Method" and "QQ Input Method". And from the lexicon of the three input methods, manually sort out the most commonly used network emotional new words to create a network sentiment lexicon.

Sentiment lexicon of Weibo is an improved method which selects reference words and candidate words from Sina Weibo corpus, then makes use of extended SO. PMI algorithm to create a new lexicon.

Weibo expression sentiment lexicon: first, pre-process the Weibo text data, then word frequency statistics are performed on these data. Manually select microblog emoticons in the order of high frequency to low frequency. Finally select suitable word or expression frequency ,then add them to the lexicon.

5.1.2 the processing of creating a Weibo sentiment lexicon

Algorithim of processing of creating a Weibo sentiment lexicon:

  Step1: Pre-processing of Weibo Data.

  Step2: Select Reference words and the candidate words.

  Step3:if it is an existing sentiment word

           Over;

        Else, calculate the SO-PMI of candidate word;

             If it is a sentiment word

                 Add it to sentiment lexicon;

              Else over;

         End;

 

5.2 Sentiment analysis methods base on Sentiment Dictionary

5.2.1 Sentiment analysis of semantic rules

During preprocessing, Using “。” ”;” ”?” ”!” to split Weibo short text, Getting complex sentences. Determining what kind of sentential form they are and giving suitable weight for each complex sentence.

Using “,” to split complex sentences, Getting simple sentences. Using key words to determine what kind of relationships between sentences. Giving suitable weight for each simple sentence.

In sentences, determining the polarity and value of emotional words by looking up the sentiment dictionary, and correcting the corresponding values according to the degree adverbs and negative adverbs appearing before the emotional words.

The final result is obtained by weighting the values obtained from each part.

5.2.2 Sentiment analysis of emoticon weighting

During preprocessing, Using regular matching to extract the text representation of the emoticon from “[ ]” symbol.

Building a set of emoticon.

Querying the value corresponding to the emoticon in the expression database or dictionary and summing it, getting the value.

  1. Planning

Our project mainly bases on the sentiment analysis method with the combination of semantic rules and emoticon weighting [1]. Their sentiment dictionary is imperfect and the method of analysis emoticons is simple.

We will mainly improve on these two points.

6.1 lexicon

Because of the fact that the content of Weibo text is streamlined and the sentence pattern is variable, the text also contains special symbols such as expressions. Therefore, machine learning method is not a good method for sentiment analysis, as for the problems such as complicated text processing and low accuracy of text sentiment classification. But there are still some weakness  to which we should pay attention. For example, picture and some web links cannot be analysis directly. And in this lexicon, we cannot make a good use of them to analysis sentiment yet.

  1. Special syntactic rules and expressions

The computer cannot distinguish the rhetorical methods, such as inverse question, metonymy. Therefore, researchers need to consider more special sentence patterns and rhetorical methods in Weibo corpus sentiment analysis, and adjust the corresponding weights, which are beneficial to improve the sentiment analysis effect of Weibo text.

  1. The particularity of Weibo

Researchers need to consider the specificity of Weibo, such as ‘@’, ‘#’ and so on.

  1. There is no end to build a broader and more accurate Weibo sentiment dictionary

Weibo has been constantly developing and changing nowadays. The researcher should change the research methods constantly.

6.2 Sentiment analysis method on emoticon.

We think the method of analysis emoticon in the paper [1] is simple. After observing Weibo comments, We find that, some emoticons have nearly opposite meanings in different sentences. For example, the positive emoticon can be use as negative emoticon, is shown in fig 6-2-1.

Figure 6-2-1

We think the emoticon sentiment analysis need to contact context, and decide to do some detail work on it.

6.3 .memes in comments.

Weibo comments can be accompanied by pictures, and the probability of pictures as memes is very high. The example is shown in fig 6-2-2.

Figure 6-2-2

Some memes come with clear text. Identifying the text in the image may help us in sentiment analysis, and we may add this in our project.

References 

[1] ZHAN, T., YAO, H., FANG, C., ZHANG, J. and ZHANG, P. (2017). Microblogging sentiment analysis method with the combination of semantic rules and emoticon weighting. [online] Beijing: CNKI. Available at: http://big5.oversea.cnki.net.libezproxy.must.edu.mo/kcms/detail/detail.aspx?recid=&FileName=1017290819.nh&DbName=CMFD201801&DbCode=CMFD&uid=WEEvREcwSlJHSldRa1Fhb09jSnU2c2JVemZmQThiUkI3bk5WZ2FsWkRlTT0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4IQMovwHtwkF4VYPoHbKxJw!! [Accessed 28 Nov. 2018].

[2] Wu, D. (2018). 微博熱門話題情感分析及實證研究. [online] 黑龍江: 中国知网, pp.2018年 09期. Available at: http://big5.oversea.cnki.net.libezproxy.must.edu.mo/kcms/detail/detail.aspx?recid=&FileName=1018069133.nh&DbName=CMFD201802&DbCode=CMFD&uid=WEEvREcwSlJHSldRa1Fhb09jSnU2c2JVemZmQThiUkI3bk5WZ2FsWkRlTT0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4IQMovwHtwkF4VYPoHbKxJw!! [Accessed 28 Nov. 2018].

[3] Kaur, H., Mangat, V. and Nidhi (2017). A survey of sentiment analysis techniques. In: 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). [online] IEEE, pp.921 - 925. Available at: https://ieeexplore-ieee-org.libezproxy.must.edu.mo/document/8058315 [Accessed 28 Nov. 2018].

[4] Wang Zongyue ; Qin Sujuan , A sentiment analysis method of Chinese specialized field short commentary. 2017 3rd IEEE International Conference on Computer and Communications (ICCC)

[5] 肖江,丁星,何荣杰.基于领域情感词典的中文微博情感分析[J].电子设计工程2015(12):18-21.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值