Emoji表情符号用于文本情感分析-Improving sentiment analysis accuracy with emoji embedding

无知的研究生

已于 2022-05-19 21:10:06 修改

阅读量2.2k

点赞数 1

文章标签：机器学习深度学习人工智能算法

于 2022-05-19 20:05:21 首次发布

本文链接：https://blog.csdn.net/weixin_37996254/article/details/124861315

版权

Abstract:
Due to the diversity and variability of Chinese syntax and semantics, accurately identifying and distinguishing individual emotions from online texts is challenging. To overcome this limitation, we incorporate a new source of individual sentiment, emojis, which contain thousands of graphic symbols and are increasingly being used for expressing emotion in online conversations. We examined popular sentiment analysis algorithms, including rule-based and classification algorithms, to evaluate the impact of supplementing emojis as additional features to improve the algorithm performance. Emojis were also translated into corresponding sentiment words when constructing features for comparison with those directly generated from emoji label words. In addition, considering different functions of emojis in texts, we classified all posts in the dataset by their emoji usage and examined the changes in algorithm performance. We found that emojis are effective as expanding features for improving the accuracy of sentiment analysis algorithms, and the algorithm performance can be further increased by taking different emoji usages into consideration. In this study, we developed an improved emoji-embedding model based on Bi-LSTM (namely, CEmo-LSTM), which achieves the highest accuracy (around 0.95) when analyzing online Chinese texts. We applied the CEmo-LSTM algorithm to a large dataset collected from Weibo from December 1, 2019 to March 20, 2020 to understand the sentiment evolution of online users during the COVID-19 pandemic. We found that the pandemic remarkably impacted individual sentiments and caused more passive emotions (e.g., horror and sadness). Our novel emoji-embedding algorithm creatively combined emojis as well as emoji usage with the sentiment analysis model and can handle emotion mining tasks more effectively and efficiently.
由于汉语句法和语义的多样性和可变性，准确识别和区分网络文本中的个人情感是一项挑战。为了克服这一限制，我们加入了一种新的个人情感来源，表情符号，它包含数千个图形符号，越来越多地被用于在线对话中表达情感。我们研究了流行的情感分析算法，包括基于规则的算法和分类算法，以评估补充表情符号作为额外特征对提高算法性能的影响。在构造特征以与表情符号标签词直接生成的特征进行比较时，表情符号也被翻译成相应的情感词。此外，考虑到表情符号在文本中的不同功能，我们根据表情符号的使用情况对数据集中的所有帖子进行分类，并检查算法性能的变化。我们发现表情符号作为扩展特征对于提高情感分析算法的准确性是有效的，并且通过考虑不同表情符号的使用可以进一步提高算法的性能。在本研究中，我们开发了一种基于Bi-LSTM（即CEmo-LSTM）的改进表情嵌入模型，该模型在分析在线中文文本时达到了最高的准确率（约0.95）。我们将CEmo LSTM算法应用于2019年12月1日至2020年3月20日从微博收集的一个大型数据集，以了解2019冠状病毒疾病大流行期间在线用户的情绪演变。我们发现，这种流行病显著影响了个人情绪，并导致更多的消极情绪（例如，恐惧和悲伤）。我们的新表情嵌入算法创造性地将表情以及表情的使用与情感分析模型相结合，可以更有效地处理情感挖掘任务。

Main Work:
However, these studies mainly considered emojis as one feature and did not research the sentiment effects of emojis on the whole texts. Little attention has been given to the SA model combined with different emoji usages in texts.
In this study, we proposed an emoji-embedding architecture named CEmo-LSTM to improve the accuracy of sentiment identification and classification in SA tasks. We further evaluated the benefits of introducing emojis to the accuracy of SA in both the traditional rule-based and supervised learning algorithms. Additionally, the most effective approach for embedding emojis in SA algorithms was examined. We compared the performance of the CEmo-LSTM model with that of other mainstream SA models in different experimental settings. Finally, by collecting all posts and embedded emojis published by users on Weibo during the COVID-19 outbreak, we uti

最低0.47元/天解锁文章