r语言模型评估:
I recently received a new paper titled“Evaluation of Sentiment Analysis in Finance: From Lexicons to Transformers” published on July 16 2020 in IEEE. The authors, KostadinMishev, Ana Gjorgjevikj, Irena Vodenska, Lubomir T. Chitkushev, and DimitarTrajanov compared more than a hundred sentiment algorithms that were applied on two known financial sentiment datasets and evaluated their effectiveness. Although the purpose of the study was to test the effectiveness of different Natural Language Processing (NLP) models, the findings, in the paper, can tell us much more, about the progress of NLP over the duration of the last decade, especially, to better understand what elements contributed the most to the sentiment prediction task.
我最近收到了一份新论文,题为“金融中的情感分析评估:从Lexicons到变形金刚”,于2020年7月16日在IEEE发表。 作者KostadinMishev,Ana Gjorgjevikj,Irena Vodenska,Lubomir T.Chitkushev和DimitarTrajanov比较了应用于两种已知金融情绪数据集的一百多种情绪算法,并评估了它们的有效性。 尽管研究的目的是测试不同自然语言处理(NLP)模型的有效性,但本文的发现可以告诉我们有关NLP在过去十年中的进展的更多信息,尤其是到更好地了解哪些元素对情绪预测任务的贡献最大。
So let’s start with the definition of the sentiment prediction task. Given a collection of paragraphs, the model classifies each paragraph into one of three possible categories: positive sentiment, negative sentiment, or neutral. The model is then evaluated based on a confusion matrix (3X3) that is constructed from the counts of predicted sentiment versus the ground truths (the true labels of each paragraphs).
因此,让我们从情感预测任务的定义开始。 给定一个段落集合,模型将每个段落分为三个可能的类别之一:积极情绪,消极情绪或中立。 然后,基于混淆矩阵(3X3)对模型进行评估,该混淆矩阵是根据预测的情绪与实际情况(每个段落的真实标签)的计数构建的。
The evaluation metric implemented by the authors is called the Matthews correlation coefficient (MCC) and serves as a measure of the quality of binary (two-class) classifications (Matthews,1975). Although the MCC metric is only applicable for the binary case, the authors do not mention how they applied the MCC function in the multi-class case (3 sentiment classes)