“This is the simplest form of sentiment analysis and it is assumed that the document contains an opinion on one main object expressed by the author of the document. Numerous papers have been written on this topic. There are two main approaches to document-level sentiment analysis: supervised learning and unsupervised learning.”
这是情感分析的最简单形式,它假定文档包含文档作者所表达的对一个主要对象的意见。关于这个题目已经写了许多论文。文档级情感分析主要有两种方法:有监督学习和无监督学习。
“The supervised approach assumes that there is a finite set of classes into which the document should be classified and training data is available for each class.”
有监督的方法假设存在一组有限的类,文档应该被分类到这些类中,并且每个类的训练数据都是可用的。
“Research[28] has shown that good accuracy is achieved even when each document is represented as a simple bag of words. More advanced representations utilize TFIDF, POS (Part of Speech) information, sentiment lexicons, and parse structures.”
研究[28]已经表明,即使每个文档被表示为一个简单的单词袋,也能达到很好的准确性。更高级的表示使用TFIDF,POS (词性)信息、情感词汇和语法结构。
参考资料:
[28] Pang, B., Lee, L. and Vaithyanathan, S. Thumbs up? Sentiment Classification using machine learning techniques. In Proceedings of EMNLP-02, 7th Conference on Empirical Methods in Natural Language Processing (Philadelphia, PA, 2002). Association for Computational Linguistics, Morristown, NJ, 79–86.