NLP-Text Classifiers for Sentiment Analysis

最新推荐文章于 2021-12-02 17:02:03 发布

Sengo_1993

最新推荐文章于 2021-12-02 17:02:03 发布

阅读量669

点赞数

分类专栏： Machine Learning 文章标签： NLP

本文链接：https://blog.csdn.net/sengo_gwu/article/details/83841997

版权

本文档介绍了如何使用scikit-learn库构建文本分类器进行情感分析，包括使用Naïve Bayes和Logistic Regression进行二元特征提取，并通过10折交叉验证评估性能。内容涵盖不同特征（词袋、n-gram）和模型的精度和F1分数。

摘要由CSDN通过智能技术生成

Overview

1.Text Classification:

In this assignment, you will use scikit-learn, a machine learning toolkit in Python, to implement text classifiers for sentiment analysis. Please read all instructions below carefully.

2. Datasets and evaluation:

You are given the following customer reviews dataset: CR.zip, which includes positive and negative reviews. CR is a small dataset that doesn’t have train/test divisions, so you are required to evaluate the performance using 10-fold crossvalidation. Please use the following scikit-learn modules in your implementation:

scikit-learn documentation:

Bag-of-words (or ngrams) feature extraction using CountVectorizer: http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html Use binary features (1/0 rather than counts).

Naïve Bayes classifier: http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html

Logistic Regression classifier: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Cross validation: http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html

Classification report: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html