NLP-Text Classifiers for Sentiment Analysis

本文档介绍了如何使用scikit-learn库构建文本分类器进行情感分析,包括使用Naïve Bayes和Logistic Regression进行二元特征提取,并通过10折交叉验证评估性能。内容涵盖不同特征(词袋、n-gram)和模型的精度和F1分数。
摘要由CSDN通过智能技术生成

Overview

1.Text Classification:

In this assignment, you will use scikit-learn, a machine learning toolkit in Python, to implement text classifiers for sentiment analysis. Please read all instructions below carefully.

2. Datasets and evaluation:

You are given the following customer reviews dataset: CR.zip, which includes positive and negative reviews. CR is a small dataset that doesn’t have train/test divisions, so you are required to evaluate the performance using 10-fold crossvalidation. Please use the following scikit-learn modules in your implementation:

scikit-learn documentation:

Bag-of-words (or ngrams) feature extraction using CountVectorizer: http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html Use binary features (1/0 rather than counts).

Naïve Bayes classifier: http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html

Logistic Regression classifier: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Cross validation: http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html

Classification report: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html

Targets

Part1:

Using the scikit-learn modules described above, Implement the following models and report the performance (accuracy and F1) for the CR dataset:

a) A Naïve Bayes classifier with add-1 smoothing using binary bagof-words features.

b) A Naïve Bayes classifier with add-1 smoothing using binary bagof-ngrams features (with unigrams and bigrams).

c) Logistic Regression classifier with L2 regularization (and default parameters) using binary bag-of-words features.

d)  Logistic Regression classifier with L2 regularization using binary bag-of-ngrams features (with unigrams and bigrams).

part2:

[optional

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值