电影评论分类——朴素贝叶斯

最新推荐文章于 2024-06-25 06:30:00 发布

mmい

最新推荐文章于 2024-06-25 06:30:00 发布

阅读量4.5k

点赞数

分类专栏： Machine Learning

本文链接：https://blog.csdn.net/zm714981790/article/details/51295107

版权

本文介绍了如何使用朴素贝叶斯分类模型对电影评论进行情感分析。通过计算样本属于积极或消极类别的概率，利用贝叶斯公式和拉普拉斯平滑处理，对文本分类问题进行预测，并计算预测误差。

摘要由CSDN通过智能技术生成

Before We Classify

给定一个电影的评论（文本信息），我们想要知道这个评论的语气是积极（+1）的还是消极的（-1）。本文利用 naive bayes分类模型来解决这个问题。朴素贝叶斯的原理是计算某个样本属于某个类的概率。计算公式是基于贝叶斯理论：P(A∣B)=P(B∣A)/P(A)P(B),意思是给定B，计算A的概率。

# Here's a running history for the past week.
# For each day, it contains whether or not the person ran, and whether or not they were tired.
days = [["ran", "was tired"], ["ran", "was not tired"], ["didn't run", "was tired"], ["ran", "was tired"], ["didn't run", "was not tired"], ["ran", "was not tired"], ["ran", "was tired"]]


# This is P(A):the probability of being tired
prob_tired = len([d for d in days if d[1] == "was tired"]) / len(days)
# This is P(B):the probability of running
prob_ran = len([d for d in days if d[0] == "ran"]) / len(days)
# This is P(B|A):the probability of running given that you are tired
prob_ran_given_tired = len([d for d in days if d[0] == "ran" and d[1] == "was tired"]) / len([d for d in days if d[1] == "was tired"])

# Now we can calculate P(A|B).
prob_tired_given_ran = (prob_ran_given_tired * prob_tired) / prob_ran

print("Probability of being tired given that you ran: {0}".format(prob_tired_given_ran))
'''
Probability of being tired given that you ran: 0.6
'''

Naive Bayes Intro

上一个例子中只有一个属性：跑步，而是否累是预测变量，所以可以使用贝叶斯公式：P(A∣B)=P(B∣A)/P(A)P(B)，但是当属性多余一个时，这个公式就不好计算了，此时就引出了朴素贝叶斯理论。朴素贝叶斯有一个条件独立假设，公式如下：

这里写图片描述

下面这个例子中有两个属性，是否跑步以及是否早起，给定一个样本[“ran”, “didn’t wake up early”]，预测是否tired：

# Here's our data, but with "woke up early" or "didn't wake up early" added.
days = [["ran", "was tired",

最低0.47元/天解锁文章

mmい

关注

0
点赞
踩
16

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录