电影评论分类——朴素贝叶斯

本文介绍了如何使用朴素贝叶斯分类模型对电影评论进行情感分析。通过计算样本属于积极或消极类别的概率,利用贝叶斯公式和拉普拉斯平滑处理,对文本分类问题进行预测,并计算预测误差。
摘要由CSDN通过智能技术生成

Before We Classify

  • 给定一个电影的评论(文本信息),我们想要知道这个评论的语气是积极(+1)的还是消极的(-1)。本文利用 naive bayes分类模型来解决这个问题。朴素贝叶斯的原理是计算某个样本属于某个类的概率。计算公式是基于贝叶斯理论:P(A∣B)=P(B∣A)/P(A)P(B),意思是给定B,计算A的概率。
# Here's a running history for the past week.
# For each day, it contains whether or not the person ran, and whether or not they were tired.
days = [["ran", "was tired"], ["ran", "was not tired"], ["didn't run", "was tired"], ["ran", "was tired"], ["didn't run", "was not tired"], ["ran", "was not tired"], ["ran", "was tired"]]


# This is P(A):the probability of being tired
prob_tired = len([d for d in days if d[1] == "was tired"]) / len(days)
# This is P(B):the probability of running
prob_ran = len([d for d in days if d[0] == "ran"]) / len(days)
# This is P(B|A):the probability of running given that you are tired
prob_ran_given_tired = len([d for d in days if d[0] == "ran" and d[1] == "was tired"]) / len([d for d in days if d[1] == "was tired"])

# Now we can calculate P(A|B).
prob_tired_given_ran = (prob_ran_given_tired * prob_tired) / prob_ran

print("Probability of being tired given that you ran: {0}".format(prob_tired_given_ran))
'''
Probability of being tired given that you ran: 0.6
'''

Naive Bayes Intro

  • 上一个例子中只有一个属性:跑步,而是否累是预测变量,所以可以使用贝叶斯公式:P(A∣B)=P(B∣A)/P(A)P(B),但是当属性多余一个时,这个公式就不好计算了,此时就引出了朴素贝叶斯理论。朴素贝叶斯有一个条件独立假设,公式如下:

这里写图片描述

  • 下面这个例子中有两个属性,是否跑步以及是否早起,给定一个样本[“ran”, “didn’t wake up early”],预测是否tired:
# Here's our data, but with "woke up early" or "didn't wake up early" added.
days = [["ran", "was tired", 
  • 0
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值