C1W2.LAB.Visualizing Naive Bayes

理论课:C1W2.Sentiment Analysis with Naïve Bayes


理论课: C1W2.Sentiment Analysis with Naïve Bayes

导入包

在下面的练习中,将使用朴素贝叶斯特征对推文数据集进行可视化检查,重点理解对数似然比=一对可输入机器学习算法的数字特征。

最后,将介绍置信度椭圆的概念,作为直观表示朴素贝叶斯模型的工具。

import numpy as np # Library for linear algebra and math utils
import pandas as pd # Dataframe library

import matplotlib.pyplot as plt # Library for plots
from utils import confidence_ellipse # Function to add confidence ellipses to charts

Calculate the likelihoods for each tweet

对于每条推文,我们都计算了该推文的正面可能性和负面可能性。下面给出可能性比率的分子和分母。
l o g P ( t w e e t ∣ p o s ) P ( t w e e t ∣ n e g ) = l o g ( P ( t w e e t ∣ p o s ) ) − l o g ( P ( t w e e t ∣ n e g ) ) log \frac{P(tweet|pos)}{P(tweet|neg)} = log(P(tweet|pos)) - log(P(tweet|neg)) logP(tweetneg)P(tweetpos)=log(P(tweetpos))log(P(tweetneg))
p o s i t i v e = l o g ( P ( t w e e t ∣ p o s ) ) = ∑ i = 0 n l o g P ( W i ∣ p o s ) positive = log(P(tweet|pos)) = \sum_{i=0}^{n}{log P(W_i|pos)} positive=log(P(tweetpos))=i=0nlogP(Wipos)
n e g a t i v e = l o g ( P ( t w e e t ∣ n e g ) ) = ∑ i = 0 n l o g P ( W i ∣ n e g ) negative = log(P(tweet|neg)) = \sum_{i=0}^{n}{log P(W_i|neg)} negative=log(P(tweetneg))=i=0nlogP(Wineg)
以上公式对应的代码本次实验不做要求,但运行得到的结果放在:'bayes_features.csv’文件中。

data = pd.read_csv('./data/bayes_features.csv'); # Load the data from the csv file

data.head(5) # Print the first 5 tweets features. Each row represents a tweet

结果:
在这里插入图片描述
画图:

# Plot the samples using columns 1 and 2 of the matrix
fig, ax = plt.subplots(figsize = (8, 8)) #Create a new figure with a custom size

colors = ['red', 'green'] # Define a color palete
sentiments = ['negative', 'positive'] 

index = data.index

# Color base on sentiment
for sentiment in data.sentiment.unique():
    ix = index[data.sentiment == sentiment]
    ax.scatter(data.iloc[ix].positive, data.iloc[ix].negative, c=colors[int(sentiment)], s=0.1, marker='*', label=sentiments[int(sentiment)])

ax.legend(loc='best')    
    
# Custom limits for this chart
plt.xlim(-250,0)
plt.ylim(-250,0)

plt.xlabel("Positive") # x-axis label
plt.ylabel("Negative") # y-axis label
plt.show()

在这里插入图片描述

Using Confidence Ellipses to interpret Naïve Bayes

本节我们将使用 置信度椭圆 分析朴素贝叶斯的结果。

置信椭圆是可视化二维随机变量的一种方法。它比在直角坐标平面上绘制点更好,因为在大数据集上,点会严重重叠,从而掩盖数据的真实分布。置信度椭圆只需四个参数就能概括数据集的信息:

  • 中心: 中心:是属性的数值平均值。
  • 高度和宽度: 高度和宽度:与每个属性的方差有关。用户必须指定绘制椭圆所需的标准偏差量。
  • 角度: 与属性间的协方差有关。

参数 n_std 代表椭圆边界的标准差个数。请记住,对于正态随机分布来说

  • 曲线下约 68% 的面积落在均值周围 1 个标准差的范围内。
  • 约 95% 的曲线下面积在均值周围 2 个标准差以内。
  • 约 99.7% 的曲线下面积在均值周围 3 个标准差以内。

在这里插入图片描述
下面代码将绘制2_std和3_std

# Plot the samples using columns 1 and 2 of the matrix
fig, ax = plt.subplots(figsize = (8, 8))

colors = ['red', 'green'] # Define a color palete
sentiments = ['negative', 'positive'] 
index = data.index

# Color base on sentiment
for sentiment in data.sentiment.unique():
    ix = index[data.sentiment == sentiment]
    ax.scatter(data.iloc[ix].positive, data.iloc[ix].negative, c=colors[int(sentiment)], s=0.1, marker='*', label=sentiments[int(sentiment)])

# Custom limits for this chart
plt.xlim(-200,40)  
plt.ylim(-200,40)

plt.xlabel("Positive") # x-axis label
plt.ylabel("Negative") # y-axis label

data_pos = data[data.sentiment == 1] # Filter only the positive samples
data_neg = data[data.sentiment == 0] # Filter only the negative samples

# Print confidence ellipses of 2 std
confidence_ellipse(data_pos.positive, data_pos.negative, ax, n_std=2, edgecolor='black', label=r'$2\sigma$' )
confidence_ellipse(data_neg.positive, data_neg.negative, ax, n_std=2, edgecolor='orange')

# Print confidence ellipses of 3 std
confidence_ellipse(data_pos.positive, data_pos.negative, ax, n_std=3, edgecolor='black', linestyle=':', label=r'$3\sigma$')
confidence_ellipse(data_neg.positive, data_neg.negative, ax, n_std=3, edgecolor='orange', linestyle=':')
ax.legend(loc='lower right')

plt.show()

在这里插入图片描述
下面,修改正例推文的特征,使其与负例重合:

data2 = data.copy() # Copy the whole data frame

# The following 2 lines only modify the entries in the data frame where sentiment == 1
#data2.negative[data.sentiment == 1] =  data2.negative * 1.5 + 50 # Modify the negative attribute
#data2.positive[data.sentiment == 1] =  data2.positive / 1.5 - 50 # Modify the positive attribute 
# 对于情感值为1的数据点,修改negative属性
data2.loc[data2.sentiment == 1, 'negative'] = data2.loc[data2.sentiment == 1, 'negative'] * 1.5 + 50

# 对于情感值为1的数据点,修改positive属性
data2.loc[data2.sentiment == 1, 'positive'] = data2.loc[data2.sentiment == 1, 'positive'] / 1.5 - 50

重新绘制图像:

# Plot the samples using columns 1 and 2 of the matrix
fig, ax = plt.subplots(figsize = (8, 8))

colors = ['red', 'green'] # Define a color palete
sentiments = ['negative', 'positive'] 
index = data2.index

# Color base on sentiment
for sentiment in data2.sentiment.unique():
    ix = index[data2.sentiment == sentiment]
    ax.scatter(data2.iloc[ix].positive, data2.iloc[ix].negative, c=colors[int(sentiment)], s=0.1, marker='*', label=sentiments[int(sentiment)])

#ax.scatter(data2.positive, data2.negative, c=[colors[int(k)] for k in data2.sentiment], s = 0.1, marker='*')  # Plot a dot for tweet
# Custom limits for this chart
plt.xlim(-200,40)  
plt.ylim(-200,40)

plt.xlabel("Positive") # x-axis label
plt.ylabel("Negative") # y-axis label

data_pos = data2[data2.sentiment == 1] # Filter only the positive samples
data_neg = data[data2.sentiment == 0] # Filter only the negative samples

# Print confidence ellipses of 2 std
confidence_ellipse(data_pos.positive, data_pos.negative, ax, n_std=2, edgecolor='black', label=r'$2\sigma$' )
confidence_ellipse(data_neg.positive, data_neg.negative, ax, n_std=2, edgecolor='orange')

# Print confidence ellipses of 3 std
confidence_ellipse(data_pos.positive, data_pos.negative, ax, n_std=3, edgecolor='black', linestyle=':', label=r'$3\sigma$')
confidence_ellipse(data_neg.positive, data_neg.negative, ax, n_std=3, edgecolor='orange', linestyle=':')
ax.legend(loc='lower right')

plt.show()

在这里插入图片描述

修改后,两个数据的分布开始重合。

  • 17
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

oldmao_2000

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值