贝叶斯分类器

对于连续属性而言,可以考虑使用概率密度函数(如果是离散的,直接数数即可)。
对于贝叶斯统计,有以下公式:
在这里插入图片描述

1)属性连续的情况

举例1:以下是小孩和成年人的数据,其中第一个数表示身高,第二个数表示体重,根据以下数据判断新数据(120,120),(165,110)是成人还是小孩
在这里插入图片描述
首先,我们假设身高和体重是互不相关的,即独立的影响判断的结果,如此可以使用高斯分布作用于朴素贝叶斯上,由于已经假设独立同分布,所以一个类的似然等于类中每个属性的似然乘积,如下公式:
在这里插入图片描述
解析:我们的目的是为了求得后验分布,所以先计算先验和似然
(1)直接使用计数求先验
p ( y = a ) = 4 / ( 4 + 12 ) = 0.25 ; p ( y = c ) = 1 − 0.25 = 0.75 p(y=a)=4/(4+12)=0.25; p(y=c)=1-0.25=0.75 p(y=a)=4/(4+12)=0.25;p(y=c)=10.25=0.75
(2)先使用data set将高斯分布中的两个参数,即均值和方差确定。
通过以下程序计算:

# -*- coding: utf-8 -*-
"""
Created on Thu Nov 22 20:01:46 2018

@author: wudl
"""
import numpy as np
import xlrd

def mean_function(ob):  
    mean = sum(ob)/len(ob)
    return mean

def variance_function(ob):
    mean = mean_function(ob)
    ob_array = np.array(ob)
    variance = sum((ob_array-mean)**2)/len(ob)
    return variance
    
if __name__=="__main__":
    workbook = xlrd.open_workbook('C:/Users/Lenovo/Desktop/bayes.xlsx')
    sheet = workbook.sheet_by_name('Sheet1')
    height_c = sheet.row_values(0)
    weight_c = sheet.row_values(1)
    height_a = sheet.row_values(3)
    height_a = [i for i in height_a if i !='']   #使用列表解析式是为了将列表中的空字符去掉
    weight_a = sheet.row_values(4)
    weight_a = [i for i in weight_a if i !='']
    mean_height_c = mean_function(height_c)
    variance_height_c = variance_function(height_c)
    mean_weight_c = mean_function(weight_c)
    variance_weight_c = variance_function(weight_c)
    mean_height_a = mean_function(height_a)
    variance_height_a = variance_function(height_a)
    mean_weight_a = mean_function(weight_a)
    variance_weight_a = variance_function(weight_a)
    print('mean_height_c ==>> %.2f;   variance_height_c ==>> %.2f' %(mean_height_c,variance_height_c),'\n' )
    print('mean_weight_c ==>> %.2f;   variance_weight_c ==>> %.2f' %(mean_weight_c,variance_weight_c),'\n' )
    print('mean_height_a ==>> %.2f;   variance_height_a ==>> %.2f' %(mean_height_a,variance_height_a),'\n' )
    print('mean_weight_a ==>> %.2f;   variance_weight_a ==>> %.2f' %(mean_weight_a,variance_weight_a),'\n' )

由此我们可以得到:

mean_height_c ==>> 59.17;   variance_height_c ==>> 424.31 

mean_weight_c ==>> 59.17;   variance_weight_c ==>> 424.31 

mean_height_a ==>> 170.00;   variance_height_a ==>> 50.00 

mean_weight_a ==>> 170.00;   variance_weight_a ==>> 50.00 

Excel数据
在这里插入图片描述
这里近似四舍五入得到如下结果:
在这里插入图片描述
(3)计算后验
在得到了均值和方差之后,就可以计算成人和小孩的先验进而求得后验
比如求成人身高的先验
p ( x h ∣ y = a ) = 1 ( 2 π σ h , a 2 ) e x p ( − ( x h − μ h , a ) 2 2 σ h , a 2 ) p(x_h|y=a)=\frac{1}{\sqrt(2\pi\sigma^2_{h,a})}exp(-\frac{(x_h-\mu_{h,a})^2}{2\sigma^2_{h,a}}) p(xhy=a)=( 2πσh,a2)1exp(2σh,a2(xhμh,a)2)
p ( y = a ∣ x ) = p ( a ) p ( x ∣ y = a ) p ( x ) p(y=a|x)=\frac{p(a)p(x|y=a)}{p(x)} p(y=ax)=p(x)p(a)p(xy=a)
其中
p ( x ∣ y = a ) = p ( x h ∣ y = a ) p ( x w ∣ y = a ) p(x|y=a)=p(x_h|y=a)p(x_w|y=a) p(xy=a)=p(xhy=a)p(xwy=a)
p ( x ) = p ( a ) p ( x ∣ y = a ) + p ( c ) p ( x ∣ y = c ) = 0.25 ∗ p(x)=p(a)p(x|y=a)+p(c)p(x|y=c)=0.25* p(x)=p(a)p(xy=a)+p(c)p(xy=c)=0.25
在这里插入图片描述
完整程序:

# -*- coding: utf-8 -*-
"""
Created on Thu Nov 22 20:01:46 2018

@author: wudl
"""
import numpy as np
import xlrd

def mean_function(ob):  
    mean = sum(ob)/len(ob)
    return mean

def variance_function(ob):
    mean = mean_function(ob)
    ob_array = np.array(ob)
    variance = sum((ob_array-mean)**2)/len(ob)
    return variance

def prior_distribution(w,h,ob1,ob2):
    prior_h = 1/np.sqrt(2*np.pi*variance_function(ob1))*np.exp(-(h-mean_function(ob1))**2/(2*variance_function(ob1)))
    prior_w = 1/np.sqrt(2*np.pi*variance_function(ob2))*np.exp(-(w-mean_function(ob2))**2/(2*variance_function(ob2)))
    return prior_h*prior_w

#def p_sum():
#    

if __name__=="__main__":
    height,weight = map(int,input('Enter height and weight(separated by space):').split())
    workbook = xlrd.open_workbook('C:/Users/Lenovo/Desktop/bayes.xlsx')
    sheet = workbook.sheet_by_name('Sheet1')
    height_c = sheet.row_values(0)
    weight_c = sheet.row_values(1)
    height_a = sheet.row_values(3)
    height_a = [i for i in height_a if i !='']
    weight_a = sheet.row_values(4)
    weight_a = [i for i in weight_a if i !='']
    mean_height_c = mean_function(height_c)
    variance_height_c = variance_function(height_c)
    mean_weight_c = mean_function(weight_c)
    variance_weight_c = variance_function(weight_c)
    mean_height_a = mean_function(height_a)
    variance_height_a = variance_function(height_a)
    mean_weight_a = mean_function(weight_a)
    variance_weight_a = variance_function(weight_a)
    print('mean_height_c ==>> %.2f;   variance_height_c ==>> %.2f' %(mean_height_c,variance_height_c),'\n' )
    print('mean_weight_c ==>> %.2f;   variance_weight_c ==>> %.2f' %(mean_weight_c,variance_weight_c),'\n' )
    print('mean_height_a ==>> %.2f;   variance_height_a ==>> %.2f' %(mean_height_a,variance_height_a),'\n' )
    print('mean_weight_a ==>> %.2f;   variance_weight_a ==>> %.2f' %(mean_weight_a,variance_weight_a),'\n' )
    
    "for forecasting"
    
    "prior distribution"
    prior_a = len(height_a)/(len(height_a)+len(height_c))
    prior_c = len(height_c)/(len(height_a)+len(height_c))
    "likelihood"
    like_a = prior_distribution(height,weight,height_a,weight_a)
    like_c = prior_distribution(height,weight,height_c,weight_c)
    
    "results"
    p_a = prior_a*like_a/(prior_a*like_a+prior_c*like_c)
    p_c = prior_c*like_c/(prior_a*like_a+prior_c*like_c)
    print(p_a)
    print('p(y=c|x)==>>%.4f' %p_c)
'''
    print('p(y=a|x)==>>'+str(p_a))      #考虑到四舍五入有时候比较小时总是会变为零,所以采用字符型输出
    print('p(y=c|x)==>>'+str(p_c))
'''

输入(120 120)得到:判定是小孩
在这里插入图片描述
输入(165 110)得到:判定是小孩
在这里插入图片描述

2) 离散型(实际上只要数数相乘即可,以后更新)
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值