朴素贝叶斯算法的代码实例实现(python)

本文由本人原创,仅作为自己的学习记录

数据:假设下面是课程数据,课程数据分为,价格A,课时B,销量C

价格A课时B销量C

现在学校出了一门新的课程,课程价格A=高,课时B=多,需要预测这个课程的销量

这个问题提出了预测之后的结果,而朴素贝叶斯正好可以满足这一点,网上大多是直接调用API进行预测,实际上最好还是自己实现朴素贝叶斯,朴素贝叶斯公式:P(B|A)=P(A|B)P(B)/P(A),而本文中,公式即为,P(C|AB)=P(AB|C)P(C)/P(AB)=P(A|C)P(B|C)P(C)/P(AB),方法就是分别推算出C为低销量,中销量,高销量时候的概率,然后进行比较,反馈出最大的概率为预测的结果

顺便说一句,朴素两个字意思就是说,AB之间相互独立,互不影响,实际上价格和课时是存在一定的关系的,但是朴素贝叶斯把它当做独立来处理,以计算销量的预测的概率。

下面给出我的代码:

#coding=utf-8  
from __future__ import division
from numpy import array

def set_data(price,time,sale):
    price_number =[]
    time_number= []
    sale_number =[]
    for i in price:
        if i=="低":
            price_number.append(0)
        elif i=="中":
            price_number.append(1)
        elif i=="高":
            price_number.append(2)
    for j in time:
        if j=="少":
            time_number.append(0)
        elif j=="中":
            time_number.append(1)
        elif j=="多":
            time_number.append(2)
    for k in sale:
        if k=="低":
            sale_number.append(0)
        elif k=="中":
            sale_number.append(1)
        elif k=="高":
            sale_number.append(2)
    return price_number,time_number,sale_number

def naive_bs(price_number,time_number,sale_number,expected_price,expected_time):
    price_p=[]
    time_p=[]
    sale_p=[]
    m = array(zip(price_number,time_number,sale_number)).T
    for i in range(3):
        price_p.append(price.count(i)/len(price_number)) #计算各项概率
        time_p.append(time.count(i)/len(time_number))
        sale_p.append(sale.count(i)/len(sale_number))

    advance_sale=[]
    p_ex_price = price.count(expected_price)/len(price_number)
    p_ex_time = time.count(expected_time)/len(time_number)
    low_ex_sale=0
    middle_ex_sale=0
    high_ex_sale=0
    
    for i in range(0,len(sale_number)):
        if  sale_number[i]==0:
            low_ex_sale=low_ex_sale+1
        elif sale_number[i]==1:
            middle_ex_sale=middle_ex_sale+1
        elif sale_number[i]==2:
            high_ex_sale=high_ex_sale+1
    #统计p(c)出现的概率
    #计算不同情况
    aa=0
    bb=0
    cc=0
    for i in range(0,len(price_number)):    
        if expected_price==price_number[i] and sale_number[i]==0:
            aa=aa+1
        elif expected_price==price_number[i] and sale_number[i]==1:
            bb=bb+1
        elif expected_price==price_number[i] and sale_number[i]==2:
            cc=cc+1
    p_aa = aa/low_ex_sale
    p_bb =bb/middle_ex_sale
    p_cc = cc/high_ex_sale
    
    print "p(a|c):%s ,%s,%s"%(p_aa,p_bb,p_cc)
    aaa=0
    bbb=0
    ccc=0
    for i in range(0,len(time_number)):    
        if expected_time==time_number[i] and sale_number[i]==0:
            aaa=aaa+1
        elif expected_time==time_number[i] and sale_number[i]==1:
            bbb=bbb+1
        elif expected_time==time_number[i] and sale_number[i]==2:
            ccc=ccc+1
    p_aaa=aaa/low_ex_sale
    p_bbb=bbb/middle_ex_sale
    p_ccc=ccc/high_ex_sale
    print "p(b|c): %s,%s,%s"%(p_aaa,p_bbb,p_ccc)
    final_low_p = p_aa*p_aaa*low_ex_sale/len(sale_number)*1000
    final_midd_p = p_bb*p_bbb*middle_ex_sale/len(sale_number)*1000
    final_high_p = p_cc*p_ccc*high_ex_sale/len(sale_number)*1000
    final_list=[final_low_p,final_midd_p,final_high_p]
    final_index= final_list.index(max(final_list))
    print final_list
    if final_index==0:
        print "销量预测销量为低"
    elif final_index==1:
        print "销量预测销量为中"
    else:
        print "销量预测销量为高" 
if __name__=="__main__":
    price = ["低","高","低","低","中","高","低"]
    time = ["多","中","少","中","中","多","少"]
    sale = ["高","高","高","低","中","高","中"]
    
    expected_price="高" #新课程价格高
    expected_time="高"  #新课程课时多
    if expected_price=="低":
        expected_price_id=0
    elif expected_price=="中":
        expected_price_id=1
    else:
        expected_price_id=2
    if expected_time=="少":
        expected_time_id=0
    elif expected_time=="中":
        expected_time_id=1
    else:
        expected_time_id=2
    price_number,time_number,sale_number= set_data(price, time, sale)
    print price_number,time_number,sale_number
    naive_bs(price_number, time_number, sale_number, expected_price_id, expected_time_id)
    
   

代码对三个特征进行处理,让属性分别用0,1,2来进行标识,代码是基于价格,课时,销量三个特征的列表长度相等,实际上我们拿到的数据应该是不相同的,应该先对数据处理,即进行数据预处理(主要是缺失值与异常值处理)。

下面是我在eclipse里的运行结果:

[0, 2, 0, 0, 1, 2, 0] [2, 1, 0, 1, 1, 2, 0] [2, 2, 2, 0, 1, 2, 1]
p(a|c):0.0 ,0.0,0.5
p(b|c): 0.0,0.0,0.5
[0.0, 0.0, 142.85714285714286]
预测销量为高

本文仅作为自己的学习记录,可能存在很多不足之处。

 

  • 12
    点赞
  • 120
    收藏
    觉得还不错? 一键收藏
  • 8
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值