【机器学习算法】之朴素贝叶斯的实现

为了加深对机器学习算法的理解,以及熟悉python,pandas,scikit-learn。现在自己实现一下主要的机器学习算法,程序记录如下:

决策树类的实现程序:

from numpy import *

def loadDataSet():
    postingList=[['my', 'dog', 'has', 'flea', 'problems', 'help', 'please'],
                 ['maybe', 'not', 'take', 'him', 'to', 'dog', 'park', 'stupid'],
                 ['my', 'dalmation', 'is', 'so', 'cute', 'I', 'love', 'him'],
                 ['stop', 'posting', 'stupid', 'worthless', 'garbage'],
                 ['mr', 'licks', 'ate', 'my', 'steak', 'how', 'to', 'stop', 'him'],
                 ['quit', 'buying', 'worthless', 'dog', 'food', 'stupid']]
    classVec = [0,1,0,1,0,1]    #1 is abusive, 0 not
    return postingList,classVec 
def gen_label_prob(label):
    sample_len = len(label)
    label_dic = {}
    for label_val in label:
        label_dic[label_val] = label_dic.get(label_val,0)+1
    for key in label_dic.keys():
        label_dic[key]=float(label_dic[key])/sample_len
    return label_dic

def gen_condi_prob(train_data,label,label_dic):
    data_len = len(train_data)
    label_set = set(label)
    res_dic={}
    for data_list in train_data:
        for label_val in label_set:
            for curr_x in data_list:
                key = tuple([curr_x,label_val])
                res_dic[key] = res_dic.get(key,0)+1

    for key in res_dic.keys():
        res_dic[key] = float(res_dic[key])/(data_len*label_dic[key[1]])
    return res_dic,label_set

def predict(test,res_dic,label_set,label_dic):
        prob = {}
        for label in label_set:
            for curr_x in test:
                key=tuple([curr_x,label])
                prob[label]=prob.get(label,1)*res_dic.get(key,0)
        max_prob=0;max_label=0
        for key in prob.keys():
            prob[key]=prob[key]*label_dic[key]
            if(prob[key]>max_prob):
                max_label=key
                max_prob=prob[key]
        return max_label

def model_test():
    train_data,train_label = loadDataSet()
    label_dic=gen_label_prob(train_label)
    res_dic,label_set=gen_condi_prob(train_data,train_label,label_dic)
    #x=['quit', 'buying', 'worthless', 'food', 'stupid']
    x=['stop']
    res_label = predict(x,res_dic,label_set,label_dic)
    print res_label

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值