讲道理上次写完离散性朴素贝叶斯的实现,这次得写连续型的了,考虑到还有离散性+连续型(考虑到我懒),即数据集里的特征既有离散的特征又有连续的特征这样,就一并一起洗写了吧o(* ̄▽ ̄*)ブ
上次讲到了朴素贝叶斯的思想,本质上就是假设数据特征的条件概率是无关的,然后我们通过正态分布去假设每个特征条件概率的分布;
于是乎对于连续型的特征我们可以通过它们在训练集上的均值和方差去估算新来样本的条件概率
然后就和离散性一样啦~
import numpy as np
import math
pi = math.pi
class NaiveBayesClassifier(object):
def __init__(self):
self.x = self.y = []
self.feat_dics = self.label_dic = self.dic_label = self.pri = None
self.con = []
self.is_continue = self.cont_con = None
def pre(self, x, y):
xt = map(list, zip(*x))
features = [set(feat) for feat in xt]
self.feat_dics = [{_l: i for i, _l in enumerate(feats)}
for i, feats in enumerate(features)]
x =