朴素贝叶斯
项目链接:
公式推导
贝叶斯算法的原理,在李航机器学习的书中已有详细证明,一下只对关键问题进行证明
1 为什么贝叶斯中后验概率最大化等价于经验风险最小化
- 令
L
(
y
,
f
(
x
)
)
L(y, f(x))
L(y,f(x))为损失函数,通过积分可以得到经验损失
R e p p ( f ) = ∫ x ∫ y L ( y , f ( x ) ) × P ( x , y ) d x d y = ∫ x ∫ y L ( y , f ( x ) ) × P ( y ∣ x ) P ( x ) d x d y = ∫ x P ( x ) d x ∫ y L ( y , f ( x ) ) × P ( y ∣ x ) d y min ( ∫ y L ( y , f ( x ) ) × P ( y ∣ x ) d y ) → min ( ∑ i = 1 n L ( y i , f ( x i ) ) × P ( Y = y i ∣ X = x i ) ) min ( ∑ k = 1 K L ( c k , y ) × P ( c k ∣ X = x ) ) \begin{array}{l}{R_{\mathrm{epp}}(f)=\int_{x} \int_{y} L(y, f(x)) \times P(x, y) d x d y} \\ {=\int_{x} \int_{y} L(y, f(x)) \times P(y | x) P(x) d x d y} \\ {=\int_{x} P(x) d x \int_{y} L(y, f(x)) \times P(y | x) d y} \\ {\min \left(\int_{y} L(y, f(x)) \times P(y | x) d y\right) \rightarrow \min \left(\sum_{i=1}^{n} L\left(y_{i}, f\left(x_{i}\right)\right) \times P\left(Y=y_{i} | X=x_{i}\right)\right)} \\ {\min \left(\sum_{k=1}^{K} L\left(c_{k}, y\right) \times P\left(c_{k} | X=x\right)\right)}\end{array} Repp(f)=∫x∫yL(y,f(x))×P(x,y)dxdy=∫x∫yL(y,f(x))×P(y∣x)P(x)dxdy=∫xP(x)dx∫yL(y,f(x))×P(y∣x)dymin(∫yL(y,f(x))×P(y∣x)dy)→min(∑i=1nL(yi,f(xi))×P(Y=yi∣X=xi))min(∑k=1KL(ck,y)×P(ck∣X=x)) - 令损失函数为指示函数,则可得等价形式
min ( ∑ k = 1 K L ( c k , y ) × P ( c k ∣ X = x ) ) = min ( ∑ k = 1 K P ( c k ≠ y ∣ X = x ) ) = min ( ∑ k = 1 K 1 − P ( c k = y ∣ X = x ) ) = max ( P ( c k = y ∣ X = x ) ) \begin{array}{l}{\min \left(\sum_{k=1}^{K} L\left(c_{k}, y\right) \times P\left(c_{k} | X=x\right)\right)} \\ {=\min \left(\sum_{k=1}^{K} P\left(c_{k} \neq y | X=x\right)\right)} \\ {=\min \left(\sum_{k=1}^{K} 1-P\left(c_{k}=y | X=x\right)\right)} \\ {=\max \left(P\left(c_{k}=y | X=x\right)\right)}\end{array} min(∑k=1KL(ck,y)×P(ck∣X=x))=min(∑k=1KP(ck̸=y∣X=x))=min(∑k=1K1−P(ck=y∣X=x))=max(P(ck=y∣X=x))
2 贝叶斯原理公式
P ( Y = c k ∣ X = x ) = P ( X = x ∣ Y = c k ) P ( Y = c k ) ∑ k P ( X = x ∣ Y = c k ) P ( Y = c k ) P\left(Y=c_{k} | X=x\right)=\frac{P\left(X=x | Y=c_{k}\right) P\left(Y=c_{k}\right)}{\sum_{k} P\left(X=x | Y=c_{k}\right) P\left(Y=c_{k}\right)} P(Y=ck∣X=x)=∑kP(X=x∣Y=ck)P(Y=ck)P(X=x∣Y=ck)P(Y=ck)
P ( Y = c k ∣ X = x ) = P ( Y = c k ) ∏ j P ( X ( j ) = x ˙ ( j ) ∣ Y = c k ) ∑ k P ( Y = c k ) ∏ j P ( X ( j ) = x ( j ) ∣ Y = c k ) P\left(Y=c_{k} | X=x\right)=\frac{P\left(Y=c_{k}\right) \prod_{j} P\left(X^{(j)}=\dot{x}^{(j)} | Y=c_{k}\right)}{\sum_{k} P\left(Y=c_{k}\right) \prod_{j} P\left(X^{(j)}=x^{(j)} | Y=c_{k}\right)} P(Y=ck∣X=x)=∑kP(Y=ck)∏jP(X(j)=x(j)∣Y=ck)P(Y=ck)∏jP(X(j)=x˙(j)∣Y=ck)
import numpy as np
class NavieBayesClassifier:
def __init__(self, lamb = 0):
self.prior_prob_y = {}
self.prior_prob_x = {}
self.x_dim = 0
#拉普拉斯平滑系数
self.lamb = lamb
def fit(self, x, y):
'''
x是二维ndarray数组
y是一维ndarray数组
x,y长度相同
'''
self.x_dim = len(x[0])
y_list = y.tolist()
y_unique = np.unique(y)
for val in y_unique:
self.prior_prob_y[val] = y_list.count(val)/len(y_list)
y = np.array([y_list])
xy = np.hstack((x, y.T))
for d in range(self.x_dim):
#处理x不同维度
x_and_y = xy[:, (d,-1)]
x_unique = np.unique(xy[:, d])
laplace = len(x_unique)
self.prior_prob_x[d] = {}
for yy in y_unique:
#处理不同的y值
x_when_yy = x_and_y[x_and_y[:, -1] == yy]
x_list = x_when_yy[:, 0].tolist()
self.prior_prob_x[d][yy] = {}
for xx in x_unique:
#获取固定的y下,不同的x的概率
self.prior_prob_x[d][yy][xx] = (x_list.count(xx) + self.lamb) / (len(x_list) + laplace * self.lamb)
def predict(self, x):
'''
x是一维数组
'''
res = {}
all_pro = 0
for y_val in self.prior_prob_y:
res[y_val] = self.prior_prob_y[y_val]
px_y = 1
for d in range(self.x_dim):
print(d, y_val, x[d], self.prior_prob_x[d][y_val][x[d]])
px_y *= self.prior_prob_x[d][y_val][x[d]]
res[y_val] *= px_y
all_pro += res[y_val]
for y_val in res:
res[y_val] /= all_pro
#利用书中的实例测试
# if __name__ == '__main__':
xy = [[1,4,-1],
[1,5,-1],
[1,5,1],
[1,4,1],
[1,4,-1],
[2,4,-1],
[2,5,-1],
[2,5,1],
[2,6,1],
[2,6,1],
[3,6,1],
[3,5,1],
[3,5,1],
[3,6,1],
[3,6,-1]]
xy = np.array(xy)
sb_clf = NavieBayesClassifier(1)
sb_clf.fit(xy[:, (0,1)], xy[:, -1])
print('x prob', sb_clf.prior_prob_x)
print('y prob', sb_clf.prior_prob_y)