【统计学习方法】第4章 朴素贝叶斯法

朴素贝叶斯(naive Bayes)法是基于贝叶斯定理与特征条件独立假设的分类方法。对于给定的训练数据集,首先基于特征条件独立假设学习输入/输出的联合概率分布;然后基于此模型,对给定的输入 x x x,利用贝叶斯定理求出后验概率最大的输出 y y y。 朴素贝叶斯法实现简单,学习与预测的效率都很高,是一种常用的方法。

1、朴素贝叶斯法

训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯   , ( x N , y N ) } \begin{aligned} & T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} \end{aligned} T={(x1,y1),(x2,y2),,(xN,yN)}

P ( X , Y ) P \left( X, Y \right) P(X,Y)独立同分布产生。其中, x i ∈ X ⊆ R n , y i ∈ Y = { c 1 , c 2 , ⋯   , c K } , i = 1 , 2 , ⋯   , N x_{i} \in \mathcal{X} \subseteq R^{n}, y_{i} \in \mathcal{Y} = \left\{ c_{1}, c_{2}, \cdots, c_{K} \right\}, i = 1, 2, \cdots, N xiXRn,yiY={c1,c2,,cK},i=1,2,,N x i x_{i} xi为第 i i i个特征向量(实例), y i y_{i} yi x i x_{i} xi的类标记, X X X是定义在输入空间 X \mathcal{X} X上的随机向量, Y Y Y是定义在输出空间 Y \mathcal{Y} Y上的随机变量。 P ( X , Y ) P \left( X, Y \right) P(X,Y) X X X Y Y Y的联合概率分布。

算法推导

条件独立性假设

P ( X = x ∣ Y = c k ) = P ( X ( 1 ) = x ( 1 ) , ⋯   , X ( n ) = x ( n ) ∣ Y = c k ) = ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) \begin{aligned} & P \left( X = x | Y = c_{k} \right) = P \left( X^{\left( 1 \right)} = x^{\left( 1 \right)} , \cdots, X^{\left( n \right)} = x^{\left( n \right)} | Y = c_{k}\right) \\ & \quad\quad\quad\quad\quad\quad = \prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right) \end{aligned} P(X=xY=ck)=P(X(1)=x(1),,X(n)=x(n)Y=ck)=j=1nP(X(j)=x(j)Y=ck)

即,用于分类的特征在类确定的条件下都是条件独立的。

P ( X = x , Y = c k ) = P ( X = x ∣ Y = c k ) P ( Y = c k ) P ( X = x , Y = c k ) = P ( Y = c k ∣ X = x ) P ( X = x ) \begin{aligned} & P \left( X = x, Y = c_{k} \right) = P \left(X = x | Y = c_{k} \right) P \left( Y = c_{k} \right) \\ & P \left( X = x, Y = c_{k} \right) = P \left( Y = c_{k}| X = x \right) P \left( X = x \right)\end{aligned} P(X=x,Y=ck)=P(X=xY=ck)P(Y=ck)P(X=x,Y=ck)=P(Y=ckX=x)P(X=x)

P ( X = x ∣ Y = c k ) P ( Y = c k ) = P ( Y = c k ∣ X = x ) P ( X = x ) P ( Y = c k ∣ X = x ) = P ( X = x ∣ Y = c k ) P ( Y = c k ) P ( X = x ) = P ( X = x ∣ Y = c k ) P ( Y = c k ) ∑ Y P ( X = x , Y = c k ) = P ( X = x ∣ Y = c k ) P ( Y = c k ) ∑ Y P ( X = x ∣ Y = c k ) P ( Y = c k ) = P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) ∑ Y P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) \begin{aligned} & P \left(X = x | Y = c_{k} \right) P \left( Y = c_{k} \right) = P \left( Y = c_{k}| X = x \right) P \left( X = x \right) \\ & P \left( Y = c_{k}| X = x \right) = \dfrac{P \left(X = x | Y = c_{k} \right) P \left( Y = c_{k} \right)}{P \left( X = x \right)} \\ & \quad\quad\quad\quad\quad\quad = \dfrac{P \left(X = x | Y = c_{k} \right) P \left( Y = c_{k} \right)}{\sum_{Y} P \left( X = x, Y = c_{k} \right)} \\ & \quad\quad\quad\quad\quad\quad = \dfrac{P \left(X = x | Y = c_{k} \right) P \left( Y = c_{k} \right)}{\sum_{Y} P \left(X = x | Y = c_{k} \right) P \left( Y = c_{k} \right)} \\ & \quad\quad\quad\quad\quad\quad = \dfrac{ P \left( Y = c_{k} \right)\prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right)}{\sum_{Y} P \left( Y = c_{k} \right)\prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right)}\end{aligned} P(X=xY=ck)P(Y=ck)=P(Y=ckX=x)P(X=x)P(Y=ckX=x)=P(X=x)P(X=xY=ck)P(Y=ck)=YP(X=x,Y=ck)P(X=xY=ck)P(Y=ck)=YP(X=xY=ck)P(Y=ck)P(X=xY=ck)P(Y=ck)=YP(Y=ck)j=1nP(X(j)=x(j)Y=ck)P(Y=ck)j=1nP(X(j)=x(j)Y=ck)

朴素贝叶斯分类器可表示为

y = f ( x ) = arg ⁡ max ⁡ c k P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) ∑ Y P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) = arg ⁡ max ⁡ c k P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) \begin{aligned} & y = f \left( x \right) = \arg \max_{c_{k}} \dfrac{ P \left( Y = c_{k} \right)\prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right)}{\sum_{Y} P \left( Y = c_{k} \right)\prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right)} \\ & \quad\quad\quad = \arg \max_{c_{k}} P \left( Y = c_{k} \right)\prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right)\end{aligned} y=f(x)=argckmaxYP(Y=ck)j=1nP(X(j)=x(j)Y=ck)P(Y=ck)j=1nP(X(j)=x(j)Y=ck)=argckmaxP(Y=ck)j=1nP(X(j)=x(j)Y=ck)

2、参数估计

极大似然估计

朴素贝叶斯模型参数的极大似然估计

先验概率 P ( Y = c k ) P \left( Y = c_{k} \right) P(Y=ck)的极大似然估计
P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N k = 1 , 2 , ⋯   , K \begin{aligned} P \left( Y = c_{k} \right) = \dfrac{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right)}{N} \quad k = 1, 2, \cdots, K\end{aligned} P(Y=ck)=Ni=1NI(yi=ck)k=1,2,,K

设第 j j j个特征 x ( j ) x^{\left( j \right)} x(j)可能取值的集合为 { a j 1 , a j 2 , ⋯   , a j S j } \left\{ a_{j1}, a_{j2}, \cdots, a_{j S_{j}} \right\} {aj1,aj2,,ajSj},条件概率 P ( X ( j ) = a j l ∣ Y = c k ) P \left( X^{\left( j \right)} = a_{jl} | Y = c_{k} \right) P(X(j)=ajlY=ck)的极大似然估计 P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) j = 1 , 2 , ⋯   , n ; l = 1 , 2 , ⋯   , S j ; k = 1 , 2 , ⋯   , K \begin{aligned} & P \left( X^{\left( j \right)} = a_{jl} | Y = c_{k} \right) = \dfrac{\sum_{i=1}^{N} I \left(x_{i}^{\left( j \right)}=a_{jl}, y_{i} = c_{k} \right)}{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right)} \\ & j = 1, 2, \cdots, n;\quad l = 1, 2, \cdots, S_{j};\quad k = 1, 2, \cdots, K\end{aligned} P(X(j)=ajlY=ck)i=1NI(yi=ck)i=1NI(xi(j)=ajl,yi=ck)j=1,2,,n;l=1,2,,Sj;k=1,2,,K

其中, x i ( j ) x_{i}^{\left( j \right)} xi(j)是第 i i i个样本的第 j j j个特征; a j l a_{jl} ajl是第 j j j个特征可能取的第 l l l个值; I I I是指示函数。

贝叶斯估计

对于 x x x的某个特征的取值没有在先验中出现的情况 ,如果用极大似然估计,这种情况的可能性就是0。 但是出现这种情况的原因通常是因为数据集不能全覆盖样本空间,出现未知的情况处理的策略就是做平滑。

朴素贝叶斯算法
朴素贝叶斯法是基于贝叶斯定理与特征条件独立假设的分类方法。

  • 输入:线性可分训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯   , ( x N , y N ) } T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} T={(x1,y1),(x2,y2),,(xN,yN)},其中 x i = ( x i ( 1 ) , x i ( 2 ) , ⋯   , x i ( n ) ) T x_{i}= \left( x_{i}^{\left(1\right)},x_{i}^{\left(2\right)},\cdots, x_{i}^{\left(n\right)} \right)^{T} xi(xi(1),xi(2),,xi(n))T x i ( j ) x_{i}^{\left( j \right)} xi(j)是第 i i i个样本的第 j j j个特征, x i ( j ) ∈ { a j 1 , a j 2 , ⋯   , a j S j } x_{i}^{\left( j \right)} \in \left\{ a_{j1}, a_{j2}, \cdots, a_{j S_{j}} \right\} xi(j){aj1,aj2,,ajSj} a j l a_{jl} ajl是第 j j j个特征可能取的第 l l l个值, j = 1 , 2 , ⋯   , n ; l = 1 , 2 , ⋯   , S j , y i ∈ { c 1 , c 2 , ⋯   , c K } j = 1, 2, \cdots, n; l = 1, 2, \cdots, S_{j},y_{i} \in \left\{ c_{1}, c_{2}, \cdots, c_{K} \right\} j=1,2,,n;l=1,2,,Sj,yi{c1,c2,,cK};实例 x x x
  • 输出:实例 x x x的分类
  1. 计算先验概率及条件概率
    P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N k = 1 , 2 , ⋯   , K P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) j = 1 , 2 , ⋯   , n ; l = 1 , 2 , ⋯   , S j ; k = 1 , 2 , ⋯   , K \begin{aligned} & P \left( Y = c_{k} \right) = \dfrac{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right)}{N} \quad k = 1, 2, \cdots, K \\ & P \left( X^{\left( j \right)} = a_{jl} | Y = c_{k} \right) = \dfrac{\sum_{i=1}^{N} I \left(x_{i}^{\left( j \right)}=a_{jl}, y_{i} = c_{k} \right)}{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right)} \\ & j = 1, 2, \cdots, n;\quad l = 1, 2, \cdots, S_{j};\quad k = 1, 2, \cdots, K\end{aligned} P(Y=ck)=Ni=1NI(yi=ck)k=1,2,,KP(X(j)=ajlY=ck)i=1NI(yi=ck)i=1NI(xi(j)=ajl,yi=ck)j=1,2,,n;l=1,2,,Sj;k=1,2,,K

  2. 对于给定的实例 x = ( x ( 1 ) , x ( 2 ) , ⋯   , x ( n ) ) T x=\left( x^{\left( 1 \right)}, x^{\left( 2 \right)}, \cdots, x^{\left( n \right)}\right)^{T} x=(x(1),x(2),,x(n))T,计算

P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) k = 1 , 2 , ⋯   , K \begin{aligned} & P \left( Y = c_{k} \right)\prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right) \quad k=1,2,\cdots,K\end{aligned} P(Y=ck)j=1nP(X(j)=x(j)Y=ck)k=1,2,,K

  1. 确定实例 x x x的类别
    y = f ( x ) = arg ⁡ max ⁡ c k P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) \begin{aligned} & y = f \left( x \right) = \arg \max_{c_{k}} P \left( Y = c_{k} \right)\prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right) \end{aligned} y=f(x)=argckmaxP(Y=ck)j=1nP(X(j)=x(j)Y=ck)

朴素贝叶斯模型参数的贝叶斯估计

条件概率的贝叶斯估计 P λ ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) + λ ∑ i = 1 N I ( y i = c k ) + S j λ \begin{aligned} & P_{\lambda} \left( X^{\left( j \right)} = a_{jl} | Y = c_{k} \right) = \dfrac{\sum_{i=1}^{N} I \left(x_{i}^{\left( j \right)}=a_{jl}, y_{i} = c_{k} \right) + \lambda}{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right) + S_{j} \lambda} \end{aligned} Pλ(X(j)=ajlY=ck)i=1NI(yi=ck)+Sjλi=1NI(xi(j)=ajl,yi=ck)+λ

式中 λ ≥ 0 \lambda \geq 0 λ0。当 λ = 0 \lambda = 0 λ0时,是极大似然估计;当 λ = 1 \lambda = 1 λ1时,称为拉普拉斯平滑。
先验概率的贝叶斯估计 P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) + λ N + K λ \begin{aligned} & P \left( Y = c_{k} \right) = \dfrac{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right) + \lambda}{N + K \lambda}\end{aligned} P(Y=ck)=N+Kλi=1NI(yi=ck)+λ

3、概要总结

1.朴素贝叶斯法是典型的生成学习方法。生成方法由训练数据学习联合概率分布 P ( X , Y ) P(X,Y) P(X,Y),然后求得后验概率分布 P ( Y ∣ X ) P(Y|X) P(YX)。具体来说,利用训练数据学习 P ( X ∣ Y ) P(X|Y) P(XY) P ( Y ) P(Y) P(Y)的估计,得到联合概率分布:

P ( X , Y ) = P ( Y ) P ( X ∣ Y ) P(X,Y)=P(Y)P(X|Y) P(X,Y)P(Y)P(XY)

概率估计方法可以是极大似然估计或贝叶斯估计。

2.朴素贝叶斯法的基本假设是条件独立性,

P ( X = x ∣ Y = c k ) = P ( X ( 1 ) = x ( 1 ) , ⋯   , X ( n ) = x ( n ) ∣ Y = c k ) = ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) \begin{aligned} P(X&=x | Y=c_{k} )=P\left(X^{(1)}=x^{(1)}, \cdots, X^{(n)}=x^{(n)} | Y=c_{k}\right) \\ &=\prod_{j=1}^{n} P\left(X^{(j)}=x^{(j)} | Y=c_{k}\right) \end{aligned} P(X=xY=ck)=P(X(1)=x(1),,X(n)=x(n)Y=ck)=j=1nP(X(j)=x(j)Y=ck)

这是一个较强的假设。由于这一假设,模型包含的条件概率的数量大为减少,朴素贝叶斯法的学习与预测大为简化。因而朴素贝叶斯法高效,且易于实现。其缺点是分类的性能不一定很高。

3.朴素贝叶斯法利用贝叶斯定理与学到的联合概率模型进行分类预测。

P ( Y ∣ X ) = P ( X , Y ) P ( X ) = P ( Y ) P ( X ∣ Y ) ∑ Y P ( Y ) P ( X ∣ Y ) P(Y | X)=\frac{P(X, Y)}{P(X)}=\frac{P(Y) P(X | Y)}{\sum_{Y} P(Y) P(X | Y)} P(YX)=P(X)P(X,Y)=YP(Y)P(XY)P(Y)P(XY)
将输入 x x x分到后验概率最大的类 y y y

y = arg ⁡ max ⁡ c k P ( Y = c k ) ∏ j = 1 n P ( X j = x ( j ) ∣ Y = c k ) y=\arg \max _{c_{k}} P\left(Y=c_{k}\right) \prod_{j=1}^{n} P\left(X_{j}=x^{(j)} | Y=c_{k}\right) y=argckmaxP(Y=ck)j=1nP(Xj=x(j)Y=ck)

后验概率最大等价于0-1损失函数时的期望风险最小化。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值