朴素贝叶斯及其数学推导

一、朴素贝叶斯简单介绍

朴素贝叶斯成立的前提是条件独立性假设:分类的特征 x i x_i xi在类别确定的条件下都是独立的,用公式表示如下:
P ( X = x i ∣ Y = c k ) = P ( X = x i 1 , X = x i 2 , ⋯   , X = x i n ∣ Y = c k ) = ∏ j = 1 n P ( X ( j ) = x i j ∣ Y = c k ) \begin{aligned} P(X=x_i|Y=c_k) &= P(X=x_i^1,X=x_i^2,\cdots,X=x_i^n|Y=c_k) \\ & = \prod_{j=1}^{n}P(X^{(j)}=x_i^{j}|Y=c_k) \end{aligned} P(X=xiY=ck)=P(X=xi1,X=xi2,,X=xinY=ck)=j=1nP(X(j)=xijY=ck)
其中 c k c_k ck是类别,假设有K个类,n是样本的维度, x i x_i xi是输入样本

朴素贝叶斯法表示如下:
y = a r g   max ⁡ c k P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x i j ∣ Y = c k ) y=arg\,\max_{c_k}P(Y=c_k) \prod_{j=1}^{n}P(X^{(j)}=x_i^{j}|Y=c_k) y=argckmaxP(Y=ck)j=1nP(X(j)=xijY=ck)

二、贝叶斯决策论

介绍朴素贝叶斯中最大化后验概率的来源
朴素贝叶斯选择0-1损失函数作为评价标准,0-1损失函数表示如下:
L ( Y , f ( X ) ) = { 0 , Y = f(X) 1 , Y  ≠  f(X) L(Y,f(X))= \begin{cases} 0,& \text{Y = f(X)} \\ 1,& \text{Y $\neq$ f(X)} \end{cases} L(Y,f(X))={0,1,Y = f(X)= f(X)
其中 f ( X ) f(X) f(X)是分类决策函数
期望损失: R e x p ( f ) = E ( L ( Y , f ( x ) ) ) R_{exp}(f)=E(L(Y,f(x))) Rexp(f)=E(L(Y,f(x))),显然对每个样本 x x x最小化条件风险,则期望损失最小,下面证明期望损失最小化等价于后验概率最大化:
f ( x ) = a r g   min ⁡ y ∈ Y ∑ k = 1 K L ( c k , y ) P ( y = c k ∣ X = x ) = a r g   min ⁡ y ∈ Y ∑ k = 1 K P ( y ≠ c k ∣ X = x ) = a r g   min ⁡ y ∈ Y ∑ k = 1 K ( 1 − P ( y = c k ∣ X = x ) ) = a r g   max ⁡ y ∈ Y ∑ k = 1 K P ( y = c k ∣ X = x ) \begin{aligned} f(x) &=arg \,\min_{y \in \mathcal Y}\sum_{k=1}^{K}L(c_k,y)P(y=c_k|X=x) \\ &=arg \,\min_{y \in \mathcal Y}\sum_{k=1}^{K}P(y \neq c_k|X=x) \\ &=arg \,\min_{y \in \mathcal Y}\sum_{k=1}^{K}(1-P(y=c_k|X=x) )\\ &=arg \,\max_{y \in \mathcal Y}\sum_{k=1}^{K}P(y=c_k|X=x) \end{aligned} f(x)=argyYmink=1KL(ck,y)P(y=ckX=x)=argyYmink=1KP(y=ckX=x)=argyYmink=1K(1P(y=ckX=x))=argyYmaxk=1KP(y=ckX=x)
得到后验概率最大化准则:
f ( x ) = = a r g   max ⁡ y ∈ Y ∑ i = 1 K P ( y = c k ∣ X = x ) f(x)==arg \,\max_{y \in \mathcal Y}\sum_{i=1}^{K}P(y=c_k|X=x) f(x)==argyYmaxi=1KP(y=ckX=x)
其中 Y = { c 1 , c 2 , ⋯   , c K } \mathcal Y=\{c_1,c2, \cdots,c_K\} Y={c1,c2,,cK}, K K K是类别个数

三、参数估计

方法1:极大似然估计

下面先给出结果,再证明
先验概率估计:

P ( Y = c k ) = ∑ i = 1 n I ( y i = c k ) N P(Y=c_k)={ {\sum_{i=1}^{n}I(y_i=c_k)} \over {N} } P(Y=ck)=Ni=1nI(yi=ck)
条件概率估计:

P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 n I ( X i ( j ) = a j l , y i = c k ) ∑ i = 1 n I ( y i = c k ) P(X^{(j)}=a_{jl}|Y=c_k)={ {\sum_{i=1}^{n}I(X_i^{(j)}=a_{jl},y_i=c_k)} \over { \sum_{i=1}^{n}I(y_i=c_k) } } P(X(j)=ajlY=ck)=i=1nI(yi=ck)i=1nI(Xi(j)=ajl,yi=ck)
其中 k = 1 , 2 , ⋯   , K , j = 1 , 2 , ⋯   , n , l = 1 , 2 , ⋯ S j k=1,2,\cdots,K,j=1,2,\cdots,n,l=1,2,\cdots S_j k=1,2,,K,j=1,2,,n,l=1,2,Sj
x ( j ) ∈ { a j 1 , a j 2 , ⋯   , a j S j } x^{(j)} \in \{a_{j1},a_{j2},\cdots,a_{jS_j}\} x(j){aj1,aj2,,ajSj}

下面给出证明:
1、估计先验概率 P ( c k ) P(c_k) P(ck)
P ( Y = c k ) = θ k , k ∈ { 1 , 2 , ⋯   , K } P(Y=c_k)=\theta_k,k \in \{1,2,\cdots ,K\} P(Y=ck)=θk,k{1,2,,K}
P ( Y ) = ∏ k = 1 K θ k I ( Y = c k ) P(Y)=\prod_{k=1}^{K} \theta_k^{I(Y=c_k)} P(Y)=k=1KθkI(Y=ck)
那么对数似然函数表示如下:

L ( θ ) = log ⁡ ( ∏ i = 1 n P ( Y = y i ) ) = log ⁡ ( ∏ i = 1 n ∏ k = 1 K θ k I ( Y i = c k ) ) = log ⁡ ( ∏ k = 1 K θ k N k ) = ∑ k = 1 K N k log ⁡ θ k \begin{aligned} L(\theta) &=\log(\prod_{i=1}^{n}P(Y=y_i)) \\ &=\log(\prod_{i=1}^{n}\prod_{k=1}^{K}\theta_k^{I(Y_i=c_k)}) \\ &=\log(\prod_{k=1}^{K}\theta_k^{N_k}) \\ &=\sum_{k=1}^{K}N_k \log \theta_k \end{aligned} L(θ)=log(i=1nP(Y=yi))=log(i=1nk=1KθkI(Yi=ck))=log(k=1KθkNk)=k=1KNklogθk

其中 N k N_k Nk是样本类别为 c k c_k ck的样本数目
又因为 ∑ k = 1 K θ k = 1 \sum_{k=1}^{K}\theta_k=1 k=1Kθk=1,所以拉格朗日函数可以表示为:
L ( θ k , λ ) = ∑ k = 1 K N k log ⁡ θ k + λ ( ∑ k = 1 K θ k − 1 ) L(\theta_k,\lambda)=\sum_{k=1}^{K}N_k \log \theta_k+ \lambda (\sum_{k=1}^{K}\theta_k-1) L(θk,λ)=k=1KNklogθk+λ(k=1Kθk1)
拉格朗日函数对 θ k \theta_k θk求偏导可得:
∂ L ( θ k , λ ) θ k = N k θ k + λ = 0 ⇒ N k = − λ θ k { \partial L(\theta_k,\lambda) \over {\theta_k} }={ N_k \over \theta_k}+\lambda=0 \Rightarrow N_k=-\lambda \theta_k θkL(θk,λ)=θkNk+λ=0Nk=λθk
对上式求和可得:
∑ k = 1 K = N k = − λ ∑ k = 1 k θ k ⇒ N = − λ ⇒ θ k = N k N 得 证 \sum_{k=1}^{K}=N_k=-\lambda\sum_{k=1}^{k}\theta_k \Rightarrow N=-\lambda \Rightarrow \theta_k=\frac {N_k}{N} \quad得证 k=1K=Nk=λk=1kθkN=λθk=NNk

1、估计条件概率 P ( X ( j ) = a j l ∣ y = c k ) P(X^{(j)}=a_{jl}|y=c_k) P(X(j)=ajly=ck)

P ( X ( j ) = a j l ∣ y = c k ) = θ k j l P(X^{(j)}=a_{jl}|y=c_k)=\theta_{kjl} P(X(j)=ajly=ck)=θkjl
P ( X ( j ) = a j l ∣ y = c k ) = θ k j l = ∏ k = 1 K ∏ j = 1 n ∏ l = 1 S j θ k j l I ( X ( j ) = a j l ) P(X^{(j)}=a_{jl}|y=c_k)=\theta_{kjl}=\prod_{k=1}^{K}\prod_{j=1}^{n}\prod_{l=1}^{S_j}\theta_{kjl}^{I(X^{(j)}=a_{jl})} P(X(j)=ajly=ck)=θkjl=k=1Kj=1nl=1SjθkjlI(X(j)=ajl)
似然函数表示如下:
l ( θ ) = ∏ i = 1 N K ( ∏ k = 1 K ∏ j = 1 n ∏ l = 1 S j θ k j l I ( X i ( j ) = a j l ) ) = ∏ k = 1 K ∏ j = 1 n θ k j l N k j l \begin{aligned} l(\theta) &=\prod_{i=1}^{N_K}(\prod_{k=1}^{K}\prod_{j=1}^{n}\prod_{l=1}^{S_j}\theta_{kjl}^{I(X^{(j)}_i=a_{jl})} ) \\ & =\prod_{k=1}^{K}\prod_{j=1}^{n}\theta_{kjl}^{N_{kjl}} \end{aligned} l(θ)=i=1NK(k=1Kj=1nl=1SjθkjlI(Xi(j)=ajl))=k=1Kj=1nθkjlNkjl
其中 N k j l N_{kjl} Nkjl表示数据集中属于类 c k c_k ck,且样本的第 j j j维度取值为 a j l 的 个 数 a_{jl}的个数 ajl
所以对数似然函数表示如下:
L ( θ ) = ∑ k = 1 K ∑ j = 1 n N k j l log ⁡ θ k j l L(\theta)=\sum_{k=1}^{K}\sum_{j=1}^{n}N_{kjl} \log \theta_{kjl} L(θ)=k=1Kj=1nNkjllogθkjl
又因为 ∑ l = 1 S j θ k j l = 1 \sum_{l=1}^{S_j}\theta_{kjl}=1 l=1Sjθkjl=1
拉格朗日函数可以表示为:
L ( θ , λ ) = ∑ k = 1 K ∑ j = 1 n N k j l log ⁡ θ k j l + λ ( ∑ l = 1 S j θ k j l − 1 ) L(\theta,\lambda)=\sum_{k=1}^{K}\sum_{j=1}^{n}N_{kjl} \log \theta_{kjl}+\lambda(\sum_{l=1}^{S_j}\theta_{kjl}-1) L(θ,λ)=k=1Kj=1nNkjllogθkjl+λ(l=1Sjθkjl1)
⇒ ∂ L ( θ , λ ) ∂ θ k j l = N k j l θ k j l − λ = 0 \Rightarrow \frac {\partial L(\theta,\lambda)} {\partial \theta_{kjl}} =\frac {N_{kjl}}{\theta_{kjl}}-\lambda=0 θkjlL(θ,λ)=θkjlNkjlλ=0
⇒ ∑ l = 1 S j N k j l = λ ∑ l = 1 S j θ k j l = λ \Rightarrow \sum_{l=1}^{S_j}N_{kjl}=\lambda \sum_{l=1}^{S_j}\theta_{kjl}=\lambda l=1SjNkjl=λl=1Sjθkjl=λ
⇒ λ = N k \Rightarrow \lambda =N_k λ=Nk
⇒ θ k j l = N k j l N k \Rightarrow \theta_{kjl}=\frac {N_{kjl}}{N_k} θkjl=NkNkjl, 得证

方法2:贝叶斯估计

贝叶斯估计是为了解决极大似然估计中可能存在的所要估计的概率值为0的情况
先验概率估计:
P ( Y = c k ) = ∑ i = 1 n I ( y i = c k ) + λ N + K λ P(Y=c_k)={ {\sum_{i=1}^{n}I(y_i=c_k)+\lambda} \over {N+K\lambda} } P(Y=ck)=N+Kλi=1nI(yi=ck)+λ
条件概率估计:
P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 n I ( X i ( j ) = a j l , y i = c k ) + λ ∑ i = 1 n I ( y i = c k ) + S j λ P(X^{(j)}=a_{jl}|Y=c_k)={ {\sum_{i=1}^{n}I(X_i^{(j)}=a_{jl},y_i=c_k)}+\lambda \over { \sum_{i=1}^{n}I(y_i=c_k) } +S_j\lambda} P(X(j)=ajlY=ck)=i=1nI(yi=ck)+Sjλi=1nI(Xi(j)=ajl,yi=ck)+λ
其中 k = 1 , 2 , ⋯   , K , j = 1 , 2 , ⋯   , n , l = 1 , 2 , ⋯ S j k=1,2,\cdots,K,j=1,2,\cdots,n,l=1,2,\cdots S_j k=1,2,,K,j=1,2,,n,l=1,2,Sj
x ( j ) ∈ { a j 1 , a j 2 , ⋯   , a j S j } x^{(j)} \in \{a_{j1},a_{j2},\cdots,a_{jS_j}\} x(j){aj1,aj2,,ajSj}

  • 3
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值