参考资料: 李航《统计学习方法》
朴素贝叶斯法是基于贝叶斯定理与特征条件独立假设的分类方法
对于给定的训练数据集,首先基于特征条件独立假设学习输入/输出的联合概率分布
p
(
x
,
y
)
p(x,y)
p(x,y);然后基于此模型,对给定的输入
x
x
x,利用贝叶斯定理求出后验概率
p
(
y
∣
x
)
p(y|x)
p(y∣x)最大的输出
y
y
y
利用训练数据学习
p
(
x
∣
y
)
p(x|y)
p(x∣y)和
p
(
y
)
p(y)
p(y)的估计,得到联合概率分布:
p
(
x
,
y
)
=
p
(
y
)
p
(
x
∣
y
)
p(x,y)=p(y)p(x|y)
p(x,y)=p(y)p(x∣y)
概率估计可以使极大似然估计或贝叶斯估计
基本假设
朴素贝叶斯法的基本假设是条件独立性,
P
(
X
=
x
∣
Y
=
c
k
)
=
P
(
X
(
1
)
=
x
(
1
)
,
X
(
2
)
=
x
(
2
)
,
.
.
.
,
X
(
n
)
=
x
(
n
)
∣
Y
=
c
k
)
=
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
\begin{aligned} P(X=x|Y=c_{k})&=P(X^{(1)}=x^{(1)},X^{(2)}=x^{(2)},...,X^{(n)}=x^{(n)}|Y=c_{k})\\ &=\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_{k}) \end{aligned}
P(X=x∣Y=ck)=P(X(1)=x(1),X(2)=x(2),...,X(n)=x(n)∣Y=ck)=j=1∏nP(X(j)=x(j)∣Y=ck)
这是一个较强的假设,由于这一假设,模型包含的条件概率的数量大为减少,朴素贝叶斯法的学习与预测大为简化,高效易于实现,然而分类的性能不一定很高
P
(
Y
∣
X
)
=
P
(
X
,
Y
)
P
(
X
)
=
P
(
Y
)
P
(
X
∣
Y
)
∑
Y
P
(
Y
)
P
(
X
∣
Y
)
P(Y|X)=\frac {P(X,Y)}{P(X)}=\frac {P(Y)P(X|Y)}{\sum \limits_{Y}P(Y)P(X|Y)}
P(Y∣X)=P(X)P(X,Y)=Y∑P(Y)P(X∣Y)P(Y)P(X∣Y)
将输入
x
x
x分到后验概率最大的类
y
y
y
y
=
a
r
g
max
c
k
P
(
Y
=
c
k
)
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
y=arg\max \limits{_{c_{k}}P(Y=c_{k})}\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_{k})
y=argmaxckP(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck)
后验概率最大等价于0-1损失函数时的期望风险最小化
朴素贝叶斯法实际上学习到的生成数据的机制,所以属于生成模型
条件独立假设等于说用于分类的特征在类确定的条件下都是独立的,这一假设使朴素贝叶斯法变得简单,但有时会牺牲一定的分类准确率。
极大似然估计
先验概率
P
(
Y
=
c
k
)
P(Y=c_{k})
P(Y=ck)的极大似然估计
P
(
Y
=
c
k
)
=
∑
i
=
1
N
I
(
y
i
=
c
k
)
N
,
k
=
1
,
2
,
.
.
.
,
K
P(Y=c_{k})=\frac{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})}{N},k=1,2,...,K
P(Y=ck)=Ni=1∑NI(yi=ck),k=1,2,...,K
设第
j
j
j个特征
x
(
j
)
x^{(j)}
x(j)可能取值的集合为
{
a
j
1
,
a
j
2
,
.
.
.
,
a
j
S
j
}
\{a_{j1},a_{j2},...,a_{jS_j}\}
{aj1,aj2,...,ajSj},
条件概率
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
P(X^{(j)}=a_{jl}|Y=c_{k})
P(X(j)=ajl∣Y=ck)的极大似然估计
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
∑
i
=
1
N
I
(
y
i
=
c
k
)
P(X^{(j)}=a_{jl}|Y=c_{k})=\frac {\sum \limits_{i=1}^{N}I(x_{i}^{(j)}=a_{jl},y_{i}=c_{k})}{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})}
P(X(j)=ajl∣Y=ck)=i=1∑NI(yi=ck)i=1∑NI(xi(j)=ajl,yi=ck)
j
=
1
,
2
,
.
.
.
,
n
;
l
=
1
,
2
,
.
.
.
,
S
j
;
k
=
1
,
2
,
.
.
.
,
K
j=1,2,...,n;l=1,2,...,S_{j};k=1,2,...,K
j=1,2,...,n;l=1,2,...,Sj;k=1,2,...,K
x
i
(
j
)
x_{i}^{(j)}
xi(j)是第
i
i
i个样本的第
j
j
j个特征;
a
j
l
a_{jl}
ajl是第
j
j
j个特征可能取的第
l
l
l个值;
I
I
I为指示函数
贝叶斯估计
朴素贝叶斯法与贝叶斯估计是不同的概念
用极大似然估计可能会出现所要估计的概率值为0的情况,采用贝叶斯估计来解决这一问题
条件概率的贝叶斯估计是
P
λ
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
+
λ
∑
i
=
1
N
I
(
y
i
=
c
k
)
+
S
i
λ
P_{\lambda}(X^{(j)}=a_{jl}|Y=c_{k})=\frac {\sum \limits_{i=1}^{N}I(x_{i}^{(j)}=a_{jl},y_{i}=c_{k})+\lambda}{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})+S_{i}\lambda}
Pλ(X(j)=ajl∣Y=ck)=i=1∑NI(yi=ck)+Siλi=1∑NI(xi(j)=ajl,yi=ck)+λ
式中
λ
>
0
\lambda>0
λ>0,常取
λ
=
1
\lambda=1
λ=1,这时称为拉普拉斯平滑,显然有
P
λ
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
>
0
P_{\lambda}(X^{(j)}=a_{jl}|Y=c_{k})>0
Pλ(X(j)=ajl∣Y=ck)>0
∑
l
=
1
S
j
P
λ
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
1
\sum \limits_{l=1}^{S_{j}}P_{\lambda}(X^{(j)}=a_{jl}|Y=c_{k})=1
l=1∑SjPλ(X(j)=ajl∣Y=ck)=1
l
=
1
,
2
,
.
.
.
,
S
j
,
k
=
1
,
2
,
.
.
.
,
K
l=1,2,...,S_{j},k=1,2,...,K
l=1,2,...,Sj,k=1,2,...,K
表明贝叶斯估计是一种概率分布。同理,先验概率的贝叶斯估计是
P
λ
(
Y
=
c
k
)
=
∑
i
=
1
N
I
(
y
i
=
c
k
)
+
λ
N
+
K
λ
,
k
=
1
,
2
,
.
.
.
,
K
P_{\lambda}(Y=c_{k})=\frac{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})+\lambda}{N+K\lambda},k=1,2,...,K
Pλ(Y=ck)=N+Kλi=1∑NI(yi=ck)+λ,k=1,2,...,K
朴素贝叶斯算法
输入:训练数据 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\} T={(x1,y1),(x2,y2),...,(xN,yN)},其中 x i = ( x i ( 1 ) , x i ( 2 ) , . . . , x i ( N ) ) x_{i}=(x_{i}^{(1)},x_{i}^{(2)},...,x_{i}^{(N)}) xi=(xi(1),xi(2),...,xi(N)), x i ( j ) x_{i}^{(j)} xi(j)是第 i i i个样本的第 j j j个特征, x i ( j ) ∈ { a j 1 , a j 2 , . . . , a j S j } x_{i}^{(j)}\in \{a_{j1},a_{j2},...,a_{jS_{j}}\} xi(j)∈{aj1,aj2,...,ajSj}, a j l a_{jl} ajl是第 j j j个特征可能取的第 l l l个值, j = 1 , 2 , . . . , n , l = 1 , 2 , . . . , S j , y i ∈ { c 1 , c 2 , . . . , c K } j=1,2,...,n,l=1,2,...,S_{j},y_{i}\in\{c_{1},c_2,...,c_K\} j=1,2,...,n,l=1,2,...,Sj,yi∈{c1,c2,...,cK};实例 x x x;
输出:实例 x x x的分类
(1)计算先验概率及条件概率
P
(
Y
=
c
k
)
=
∑
i
=
1
N
I
(
y
i
=
c
k
)
N
,
k
=
1
,
2
,
.
.
.
,
K
P(Y=c_{k})=\frac{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})}{N},k=1,2,...,K
P(Y=ck)=Ni=1∑NI(yi=ck),k=1,2,...,K
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
∑
i
=
1
N
I
(
y
i
=
c
k
)
P(X^{(j)}=a_{jl}|Y=c_{k})=\frac {\sum \limits_{i=1}^{N}I(x_{i}^{(j)}=a_{jl},y_{i}=c_{k})}{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})}
P(X(j)=ajl∣Y=ck)=i=1∑NI(yi=ck)i=1∑NI(xi(j)=ajl,yi=ck)
j
=
1
,
2
,
.
.
.
,
n
;
l
=
1
,
2
,
.
.
.
,
S
j
;
k
=
1
,
2
,
.
.
.
,
K
j=1,2,...,n;l=1,2,...,S_{j};k=1,2,...,K
j=1,2,...,n;l=1,2,...,Sj;k=1,2,...,K
(2)对于给定的实例
x
=
(
x
(
1
)
,
x
(
2
)
,
.
.
.
,
x
(
N
)
)
x=(x^{(1)},x^{(2)},...,x^{(N)})
x=(x(1),x(2),...,x(N)),计算
P
(
Y
=
c
k
)
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
,
k
=
1
,
2
,
.
.
.
,
K
P(Y=c_{k})\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_{k}),k=1,2,...,K
P(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck),k=1,2,...,K
(3)确定实例
x
x
x的类
y
=
a
r
g
max
c
k
P
(
Y
=
c
k
)
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
y=arg\max \limits{_{c_{k}}P(Y=c_{k})}\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_{k})
y=argmaxckP(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck)
习题4.1
习题:用极大似然估计法推出朴素贝叶斯中的概率估计公式
P
(
Y
=
c
k
)
=
∑
i
=
1
N
I
(
y
i
=
c
k
)
N
,
k
=
1
,
2
,
.
.
.
,
K
P(Y=c_{k})=\frac{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})}{N},k=1,2,...,K
P(Y=ck)=Ni=1∑NI(yi=ck),k=1,2,...,K
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
∑
i
=
1
N
I
(
y
i
=
c
k
)
P(X^{(j)}=a_{jl}|Y=c_{k})=\frac {\sum \limits_{i=1}^{N}I(x_{i}^{(j)}=a_{jl},y_{i}=c_{k})}{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})}
P(X(j)=ajl∣Y=ck)=i=1∑NI(yi=ck)i=1∑NI(xi(j)=ajl,yi=ck)
解答:把 P ( Y = c k ) , P ( X ( j ) = a j l ∣ Y = c k ) P(Y=c_{k}),P(X^{(j)}=a_{jl}|Y=c_{k}) P(Y=ck),P(X(j)=ajl∣Y=ck)当做参数, ∑ k = 1 K P ( y = c k ) = 1 \sum \limits_{k=1}^{K}P(y=c_k)=1 k=1∑KP(y=ck)=1作为约束条件来求解参数值
由假设可知:
P
(
y
)
=
∏
k
=
1
K
P
(
y
=
c
k
)
I
(
y
=
c
k
)
P(y)=\prod \limits_{k=1}^{K}P(y=c_{k})^{I(y=c_{k})}
P(y)=k=1∏KP(y=ck)I(y=ck) ,
P
(
x
∣
y
=
c
k
)
=
∏
j
=
1
n
P
(
x
(
j
)
∣
y
=
c
k
)
=
∏
j
=
1
n
∏
l
=
1
S
j
P
(
x
(
j
)
=
a
j
l
∣
y
=
c
k
)
I
(
x
(
j
)
=
a
j
l
,
y
=
c
k
)
P(x|y=c_k)=\prod \limits_{j=1}^{n}P(x^{(j)}|y=c_{k})=\prod \limits_{j=1}^{n} \prod \limits_{l=1}^{S_j}P(x^{(j)}=a_{jl}|y=c_{k})^{I(x^{(j)}=a_{jl},y=c_k)}
P(x∣y=ck)=j=1∏nP(x(j)∣y=ck)=j=1∏nl=1∏SjP(x(j)=ajl∣y=ck)I(x(j)=ajl,y=ck)
令
φ
=
{
P
(
Y
=
c
k
)
,
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
}
\varphi = \{P(Y=c_{k}),P(X^{(j)}=a_{jl}|Y=c_{k})\}
φ={P(Y=ck),P(X(j)=ajl∣Y=ck)},对数似然函数为:
L
(
φ
)
=
l
o
g
∏
i
=
1
N
P
(
x
i
,
y
i
;
φ
)
=
l
o
g
∏
i
=
1
N
P
(
x
i
∣
y
i
;
φ
)
P
(
y
i
;
φ
)
=
l
o
g
∏
i
=
1
N
∏
j
=
1
n
P
(
x
i
(
j
)
∣
y
i
;
φ
)
P
(
y
i
;
φ
)
=
∑
i
=
1
N
(
P
(
y
i
;
φ
)
+
∑
j
=
1
n
P
(
x
i
(
j
)
∣
y
i
;
φ
)
)
=
∑
i
=
1
N
[
∑
k
=
1
K
l
o
g
P
(
y
=
c
k
)
I
(
y
i
=
c
k
)
+
∑
j
=
1
n
∑
l
=
1
S
j
∑
k
=
1
K
l
o
g
P
(
x
i
(
j
)
=
a
j
l
∣
y
i
=
c
k
)
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
]
=
∑
i
=
1
N
[
∑
k
=
1
K
I
(
y
i
=
c
k
)
l
o
g
P
(
y
=
c
k
)
+
∑
j
=
1
n
∑
l
=
1
S
j
∑
k
=
1
K
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
l
o
g
P
(
x
i
(
j
)
=
a
j
l
∣
y
i
=
c
k
)
]
\begin{aligned} L(\varphi)&=log\prod \limits_{i=1}^{N}P(x_i,y_i;\varphi)=log\prod \limits_{i=1}^{N}P(x_i|y_i;\varphi)P(y_{i};\varphi)\\ &=log\prod \limits_{i=1}^{N} \prod \limits_{j=1}^{n}P(x_i^{(j)}|y_i;\varphi)P(y_{i};\varphi)\\ &=\sum \limits_{i=1}^{N} (P(y_{i};\varphi) + \sum \limits_{j=1}^{n}P(x_i^{(j)}|y_i;\varphi))\\ &=\sum \limits_{i=1}^{N} [\sum \limits_{k=1}^{K}logP(y=c_k)^{I(y_i=c_k)} + \sum \limits_{j=1}^{n} \sum \limits_{l=1}^{S_j}\sum \limits_{k=1}^{K}log P(x_i^{(j)}=a_{jl}|y_i=c_k)^{I(x_i^{(j)}=a_{jl},y_i=c_k)}]\\ &=\sum \limits_{i=1}^{N} [\sum \limits_{k=1}^{K}{I(y_i=c_k)}logP(y=c_k) + \sum \limits_{j=1}^{n} \sum \limits_{l=1}^{S_j}\sum \limits_{k=1}^{K}{I(x_i^{(j)}=a_{jl},y_i=c_k)}logP(x_i^{(j)}=a_{jl}|y_i=c_k)] \end{aligned}
L(φ)=logi=1∏NP(xi,yi;φ)=logi=1∏NP(xi∣yi;φ)P(yi;φ)=logi=1∏Nj=1∏nP(xi(j)∣yi;φ)P(yi;φ)=i=1∑N(P(yi;φ)+j=1∑nP(xi(j)∣yi;φ))=i=1∑N[k=1∑KlogP(y=ck)I(yi=ck)+j=1∑nl=1∑Sjk=1∑KlogP(xi(j)=ajl∣yi=ck)I(xi(j)=ajl,yi=ck)]=i=1∑N[k=1∑KI(yi=ck)logP(y=ck)+j=1∑nl=1∑Sjk=1∑KI(xi(j)=ajl,yi=ck)logP(xi(j)=ajl∣yi=ck)]
关于第一个参数
P
(
Y
=
c
k
)
P(Y=c_{k})
P(Y=ck)求导:
∂
L
(
φ
)
∂
P
(
y
=
c
k
)
=
∂
∂
P
(
y
=
c
k
)
∑
i
=
1
N
∑
k
=
1
K
I
(
y
i
=
c
k
)
l
o
g
P
(
y
=
c
k
)
\frac {\partial {L(\varphi)}}{\partial P(y=c_k)}=\frac {\partial}{\partial P(y=c_k)}\sum \limits_{i=1}^{N}\sum \limits_{k=1}^{K}{I(y_i=c_k)}logP(y=c_k)
∂P(y=ck)∂L(φ)=∂P(y=ck)∂i=1∑Nk=1∑KI(yi=ck)logP(y=ck)
由约束条件可知:
P
(
y
=
c
K
)
=
1
−
∑
k
=
1
K
−
1
P
(
y
=
c
k
)
P(y=c_K)=1-\sum \limits_{k=1}^{K-1}P(y=c_k)
P(y=cK)=1−k=1∑K−1P(y=ck)
⇒
∂
L
(
φ
)
∂
P
(
y
=
c
k
)
=
∂
∂
P
(
y
=
c
k
)
∑
i
=
1
N
[
∑
k
=
1
K
−
1
I
(
y
i
=
c
k
)
l
o
g
P
(
y
=
c
k
)
+
I
(
y
i
=
c
K
)
l
o
g
P
(
y
=
c
K
)
]
=
∂
∂
P
(
y
=
c
k
)
∑
i
=
1
N
[
∑
k
=
1
K
−
1
I
(
y
i
=
c
k
)
l
o
g
P
(
y
=
c
k
)
+
I
(
y
i
=
c
K
)
l
o
g
(
1
−
∑
k
=
1
K
−
1
P
(
y
=
c
k
)
)
]
\Rightarrow\frac {\partial {L(\varphi)}}{\partial P(y=c_k)}=\frac {\partial}{\partial P(y=c_k)}\sum \limits_{i=1}^{N}[\sum \limits_{k=1}^{K-1}{I(y_i=c_k)}logP(y=c_k)+I(y_i=c_K)logP(y=c_K)]\\ =\frac {\partial}{\partial P(y=c_k)}\sum \limits_{i=1}^{N}[\sum \limits_{k=1}^{K-1}{I(y_i=c_k)}logP(y=c_k)+I(y_i=c_K)log(1-\sum \limits_{k=1}^{K-1}P(y=c_k))]
⇒∂P(y=ck)∂L(φ)=∂P(y=ck)∂i=1∑N[k=1∑K−1I(yi=ck)logP(y=ck)+I(yi=cK)logP(y=cK)]=∂P(y=ck)∂i=1∑N[k=1∑K−1I(yi=ck)logP(y=ck)+I(yi=cK)log(1−k=1∑K−1P(y=ck))]
先来求
P
(
y
=
c
1
)
P(y=c_1)
P(y=c1)的估计值:
0
=
∂
∂
P
(
y
=
c
1
)
∑
i
=
1
N
[
∑
k
=
1
K
−
1
I
(
y
i
=
c
k
)
l
o
g
P
(
y
=
c
k
)
+
I
(
y
i
=
c
K
)
l
o
g
(
1
−
∑
k
=
1
K
−
1
P
(
y
=
c
k
)
)
]
=
∑
i
=
1
N
[
I
(
y
i
=
c
1
)
P
(
y
=
c
1
)
−
I
(
y
i
=
c
K
)
1
−
∑
a
=
1
K
−
1
P
(
y
=
c
a
)
]
=
∑
i
=
1
N
[
I
(
y
i
=
c
1
)
P
(
y
=
c
1
)
−
I
(
y
i
=
c
K
)
P
(
y
=
c
K
)
]
\begin{aligned} 0&=\frac {\partial}{\partial P(y=c_1)}\sum \limits_{i=1}^{N}[\sum \limits_{k=1}^{K-1}{I(y_i=c_k)}logP(y=c_k)+I(y_i=c_K)log(1-\sum \limits_{k=1}^{K-1}P(y=c_k))]\\ &=\sum \limits_{i=1}^{N}[\frac{I(y_i=c_1)}{P(y=c_1)}-\frac{I(y_i=c_K)}{1-\sum\limits_{a=1}^{K-1}P(y=c_a)}]\\ &=\sum \limits_{i=1}^{N}[\frac{I(y_i=c_1)}{P(y=c_1)}-\frac{I(y_i=c_K)}{P(y=c_K)}] \end{aligned}
0=∂P(y=c1)∂i=1∑N[k=1∑K−1I(yi=ck)logP(y=ck)+I(yi=cK)log(1−k=1∑K−1P(y=ck))]=i=1∑N[P(y=c1)I(yi=c1)−1−a=1∑K−1P(y=ca)I(yi=cK)]=i=1∑N[P(y=c1)I(yi=c1)−P(y=cK)I(yi=cK)]
P
(
y
=
c
K
)
P(y=c_K)
P(y=cK)在此为由
P
(
y
=
c
1
)
,
P
(
y
=
c
2
)
,
.
.
.
,
P
(
y
=
c
K
−
1
)
P(y=c_1),P(y=c_2),...,P(y=c_{K-1})
P(y=c1),P(y=c2),...,P(y=cK−1)决定的一个值
∑
i
=
1
N
[
I
(
y
i
=
c
1
)
P
(
y
=
c
1
)
−
I
(
y
i
=
c
K
)
P
(
y
=
c
K
)
]
=
0
\begin{aligned} \sum \limits_{i=1}^{N}[\frac{I(y_i=c_1)}{P(y=c_1)}-\frac{I(y_i=c_K)}{P(y=c_K)}]=0 \\ \end{aligned}
i=1∑N[P(y=c1)I(yi=c1)−P(y=cK)I(yi=cK)]=0
⇒
P
(
y
=
c
K
)
∑
i
=
1
N
I
(
y
i
=
c
1
)
−
P
(
y
=
c
1
)
∑
i
=
1
N
I
(
y
i
=
c
K
)
=
0
\begin{aligned} \Rightarrow P(y=c_K)\sum \limits_{i=1}^{N}I(y_i=c_1)-P(y=c_1)\sum \limits_{i=1}^{N}I(y_i=c_K)=0\\ \end{aligned}
⇒P(y=cK)i=1∑NI(yi=c1)−P(y=c1)i=1∑NI(yi=cK)=0
P
(
y
=
c
1
)
=
∑
i
=
1
N
I
(
y
i
=
c
1
)
∑
i
=
1
N
I
(
y
i
=
c
K
)
P
(
y
=
c
K
)
P
(
y
=
c
2
)
=
∑
i
=
1
N
I
(
y
i
=
c
2
)
∑
i
=
1
N
I
(
y
i
=
c
K
)
P
(
y
=
c
K
)
.
.
.
.
.
.
P
(
y
=
c
K
)
=
∑
i
=
1
N
I
(
y
i
=
c
K
)
∑
i
=
1
N
I
(
y
i
=
c
K
)
P
(
y
=
c
K
)
\begin{aligned} P(y=c_1) &= \frac {\sum \limits_{i=1}^{N}I(y_i=c_1)}{\sum \limits_{i=1}^{N}I(y_i=c_K)} P(y=c_K)\\ P(y=c_2) &= \frac {\sum \limits_{i=1}^{N}I(y_i=c_2)}{\sum \limits_{i=1}^{N}I(y_i=c_K)} P(y=c_K)\\ &...... \\ P(y=c_K) &= \frac {\sum \limits_{i=1}^{N}I(y_i=c_K)}{\sum \limits_{i=1}^{N}I(y_i=c_K)} P(y=c_K) \end{aligned}
P(y=c1)P(y=c2)P(y=cK)=i=1∑NI(yi=cK)i=1∑NI(yi=c1)P(y=cK)=i=1∑NI(yi=cK)i=1∑NI(yi=c2)P(y=cK)......=i=1∑NI(yi=cK)i=1∑NI(yi=cK)P(y=cK)
累加上式
P
(
y
=
c
1
)
,
P
(
y
=
c
2
)
,
.
.
.
,
P
(
y
=
c
K
)
P(y=c_1),P(y=c_2),...,P(y=c_K)
P(y=c1),P(y=c2),...,P(y=cK)得到:
P
(
y
=
c
1
)
+
P
(
y
=
c
2
)
+
.
.
.
+
P
(
y
=
c
K
)
=
N
∑
i
=
1
N
I
(
y
i
=
c
K
)
P
(
y
=
c
K
)
P(y=c_1)+P(y=c_2)+...+P(y=c_K)=\frac{N}{\sum \limits_{i=1}^{N}I(y_i=c_K)} P(y=c_K)
P(y=c1)+P(y=c2)+...+P(y=cK)=i=1∑NI(yi=cK)NP(y=cK)
⇒
1
=
N
∑
i
=
1
N
I
(
y
i
=
c
K
)
P
(
y
=
c
K
)
\Rightarrow 1=\frac{N}{\sum \limits_{i=1}^{N}I(y_i=c_K)} P(y=c_K)
⇒1=i=1∑NI(yi=cK)NP(y=cK)
⇒
P
(
y
=
c
K
)
=
∑
i
=
1
N
I
(
y
i
=
c
K
)
N
\Rightarrow P(y=c_K)=\frac{\sum \limits_{i=1}^{N}I(y_i=c_K)} {N}
⇒P(y=cK)=Ni=1∑NI(yi=cK)
同理可得:
P
(
y
=
c
k
)
=
∑
i
=
1
N
I
(
y
i
=
c
k
)
N
,
k
=
1
,
2
,
.
.
,
K
P(y=c_k)=\frac{\sum \limits_{i=1}^{N}I(y_i=c_k)} {N},k=1,2,..,K
P(y=ck)=Ni=1∑NI(yi=ck),k=1,2,..,K
同理对
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
P(X^{(j)}=a_{jl}|Y=c_{k})
P(X(j)=ajl∣Y=ck)求导,可得
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
∑
i
=
1
N
I
(
y
i
=
c
k
)
P(X^{(j)}=a_{jl}|Y=c_{k})=\frac {\sum \limits_{i=1}^{N}I(x_{i}^{(j)}=a_{jl},y_{i}=c_{k})}{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})}
P(X(j)=ajl∣Y=ck)=i=1∑NI(yi=ck)i=1∑NI(xi(j)=ajl,yi=ck)