极大似然估计
用极大似然估计法推出朴素贝叶斯法中的概率估计公式
1.在朴素贝叶斯法中,学习意味着估计 P ( Y = c k ) P(Y=c_k) P(Y=ck)和 P ( X ( j ) = x ( j ) ∣ Y = c k ) P(X^{(j)}=x^{(j)}|Y=c_k) P(X(j)=x(j)∣Y=ck)。可以应用极大似然估计法估计相应的概率。先验概率 P ( Y = c k ) P(Y=c_k) P(Y=ck)的极大似然估计是
P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N (1) P(Y=c_k)=\frac{\sum_{i=1}^{N}I(y_i=c_k)}{N} \tag{1} P(Y=ck)=N∑i=1NI(yi=ck)(1)
在朴素贝叶斯法中,学习意味着估计 P ( Y = c k ) P(Y=c_k) P(Y=ck)和 P ( X ( j ) = x ( j ) ∣ Y = c k ) P(X^{(j)}=x^{(j)}|Y=c_k) P(X(j)=x(j)∣Y=ck)。
答:
-
设 P = P ( Y = c k ) P=P(Y=c_k) P=P(Y=ck)
-
L = P ∑ i = 1 N I ( y i = c k ) ∗ ( 1 − P ) ∑ i = 1 N I ( y i ≠ c k ) L=P^{\sum_{i=1}^{N}I(y_i=c_k)}*(1-P)^{\sum_{i=1}^{N}I(y_i\neq c_k)} L=P∑i=1NI(yi=ck)∗(1−P)∑i=1NI(yi=ck)
-
两边同时取对数得 l n L = l n P ∑ i = 1 N I ( y I = c k ) + l n ( 1 − P ) ∑ i = 1 N I ( y i = c k ) ) lnL=lnP\sum_{i=1}^{N}I(y_I=c_k)+ln(1-P)\sum_{i=1}^{N}I(y_i=c_k)) lnL=lnP∑i=1NI(yI=ck)+ln(1−P)∑i=1NI(yi=ck))
-
对 P P P求偏导得
∂ L ∂ P = ∑ i = 1 N I ( y I = c k ) P − ∑ i = 1 N I ( y I ≠ c k ) 1 − P = 0 \frac{\partial L}{\partial P}=\frac{\sum_{i=1}^{N}I(y_I=c_k)}{P}-\frac{\sum_{i=1}^{N}I(y_I\neq c_k)}{1-P}=0 ∂P∂L=P∑i=1NI(yI=ck)−1−P∑i=1NI(yI=ck)=0
- 最终得 P = P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N P=P(Y=c_k)=\frac{\sum_{i=1}^{N}I(y_i=c_k)}{N} P=P(Y=ck)=N∑i=1NI(yi=ck)
- END
2.设第 j j j个特称 x ( j ) x^{(j)} x(j)可能取值的集合为 a j 1 , … , a j S j {a_{j1},\dots,a_{jS_j}} aj1,…,ajSj,条件概率 P ( X ( j ) = a j l ∣ Y = c k ) P(X^{(j)}=a_{jl}|Y=c_k) P(X(j)=ajl∣Y=ck)的极大似然估计是
P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) P(X_{(j)}=a_{jl}|Y=c_k)=\frac{\sum_{i=1}^{N}I(x_i^{(j)}=a_{jl},y_i=c_k)}{\sum_{i=1}^{N}I(y_i=c_k)} P(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)∑i=1NI(xi(j)=ajl,yi=ck)
j = 1 , … , n ; l = 1 , … , S j ; k = 1 , … , K j=1,\dots,n;l=1,\dots,S_j;k=1,\dots,K j=1,…,n;l=1,…,Sj;k=1,…,K式中 x i ( j ) x_i^{(j)} xi(j)表示第 i i i个样本的第 j j j个特征, a j l a_{jl} ajl是第 j j j个特征可能取得第 l l l个值, I I I为指数函数。
答:
-
P ( X ( j ) = a j l ∣ Y = c k ) = P ( X ( j ) = a j l , Y = c k ) P ( Y = c k ) P(X_{(j)}=a_{jl}|Y=c_k)=\frac{P(X_{(j)}=a_{jl},Y=c_k)}{P(Y=c_k)} P(X(j)=ajl∣Y=ck)=P(Y=ck)P(X(j)=ajl,Y=ck)
-
由上式(1)可得,分母 P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N P(Y=c_k)=\frac{\sum_{i=1}^{N}I(y_i=c_k)}{N} P(Y=ck)=N∑i=1NI(yi=ck)
-
同理分子 P ( X ( j ) = a j l , Y = c k ) = ∑ i = 1 N I ( X i ( j ) = a j l , Y = c k ) N P(X_{(j)}=a_{jl},Y=c_k)=\frac{\sum_{i=1}^{N}I(X_{i}^{(j)}=a_{jl},Y=c_k)}{N} P(X(j)=ajl,Y=ck)=N∑i=1NI(Xi(j)=ajl,Y=ck)
-
代入原式化简得 P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) P(X_{(j)}=a_{jl}|Y=c_k)=\frac{\sum_{i=1}^{N}I(x_i^{(j)}=a_{jl},y_i=c_k)}{\sum_{i=1}^{N}I(y_i=c_k)} P(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)∑i=1NI(xi(j)=ajl,yi=ck)
-
END
贝叶斯估计
用贝叶斯估计法推出朴素贝叶斯法中的概率估计公式
1.用极大似然估计可能会出现所要估计得概率值为0得情况。这时会影响到后验概率估计的结果,是分类产生偏差。解决这一问题的方式是采用贝叶斯估计。具体的,条件概率的贝叶斯估计是
P λ ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) + λ ∑ i = 1 N I ( y i = c k ) + S j λ (2) P_\lambda(X_{(j)}=a_{jl}|Y=c_k)=\frac{\sum_{i=1}^{N}I(x_i^{(j)}=a_{jl},y_i=c_k)+\lambda}{\sum_{i=1}^{N}I(y_i=c_k)+S_j\lambda} \tag{2} Pλ(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)+Sjλ∑i=1NI(xi(j)=ajl,yi=ck)+λ(2)
式中
λ
≥
0
\lambda \geq 0
λ≥0。等价于在随机变量各个取值的频数上赋予一个正数
λ
>
0
\lambda > 0
λ>0。当
λ
=
0
\lambda =0
λ=0时就是极大似然估计。常取
λ
=
0
\lambda=0
λ=0,这是称为拉普拉斯平滑(Laplacian smoothing)。显然对任何
l
=
1
,
…
,
S
j
,
k
=
1
,
…
,
K
l=1,\dots,S_j,k=1,\dots,K
l=1,…,Sj,k=1,…,K,有
P
λ
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
>
0
∑
l
=
1
S
j
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
1
P_\lambda(X_{(j)}=a_{jl}|Y=c_k) > 0 \\ \sum_{l=1}^{S_j}P(X^{(j)}=a_{jl}|Y=c_k) = 1
Pλ(X(j)=ajl∣Y=ck)>0l=1∑SjP(X(j)=ajl∣Y=ck)=1
答:
- P λ ( X ( j ) = a j l ∣ Y = c k ) = P ( X ( j ) = a j l , Y = c k ) P ( Y = c k ) P_\lambda(X_{(j)}=a_{jl}|Y=c_k)=\frac{P(X_{(j)}=a_{jl},Y=c_k)}{P(Y=c_k)} Pλ(X(j)=ajl∣Y=ck)=P(Y=ck)P(X(j)=ajl,Y=ck)
- 根据(3) P λ ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) + λ N + K λ P_\lambda(Y=c_k)=\frac{\sum_{i=1}^{N}I(y_i=c_k)+\lambda}{N+K\lambda} Pλ(Y=ck)=N+Kλ∑i=1NI(yi=ck)+λ,同时令 λ = λ S j \lambda=\lambda S_j λ=λSj
- 同理 P ( X ( j ) = a j l , Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) + λ N + k λ S j P(X_{(j)}=a_{jl},Y=c_k)=\frac{\sum_{i=1}^{N}I(x_i^{(j)}=a_{jl},y_i=c_k)+\lambda}{N+k\lambda S_j} P(X(j)=ajl,Y=ck)=N+kλSj∑i=1NI(xi(j)=ajl,yi=ck)+λ
- 则 P λ ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) + λ N + k λ S j / ∑ i = 1 N I ( y i = c k ) + λ N + K λ P_\lambda(X_{(j)}=a_{jl}|Y=c_k)=\frac{\sum_{i=1}^{N}I(x_i^{(j)}=a_{jl},y_i=c_k)+\lambda}{N+k\lambda S_j} \bigg / \frac{\sum_{i=1}^{N}I(y_i=c_k)+\lambda}{N+K\lambda} Pλ(X(j)=ajl∣Y=ck)=N+kλSj∑i=1NI(xi(j)=ajl,yi=ck)+λ/N+Kλ∑i=1NI(yi=ck)+λ
- 整理得 P λ ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) + λ ∑ i = 1 N I ( y i = c k ) + S j λ P_\lambda(X_{(j)}=a_{jl}|Y=c_k)=\frac{\sum_{i=1}^{N}I(x_i^{(j)}=a_{jl},y_i=c_k)+\lambda}{\sum_{i=1}^{N}I(y_i=c_k)+S_j\lambda} Pλ(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)+Sjλ∑i=1NI(xi(j)=ajl,yi=ck)+λ
- END
2.表达式(2)确为一种概率分布。同样,先验概率的贝叶斯估计是
P λ ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) + λ N + K λ (3) P_\lambda(Y=c_k)=\frac{\sum_{i=1}^{N}I(y_i=c_k)+\lambda}{N+K\lambda} \tag{3} Pλ(Y=ck)=N+Kλ∑i=1NI(yi=ck)+λ(3)
答:
- 考虑加入先验概率,在没有任何信息得情况下可假设先验概率均匀分布,由 P = P ( Y = c k ) = 1 K P=P(Y=c_k)=\frac{1}{K} P=P(Y=ck)=K1得 P K − 1 = 0 PK-1=0 PK−1=0
- 由公式(1)得 P N − ∑ i = 1 N I ( Y = c k ) = 0 PN-\sum_{i=1}^{N}I(Y=c_k)=0 PN−∑i=1NI(Y=ck)=0
- 构造 λ ( P K − 1 ) + P N − ∑ i = 1 N I ( Y = c k ) = 0 \lambda(PK-1)+PN-\sum_{i=1}^{N}I(Y=c_k)=0 λ(PK−1)+PN−∑i=1NI(Y=ck)=0
- 上式化简得 P = P λ ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) + λ N + K λ P=P_\lambda(Y=c_k)=\frac{\sum_{i=1}^{N}I(y_i=c_k)+\lambda}{N+K\lambda} P=Pλ(Y=ck)=N+Kλ∑i=1NI(yi=ck)+λ
- END