朴素贝叶斯法
设输入空间 X ⊆ R n \mathcal{X} \subseteq \mathbf{R}^{n} X⊆Rn 为 n n n 维向量的集合, 输出空间为类标记集合 Y = \mathcal{Y}= Y= { c 1 , c 2 , ⋯ , c K } \left\{c_{1}, c_{2}, \cdots, c_{K}\right\} {c1,c2,⋯,cK} 。输入为特征向量 x ∈ X x \in \mathcal{X} x∈X, 输出为类标记(class label) y ∈ Y ∘ X y \in \mathcal{Y}_{\circ} X y∈Y∘X是定义在输入空间 X \mathcal{X} X 上的随机向量, Y Y Y 是定义在输出空间 Y \mathcal{Y} Y 上的随机变量。 P ( X , Y ) P(X, Y) P(X,Y) 是 X X X 和 Y Y Y 的联合概率分布。
c i c_{i} ci : Y Y Y的集合中的元素
假设 x ( j ) x^{(j)} x(j) 可取值有 S j S_{j} Sj 个
第 j j j 个特征 x ( j ) x^{(j)} x(j) 可能取值的集合为 { a j 1 , a j 2 , ⋯ , a j S j } \left\{a_{j 1}, a_{j 2}, \cdots, a_{j S_{j}}\right\} {aj1,aj2,⋯,ajSj}
a j l a_{j l} ajl 是第 j j j 个特征可能取的第 l l l 个值
极大似然估计
-
先验概率
- P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N , k = 1 , 2 , ⋯ , K P\left(Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)}{N}, \quad k=1,2, \cdots, K P(Y=ck)=N∑i=1NI(yi=ck),k=1,2,⋯,K
-
条件概率
-
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
∑
i
=
1
N
I
(
y
i
=
c
k
)
P\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{k}\right)}{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)}
P(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)∑i=1NI(xi(j)=ajl,yi=ck)
j = 1 , 2 , ⋯ , n ; l = 1 , 2 , ⋯ , S j ; k = 1 , 2 , ⋯ , K j=1,2, \cdots, n ; \quad l=1,2, \cdots, S_{j} ; \quad k=1,2, \cdots, K j=1,2,⋯,n;l=1,2,⋯,Sj;k=1,2,⋯,K
-
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
∑
i
=
1
N
I
(
y
i
=
c
k
)
P\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{k}\right)}{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)}
P(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)∑i=1NI(xi(j)=ajl,yi=ck)
-
判断类
- y = arg max c k P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) y=\arg \max _{c_{k}} P\left(Y=c_{k}\right) \prod_{j=1}^{n} P\left(X^{(j)}=x^{(j)} \mid Y=c_{k}\right) y=argmaxckP(Y=ck)∏j=1nP(X(j)=x(j)∣Y=ck)
贝叶斯估计
- 先验概率
- P λ ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) + λ N + K λ P_{\lambda}\left(Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)+\lambda}{N+K \lambda} Pλ(Y=ck)=N+Kλ∑i=1NI(yi=ck)+λ
- 条件概率
- P λ ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) + λ ∑ i = 1 N I ( y i = c k ) + S j λ P_{\lambda}\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{k}\right)+\lambda}{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)+S_{j} \lambda} Pλ(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)+Sjλ∑i=1NI(xi(j)=ajl,yi=ck)+λ
- 判断类
- y = arg max c k P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) y=\arg \max _{c_{k}} P\left(Y=c_{k}\right) \prod_{j=1}^{n} P\left(X^{(j)}=x^{(j)} \mid Y=c_{k}\right) y=argmaxckP(Y=ck)∏j=1nP(X(j)=x(j)∣Y=ck)
贝叶斯估计补充:
- 验证其为概率分布
-
P
λ
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
>
0
P_{\lambda}\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)>0
Pλ(X(j)=ajl∣Y=ck)>0
∑ l = 1 S j P ( X ( j ) = a j l ∣ Y = c k ) = 1 \sum_{l=1}^{S_{j}} P\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)=1 ∑l=1SjP(X(j)=ajl∣Y=ck)=1
-
P
λ
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
>
0
P_{\lambda}\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)>0
Pλ(X(j)=ajl∣Y=ck)>0
- λ > 0 {\lambda>0} λ>0
- S j S_{j} Sj x ( j ) x^{(j)} x(j) 可取值的数量