朴素贝叶斯的学习和分类
模型
输入:
x
∈
R
p
x\in R^p
x∈Rp,p维特征向量
输出
y
∈
{
1
,
2
,
⋯
,
K
}
y\in\{1,2,\cdots,K\}
y∈{1,2,⋯,K},,类别标记
训练数据集
T
=
{
(
x
1
,
y
1
)
,
⋯
,
(
x
n
,
y
n
)
}
T=\{(x_1,y_1),\cdots,(x_n,y_n)\}
T={(x1,y1),⋯,(xn,yn)}
模型假设
(
x
,
y
)
(x,y)
(x,y)由
p
(
x
,
y
)
p(x,y)
p(x,y)产生
条件独立性假设:
P
(
X
=
x
∣
Y
=
c
k
)
=
∏
i
=
1
p
p
(
X
i
=
x
i
∣
y
=
c
k
)
P(X=x|Y=c_k)=\prod_{i=1}^p p(X^i=x^i|y=c_k)
P(X=x∣Y=ck)=i=1∏pp(Xi=xi∣y=ck)
目标:
学习
p
(
x
,
y
)
p(x,y)
p(x,y)
之后给定x,求出后验概率最大的y作为分类变量输出
学习:
先验概率:
p
(
y
)
p(y)
p(y)
条件概率:
p
(
X
=
x
∣
Y
=
c
k
)
=
P
(
X
1
=
x
1
,
X
2
=
x
2
,
⋯
,
X
p
=
x
p
∣
Y
=
c
k
)
p(X=x|Y=c_k)=P(X^1=x^1,X^2=x^2,\cdots,X^p=x^p|Y=c_k)
p(X=x∣Y=ck)=P(X1=x1,X2=x2,⋯,Xp=xp∣Y=ck)
后验概率最大化
P
(
Y
=
c
k
∣
X
=
x
)
=
P
(
X
=
x
∣
Y
=
c
k
)
P
(
Y
=
c
k
)
∑
k
P
(
X
=
x
∣
Y
=
c
k
)
P
(
Y
=
c
k
)
P\left(Y=c_{k} \mid X=x\right)=\frac{P\left(X=x \mid Y=c_{k}\right) P\left(Y=c_{k}\right)}{\sum_{k} P\left(X=x \mid Y=c_{k}\right) P\left(Y=c_{k}\right)}
P(Y=ck∣X=x)=∑kP(X=x∣Y=ck)P(Y=ck)P(X=x∣Y=ck)P(Y=ck)
分类依据:
y
=
a
r
g
m
a
x
c
k
p
(
Y
=
c
k
)
∏
i
=
1
p
p
(
X
i
=
x
i
∣
y
=
c
k
)
y=argmax_{c_k}p(Y=c_k)\prod_{i=1}^p p(X^i=x^i|y=c_k)
y=argmaxckp(Y=ck)i=1∏pp(Xi=xi∣y=ck)
后验概率最大化==期望风险最小化(选择0-1)损失函数
proof:
0-1损失函数
f
(
x
)
=
a
r
g
m
i
n
y
∑
k
=
1
K
I
(
y
≠
C
k
)
p
(
C
k
∣
X
=
x
)
=
a
r
g
m
i
n
y
1
−
I
(
y
=
C
k
)
p
(
C
k
∣
X
=
x
)
=
a
r
g
m
a
x
y
I
(
y
=
C
k
)
p
(
C
k
∣
X
=
x
)
f(x)=argmin_{y}\sum_{k=1}^KI(y\neq C_k)p(C_k|X=x)=argmin_{y} 1-I(y= C_k)p(C_k|X=x)=argmax_{y}I(y= C_k)p(C_k|X=x)
f(x)=argminy∑k=1KI(y=Ck)p(Ck∣X=x)=argminy1−I(y=Ck)p(Ck∣X=x)=argmaxyI(y=Ck)p(Ck∣X=x)
参数估计
极大似然估计
先验概率的极大似然估计:
p
(
Y
=
c
k
)
=
∑
i
=
1
N
I
(
y
i
=
c
k
)
N
p(Y=c_k)=\frac{\sum_{i=1}^NI(y_i=c_k)}{N}
p(Y=ck)=N∑i=1NI(yi=ck)
条件概率的极大似然估计
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
∑
i
I
(
y
i
=
c
k
)
P\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)=\frac{\sum_{i} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{k}\right)}{\sum_{i} I\left(y_{i}=c_{k}\right)}
P(X(j)=ajl∣Y=ck)=∑iI(yi=ck)∑iI(xi(j)=ajl,yi=ck)
学习和分类的算法
贝叶斯估计
极大似然估计有可能出现先验概率为0的情况,此时会影响条件概率的估计这是后采取贝叶斯估计
贝叶斯估计等价于随机变量在各个取值的频数上加上
λ
\lambda
λ,一般
λ
=
1
\lambda =1
λ=1
条件概率
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
+
λ
∑
i
I
(
y
i
=
c
k
)
+
s
j
λ
P\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)=\frac{\sum_{i} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{k}\right)+\lambda}{\sum_{i} I\left(y_{i}=c_{k}\right)+s_j\lambda}
P(X(j)=ajl∣Y=ck)=∑iI(yi=ck)+sjλ∑iI(xi(j)=ajl,yi=ck)+λ
s
j
s_j
sj为
x
j
x_j
xj可以取值的种类数目
先验概率
p
(
Y
=
c
k
)
=
∑
i
=
1
N
I
(
y
i
=
c
k
)
+
λ
N
+
k
λ
p(Y=c_k)=\frac{\sum_{i=1}^NI(y_i=c_k)+\lambda}{N+k\lambda}
p(Y=ck)=N+kλ∑i=1NI(yi=ck)+λ