1.贝叶斯公式
P ( Y = c k ∣ X = x ) = P ( X = x ∣ Y = c k ) P ( Y = c k ) ∑ k P ( X = x ∣ Y = c k ) P ( Y = c k ) P(Y=c_k|X=x)=\frac{P(X=x|Y=c_k)P(Y=c_k)}{\sum _k {P(X=x|Y=c_k)P(Y=c_k)}} P(Y=ck∣X=x)=∑kP(X=x∣Y=ck)P(Y=ck)P(X=x∣Y=ck)P(Y=ck)
条件独立性假设:
每个特征之间相互独立
由此可对
P
(
X
=
x
∣
Y
=
c
k
)
P(X=x|Y=c_k)
P(X=x∣Y=ck)变形
P
(
X
=
x
∣
Y
=
c
k
)
=
P
(
X
(
1
)
=
x
(
1
)
,
X
(
2
)
=
x
(
2
)
.
.
.
X
(
n
)
=
x
(
n
)
∣
Y
=
c
k
)
=
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
\begin{aligned} P(X=x|Y=c_k) =P(X^{(1)} &=x^{(1)},X^{(2)}=x^{(2)}...X^{(n)}=x^{(n)}|Y=c_k) \\ &= \prod_{j=1}^n P(X^{(j)}=x^{(j)}|Y=c_k) \end{aligned}
P(X=x∣Y=ck)=P(X(1)=x(1),X(2)=x(2)...X(n)=x(n)∣Y=ck)=j=1∏nP(X(j)=x(j)∣Y=ck)
贝叶斯公式转换成:
P
(
Y
=
c
k
∣
X
=
x
)
=
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
P
(
Y
=
c
k
)
∑
k
P
(
X
=
x
∣
Y
=
c
k
)
P
(
Y
=
c
k
)
P(Y=c_k|X=x)=\frac{\prod_{j=1}^n P(X^{(j)}=x^{(j)}|Y=c_k)P(Y=c_k)}{\sum _k {P(X=x|Y=c_k)P(Y=c_k)}}
P(Y=ck∣X=x)=∑kP(X=x∣Y=ck)P(Y=ck)∏j=1nP(X(j)=x(j)∣Y=ck)P(Y=ck)
c k c_k ck是样本的第K类的标签, x ( n ) x^{(n)} x(n)是样本 x x x第n个特征的取值
所以当判断样本
x
x
x属于那个分类时只需求得所有
P
(
Y
=
c
(
1
,
2
,
3...
k
)
∣
X
=
x
)
P(Y=c_(1,2,3...k)|X=x)
P(Y=c(1,2,3...k)∣X=x)并选择最大的
c
k
c_k
ck作为分类标签,即
y
=
f
(
x
)
=
a
r
g
max
c
k
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
P
(
Y
=
c
k
)
∑
k
P
(
X
=
x
∣
Y
=
c
k
)
P
(
Y
=
c
k
)
y=f(x)=arg \max_{c_k}{\frac{\prod_{j=1}^n P(X^{(j)}=x^{(j)}|Y=c_k)P(Y=c_k)}{\sum _k {P(X=x|Y=c_k)P(Y=c_k)}}}
y=f(x)=argckmax∑kP(X=x∣Y=ck)P(Y=ck)∏j=1nP(X(j)=x(j)∣Y=ck)P(Y=ck)
由于对所有
c
k
c_k
ck来说,他们的分子是一样的,所以只需求得:
y
=
a
r
g
max
c
k
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
P
(
Y
=
c
k
)
y=arg \max_{c_k}{\prod_{j=1}^n P(X^{(j)}=x^{(j)}|Y=c_k)P(Y=c_k)}
y=argckmaxj=1∏nP(X(j)=x(j)∣Y=ck)P(Y=ck)
(此为后验概率最大化准则的分类器公式)
2.损失函数
L
(
Y
,
f
(
x
)
)
=
{
1
,
(
Y
≠
f
(
x
)
)
0
,
(
Y
=
f
(
x
)
)
L(Y,f(x))=\left\{ \begin{aligned} 1 & , & (Y \neq f(x)) \\ 0 & , & (Y=f(x)) \end{aligned} \right.
L(Y,f(x))={10,,(Y̸=f(x))(Y=f(x))
期望风险
R
e
x
p
=
E
x
∑
k
=
1
K
(
L
(
c
k
,
f
(
x
)
)
P
(
c
k
∣
X
)
)
R_{exp}=E_x \sum_{k=1}^{K}(L(c_k,f(x)) P(c_k|X))
Rexp=Exk=1∑K(L(ck,f(x))P(ck∣X))
期望风险越小越好,整体损失越小,由最小期望风险也可以推导
f
(
x
)
=
a
r
g
min
∑
k
=
1
K
(
L
(
c
k
,
y
)
P
(
c
k
∣
X
=
x
)
)
=
a
r
g
min
∑
k
=
1
K
P
(
c
k
!
=
y
∣
X
=
x
)
=
a
r
g
min
(
1
−
P
(
c
k
=
y
∣
X
=
x
)
)
=
a
r
g
max
P
(
c
k
=
y
∣
X
=
x
)
\begin{aligned} f(x) &=arg \min \sum_{k=1}^{K}(L(c_k,y) P(c_k|X=x)) \\ &=arg \min \sum_{k=1}^{K}P(c_k != y|X=x)\\ &= arg \min (1-P(c_k = y|X=x))\\ &= arg \max P(c_k = y|X=x) \end{aligned}
f(x)=argmink=1∑K(L(ck,y)P(ck∣X=x))=argmink=1∑KP(ck!=y∣X=x)=argmin(1−P(ck=y∣X=x))=argmaxP(ck=y∣X=x)
和上述分类器原理一直,由此可知上述分类器公式满足期望风险最小
3.参数估计方法
3.1极大似然估计
简单来说就是直接数样本,把样本中出现
c
k
c_k
ck的概率当做
c
k
c_k
ck在自然界中自己生成的概率。
P
(
Y
=
c
k
)
=
∑
i
=
1
N
I
(
y
i
=
c
k
)
N
P(Y=c_k)=\frac{\sum_{i=1}^N I(y_i=c_k)}{N}
P(Y=ck)=N∑i=1NI(yi=ck)
含义:标签为
c
k
c_k
ck的样本数占总样本数的比例
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
j
=
a
j
l
,
y
i
=
c
k
)
∑
i
=
1
N
I
(
y
i
=
c
k
)
P(X^{(j)}=a_{jl}|Y=c_k)=\frac{\sum_{i=1}^N I(x^{j}_i=a_{jl},y_i=c_k)}{\sum_{i=1}^N I(y_i=c_k)}
P(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)∑i=1NI(xij=ajl,yi=ck)
含义:样本标签为
c
k
c_k
ck样本中样本
x
x
x第j个特征取值=
a
j
l
a_{jl}
ajl所占的比例
3.2贝叶斯估计
使用极大似然估计可能会导致都要估计的概率值为0,
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
j
=
a
j
l
,
y
i
=
c
k
)
+
λ
∑
i
=
1
N
I
(
y
i
=
c
k
)
+
S
j
λ
P(X^{(j)}=a_{jl}|Y=c_k)=\frac{\sum_{i=1}^N I(x^{j}_i=a_{jl},y_i=c_k)+\lambda}{\sum_{i=1}^N I(y_i=c_k) + S_j \lambda}
P(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)+Sjλ∑i=1NI(xij=ajl,yi=ck)+λ
λ
\lambda
λ为正数,
λ
=
0
\lambda=0
λ=0是就是极大似然估计
λ
=
1
\lambda=1
λ=1时被称为拉普拉斯平滑
s
j
s_j
sj为样本
x
x
x标签为
c
k
c_k
ck且第j个特征所有取值的数量