CSDN话题挑战赛第2期
参赛话题:学习笔记
朴素贝叶斯是对数据属性之间的关系进行了假设,即各个属性维度之间独立。
NB中我们假设 X X X是离散的,服从多项分布(包括伯努利)。GDA的 X X X可以用多维高斯分布表示,但是在NB中我们却不能直接使用多项分布。我们用垃圾邮件分类器来阐述NB的思想。
在这个分类器中我们可以用单词向量作为输入特征,具体的,我们的单词书中如果一共有50000个词,那么一封邮件的x向量可以是
x = [ 1 0 0 ⋅ ⋅ 1 ⋅ ⋅ 0 ] a a a r d v a r k a a r d w o l f ⋅ ⋅ b u y ⋅ ⋅ z e n x=\left[\begin{matrix}1\\0\\0\\\cdot\\\cdot\\1\\\cdot\\\cdot\\0\end{matrix}\right]\begin{matrix}a\\aardvark\\aardwolf\\\cdot\\\cdot\\buy\\\cdot\\\cdot\\zen\end{matrix} x=⎣ ⎡100⋅⋅1⋅⋅0⎦ ⎤aaardvarkaardwolf⋅⋅buy⋅⋅zen
x x x是一个 50000 50000 50000维的向量,在这封邮件中如果存在字典中的词,那该词所在的位置设置为 1 1 1;否则为 0 0 0。
如果要直接用多项分布对 p ( x ∣ y ) p(x|y) p(x∣y)建模, p ( x ∣ y ) p(x|y) p(x∣y)共有 2 50000 2^{50000} 250000个不同的值,那么我们至少需要 2 50000 − 1 2^{50000}−1 250000−1个参数使参数和为 1 1 1,对如此多的参数进行估计是不现实的,所以我们做一个强假设来简化概率模型。因为每一维度都有 0 , 1 0,1 0,1两种可能,因此就有 2 50000 2^{50000} 250000种组合
{
(
x
i
,
y
i
)
}
i
=
1
N
,
x
i
∈
R
p
,
y
i
∈
{
0
,
1
}
\begin{gathered} \left\{(x_{i},y_{i})\right\}_{i=1}^{N},x_{i}\in \mathbb{R}^{p},y_{i}\in \left\{0,1\right\} \end{gathered}
{(xi,yi)}i=1N,xi∈Rp,yi∈{0,1}
朴素贝叶斯假设每一个维度都是独立的,则有
p
(
x
1
,
⋯
,
x
p
∣
y
)
=
p
(
x
1
∣
y
)
p
(
x
2
∣
y
,
x
1
)
⋯
p
(
x
p
∣
y
,
x
1
,
⋯
,
x
p
−
1
)
根据朴素贝叶斯假设各个维度独立
=
p
(
x
1
∣
y
)
p
(
x
2
∣
y
)
⋯
p
(
x
p
∣
y
)
=
∏
j
=
1
p
p
(
x
j
∣
y
)
\begin{aligned} p(x_{1},\cdots ,x_{p}|y)&=p(x_{1}|y)p(x_{2}|y,x_{1})\cdots p(x_{p}|y,x_{1},\cdots ,x_{p-1})\\ &根据朴素贝叶斯假设各个维度独立\\ &=p(x_{1}|y)p(x_{2}|y)\cdots p(x_{p}|y)\\ &=\prod\limits_{j=1}^{p}p(x_{j}|y) \end{aligned}
p(x1,⋯,xp∣y)=p(x1∣y)p(x2∣y,x1)⋯p(xp∣y,x1,⋯,xp−1)根据朴素贝叶斯假设各个维度独立=p(x1∣y)p(x2∣y)⋯p(xp∣y)=j=1∏pp(xj∣y)
这里需要先假设
y
∼
B
(
1
,
ϕ
y
)
⇒
p
(
y
)
=
ϕ
y
(
1
−
ϕ
)
1
−
y
p
(
x
j
=
1
∣
y
=
0
)
=
ϕ
j
∣
y
=
0
p
(
x
j
=
1
∣
y
=
1
)
=
ϕ
j
∣
y
=
1
ϕ
j
∣
y
=
ϕ
j
∣
y
=
1
y
ϕ
j
∣
y
=
0
1
−
y
p
(
x
j
∣
y
)
=
ϕ
j
∣
y
x
j
(
1
−
ϕ
j
∣
y
)
1
−
x
j
\begin{aligned} y &\sim B(1,\phi_{y})\\ &\Rightarrow p(y)=\phi^{y}(1-\phi)^{1-y}\\ p(x_{j}=1|y=0)&=\phi_{j|y=0}\\ p(x_{j}=1|y=1)&=\phi_{j|y=1}\\ \phi_{j|y}&=\phi_{j|y=1}^{y}\phi_{j|y=0}^{1-y}\\ p(x_{j}|y)&=\phi_{j|y}^{x_{j}}(1-\phi_{j|y})^{1-x_{j}} \end{aligned}
yp(xj=1∣y=0)p(xj=1∣y=1)ϕj∣yp(xj∣y)∼B(1,ϕy)⇒p(y)=ϕy(1−ϕ)1−y=ϕj∣y=0=ϕj∣y=1=ϕj∣y=1yϕj∣y=01−y=ϕj∣yxj(1−ϕj∣y)1−xj
对数似然函数
L
(
ϕ
y
,
ϕ
j
∣
y
=
0
,
ϕ
j
∣
y
=
1
)
=
log
∏
i
=
1
N
p
(
x
i
,
y
i
)
=
log
∏
i
=
1
N
p
(
x
i
∣
y
i
)
p
(
y
i
)
=
log
∏
i
=
1
N
(
∏
j
=
1
p
p
(
x
i
j
∣
y
i
)
)
p
(
y
i
)
=
∑
i
=
1
N
[
log
p
(
y
i
)
+
∑
j
=
1
p
log
p
(
x
i
j
∣
y
i
)
]
=
∑
i
=
1
N
[
y
i
log
ϕ
y
+
(
1
−
y
i
)
log
(
1
−
ϕ
y
)
⏟
(
1
)
+
∑
j
=
1
p
[
(
x
i
j
log
ϕ
j
∣
y
i
)
+
(
1
−
x
i
j
)
log
(
1
−
ϕ
j
∣
y
i
)
]
⏟
(
2
)
]
\begin{aligned} L(\phi_{y},\phi_{j|y=0},\phi_{j|y=1})&=\log \prod\limits_{i=1}^{N}p(x_{i},y_{i})\\ &=\log \prod\limits_{i=1}^{N}p(x_{i}|y_{i})p(y_{i})\\ &=\log \prod\limits_{i=1}^{N}\left(\prod\limits_{j=1}^{p}p(x_{ij}|y_{i})\right) p(y_{i})\\ &=\sum\limits_{i=1}^{N}\left[\log p(y_{i})+\sum\limits_{j=1}^{p}\log p(x_{ij}|y_{i})\right]\\ &=\sum\limits_{i=1}^{N}\left[\underbrace{y_{i}\log \phi_{y}+(1-y_{i})\log (1-\phi_{y})}_{(1)}+\underbrace{\sum\limits_{j=1}^{p}[(x_{ij}\log \phi_{j|y_{i}})+(1-x_{ij})\log (1-\phi_{j|y_{i}})]}_{(2)}\right] \end{aligned}
L(ϕy,ϕj∣y=0,ϕj∣y=1)=logi=1∏Np(xi,yi)=logi=1∏Np(xi∣yi)p(yi)=logi=1∏N(j=1∏pp(xij∣yi))p(yi)=i=1∑N[logp(yi)+j=1∑plogp(xij∣yi)]=i=1∑N⎣
⎡(1)
yilogϕy+(1−yi)log(1−ϕy)+(2)
j=1∑p[(xijlogϕj∣yi)+(1−xij)log(1−ϕj∣yi)]⎦
⎤
对于
ϕ
j
∣
y
=
0
\phi_{j|y=0}
ϕj∣y=0有
(
2
)
=
∑
j
=
1
p
[
(
x
i
j
log
ϕ
j
∣
y
i
)
+
(
1
−
x
i
j
)
log
(
1
−
ϕ
j
∣
y
i
)
]
=
∑
j
=
1
p
[
x
i
j
log
ϕ
j
∣
y
=
0
1
{
y
i
=
0
}
+
(
1
−
x
i
j
)
log
(
1
−
ϕ
j
∣
y
=
0
)
1
{
y
i
=
0
}
]
∂
(
2
)
∂
ϕ
j
∣
y
=
0
=
∑
j
=
1
p
[
x
i
j
1
ϕ
j
∣
y
=
0
1
{
y
i
=
0
}
−
(
1
−
x
i
j
)
1
1
−
ϕ
j
∣
y
=
0
1
{
y
i
=
0
}
]
=
0
0
=
∑
j
=
1
p
[
(
x
i
j
−
ϕ
j
∣
y
=
0
)
1
{
y
i
=
0
}
]
0
=
∑
j
=
1
p
(
x
i
j
⋅
1
{
y
i
=
0
}
)
−
ϕ
j
∣
y
=
0
∑
j
=
1
p
1
{
y
i
=
0
}
0
=
∑
j
=
1
p
1
{
x
i
j
=
1
∧
y
i
=
0
}
−
ϕ
j
∣
y
=
0
∑
j
=
1
p
1
{
y
i
=
0
}
ϕ
j
∣
y
=
0
^
=
∑
j
=
1
p
1
{
x
i
j
=
1
∧
y
i
=
0
}
∑
j
=
1
p
1
{
y
i
=
0
}
\begin{aligned} (2)&=\sum\limits_{j=1}^{p}[(x_{ij}\log \phi_{j|y_{i}})+(1-x_{ij})\log (1-\phi_{j|y_{i}})]\\ &=\sum\limits_{j=1}^{p}[x_{ij}\log \phi_{j|y=0}1\left\{y_{i}=0\right\}+(1-x_{ij})\log(1- \phi_{j|y=0})1\left\{y_{i}=0\right\}]\\ \frac{\partial (2)}{\partial \phi_{j|y=0}}&=\sum\limits_{j=1}^{p}\left[x_{ij} \frac{1}{\phi_{j|y=0}}1\left\{y_{i}=0\right\}-\left(1-x_{ij}\right) \frac{1}{1-\phi_{j|y=0}}1\left\{y_{i}=0\right\}\right]=0\\ 0&=\sum\limits_{j=1}^{p}[(x_{ij}-\phi_{j|y=0})1\left\{y_{i}=0\right\}]\\ 0&=\sum\limits_{j=1}^{p}(x_{ij}\cdot 1\left\{y_{i}=0\right\})-\phi_{j|y=0}\sum\limits_{j=1}^{p}1 \left\{y_{i}=0\right\}\\ 0&=\sum\limits_{j=1}^{p}1\left\{x_{ij}=1\land y_{i}=0\right\}-\phi_{j|y=0}\sum\limits_{j=1}^{p}1\left\{y_{i}=0\right\}\\ \widehat{\phi_{j|y=0}}&=\frac{\sum\limits_{j=1}^{p}1\left\{x_{ij}=1 \land y_{i}=0\right\}}{\sum\limits_{j=1}^{p}1\left\{y_{i}=0\right\}} \end{aligned}
(2)∂ϕj∣y=0∂(2)000ϕj∣y=0
=j=1∑p[(xijlogϕj∣yi)+(1−xij)log(1−ϕj∣yi)]=j=1∑p[xijlogϕj∣y=01{yi=0}+(1−xij)log(1−ϕj∣y=0)1{yi=0}]=j=1∑p[xijϕj∣y=011{yi=0}−(1−xij)1−ϕj∣y=011{yi=0}]=0=j=1∑p[(xij−ϕj∣y=0)1{yi=0}]=j=1∑p(xij⋅1{yi=0})−ϕj∣y=0j=1∑p1{yi=0}=j=1∑p1{xij=1∧yi=0}−ϕj∣y=0j=1∑p1{yi=0}=j=1∑p1{yi=0}j=1∑p1{xij=1∧yi=0}
指示函数
1 A ( x ) = { 1 x ∈ A 0 x ∉ A 1_{A}(x)=\left\{\begin{aligned}&1&x \in A\\&0&x \notin A\end{aligned}\right. 1A(x)={10x∈Ax∈/A
也可记作 I A ( x ) , X A ( x ) I_{A}(x),X_{A}(x) IA(x),XA(x)
这里的指示函数在GDA中有类似的代替,即
C 1 = { x i ∣ y i = 1 , i = 1 , 2 , ⋯ , N } , ∣ C 1 ∣ = N 1 C 0 = { x i ∣ y i = 0 , i = 1 , 2 , ⋯ , N } , ∣ C 0 ∣ = N 0 ∑ x i ∈ C 1 , ∑ x i ∈ C 0 \begin{gathered}C_{1}=\left\{x_{i}|y_{i}=1,i=1,2,\cdots,N\right\},|C_{1}|=N_{1}\\C_{0}=\left\{x_{i}|y_{i}=0,i=1,2,\cdots,N\right\},|C_{0}|=N_{0}\\\sum\limits_{x_{i}\in C_{1}},\sum\limits_{x_{i}\in C_{0}}\end{gathered} C1={xi∣yi=1,i=1,2,⋯,N},∣C1∣=N1C0={xi∣yi=0,i=1,2,⋯,N},∣C0∣=N0xi∈C1∑,xi∈C0∑
ϕ
j
∣
y
=
0
^
\widehat{\phi_{j|y=0}}
ϕj∣y=0
可以理解为
y
=
0
y=0
y=0的样本中
x
x
x维度为
1
1
1的数量除以
y
=
0
y=0
y=0的样本个数
同理可得
ϕ
j
∣
y
=
1
^
\widehat{\phi_{j|y=1}}
ϕj∣y=1
ϕ
j
∣
y
=
1
^
=
∑
j
=
1
p
1
{
x
i
j
=
1
∧
y
i
=
1
}
∑
j
=
1
p
1
{
y
i
=
1
}
\widehat{\phi_{j|y=1}}=\frac{\sum\limits_{j=1}^{p}1\left\{x_{ij}=1\land y_{i}=1\right\}}{\sum\limits_{j=1}^{p}1\left\{y_{i}=1\right\}}
ϕj∣y=1
=j=1∑p1{yi=1}j=1∑p1{xij=1∧yi=1}
对于
ϕ
y
\phi_{y}
ϕy
(
1
)
=
∑
i
=
1
N
[
y
i
log
ϕ
y
+
(
1
−
y
i
)
log
(
1
−
ϕ
y
)
]
∂
(
1
)
∂
ϕ
y
=
∑
i
=
1
N
[
y
i
1
ϕ
y
−
(
1
−
y
i
)
1
1
−
ϕ
y
]
=
0
0
=
∑
i
=
1
N
[
y
i
(
1
−
ϕ
y
)
−
(
1
−
y
i
)
ϕ
y
]
0
=
∑
i
=
1
N
(
y
i
−
ϕ
y
)
ϕ
y
^
=
∑
i
=
1
N
1
{
y
i
=
1
}
N
\begin{aligned} (1)&=\sum\limits_{i=1}^{N}[y_{i}\log \phi_{y}+(1-y_{i})\log (1-\phi_{y})]\\ \frac{\partial (1)}{\partial \phi_{y}}&=\sum\limits_{i=1}^{N}\left[y_{i} \frac{1}{\phi_{y}}-\left(1-y_{i}\right) \frac{1}{1-\phi_{y}}\right]=0\\ 0&=\sum\limits_{i=1}^{N}[y_{i}(1-\phi_{y})-(1-y_{i})\phi_{y}]\\ 0&=\sum\limits_{i=1}^{N}(y_{i}-\phi_{y})\\ \hat{\phi_{y}}&=\frac{\sum\limits_{i=1}^{N}1\left\{y_{i}=1\right\}}{N} \end{aligned}
(1)∂ϕy∂(1)00ϕy^=i=1∑N[yilogϕy+(1−yi)log(1−ϕy)]=i=1∑N[yiϕy1−(1−yi)1−ϕy1]=0=i=1∑N[yi(1−ϕy)−(1−yi)ϕy]=i=1∑N(yi−ϕy)=Ni=1∑N1{yi=1}
这里假设
x
x
x只能等于
0
,
1
0,1
0,1,但实际上
x
x
x常常服从于类别分布,实际上思路相同,只是估计参数变多,这里不进行推导