【白板推导系列笔记】线性分类-朴素贝叶斯分类器(Naive Bayes Classifer)

CSDN话题挑战赛第2期
参赛话题:学习笔记

朴素贝叶斯是对数据属性之间的关系进行了假设,即各个属性维度之间独立。

NB中我们假设 X X X是离散的,服从多项分布(包括伯努利)。GDA的 X X X可以用多维高斯分布表示,但是在NB中我们却不能直接使用多项分布。我们用垃圾邮件分类器来阐述NB的思想。
在这个分类器中我们可以用单词向量作为输入特征,具体的,我们的单词书中如果一共有50000个词,那么一封邮件的x向量可以是
x = [ 1 0 0 ⋅ ⋅ 1 ⋅ ⋅ 0 ] a a a r d v a r k a a r d w o l f ⋅ ⋅ b u y ⋅ ⋅ z e n x=\left[\begin{matrix}1\\0\\0\\\cdot\\\cdot\\1\\\cdot\\\cdot\\0\end{matrix}\right]\begin{matrix}a\\aardvark\\aardwolf\\\cdot\\\cdot\\buy\\\cdot\\\cdot\\zen\end{matrix} x= 10010 aaardvarkaardwolfbuyzen
x x x是一个 50000 50000 50000维的向量,在这封邮件中如果存在字典中的词,那该词所在的位置设置为 1 1 1;否则为 0 0 0
如果要直接用多项分布 p ( x ∣ y ) p(x|y) p(xy)建模, p ( x ∣ y ) p(x|y) p(xy)共有 2 50000 2^{50000} 250000个不同的值,那么我们至少需要 2 50000 − 1 2^{50000}−1 2500001个参数使参数和为 1 1 1,对如此多的参数进行估计是不现实的,所以我们做一个强假设来简化概率模型。

因为每一维度都有 0 , 1 0,1 0,1两种可能,因此就有 2 50000 2^{50000} 250000种组合

作者:rushshi
链接:高斯判别分析(GDA)和朴素贝叶斯(NB)_rushshi的博客-CSDN博客

{ ( x i , y i ) } i = 1 N , x i ∈ R p , y i ∈ { 0 , 1 } \begin{gathered} \left\{(x_{i},y_{i})\right\}_{i=1}^{N},x_{i}\in \mathbb{R}^{p},y_{i}\in \left\{0,1\right\} \end{gathered} {(xi,yi)}i=1N,xiRp,yi{0,1}
朴素贝叶斯假设每一个维度都是独立的,则有
p ( x 1 , ⋯   , x p ∣ y ) = p ( x 1 ∣ y ) p ( x 2 ∣ y , x 1 ) ⋯ p ( x p ∣ y , x 1 , ⋯   , x p − 1 ) 根据朴素贝叶斯假设各个维度独立 = p ( x 1 ∣ y ) p ( x 2 ∣ y ) ⋯ p ( x p ∣ y ) = ∏ j = 1 p p ( x j ∣ y ) \begin{aligned} p(x_{1},\cdots ,x_{p}|y)&=p(x_{1}|y)p(x_{2}|y,x_{1})\cdots p(x_{p}|y,x_{1},\cdots ,x_{p-1})\\ &根据朴素贝叶斯假设各个维度独立\\ &=p(x_{1}|y)p(x_{2}|y)\cdots p(x_{p}|y)\\ &=\prod\limits_{j=1}^{p}p(x_{j}|y) \end{aligned} p(x1,,xpy)=p(x1y)p(x2y,x1)p(xpy,x1,,xp1)根据朴素贝叶斯假设各个维度独立=p(x1y)p(x2y)p(xpy)=j=1pp(xjy)
这里需要先假设
y ∼ B ( 1 , ϕ y ) ⇒ p ( y ) = ϕ y ( 1 − ϕ ) 1 − y p ( x j = 1 ∣ y = 0 ) = ϕ j ∣ y = 0 p ( x j = 1 ∣ y = 1 ) = ϕ j ∣ y = 1 ϕ j ∣ y = ϕ j ∣ y = 1 y ϕ j ∣ y = 0 1 − y p ( x j ∣ y ) = ϕ j ∣ y x j ( 1 − ϕ j ∣ y ) 1 − x j \begin{aligned} y &\sim B(1,\phi_{y})\\ &\Rightarrow p(y)=\phi^{y}(1-\phi)^{1-y}\\ p(x_{j}=1|y=0)&=\phi_{j|y=0}\\ p(x_{j}=1|y=1)&=\phi_{j|y=1}\\ \phi_{j|y}&=\phi_{j|y=1}^{y}\phi_{j|y=0}^{1-y}\\ p(x_{j}|y)&=\phi_{j|y}^{x_{j}}(1-\phi_{j|y})^{1-x_{j}} \end{aligned} yp(xj=1∣y=0)p(xj=1∣y=1)ϕjyp(xjy)B(1,ϕy)p(y)=ϕy(1ϕ)1y=ϕjy=0=ϕjy=1=ϕjy=1yϕjy=01y=ϕjyxj(1ϕjy)1xj
对数似然函数
L ( ϕ y , ϕ j ∣ y = 0 , ϕ j ∣ y = 1 ) = log ⁡ ∏ i = 1 N p ( x i , y i ) = log ⁡ ∏ i = 1 N p ( x i ∣ y i ) p ( y i ) = log ⁡ ∏ i = 1 N ( ∏ j = 1 p p ( x i j ∣ y i ) ) p ( y i ) = ∑ i = 1 N [ log ⁡ p ( y i ) + ∑ j = 1 p log ⁡ p ( x i j ∣ y i ) ] = ∑ i = 1 N [ y i log ⁡ ϕ y + ( 1 − y i ) log ⁡ ( 1 − ϕ y ) ⏟ ( 1 ) + ∑ j = 1 p [ ( x i j log ⁡ ϕ j ∣ y i ) + ( 1 − x i j ) log ⁡ ( 1 − ϕ j ∣ y i ) ] ⏟ ( 2 ) ] \begin{aligned} L(\phi_{y},\phi_{j|y=0},\phi_{j|y=1})&=\log \prod\limits_{i=1}^{N}p(x_{i},y_{i})\\ &=\log \prod\limits_{i=1}^{N}p(x_{i}|y_{i})p(y_{i})\\ &=\log \prod\limits_{i=1}^{N}\left(\prod\limits_{j=1}^{p}p(x_{ij}|y_{i})\right) p(y_{i})\\ &=\sum\limits_{i=1}^{N}\left[\log p(y_{i})+\sum\limits_{j=1}^{p}\log p(x_{ij}|y_{i})\right]\\ &=\sum\limits_{i=1}^{N}\left[\underbrace{y_{i}\log \phi_{y}+(1-y_{i})\log (1-\phi_{y})}_{(1)}+\underbrace{\sum\limits_{j=1}^{p}[(x_{ij}\log \phi_{j|y_{i}})+(1-x_{ij})\log (1-\phi_{j|y_{i}})]}_{(2)}\right] \end{aligned} L(ϕy,ϕjy=0,ϕjy=1)=logi=1Np(xi,yi)=logi=1Np(xiyi)p(yi)=logi=1N(j=1pp(xijyi))p(yi)=i=1N[logp(yi)+j=1plogp(xijyi)]=i=1N (1) yilogϕy+(1yi)log(1ϕy)+(2) j=1p[(xijlogϕjyi)+(1xij)log(1ϕjyi)]
对于 ϕ j ∣ y = 0 \phi_{j|y=0} ϕjy=0
( 2 ) = ∑ j = 1 p [ ( x i j log ⁡ ϕ j ∣ y i ) + ( 1 − x i j ) log ⁡ ( 1 − ϕ j ∣ y i ) ] = ∑ j = 1 p [ x i j log ⁡ ϕ j ∣ y = 0 1 { y i = 0 } + ( 1 − x i j ) log ⁡ ( 1 − ϕ j ∣ y = 0 ) 1 { y i = 0 } ] ∂ ( 2 ) ∂ ϕ j ∣ y = 0 = ∑ j = 1 p [ x i j 1 ϕ j ∣ y = 0 1 { y i = 0 } − ( 1 − x i j ) 1 1 − ϕ j ∣ y = 0 1 { y i = 0 } ] = 0 0 = ∑ j = 1 p [ ( x i j − ϕ j ∣ y = 0 ) 1 { y i = 0 } ] 0 = ∑ j = 1 p ( x i j ⋅ 1 { y i = 0 } ) − ϕ j ∣ y = 0 ∑ j = 1 p 1 { y i = 0 } 0 = ∑ j = 1 p 1 { x i j = 1 ∧ y i = 0 } − ϕ j ∣ y = 0 ∑ j = 1 p 1 { y i = 0 } ϕ j ∣ y = 0 ^ = ∑ j = 1 p 1 { x i j = 1 ∧ y i = 0 } ∑ j = 1 p 1 { y i = 0 } \begin{aligned} (2)&=\sum\limits_{j=1}^{p}[(x_{ij}\log \phi_{j|y_{i}})+(1-x_{ij})\log (1-\phi_{j|y_{i}})]\\ &=\sum\limits_{j=1}^{p}[x_{ij}\log \phi_{j|y=0}1\left\{y_{i}=0\right\}+(1-x_{ij})\log(1- \phi_{j|y=0})1\left\{y_{i}=0\right\}]\\ \frac{\partial (2)}{\partial \phi_{j|y=0}}&=\sum\limits_{j=1}^{p}\left[x_{ij} \frac{1}{\phi_{j|y=0}}1\left\{y_{i}=0\right\}-\left(1-x_{ij}\right) \frac{1}{1-\phi_{j|y=0}}1\left\{y_{i}=0\right\}\right]=0\\ 0&=\sum\limits_{j=1}^{p}[(x_{ij}-\phi_{j|y=0})1\left\{y_{i}=0\right\}]\\ 0&=\sum\limits_{j=1}^{p}(x_{ij}\cdot 1\left\{y_{i}=0\right\})-\phi_{j|y=0}\sum\limits_{j=1}^{p}1 \left\{y_{i}=0\right\}\\ 0&=\sum\limits_{j=1}^{p}1\left\{x_{ij}=1\land y_{i}=0\right\}-\phi_{j|y=0}\sum\limits_{j=1}^{p}1\left\{y_{i}=0\right\}\\ \widehat{\phi_{j|y=0}}&=\frac{\sum\limits_{j=1}^{p}1\left\{x_{ij}=1 \land y_{i}=0\right\}}{\sum\limits_{j=1}^{p}1\left\{y_{i}=0\right\}} \end{aligned} (2)ϕjy=0(2)000ϕjy=0 =j=1p[(xijlogϕjyi)+(1xij)log(1ϕjyi)]=j=1p[xijlogϕjy=01{yi=0}+(1xij)log(1ϕjy=0)1{yi=0}]=j=1p[xijϕjy=011{yi=0}(1xij)1ϕjy=011{yi=0}]=0=j=1p[(xijϕjy=0)1{yi=0}]=j=1p(xij1{yi=0})ϕjy=0j=1p1{yi=0}=j=1p1{xij=1yi=0}ϕjy=0j=1p1{yi=0}=j=1p1{yi=0}j=1p1{xij=1yi=0}

指示函数
1 A ( x ) = { 1 x ∈ A 0 x ∉ A 1_{A}(x)=\left\{\begin{aligned}&1&x \in A\\&0&x \notin A\end{aligned}\right. 1A(x)={10xAx/A
也可记作 I A ( x ) , X A ( x ) I_{A}(x),X_{A}(x) IA(x),XA(x)
这里的指示函数在GDA中有类似的代替,即
C 1 = { x i ∣ y i = 1 , i = 1 , 2 , ⋯   , N } , ∣ C 1 ∣ = N 1 C 0 = { x i ∣ y i = 0 , i = 1 , 2 , ⋯   , N } , ∣ C 0 ∣ = N 0 ∑ x i ∈ C 1 , ∑ x i ∈ C 0 \begin{gathered}C_{1}=\left\{x_{i}|y_{i}=1,i=1,2,\cdots,N\right\},|C_{1}|=N_{1}\\C_{0}=\left\{x_{i}|y_{i}=0,i=1,2,\cdots,N\right\},|C_{0}|=N_{0}\\\sum\limits_{x_{i}\in C_{1}},\sum\limits_{x_{i}\in C_{0}}\end{gathered} C1={xiyi=1,i=1,2,,N},C1=N1C0={xiyi=0,i=1,2,,N},C0=N0xiC1,xiC0

ϕ j ∣ y = 0 ^ \widehat{\phi_{j|y=0}} ϕjy=0 可以理解为 y = 0 y=0 y=0的样本中 x x x维度为 1 1 1的数量除以 y = 0 y=0 y=0的样本个数
同理可得 ϕ j ∣ y = 1 ^ \widehat{\phi_{j|y=1}} ϕjy=1
ϕ j ∣ y = 1 ^ = ∑ j = 1 p 1 { x i j = 1 ∧ y i = 1 } ∑ j = 1 p 1 { y i = 1 } \widehat{\phi_{j|y=1}}=\frac{\sum\limits_{j=1}^{p}1\left\{x_{ij}=1\land y_{i}=1\right\}}{\sum\limits_{j=1}^{p}1\left\{y_{i}=1\right\}} ϕjy=1 =j=1p1{yi=1}j=1p1{xij=1yi=1}
对于 ϕ y \phi_{y} ϕy
( 1 ) = ∑ i = 1 N [ y i log ⁡ ϕ y + ( 1 − y i ) log ⁡ ( 1 − ϕ y ) ] ∂ ( 1 ) ∂ ϕ y = ∑ i = 1 N [ y i 1 ϕ y − ( 1 − y i ) 1 1 − ϕ y ] = 0 0 = ∑ i = 1 N [ y i ( 1 − ϕ y ) − ( 1 − y i ) ϕ y ] 0 = ∑ i = 1 N ( y i − ϕ y ) ϕ y ^ = ∑ i = 1 N 1 { y i = 1 } N \begin{aligned} (1)&=\sum\limits_{i=1}^{N}[y_{i}\log \phi_{y}+(1-y_{i})\log (1-\phi_{y})]\\ \frac{\partial (1)}{\partial \phi_{y}}&=\sum\limits_{i=1}^{N}\left[y_{i} \frac{1}{\phi_{y}}-\left(1-y_{i}\right) \frac{1}{1-\phi_{y}}\right]=0\\ 0&=\sum\limits_{i=1}^{N}[y_{i}(1-\phi_{y})-(1-y_{i})\phi_{y}]\\ 0&=\sum\limits_{i=1}^{N}(y_{i}-\phi_{y})\\ \hat{\phi_{y}}&=\frac{\sum\limits_{i=1}^{N}1\left\{y_{i}=1\right\}}{N} \end{aligned} (1)ϕy(1)00ϕy^=i=1N[yilogϕy+(1yi)log(1ϕy)]=i=1N[yiϕy1(1yi)1ϕy1]=0=i=1N[yi(1ϕy)(1yi)ϕy]=i=1N(yiϕy)=Ni=1N1{yi=1}
这里假设 x x x只能等于 0 , 1 0,1 0,1,但实际上 x x x常常服从于类别分布,实际上思路相同,只是估计参数变多,这里不进行推导

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值