-
对于实例空间中一个固定的点 p p p ,如果给定一个类标签 c ∈ Y c\in\mathcal{Y} c∈Y ,如何确定下边的条件概率:
P r ( x , y ) ∼ D [ y = c ∣ x = p ] Pr_{(x,y)\sim\mathcal{D}}[y=c|x=p] Pr(x,y)∼D[y=c∣x=p]
设计一个分类器 h o p t h_{opt} hopt 如下:- 如果: P r ( x , y ) ∼ D [ y = − 1 ∣ x = p ] ≥ 0.5 Pr_{(x,y)\sim\mathcal{D}}[y=-1|x=p] \ge 0.5 Pr(x,y)∼D[y=−1∣x=p]≥0.5 , h o p t = − 1 h_{opt} = -1 hopt=−1
- 否则, h o p t = 1 h_{opt} = 1 hopt=1
- h o p t h_{opt} hopt 的泛化误差 e r r D ( h o p t ) err_{\mathcal{D}}(h_{opt}) errD(hopt) 被称为贝叶斯误差
- 个人认为 P r ( x , y ) ∼ D [ y = − 1 ∣ x = p ] ≥ 0.5 Pr_{(x,y)\sim\mathcal{D}}[y=-1|x=p] \ge 0.5 Pr(x,y)∼D[y=−1∣x=p]≥0.5 等效于 P r ( x , y ) ∼ D [ y = − 1 ∣ x = p ] ≥ P r ( x , y ) ∼ D [ y = 1 ∣ x = p ] Pr_{(x,y)\sim\mathcal{D}}[y=-1|x=p] \ge Pr_{(x,y)\sim\mathcal{D}}[y=1|x=p] Pr(x,y)∼D[y=−1∣x=p]≥Pr(x,y)∼D[y=1∣x=p]
-
朴素贝叶斯分类
-
贝叶斯定理
Pr [ X ∣ Y ] = Pr [ Y ∣ X ] ⋅ Pr [ X ] Pr [ Y ] \operatorname{Pr}[X \mid Y]=\frac{\operatorname{Pr}[Y \mid X] \cdot \operatorname{Pr}[X]}{\operatorname{Pr}[Y]} Pr[X∣Y]=Pr[Y]Pr[Y∣X]⋅Pr[X] -
当 Pr [ y = − 1 ∣ x ] ≥ Pr [ y = 1 ∣ x ] \operatorname{Pr}[y=-1 \mid x] \geq \operatorname{Pr}[y=1 \mid x] Pr[y=−1∣x]≥Pr[y=1∣x] 时,我们预测标签为 − 1 -1 −1
-
根据贝叶斯定理,可得:
P r [ y = 1 ∣ x ] = P r [ x ∣ y = 1 ] ⋅ P r [ y = 1 ] P r [ x ] Pr[y=1|x]=\frac{Pr[x|y=1]\cdot Pr[y=1]}{Pr[x]} Pr[y=1∣x]=Pr[x]Pr[x∣y=1]⋅Pr[y=1]P r [ y = − 1 ∣ x ] = P r [ x ∣ y = − 1 ] ⋅ P r [ y = − 1 ] P r [ x ] Pr[y=-1|x]=\frac{Pr[x|y=-1]\cdot Pr[y=-1]}{Pr[x]} Pr[y=−1∣x]=Pr[x]Pr[x∣y=−1]⋅Pr[y=−1]
-
我们只需要计算 P r [ x ∣ y = 1 ] ⋅ P r [ y = 1 ] Pr[x|y=1]\cdot Pr[y=1] Pr[x∣y=1]⋅Pr[y=1] 和 P r [ x ∣ y = − 1 ] ⋅ P r [ y = − 1 ] Pr[x|y=-1]\cdot Pr[y=-1] Pr[x∣y=−1]⋅Pr[y=−1] 哪个更大,即可确定预测的标签,使用训练集估计它们。
-
很容易得到 P r [ y = 1 ] = 0.3 Pr[y=1] = 0.3 Pr[y=1]=0.3
-
对于 P r [ x ∣ y = 1 ] Pr[x|y=1] Pr[x∣y=1] 将 x x x 分解到每个属性上,假设:
Pr [ x ∣ y = 1 ] = ∏ i = 1 d Pr [ x [ A i ] ∣ y = 1 ] \operatorname{Pr}[x \mid y=1]=\prod_{i=1}^d \operatorname{Pr}\left[x\left[A_i\right] \mid y=1\right] Pr[x∣y=1]=i=1∏dPr[x[Ai]∣y=1]
对于没有出现的情况,比如 P r [ l a w y e r ∣ y = 1 ] Pr[lawyer | y = 1] Pr[lawyer∣y=1],我们将其估计为很小的值,比如 0.000001
-
-
上边的贝叶斯分类器依赖于上述假设,即有条件独立(conditional independence)
-
-
使用不太严格的有条件独立,即假设一个属性固定时,其他的属性是独立的,例如对于:
P r [ 30 + , undergrad, programmer ∣ y = − 1 ] \boldsymbol{P r}[30+, \text { undergrad, programmer } \mid y=-1] Pr[30+, undergrad, programmer ∣y=−1]
假设,年龄和教育程度在固定职业时是独立的:
Pr [ 30 + , undergrad, programmer ∣ y = − 1 ] = Pr [ 30 + , undergrad ∣ programmer, y = − 1 ] . Pr [ programmer ∣ y = − 1 ] = Pr [ 30 + ∣ programmer, y = − 1 ] ⋅ Pr [ undergrad ∣ programmer, y = − 1 ] ⋅ Pr [ programmer ∣ y = − 1 ] = 2 4 ⋅ 1 4 ⋅ 4 7 = 1 / 14. \begin{aligned} & \operatorname{Pr}[30+, \text { undergrad, programmer } \mid y=-1] \\ =& \operatorname{Pr}[30+, \text { undergrad } \mid \text { programmer, } y=-1] . \\ & \operatorname{Pr}[\text { programmer } \mid y=-1] \\ =& \operatorname{Pr}[30+\mid \text { programmer, } y=-1] \\ & \cdot \operatorname{Pr}[\text { undergrad } \mid \text { programmer, } y=-1] \\ & \cdot \operatorname{Pr}[\text { programmer } \mid y=-1] \\ =& \frac{2}{4} \cdot \frac{1}{4} \cdot \frac{4}{7}=1 / 14 . \end{aligned} ===Pr[30+, undergrad, programmer ∣y=−1]Pr[30+, undergrad ∣ programmer, y=−1].Pr[ programmer ∣y=−1]Pr[30+∣ programmer, y=−1]⋅Pr[ undergrad ∣ programmer, y=−1]⋅Pr[ programmer ∣y=−1]42⋅41⋅74=1/14.
【数据挖掘】2. 贝叶斯分类器
于 2022-09-24 16:45:14 首次发布