Naive Bayes 朴素贝叶斯

最新推荐文章于 2023-01-17 15:42:12 发布

HaronW

最新推荐文章于 2023-01-17 15:42:12 发布

阅读量102

点赞数

分类专栏： Machine Learning

本文链接：https://blog.csdn.net/weixin_45286813/article/details/105534892

版权

Machine Learning 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

Naive Bayesian 朴素贝叶斯

内容来自于CS229,浙江大学机器学习听课笔记以及百度百科，blog补充

We would use an example to show this algorithm, this examples is still using in practice now.

Spam email classifier
$X$ is 1/0 vector correspond to dictionary and each dimension represents a word. If a word shows up in a email, then its corresponding value equals to 1.
We assume that $X_i$ are independently and identically distribution(IID), although it is obvious that they are not for the meaning of email.
We choose top 10,000 common used words as the dictionary.
$P(x_1...x_{10000}|y) = P(x_1|y)P(x_2|x_1,y)...P(x_{10000}|x_{9999},x_{9998}...x_1,y)$
$P(x_1...x_{10000}|y) = \prod\limits_{i=1}^{10000}P(x_i|y)$
Parameters:
$\phi_{j|y=1} = P(x_j=1|y=1)$
$\phi_{j|y=0} = P(x_j=1|y=0)$
$\phi_y = P(y=1)$
$y = 1$ means the email is spam email.

Joint likelihood:
$L(\phi_y,\phi_{j|y}) = \prod\limits_{i=1}^m(x^{(i)},y^{(i)};\phi_y,\phi_{j|y})$
MLE:
$\phi_y=\frac{\sum\limits_{i=1}^m1\{y^{(i)}=1\}}{m}$
$\phi_{j|y=1} = \frac{\sum\limits_{i=1}^m1\{x^{(i)}=1,y^{(i)}=1\}}{\sum\limits_{i=1}^m1\{y^{(i)}=1\}}$
$\phi_{j|y=0} = \frac{\sum\limits_{i=1}^m1\{x^{(i)}=1,y^{(i)}=0\}}{\sum\limits_{i=1}^m1\{y^{(i)}=0\}}$

Laplace moving
If a new word(not in the top 10000 common words) shows up in a email, $\sum\limits_{i=1}^m1\{x^{(i)}=1,y^{(i)}=1\}=0$ , then $\phi_{j|j=0}=0$ . This seems not robust, so we use Laplace moving to optimize this equation.
$\phi_{j|y=1} = \frac{\sum\limits_{i=1}^m1\{x^{(i)}=1,y^{(i)}=1\}+1}{\sum\limits_{i=1}^m1\{y^{(i)}=1\}+10000}$
$\phi_{j|y=0} = \frac{\sum\limits_{i=1}^m1\{x^{(i)}=1,y^{(i)}=0\}+1}{\sum\limits_{i=1}^m1\{y^{(i)}=0\}+10000}$

HaronW

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Naive Bayes 朴素贝叶斯

Naive Bayesian 朴素贝叶斯内容来自于CS229,浙江大学机器学习听课笔记以及百度百科，blog补充We would use an example to show this algorithm, this examples is still using in practice now.Spam email classifierXXX is 1/0 vector corresp...
复制链接

扫一扫