Naive Bayes

最新推荐文章于 2021-03-21 10:20:04 发布

hongo0o0

最新推荐文章于 2021-03-21 10:20:04 发布

阅读量224

点赞数

分类专栏： Machine Learning 文章标签： Naive Bayes spam filter supervised learning posterior probabilit

Machine Learning 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the“naive”assumption of independence between every pair of features. Given a class variable $y$ and a dependent feature vector $x_1$ through $x_n$ , Bayes’ theorem states the following relationship:

$P(y \mid x_1, \dots, x_n) = \frac{P(y) P(x_1, \dots x_n \mid y)} {P(x_1, \dots, x_n)}$

Using the naive independence assumption that

$P(x_i | y, x_1, \dots, x_{i-1}, x_{i+1}, \dots, x_n) = P(x_i | y),$

for all $i$ , this relationship is simplified to

$P(y \mid x_1, \dots, x_n) = \frac{P(y) \prod_{i=1}^{n} P(x_i \mid y)} {P(x_1, \dots, x_n)}$

Since $P(x_1, \dots, x_n)$ is constant given the input, we can use the following classification rule:

$P(y \mid x_1, \dots, x_n) \propto P(y) \prod_{i=1}^{n} P(x_i \mid y)\Downarrow\hat{y} = \arg\max_y P(y) \prod_{i=1}^{n} P(x_i \mid y),$

and we can use Maximum A Posteriori (MAP) estimation to estimate $P(y)$ and $P(x_i \mid y)$ ; the former is then the relative frequency of class $y$ in the training set.

The different naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of $P(x_i \mid y)$ .

Example:

spam filtering: y means classification labels: Spam / Regular.

x means different words in the emails.

$P(y)$ means the probability of y type emails in a set of all emails. AKA prior probability.

$P(x_i \mid y)$ means the probability of word "xi" by given the email type. In other words, compute the frequency of word "xi" in all ytype emails.

Then we can simply compare the values of the final formula given different y type. The type with the higher value is treated as the output class label.

Note: Real Spam Filtering is much complex than this, you have to consider a lot of other situations, such as dealing with rare words(Laplace smoothing), dealing with words like "and", "is", "a", "the", how to define the threshold to do the final classification, how to define the posterior probability in the real life and so on. Wanna know more, see the wiki link below:

https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering

Here is another example: I think it is more in details:

http://blog.csdn.net/amds123/article/details/70173402

In scikit learn package, MultinomialNB and BernoulliNB are suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.