Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the“naive”assumption of independence between every pair of features. Given a class variable and a dependent feature vector
through
, Bayes’ theorem states the following relationship:
Using the naive independence assumption that
for all , this relationship is simplified to
Since is constant given the input, we can use the following classification rule:
and we can use Maximum A Posteriori (MAP) estimation to estimate and
; the former is then the relative frequency of class
in the training set.
The different naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of .
Example:
spam filtering: y means classification labels: Spam / Regular.
x means different words in the emails.
means the probability of y type emails in a set of all emails. AKA prior probability.
means the probability of word "xi" by given the email type. In other words, compute the frequency of word "xi" in all ytype emails.
Then we can simply compare the values of the final formula given different y type. The type with the higher value is treated as the output class label.
Note: Real Spam Filtering is much complex than this, you have to consider a lot of other situations, such as dealing with rare words(Laplace smoothing), dealing with words like "and", "is", "a", "the", how to define the threshold to do the final classification, how to define the posterior probability in the real life and so on. Wanna know more, see the wiki link below:
https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering
Here is another example: I think it is more in details:
http://blog.csdn.net/amds123/article/details/70173402
In scikit learn package, MultinomialNB and BernoulliNB are suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.