Naive Bayes and Decision Tree_朴素贝叶斯分类器_决策树的基本逻辑
Bayes rule appliede to documents and classes
For a document d and a class c:
Naive Bayes Classifier
Then we use a bunch of features to represent document:
Then how to estimate these two terms separately?
Multinomial Naive Bayes Independence Assumptions
- Bag of Words assumption: Assume position doesn’t matter
- Conditional Independence: Assume the feature
probabilities P(xi|cj) are independent given the class c
The difference between naive bayes and unigram model (possible):
Unigram model multiply poriori probability
Naive Bayes multiply conditional probability under different classifications
Multinomial Naive Bayes model
- Initial: MLE
simply use frequencies in the data
Interpretation: fraction of times word w i w_i wi appears among all words in documents of topic c j c_j cj;
Create mega-document for topic j by concatenating all docs in this topic
Problem: null value of certain attribute – no training documents with the certain word
Zero probabilities cannot be conditioned away, no matter the other evidence. - Laplace (add-1) smoothing for Naïve Bayes
- Learning
- From training corpus, extract Vocabulary
•Calculate P ( c j ) P(c_j) P(cj) terms
• For each