Naive Bayes
Naive Bayes is a probabilistic machine learning algorithm based on the Bayes Theorem, used in a wide variety of classification tasks.
Bayes Theorem
Bayes’ Theorem is a simple mathematical formula used for calculating conditional probabilities.
The formula is:
It tells us how often A happens given that B happens, written P(A|B), when we know how often B happens given that A happens, written P(B|A) , and how likely A and B are on their own. Naive Bayes calculates the probabilities for every factor then it selects the outcome with highest probability.
The fundamental Naive Bayes assumption is that each feature makes an independent and equal contribution to the outcome.
If we have a certain event E E E and test actors x 1 , x 2 , x 3 x1,x2,x3 x1,x2,x3, etc.
We first calculate P ( x 1 ∣ E ) , P ( x 2 ∣ E ) … P(x1| E) , P(x2 | E) … P(x1∣E),P(x2∣E)… [read as probability of x1 given event E happened] and then select the test actor x x x with maximum probability value.
It is powerful algorithm used for:
- Real time Prediction
- Text classification/ Spam Filtering
- Recommendation System
Advantages
- It is not only a simple approach but also a fast and accurate method for prediction.
- Naive Bayes has very low computation cost.
- It can efficiently work on a large dataset.
- It performs well in case of discrete response variable compared to the continuous variable.
- It can be used with multiple class prediction problems.
- It also performs well in the case of text analytics problems.
- When the assumption of independence holds, a Naive Bayes classifier performs better compared to other models like logistic regression.
Disadvantages
- The assumption of independent features. In practice, it is almost impossible that model will get a set of predictors which are entirely independent.
- If there is no training tuple of a particular class, this causes zero posterior probability. In this case, the model is unable to make predictions. This problem is known as Zero Probability/Frequency Problem.