机器学习笔记之Naive Bayes
- A family of classifiers that are quite similar to linear models, but they train faster. Price for its efficiency is that these models often provide worse generalization performances.
- The reason that naïve Bayes models are so efficient is that they
learn parameters by looking at each feature individually and collect
simply per-class statistics from each feature. - There are three kinds of naive Bayers classifiers implemented in
scikit-learn: GaussianNB, BernoulliNB, MultinomialNB. - GaussianNB can be applied to any continuous data, while BernoulliNB assumes binary data and MultinomialNB assumes count data. (BernoulliNB and MultinomialNB are mostly used in text data
classification) - MultinomialNB and BernoulliNB have a single parameter, alpha, which controls complexity. Alpha increases many virtual data points that have positive values for all the feature. GaussianNB is mostly used on high-dimensional data.
- Naive Bayes models share many strengths and weakness of the linear
models.- Fast to train and to predict. Work well with high-dimensional
sparse data and relatively robust to the parameters. Great
baseline models and often used on very large datasets.
- Fast to train and to predict. Work well with high-dimensional