朴素贝叶斯代码

最新推荐文章于 2021-02-19 10:14:57 发布

ZiHuiJin

最新推荐文章于 2021-02-19 10:14:57 发布

阅读量352

点赞数

文章标签： python 机器学习 nlp

本文链接：https://blog.csdn.net/ZiHuiJin/article/details/112994637

版权

文档分类例子地址

计算文档中 tf-idf

分词

多项式贝叶斯分类器(MultinomialNB)

高斯朴素贝叶斯 (GaussianNB)

伯努利朴素贝叶斯(BernoulliNB)

文档分类例子地址

https://github.com/cystanford/text_classification

计算文档中 tf-idf

from sklearn.feature_extraction.text import TfidfVectorizer
# stop_words - 停用词 ,token_pattern - 正则规则
# TfidfVectorizer(stop_words=stop_words, token_pattern=token_pattern)
tfidf_vec = TfidfVectorizer(stop_words=['is'])
documents = [
    'this is the bayes document',
    'this is the second second document',
    'and the third one',
    'is this the document'
]
tfidf_matrix = tfidf_vec.fit_transform(documents)
print('不重复的词:', tfidf_vec.get_feature_names())
print('每个单词的ID:', tfidf_vec.vocabulary_)
print('每个单词的tfidf值:', tfidf_