因为
P
(
A
∣
B
)
=
P
(
A
⋂
B
)
P
(
B
)
=
>
P
(
A
⋂
B
)
=
P
(
A
∣
B
)
⋅
P
(
B
)
P(A | B) = \frac{P(A \bigcap B)}{P(B)} => P(A \bigcap B) = P(A | B) \cdot P(B)
P(A∣B)=P(B)P(A⋂B)=>P(A⋂B)=P(A∣B)⋅P(B)
P
(
B
∣
A
)
=
P
(
A
⋂
B
)
P
(
A
)
=
>
P
(
A
⋂
B
)
=
P
(
B
∣
A
)
⋅
P
(
A
)
P(B | A) = \frac{P(A \bigcap B)}{P(A)} => P(A \bigcap B) = P(B | A) \cdot P(A)
P(B∣A)=P(A)P(A⋂B)=>P(A⋂B)=P(B∣A)⋅P(A)
P
(
A
∣
B
)
⋅
P
(
B
)
=
P
(
B
∣
A
)
⋅
P
(
A
)
P(A | B) \cdot P(B) = P(B | A) \cdot P(A)
P(A∣B)⋅P(B)=P(B∣A)⋅P(A)
所以
贝
叶
斯
公
式
:
P
(
A
∣
B
)
=
P
(
B
∣
A
)
⋅
P
(
A
)
P
(
B
)
贝叶斯公式: P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)}
贝叶斯公式:P(A∣B)=P(B)P(B∣A)⋅P(A)
1. 导包
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
2. 原始数据
# fetch_20newsgroups新闻数据下载需要一定时间
news = datasets.fetch_20newsgroups(subset="all")
X = news.data
y = news.target
print(X[0:5])print(y[0:5])
# 数据示例
["From: Mamatha Devineni Ratnam <mr47+@andrew.cmu.edu>\nSubject: Pens fans reactions\nOrganization: Post Office, Carnegie Mellon, ...... I was very disappointed not to see the Islanders lose the final\nregular season game. PENS RULE!!!\n\n"]
# 新闻分类示例
[10 3 17 3 4]