欢迎关注公众号K的笔记阅读博主更多优质学习内容
前一篇内容:机器学习入门+实战初级(一)—— 线性回归
注:本文假定读者已有概率论基础
-
数据设定:我们首先假定我们已有的数据为: ( x 1 ( 1 ) , x 2 ( 1 ) , … x n ( 1 ) , y 1 ) , ( x 1 ( 2 ) , x 2 ( 2 ) , … x n ( 2 ) , y 2 ) , … ( x 1 ( m ) , x 2 ( m ) , … x n ( m ) , y m ) \left(x_{1}^{(1)}, x_{2}^{(1)}, \ldots x_{n}^{(1)}, y_{1}\right),\left(x_{1}^{(2)}, x_{2}^{(2)}, \ldots x_{n}^{(2)}, y_{2}\right), \ldots\left(x_{1}^{(m)}, x_{2}^{(m)}, \ldots x_{n}^{(m)}, y_{m}\right) (x1(1),x2(1),…xn(1),y1),(x1(2),x2(2),…xn(2),y2),…(x1(m),x2(m),…xn(m),ym)
即我们有 m m m 个样本,每个样本有 n n n 个特征,特征输出有 K K K 个标签,定义为 C 1 , C 2 , … , C K C_{1}, C_{2}, \ldots, C_{K} C1,C2,…,CK -
贝叶斯定理: P ( B ∣ A ) = P ( A ∣ B ) ∗ P ( B ) P ( A ) P(B | A)=\frac{P(A | B) * P(B)}{P(A)} P(B∣A)=P(A)P(A∣B)∗P(B)
-
贝叶斯公式:
P ( B ∣ A ) = P ( B ) ∗ P ( A ∣ B ) P ( A ) = P ( B ) ∗ P ( A ∣ B ) ∑ i = 1 n P ( B i ) ∗ P ( A ∣ B i ) P(B | A)=P(B) * \frac{P(A | B)}{P(A)}=P(B) * \frac{P(A | B)}{\sum_{i=1}^{n} P\left(B_{i}\right) * P\left(A | B_{i}\right)} P(B∣A)=P(B)∗P(A)P(A∣B)=P(B)∗∑i=1nP(Bi)∗P(A∣Bi)P(A∣B)
即后验概率 = 先验概率 * 似然估计
朴素贝叶斯模型
我们容易得到一个结论:如果似然估计
P
(
B
∣
A
)
P
(
B
)
>
1
\frac{P(B | A)}{P(B)}>1
P(B)P(B∣A)>1,那么表示A事件的发生提高了B事件发生的概率。相反的,如果似然估计
P
(
B
∣
A
)
P
(
B
)
<
1
\frac{P(B | A)}{P(B)}<1
P(B)P(B∣A)<1,那么表示A事件的发生降低了B事件发生的概率。
我们预测的类别
C
result
C_{\text {result}}
Cresult 是使
P
(
Y
=
C
k
∣
X
=
X
(
t
e
s
t
)
)
P\left(Y=C_{k} | X=X^{(t e s t)}\right)
P(Y=Ck∣X=X(test)) 最大的类别:
C
result
=
argmax
⏟
C
k
P
(
Y
=
C
k
∣
X
=
X
(
test
)
)
=
argmax
⏟
C
k
P
(
X
=
X
(
test
)
∣
Y
=
C
k
)
P
(
Y
=
C
k
)
/
P
(
X
=
X
(
test
)
)
\begin{aligned} C_{\text {result}} &=\underbrace{\operatorname{argmax}}_{C_{k}} P\left(Y=C_{k} | X=X^{(\text {test})}\right) \\ &=\underbrace{\operatorname{argmax}}_{C_{k}} P\left(X=X^{(\text {test})} | Y=C_{k}\right) P\left(Y=C_{k}\right) / P\left(X=X^{(\text {test})}\right) \end{aligned}
Cresult=Ck
argmaxP(Y=Ck∣X=X(test))=Ck
argmaxP(X=X(test)∣Y=Ck)P(Y=Ck)/P(X=X(test))
分析上式可知分母
P
(
X
=
X
(
t
e
s
t
)
)
P\left(X=X^{(t e s t)}\right)
P(X=X(test)) ) 是固定值,因此预测公式可以简化为:
C
r
e
s
u
l
t
=
argmax
⏟
C
k
P
(
X
=
X
(
t
e
s
t
)
∣
Y
=
C
k
)
P
(
Y
=
C
k
)
C_{r e s u l t}=\underbrace{\operatorname{argmax}}_{C_{k}} P\left(X=X^{(t e s t)} | Y=C_{k}\right) P\left(Y=C_{k}\right)
Cresult=Ck
argmaxP(X=X(test)∣Y=Ck)P(Y=Ck)
接着我们利用朴素贝叶斯的独立性假设,就可以得到通常意义上的朴素贝叶斯推断公式:
C
result
=
argmax
⏟
C
k
P
(
Y
=
C
k
)
∏
j
=
1
n
P
(
X
j
=
X
j
(
t
e
s
t
)
∣
Y
=
C
k
)
C_{\text {result}}=\underbrace{\operatorname{argmax}}_{C_{k}} P\left(Y=C_{k}\right) \prod_{j=1}^{n} P\left(X_{j}=X_{j}^{(t e s t)} | Y=C_{k}\right)
Cresult=Ck
argmaxP(Y=Ck)j=1∏nP(Xj=Xj(test)∣Y=Ck)
在朴素贝叶斯算法中,我们需要估计
P
(
Y
=
C
k
)
P\left(Y=C_{k}\right)
P(Y=Ck) 和
P
(
X
j
=
X
j
(
t
e
s
t
)
∣
Y
=
C
k
)
P\left(X_{j}=X_{j}^{(t e s t)} | Y=C_{k}\right)
P(Xj=Xj(test)∣Y=Ck) 。具体方法可以使用极大似然估计法:
先验概率
P
(
Y
=
C
k
)
P\left(Y=C_{k}\right)
P(Y=Ck) 的极大似然估计是:
P
(
Y
=
C
k
)
=
∑
i
=
1
N
I
(
y
i
=
C
k
)
N
,
k
=
1
,
2
,
…
K
P\left(Y=C_{k}\right)=\frac{\sum_{i=1}^{N} I\left(y_{i}=C_{k}\right)}{N}, k=1,2, \ldots K
P(Y=Ck)=N∑i=1NI(yi=Ck),k=1,2,…K
其中
I
(
y
i
=
C
k
)
=
{
1
y
i
=
C
k
0
y
i
≠
C
k
I\left(y_{i}=C_{k}\right)=\left\{\begin{array}{ll} 1 & y_{i}=C_{k} \\ 0 & y_{i} \neq C_{k} \end{array}\right.
I(yi=Ck)={10yi=Ckyi=Ck
第 j个特征 X (j) 可能的取值集合为
(
a
j
1
,
a
j
2
,
…
a
j
s
)
,
\left(a_{j 1}, a_{j 2}, \ldots a_{j s}\right),
(aj1,aj2,…ajs), 似然函数
P
(
X
(
j
)
=
a
j
l
∣
Y
=
C
k
)
=
∑
i
=
1
N
I
(
X
(
j
)
=
a
j
l
,
y
i
=
C
k
)
I
(
y
i
=
C
k
)
P\left(X^{(j)}=a_{j l} | Y=C_{k}\right)=\frac{\sum_{i=1}^{N} I\left(X^{(j)}=a_{j l}, y_{i}=C_{k}\right)}{I\left(y_{i}=C_{k}\right)}
P(X(j)=ajl∣Y=Ck)=I(yi=Ck)∑i=1NI(X(j)=ajl,yi=Ck)
j
=
1
,
2
,
…
N
,
l
=
1
,
2
,
…
S
,
k
=
1
,
2
,
…
K
j=1,2, \ldots N, l=1,2, \ldots S, k=1,2, \ldots K
j=1,2,…N,l=1,2,…S,k=1,2,…K
即
C
k
C_{k}
Ck 标签中,第
j
j
j 个特征
X
(
j
)
X^{(j)}
X(j) 中各种取值的次数在
C
k
C_{k}
Ck 标签出现总次数中的占比。
朴素贝叶斯 Python 应用
朴素贝叶斯 sklearn 具体形式为:
class sklearn.naive_bayes.MultinomialNB(alpha=1.0, fit_prior=True, class_prior=None)
函数集合为:
Method | Explain |
---|---|
fit(self, X, y[, sample_weight]) | Fit Naive Bayes classifier according to X, y |
get_params(self[, deep]) | Get parameters for this estimator. |
partial_fit(self, X, y[, classes, sample_weight]) | Incremental fit on a batch of samples. |
predict(self, X) | Perform classification on an array of test vectors X. |
predict_log_proba(self, X) | Return log-probability estimates for the test vector X. |
predict_proba(self, X) | Return probability estimates for the test vector X. |
score(self, X, y[, sample_weight]) | Return the mean accuracy on the given test data and labels. |
set_params(self, **params) | Set the parameters of this estimator. |
官网样例:
>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import MultinomialNB
>>> clf = MultinomialNB()
>>> clf.fit(X, y)
MultinomialNB()
>>> print(clf.predict(X[2:3]))
[3]