贝叶斯
贝叶斯判定准则
为最小化总体风险,只需要在每个样本上选择那个能使条件风险
R
(
c
∣
x
)
R(c|x)
R(c∣x)最小的类别标记,即:
h
∗
(
x
)
=
arg min
c
∈
Y
R
(
c
∣
x
)
(式1)
h^*{(x)}=\argmin\limits_{c\in{\mathcal Y}}R(c|x)\tag{式1}
h∗(x)=c∈YargminR(c∣x)(式1)
此时,
h
∗
(
x
)
h^*(x)
h∗(x)称为贝叶斯最优分类器。
已知,条件风险
R
(
c
∣
x
)
R(c|x)
R(c∣x)的计算公式为:
R
(
c
i
∣
x
)
=
∑
j
=
1
N
λ
i
j
P
(
c
j
∣
x
)
(式2)
R(c_i|x)=\sum_{j=1}^{N}\lambda_{ij}P(c_j|x)\tag{式2}
R(ci∣x)=j=1∑NλijP(cj∣x)(式2)
如若目标是最小化分类错误率,则误判损失
λ
i
j
\lambda_{ij}
λij对应为0/1损失,即:
λ
i
j
=
{
0
,
i
f
i
=
j
1
,
o
t
h
e
r
w
i
s
e
(式3)
\begin{aligned} \lambda_{ij}= \begin{cases} 0,\qquad &{if\quad i=j}\\ 1,&otherwise \end{cases} \end{aligned}\tag{式3}
λij={0,1,ifi=jotherwise(式3)
那么条件风险
R
(
c
∣
x
)
R(c|x)
R(c∣x)的计算公式进一步展开为:
R
(
c
i
∣
x
)
=
1
⋅
P
(
c
1
∣
x
)
+
⋯
+
1
⋅
P
(
c
i
−
1
∣
x
)
+
0
⋅
P
(
c
i
∣
x
)
+
1
⋅
P
(
c
i
+
1
∣
x
)
+
⋯
+
1
⋅
P
(
c
N
∣
x
)
=
P
(
c
1
∣
x
)
+
⋯
+
P
(
c
i
−
1
∣
x
)
+
P
(
c
i
+
1
∣
x
)
+
⋯
+
P
(
c
N
∣
x
)
(式4)
\begin{aligned} R(c_i|x)&=1\cdot P(c_1|x)+\cdots +1\cdot P(c_{i-1}|x)+0\cdot P(c_i|x)\\ &+1\cdot P(c_{i+1}|x)+\cdots+1\cdot P(c_N|x)\\ &=P(c_1|x)+\cdots+P(c_{i-1}|x)+P(c_{i+1}|x)+\cdots +P(c_N|x)\tag{式4} \end{aligned}
R(ci∣x)=1⋅P(c1∣x)+⋯+1⋅P(ci−1∣x)+0⋅P(ci∣x)+1⋅P(ci+1∣x)+⋯+1⋅P(cN∣x)=P(c1∣x)+⋯+P(ci−1∣x)+P(ci+1∣x)+⋯+P(cN∣x)(式4)
由于
∑
j
=
1
N
P
(
c
j
∣
x
)
=
1
\sum_{j=1}^{N}P(c_j|x)=1
∑j=1NP(cj∣x)=1,所以有:
R
(
c
i
∣
x
)
=
1
−
P
(
c
i
∣
x
)
(式5)
R(c_i|x)=1-P(c_i|x)\tag{式5}
R(ci∣x)=1−P(ci∣x)(式5)
于是呢,最小化错误率的贝叶斯最优分类器就是:
h
∗
(
x
)
=
arg min
c
∈
Y
R
(
c
∣
x
)
=
arg min
c
∈
Y
(
1
−
P
(
c
∣
x
)
)
=
arg max
c
∈
Y
P
(
c
∣
x
)
(式6)
h^*(x)=\argmin\limits_{c\in{\mathcal{Y}}}R(c|x)=\argmin\limits_{c\in{\mathcal{Y}}}(1-P(c|x))=\argmax\limits_{c\in{\mathcal{Y}}}P(c|x)\tag{式6}
h∗(x)=c∈YargminR(c∣x)=c∈Yargmin(1−P(c∣x))=c∈YargmaxP(c∣x)(式6)
多元正态度分布参数的极大似然估计
已知对数似然函数为:
L
L
(
θ
c
)
=
∑
x
∈
D
c
l
o
g
P
(
x
∣
θ
c
)
(式7)
LL(\theta_c)=\sum_{x\in{D_c}}logP(x|\theta_c)\tag{式7}
LL(θc)=x∈Dc∑logP(x∣θc)(式7)
为了便于计算,令
l
o
g
log
log的底数为
e
e
e,则对数似然函数为:
L
L
(
θ
c
)
=
∑
x
∈
D
c
l
n
P
(
x
∣
θ
c
)
(式8)
LL(\theta_c)=\sum_{x\in{D_c}}lnP(x|\theta_c)\tag{式8}
LL(θc)=x∈Dc∑lnP(x∣θc)(式8)
由于
P
(
x
∣
θ
c
)
=
P
(
x
∣
c
)
∼
N
(
μ
c
,
σ
c
2
)
P(x|\theta_c)=P(x|c)\sim\mathcal{N}(\mu_c,\sigma_c^2)
P(x∣θc)=P(x∣c)∼N(μc,σc2)那么:
P
(
x
∣
θ
c
)
=
1
(
2
π
)
d
∣
Σ
c
∣
e
x
p
(
−
1
2
(
x
−
μ
c
)
T
Σ
c
−
1
(
x
−
μ
c
)
)
(式9)
P(x|\theta_c)=\cfrac{1}{\sqrt{(2\pi)^d{|\Sigma_c|}}}exp(-\cfrac{1}{2}(x-\mu_c)^T\Sigma_c^{-1}(x-\mu_c))\tag{式9}
P(x∣θc)=(2π)d∣Σc∣1exp(−21(x−μc)TΣc−1(x−μc))(式9)
其中,
d
d
d表示
x
x
x的维数,
Σ
c
=
σ
c
2
\Sigma_c=\sigma_c^2
Σc=σc2为对称正定协方差矩阵,
∣
Σ
c
∣
|\Sigma_c|
∣Σc∣表示行列式,将上式代入对数似然函数可得:
L
L
(
θ
c
)
=
∑
x
∈
D
c
l
n
[
1
(
2
π
)
d
∣
Σ
c
∣
e
x
p
(
−
1
2
(
x
−
μ
c
)
T
Σ
c
−
1
(
x
−
μ
c
)
)
]
(式10)
LL(\theta_c)=\sum_{x\in{D_c}}ln[\cfrac{1}{\sqrt{(2\pi)^d{|\Sigma_c|}}}exp(-\cfrac{1}{2}(x-\mu_c)^T\Sigma_c^{-1}(x-\mu_c))]\tag{式10}
LL(θc)=x∈Dc∑ln[(2π)d∣Σc∣1exp(−21(x−μc)TΣc−1(x−μc))](式10)
令
∣
D
c
=
N
∣
|D_c=N|
∣Dc=N∣,则对数似然函数化为:
L
L
(
θ
c
)
=
∑
x
=
1
N
l
n
[
1
(
2
π
)
d
∣
Σ
c
∣
e
x
p
(
−
1
2
(
x
i
−
μ
c
)
T
Σ
c
−
1
(
x
i
−
μ
c
)
)
]
=
∑
i
=
1
N
l
n
[
1
(
2
π
)
d
⋅
1
∣
Σ
c
∣
e
x
p
(
−
1
2
(
x
i
−
μ
c
)
T
Σ
c
−
1
(
x
i
−
μ
c
)
)
]
=
∑
i
=
1
N
{
l
n
1
(
2
π
)
d
+
l
n
1
∣
Σ
c
∣
+
l
n
[
e
x
p
(
−
1
2
(
x
i
−
μ
c
)
T
Σ
c
−
1
(
x
i
−
μ
c
)
)
]
}
=
∑
i
=
1
N
{
−
d
2
l
n
(
2
π
)
−
1
2
l
n
∣
Σ
c
∣
−
1
2
(
x
i
−
μ
c
)
T
Σ
c
−
1
(
x
i
−
μ
c
)
}
=
−
N
d
2
l
n
(
2
π
)
−
N
2
l
n
∣
Σ
c
∣
−
1
2
Σ
i
=
1
N
(
x
i
−
μ
c
)
T
Σ
c
−
1
(
x
i
−
μ
c
)
(式11)
\begin{aligned} LL(\theta_c)&=\sum_{x=1}^{N}ln[\cfrac{1}{\sqrt{(2\pi)^d{|\Sigma_c|}}}exp(-\cfrac{1}{2}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c))]\\ &=\sum_{i=1}^{N}ln[\cfrac{1}{\sqrt{(2\pi)^d}}\cdot \cfrac{1}{\sqrt{|\Sigma_c|}}exp(-\cfrac{1}{2}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c))]\\ &=\sum_{i=1}^{N}\{ln\cfrac{1}{\sqrt{(2\pi)^d}}+ln\cfrac{1}{\sqrt{|\Sigma_c|}}+ln[exp(-\cfrac{1}{2}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c))]\}\\ &=\sum_{i=1}^{N}\{-\cfrac{d}{2}ln(2\pi)-\cfrac{1}{2}ln|\Sigma_c|-\cfrac{1}{2}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c)\}\\ &=-\cfrac{Nd}{2}ln(2\pi)-\cfrac{N}{2}ln|\Sigma_c|-\cfrac{1}{2}\Sigma_{i=1}^{N}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c) \end{aligned}\tag{式11}
LL(θc)=x=1∑Nln[(2π)d∣Σc∣1exp(−21(xi−μc)TΣc−1(xi−μc))]=i=1∑Nln[(2π)d1⋅∣Σc∣1exp(−21(xi−μc)TΣc−1(xi−μc))]=i=1∑N{ln(2π)d1+ln∣Σc∣1+ln[exp(−21(xi−μc)TΣc−1(xi−μc))]}=i=1∑N{−2dln(2π)−21ln∣Σc∣−21(xi−μc)TΣc−1(xi−μc)}=−2Ndln(2π)−2Nln∣Σc∣−21Σi=1N(xi−μc)TΣc−1(xi−μc)(式11)
由于参数
θ
c
\theta_c
θc的极大似然估计
θ
^
c
\hat{\theta}_c
θ^c为;
θ
^
c
=
arg min
θ
c
L
L
(
θ
c
)
(式12)
\hat{\theta}_c=\argmin\limits_{\theta_c}LL(\theta_c)\tag{式12}
θ^c=θcargminLL(θc)(式12)
所以下面只需求出使得对数似然函数
L
L
(
θ
c
)
LL(\theta_c)
LL(θc)取到最大值的
μ
^
c
\hat{\mu}_c
μ^c和
∑
^
c
\hat{\sum}_c
∑^c,就求出了
θ
^
c
\hat{\theta}_c
θ^c。
对
L
L
(
θ
c
)
LL(\theta_c)
LL(θc)关于
μ
c
\mu_c
μc求偏导:
∂
L
L
(
θ
c
)
∂
μ
c
=
∂
∂
μ
c
[
−
N
d
2
l
n
(
2
π
)
−
N
2
l
n
∣
Σ
c
∣
−
1
2
∑
i
=
1
N
(
x
i
−
μ
c
)
T
Σ
c
−
1
(
x
i
−
μ
c
)
]
=
∂
∂
μ
c
[
−
1
2
∑
i
=
1
N
(
x
i
−
μ
c
)
T
Σ
c
−
1
(
x
i
−
μ
c
)
]
]
=
−
1
2
∑
i
=
1
N
∂
∂
μ
c
[
(
x
i
−
μ
c
)
T
Σ
c
−
1
(
x
i
−
μ
c
)
]
=
−
1
2
∑
i
=
1
N
∂
∂
μ
c
[
(
x
i
T
−
μ
c
T
)
Σ
c
−
1
(
x
i
−
μ
c
)
]
=
−
1
2
∑
i
=
1
N
∂
∂
μ
c
[
(
x
i
T
−
μ
c
T
)
(
Σ
c
−
1
x
i
−
Σ
c
−
1
μ
c
)
]
=
−
1
2
∑
i
=
1
N
∂
∂
μ
c
[
x
i
T
Σ
c
−
1
x
i
−
x
i
T
Σ
c
T
μ
c
−
μ
c
T
Σ
c
−
1
x
i
+
μ
c
T
Σ
c
−
1
μ
c
]
(式13)
\begin{aligned} \cfrac{\partial{LL(\theta_c)}}{\partial{\mu_c}}&=\cfrac{\partial}{\partial{\mu_c}}[-\cfrac{Nd}{2}ln(2\pi)-\cfrac{N}{2}ln|\Sigma_c|-\cfrac{1}{2}\sum_{i=1}^{N}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c)]\\ &=\cfrac{\partial}{\partial{\mu_c}}[-\cfrac{1}{2}\sum_{i=1}^{N}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c)]]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}\cfrac{\partial}{\partial{\mu_c}}[(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c)]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}\cfrac{\partial}{\partial{\mu_c}}[(x_i^T-\mu_c^T)\Sigma_c^{-1}(x_i-\mu_c)]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}\cfrac{\partial}{\partial{\mu_c}}[(x_i^T-\mu_c^T)(\Sigma_c^{-1}x_i-\Sigma_c^{-1}\mu_c)]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}\cfrac{\partial}{\partial{\mu_c}}[x_i^T\Sigma_c^{-1}x_i-x_i^T\Sigma_c^T\mu_c-\mu_c^T\Sigma_c^{-1}x_i+\mu_c^T\Sigma_c^{-1}\mu_c] \end{aligned}\tag{式13}
∂μc∂LL(θc)=∂μc∂[−2Ndln(2π)−2Nln∣Σc∣−21i=1∑N(xi−μc)TΣc−1(xi−μc)]=∂μc∂[−21i=1∑N(xi−μc)TΣc−1(xi−μc)]]=−21i=1∑N∂μc∂[(xi−μc)TΣc−1(xi−μc)]=−21i=1∑N∂μc∂[(xiT−μcT)Σc−1(xi−μc)]=−21i=1∑N∂μc∂[(xiT−μcT)(Σc−1xi−Σc−1μc)]=−21i=1∑N∂μc∂[xiTΣc−1xi−xiTΣcTμc−μcTΣc−1xi+μcTΣc−1μc](式13)
由于
x
i
T
Σ
c
−
1
μ
c
x_i^T\Sigma_c^{-1}\mu_c
xiTΣc−1μc的计算结果为标量,所以有:
x
i
T
Σ
c
−
1
μ
c
=
(
x
i
T
Σ
c
−
1
μ
c
)
T
=
μ
c
T
(
Σ
c
−
1
)
T
x
i
=
μ
c
T
(
Σ
c
T
)
−
1
x
i
=
μ
c
T
Σ
c
−
1
x
i
(式14)
x_i^T\Sigma_c^{-1}\mu_c=(x_i^T\Sigma_c^{-1}\mu_c)^T=\mu_c^T(\Sigma_c^{-1})^Tx_i=\mu_c^T(\Sigma_c^T)^{-1}x_i=\mu_c^T\Sigma_c^{-1}x_i\tag{式14}
xiTΣc−1μc=(xiTΣc−1μc)T=μcT(Σc−1)Txi=μcT(ΣcT)−1xi=μcTΣc−1xi(式14)
所以(式13)可以进一步化为:
∂
L
L
(
θ
c
)
∂
μ
c
=
−
1
2
∑
i
=
1
N
∂
∂
μ
c
[
x
i
T
Σ
c
−
1
x
i
−
2
x
i
T
Σ
c
−
1
μ
c
+
μ
c
T
Σ
c
−
1
μ
c
]
(式15)
\cfrac{\partial{LL(\theta_c)}}{\partial{\mu_c}}= -\cfrac{1}{2}\sum_{i=1}^{N}\cfrac{\partial}{\partial{\mu_c}}[x_i^T\Sigma_c^{-1}x_i-2x_i^T\Sigma_c^{-1}\mu_c+\mu_c^T\Sigma_c^{-1}\mu_c]\tag{式15}
∂μc∂LL(θc)=−21i=1∑N∂μc∂[xiTΣc−1xi−2xiTΣc−1μc+μcTΣc−1μc](式15)
由矩阵微分公式:
∂
a
T
x
∂
x
=
a
,
∂
x
T
β
x
∂
x
=
(
β
+
β
T
)
x
(式16)
\cfrac{\partial a^T x}{\partial x}=a,\quad \cfrac{\partial x^T \beta x}{\partial x}=(\beta+\beta^T)x\tag{式16}
∂x∂aTx=a,∂x∂xTβx=(β+βT)x(式16)
可以得到;
∂
L
L
(
θ
c
)
∂
μ
c
=
−
1
2
∑
i
=
1
N
[
0
−
(
2
x
i
T
Σ
c
−
1
)
T
+
(
Σ
c
−
1
+
Σ
c
−
1
)
T
μ
c
]
=
−
1
2
∑
i
=
1
N
[
−
(
2
(
Σ
c
−
1
)
T
x
i
)
+
(
Σ
c
−
1
+
Σ
c
−
1
)
T
μ
c
]
=
−
1
2
∑
i
=
1
N
[
−
(
2
Σ
c
−
1
x
i
)
+
2
Σ
c
−
1
μ
c
]
=
∑
i
=
1
N
Σ
c
−
1
x
i
−
N
Σ
c
−
1
μ
c
(式17)
\begin{aligned} \cfrac{\partial LL(\theta_c)}{\partial \mu_c}&= -\cfrac{1}{2}\sum_{i=1}^{N}[0-(2x_i^T\Sigma_c^{-1})^T+(\Sigma_c^{-1}+{\Sigma_c^{-1})}^T\mu_c]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}[-(2(\Sigma_c^{-1})^T x_i)+(\Sigma_c^{-1}+{\Sigma_c^{-1})}^T\mu_c]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}[-(2\Sigma_c^{-1}x_i)+2\Sigma_c^{-1}\mu_c]\\ &=\sum_{i=1}^{N}\Sigma_c^{-1}x_i-N\Sigma_c^{-1}\mu_c \end{aligned}\tag{式17}
∂μc∂LL(θc)=−21i=1∑N[0−(2xiTΣc−1)T+(Σc−1+Σc−1)Tμc]=−21i=1∑N[−(2(Σc−1)Txi)+(Σc−1+Σc−1)Tμc]=−21i=1∑N[−(2Σc−1xi)+2Σc−1μc]=i=1∑NΣc−1xi−NΣc−1μc(式17)
令偏导数为0,得到:
∂
L
L
(
θ
c
)
∂
μ
c
=
∑
i
=
1
N
Σ
c
−
1
x
i
−
N
Σ
c
−
1
μ
c
=
0
⟹
∑
i
=
1
N
Σ
c
−
1
x
i
=
N
Σ
c
−
1
μ
c
⟹
Σ
c
−
1
∑
i
=
1
N
x
i
=
N
Σ
c
−
1
μ
c
⟹
N
μ
c
=
∑
i
=
1
N
x
i
⟹
μ
c
=
1
N
∑
i
=
1
N
x
i
(式18)
\begin{aligned} \cfrac{\partial LL(\theta_c)}{\partial \mu_c}&=\sum_{i=1}^{N}\Sigma_c^{-1}x_i-N\Sigma_c^{-1}\mu_c=0\\ &\Longrightarrow\sum_{i=1}^{N}\Sigma_c^{-1}x_i=N\Sigma_c^{-1}\mu_c\\ &\Longrightarrow\Sigma_c^{-1}\sum_{i=1}^{N}x_i=N\Sigma_c^{-1}\mu_c\\ &\Longrightarrow N\mu_c = \sum_{i=1}^{N}x_i\\ &\Longrightarrow \mu_c = \cfrac{1}{N}\sum_{i=1}^{N}x_i \end{aligned}\tag{式18}
∂μc∂LL(θc)=i=1∑NΣc−1xi−NΣc−1μc=0⟹i=1∑NΣc−1xi=NΣc−1μc⟹Σc−1i=1∑Nxi=NΣc−1μc⟹Nμc=i=1∑Nxi⟹μc=N1i=1∑Nxi(式18)
同样的,对
L
L
(
θ
c
)
LL(\theta_c)
LL(θc)关于
Σ
c
\Sigma_c
Σc求偏导得到:
Σ
c
=
1
N
∑
i
=
1
N
(
x
i
−
μ
c
)
(
x
i
−
μ
c
)
T
(式19)
\Sigma_c = \cfrac{1}{N}\sum_{i=1}^{N}(x_i-\mu_c)(x_i-\mu_c)^T\tag{式19}
Σc=N1i=1∑N(xi−μc)(xi−μc)T(式19)
最小化分类错误率的贝叶斯最优分类器为:
h
∗
(
x
)
=
arg max
c
∈
Y
P
(
c
∣
x
)
(式20)
h^*(x)=\argmax\limits_{c\in\mathcal{Y}}P(c|x)\tag{式20}
h∗(x)=c∈YargmaxP(c∣x)(式20)
又由贝叶斯定理可以知道:
P
(
c
∣
x
)
=
P
(
x
,
c
)
P
(
x
)
=
P
(
c
)
P
(
x
∣
c
)
P
(
x
)
(式21)
P(c|x)=\cfrac{P(x,c)}{P(x)}=\cfrac{P(c)P(x|c)}{P(x)}\tag{式21}
P(c∣x)=P(x)P(x,c)=P(x)P(c)P(x∣c)(式21)
所以:
h
∗
(
x
)
=
arg max
c
∈
Y
P
(
c
)
P
(
x
∣
c
)
P
(
x
)
=
arg max
c
∈
Y
P
(
c
)
P
(
x
∣
c
)
(式22)
h^*(x)=\argmax\limits_{c\in{\mathcal{Y}}}\cfrac{P(c)P(x|c)}{P(x)}=\argmax\limits_{c\in\mathcal{Y}}P(c)P(x|c)\tag{式22}
h∗(x)=c∈YargmaxP(x)P(c)P(x∣c)=c∈YargmaxP(c)P(x∣c)(式22)
又由属性条件独立性假设:
P
(
x
∣
c
)
=
P
(
x
1
,
x
2
,
⋯
,
x
d
∣
c
)
=
∏
i
=
1
d
P
(
x
i
∣
c
)
(式23)
P(x|c) = P(x_1,x_2,\cdots,x_d|c) = \prod_{i=1}^{d}P(x_i|c)\tag{式23}
P(x∣c)=P(x1,x2,⋯,xd∣c)=i=1∏dP(xi∣c)(式23)
所以:
h
∗
(
x
)
=
arg max
c
∈
Y
P
(
c
)
∏
i
=
1
d
P
(
x
i
∣
c
)
(式24)
h^*(x)=\argmax\limits_{c\in\mathcal{Y}}P(c)\prod_{i=1}^{d}P(x_i|c)\tag{式24}
h∗(x)=c∈YargmaxP(c)i=1∏dP(xi∣c)(式24)
这个就是朴素贝叶斯分类器的表达式。
对于
P
(
c
)
P(c)
P(c),表示的是样本空间中各类样本所占的比例,根据大数定律,当训练集包含充足的度量同分布样本的时候,
P
(
c
)
P(c)
P(c)可以通过各类样本的频率来进行估计,即:
P
(
c
)
=
∣
D
c
∣
∣
D
∣
(式25)
P(c)=\cfrac{|D_c|}{|D|}\tag{式25}
P(c)=∣D∣∣Dc∣(式25)
其中,
D
D
D表示训练集,
∣
D
∣
|D|
∣D∣表示样本数,
D
c
D_c
Dc表示训练集中第
c
c
c类样本的数量组成的集合,
∣
D
c
∣
|D_c|
∣Dc∣表示集合
D
c
D_c
Dc的样本个数。
贝叶斯分类器python应用
# 导入乳腺肿瘤数据
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
# 打印处数据的keys
print(cancer.keys())
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
# 打印数据集中的标注好的肿瘤分类
print("肿瘤的分类:",cancer['target_names'])
print("肿瘤的特征:",cancer['feature_names'])
肿瘤的分类: ['malignant' 'benign']
肿瘤的特征: ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
'mean smoothness' 'mean compactness' 'mean concavity'
'mean concave points' 'mean symmetry' 'mean fractal dimension'
'radius error' 'texture error' 'perimeter error' 'area error'
'smoothness error' 'compactness error' 'concavity error'
'concave points error' 'symmetry error' 'fractal dimension error'
'worst radius' 'worst texture' 'worst perimeter' 'worst area'
'worst smoothness' 'worst compactness' 'worst concavity'
'worst concave points' 'worst symmetry' 'worst fractal dimension']
可见,肿瘤的分类分为:恶性(Malignant),良性(benign),特征值有很多。
# 将数据集的数值和分类目标赋值给X,y
X, y = cancer.data, cancer.target
# 导入数据拆分工具
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=38)
# 查看数据形态
print("训练集形态:", X_train.shape)
print("测试集形态:", X_test.shape)
训练集形态: (426, 30)
测试集形态: (143, 30)
# 导入高斯朴素贝叶斯
from sklearn.naive_bayes import GaussianNB
# 进行拟合数据
gnb = GaussianNB()
gnb.fit(X_train, y_train)
# 打印模型得分
print("模型得分:{:.3f}".format(gnb.score(X_test, y_test)))
模型得分:0.944