【西瓜书笔记】6.极大似然估计与朴素贝叶斯

6.1 贝叶斯判定准则

贝叶斯判定准则:
为最小化总体风险,只需在每个样本上选择那个能使条件风险 R ( c ∣ x ) R(c \mid x) R(cx)最小的类别标记,即
h ∗ ( x ) = arg ⁡ min ⁡ c ∈ Y R ( c ∣ x ) h^{*}(\boldsymbol{x})=\underset{c \in \mathcal{Y}}{\arg \min } R(c \mid \boldsymbol{x}) h(x)=cYargminR(cx)
此时, h ∗ h^{*} h称为贝叶斯最优分类器

这里的R和 h ∗ h^{*} h针对的都是单个输入样本。也就是说,对于单个样本, h ∗ ( x ) h^{*}(\boldsymbol{x}) h(x)输出一个类别标记c,这个c使得R取到最小值

已知条件风险 R ( c ∣ x ) R(c \mid x) R(cx)的计算公式是
R ( c i ∣ x ) = ∑ j = 1 N λ i j P ( c j ∣ x ) R\left(c_{i} \mid x\right)=\sum_{j=1}^{N} \lambda_{i j} P\left(c_{j} \mid x\right) R(cix)=j=1NλijP(cjx)
跟西瓜书中一样,这里我们假设有N种可能的类别标记, Y = { c 1 , c 2 , … , c N } \mathcal{Y}=\left\{c_{1}, c_{2}, \ldots, c_{N}\right\} Y={c1,c2,,cN} λ i j \lambda_{ij} λij是将一个真是标记为 c j c_{j} cj的样本误分类为 c i c_i ci所产生的损失。若目标是最小化分类错误率,则误判损失 λ i j \lambda_{ij} λij对应为0/1损失,也即
λ i , j = { 0.  if  i = j 1.  otherwise  \lambda_{i, j}=\left\{\begin{array}{l} 0 . \text { if } i=j \\ 1 . \text { otherwise } \end{array}\right. λi,j={0. if i=j1. otherwise 
那么条件风险 R ( c ∣ x ) R(c \mid x) R(cx)的计算公式可以进一步展开为
R ( c i ∣ x ) = 1 × P ( c 1 ∣ x ) + … + 1 × P ( c i − 1 ∣ x ) + 0 × P ( c i ∣ x ) + 1 × P ( c i − 1 ∣ x ) + … + 1 × P ( c N ∣ x ) = P ( c 1 ∣ x ) + … + P ( c i − 1 ∣ x ) + P ( c i − 1 ∣ x ) + … + P ( c N ∣ x ) \begin{aligned} R\left(c_{i} \mid \boldsymbol{x}\right) &=1 \times P\left(c_{1} \mid \boldsymbol{x}\right)+\ldots+1 \times P\left(c_{i-1} \mid \boldsymbol{x}\right)+0 \times P\left(c_{i} \mid \boldsymbol{x}\right)+1 \times P\left(c_{i-1} \mid \boldsymbol{x}\right)+\ldots+1 \times P\left(c_{N} \mid \boldsymbol{x}\right) \\ &=P\left(c_{1} \mid \boldsymbol{x}\right)+\ldots+P\left(c_{i-1} \mid \boldsymbol{x}\right)+P\left(c_{i-1} \mid x\right)+\ldots+P\left(c_{N} \mid \boldsymbol{x}\right) \end{aligned} R(cix)=1×P(c1x)++1×P(ci1x)+0×P(cix)+1×P(ci1x)++1×P(cNx)=P(c1x)++P(ci1x)+P(ci1x)++P(cNx)
这里除了 λ i i = 0 \lambda_{ii}=0 λii=0,其余的 λ \lambda λ都等于1。又因为 ∑ j = 1 N P ( c j ∣ x ) = 1 \sum_{j=1}^{N} P\left(c_{j} \mid x\right)=1 j=1NP(cjx)=1, 所以
R ( c i ∣ x ) = 1 − P ( c i ∣ x ) R\left(c_{i} \mid x\right)=1-P\left(c_{i} \mid x\right) R(cix)=1P(cix)
也就是西瓜书式7.5。

于是,最小化错误率的贝叶斯最优分类器为
h ∗ ( x ) = arg ⁡ min ⁡ c ∈ Y R ( c ∣ x ) = arg ⁡ min ⁡ c ∈ Y ( 1 − P ( c ∣ x ) ) = arg ⁡ max ⁡ c ∈ Y P ( c ∣ x ) h^{*}(x)=\underset{c \in \mathcal{Y}}{\arg \min } R(c \mid x)=\underset{c \in \mathcal{Y}}{\arg \min }(1-P(c \mid x))=\underset{c \in \mathcal{Y}}{\arg \max } P(c \mid x) h(x)=cYargminR(cx)=cYargmin(1P(cx))=cYargmaxP(cx)

6.2 多元正态分布参数的极大似然估计

已知对数似然函数为
L L ( θ C ) = ∑ x ∈ D c log ⁡ P ( x ∣ θ C ) L L\left(\boldsymbol{\theta}_{C}\right)=\sum_{\boldsymbol{x} \in D_{c}} \log P\left(\boldsymbol{x} \mid \boldsymbol{\theta}_{C}\right) LL(θC)=xDclogP(xθC)
此为西瓜书式7.10

为了便于后续计算,我们令log的底数为e,则对数似然函数可化为
L L ( θ C ) = ∑ x ∈ D c ln ⁡ P ( x ∣ θ C ) L L\left(\boldsymbol{\theta}_{C}\right)=\sum_{\boldsymbol{x} \in D_{c}} \ln P\left(\boldsymbol{x} \mid \boldsymbol{\theta}_{C}\right) LL(θC)=xDclnP(xθC)
由于 P ( x ∣ θ c ) = P ( x ∣ c ) ∼ N ( μ c , σ c 2 ) P\left(x \mid \boldsymbol{\theta_{c}}\right)=P(x \mid c) \sim \mathcal{N}\left(\mu_{c}, \sigma_{c}^{2}\right) P(xθc)=P(xc)N(μc,σc2),那么
P ( x ∣ θ c ) = 1 ( 2 π ) d ∣ Σ c ∣ exp ⁡ ( − 1 2 ( x − μ c ) T Σ c − 1 ( x − μ c ) ) P\left(\boldsymbol{x} \mid \boldsymbol{\theta}_{c}\right)=\frac{1}{\sqrt{(2 \pi)^{d}\left|\boldsymbol{\Sigma}_{c}\right|}} \exp \left(-\frac{1}{2}\left(\boldsymbol{x}-\boldsymbol{\mu}_{c}\right)^{\mathrm{T}} \boldsymbol{\Sigma}_{c}^{-1}\left(\boldsymbol{x}-\boldsymbol{\mu}_{c}\right)\right) P(xθc)=(2π)dΣc 1exp(21(xμc)TΣc1(xμc))
其中,d表示 x \boldsymbol{x} x的维数 Σ C = σ C 2 \Sigma_{C}=\sigma_{C}^{2} ΣC=σC2为对称正定协方差矩阵, ∣ Σ c ∣ \left|\Sigma_{c}\right| Σc表示 Σ c \Sigma_{c} Σc的行列式,将上 式代入对数似然函数可得
L L ( θ c ) = ∑ x ∈ D c ln ⁡ [ 1 ( 2 π ) d ∣ Σ c ∣ exp ⁡ ( − 1 2 ( x − μ c ) T Σ c − 1 ( x − μ c ) ) ] L L\left(\boldsymbol{\theta}_{c}\right)=\sum_{\boldsymbol{x} \in D_{c}} \ln \left[\frac{1}{\sqrt{(2 \pi)^{d}\left|\boldsymbol{\Sigma}_{c}\right|}} \exp \left(-\frac{1}{2}\left(\boldsymbol{x}-\boldsymbol{\mu}_{c}\right)^{\mathrm{T}} \boldsymbol{\Sigma}_{c}^{-1}\left(\boldsymbol{x}-\boldsymbol{\mu}_{c}\right)\right)\right] LL(θc)=xDcln[(2π)dΣc 1exp(21(xμc)TΣc1(xμc))]
∣ D c ∣ = N \left|D_{c}\right|=N Dc=N,则对数似然函数可化为:
L L ( θ c ) = ∑ i = 1 N ln ⁡ [ 1 ( 2 π ) d ∣ Σ c ∣ exp ⁡ ( − 1 2 ( x i − μ c ) T Σ c − 1 ( x i − μ c ) ) ] = ∑ i = 1 N ln ⁡ [ 1 ( 2 π ) d ⋅ 1 ∣ Σ c ∣ ⋅ exp ⁡ ( − 1 2 ( x i − μ c ) T Σ c − 1 ( x i − μ c ) ) ] = ∑ i = 1 N { ln ⁡ 1 ( 2 π ) d + ln ⁡ 1 ∣ Σ c ∣ + ln ⁡ [ exp ⁡ ( − 1 2 ( x i − μ c ) T Σ c − 1 ( x i − μ c ) ) ] } = ∑ i = 1 N { − d 2 ln ⁡ ( 2 π ) − 1 2 ln ⁡ ∣ Σ c ∣ − 1 2 ( x i − μ c ) T Σ c − 1 ( x i − μ c ) } = − N d 2 ln ⁡ ( 2 π ) − N 2 ln ⁡ ∣ Σ c ∣ − 1 2 ∑ i = 1 N ( x i − μ c ) T Σ c − 1 ( x i − μ c ) \begin{aligned} L L\left(\boldsymbol{\theta}_{c}\right) &=\sum_{i=1}^{N} \ln \left[\frac{1}{\sqrt{(2 \pi)^{d}\left|\boldsymbol{\Sigma_c}\right|}} \exp \left(-\frac{1}{2}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{\mathrm{T}} \boldsymbol{\Sigma_c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\right)\right] \\ &=\sum_{i=1}^{N} \ln \left[\frac{1}{\sqrt{(2 \pi)^{d}}} \cdot \frac{1}{\sqrt{\left|\boldsymbol{\Sigma_c}\right|}} \cdot \exp \left(-\frac{1}{2}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{\mathrm{T}} \boldsymbol{\Sigma_c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\right)\right] \\ &=\sum_{i=1}^{N}\left\{\ln \frac{1}{\sqrt{(2 \pi)^{d}}}+\ln \frac{1}{\sqrt{\left|\boldsymbol{\Sigma_c}\right|}}+\ln \left[\exp \left(-\frac{1}{2}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{\mathrm{T}} \boldsymbol{\Sigma_c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\right)\right]\right\}\\ &=\sum_{i=1}^{N}\left\{-\frac{d}{2} \ln (2 \pi)-\frac{1}{2} \ln \left|\boldsymbol{\Sigma_c}\right|-\frac{1}{2}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{\mathrm{T}} \boldsymbol{\Sigma}_{c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\right\} \\ &=-\frac{N d}{2} \ln (2 \pi)-\frac{N}{2} \ln \left|\boldsymbol{\Sigma}_{c}\right|-\frac{1}{2} \sum_{i=1}^{N}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{\mathrm{T}} \boldsymbol{\Sigma}_{c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right) \end{aligned} LL(θc)=i=1Nln[(2π)dΣc 1exp(21(xiμc)TΣc1(xiμc))]=i=1Nln[(2π)d 1Σc 1exp(21(xiμc)TΣc1(xiμc))]=i=1N{ln(2π)d 1+lnΣc 1+ln[exp(21(xiμc)TΣc1(xiμc))]}=i=1N{2dln(2π)21lnΣc21(xiμc)TΣc1(xiμc)}=2Ndln(2π)2NlnΣc21i=1N(xiμc)TΣc1(xiμc)
由于参数 θ c \boldsymbol{\theta_{c}} θc的极大似然估计 θ ^ C \hat{\boldsymbol{\theta}}_{C} θ^C为:
θ ^ C = arg ⁡ max ⁡ θ c L L ( θ C ) \hat{\boldsymbol{\theta}}_{C}=\underset{\boldsymbol{\theta}_{c}}{\arg \max } L L\left(\boldsymbol{\theta}_{C}\right) θ^C=θcargmaxLL(θC)
所以接来下只需要求出使得对数似然函数 L L ( θ C ) L L\left(\boldsymbol{\theta}_{C}\right) LL(θC)取到最大值的 μ ^ c \hat{\mu}_{c} μ^c Σ ^ c \hat{\Sigma}_{c} Σ^c, 也就求出了 θ ^ c \hat{\theta}_{c} θ^c

L L ( θ C ) L L\left(\boldsymbol{\theta}_{C}\right) LL(θC)关于 μ c \mu_{c} μc,求偏导
∂ L L ( θ c ) ∂ μ c = ∂ ∂ μ c [ − N d 2 ln ⁡ ( 2 π ) − N 2 ln ⁡ ∣ Σ c ∣ − 1 2 ∑ i = 1 N ( x i − μ c ) T Σ c − 1 ( x i − μ c ) ] = ∂ ∂ μ c [ − 1 2 ∑ i = 1 N ( x i − μ c ) T Σ c − 1 ( x i − μ c ) ] = − 1 2 ∑ i = 1 N ∂ ∂ μ c [ ( x i − μ c ) T Σ c − 1 ( x i − μ c ) ] = − 1 2 ∑ i = 1 N ∂ ∂ μ c [ ( x i T − μ c T ) Σ c − 1 ( x i − μ c ) ] = − 1 2 ∑ i = 1 N ∂ ∂ μ c [ ( x i T − μ c T ) ( Σ c − 1 x i − Σ c − 1 μ c ) ] = − 1 2 ∑ i = 1 N ∂ ∂ μ c [ x i T Σ c − 1 x i − x i T Σ c − 1 μ c − μ c T Σ c − 1 x i + μ c T Σ c − 1 μ c ] \begin{aligned} \frac{\partial L L\left(\boldsymbol{\theta}_{c}\right)}{\partial \boldsymbol{\mu}_{c}} &=\frac{\partial}{\partial \boldsymbol{\mu}_{c}}\left[-\frac{N d}{2} \ln (2 \pi)-\frac{N}{2} \ln \left|\boldsymbol{\Sigma}_{c}\right|-\frac{1}{2} \sum_{i=1}^{N}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{\mathrm{T}} \boldsymbol{\Sigma}_{c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\right] \\ &=\frac{\partial}{\partial \boldsymbol{\mu}_{c}}\left[-\frac{1}{2} \sum_{i=1}^{N}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{\mathrm{T}} \boldsymbol{\Sigma}_{c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\right] \\ &=-\frac{1}{2} \sum_{i=1}^{N} \frac{\partial}{\partial \boldsymbol{\mu}_{c}}\left[\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{\mathrm{T}} \boldsymbol{\Sigma}_{c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\right] \\ &=-\frac{1}{2} \sum_{i=1}^{N} \frac{\partial}{\partial \boldsymbol{\mu}_{c}}\left[\left(\boldsymbol{x}_{i}^{T}-\boldsymbol{\mu}_{c}^{T}\right) \boldsymbol{\Sigma}_{c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\right]\\ &=-\frac{1}{2} \sum_{i=1}^{N} \frac{\partial}{\partial \boldsymbol{\mu}_{c}}\left[\left(\boldsymbol{x}_{i}^{T}-\boldsymbol{\mu}_{c}^{T}\right)\left(\boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{x}_{i}-\boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{\mu}_{c}\right)\right] \\ &=-\frac{1}{2} \sum_{i=1}^{N} \frac{\partial}{\partial \boldsymbol{\mu}_{c}}\left[\boldsymbol{x}_{i}^{T} \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{x}_{i}-\boldsymbol{x}_{i}^{T} \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{\mu}_{c}-\boldsymbol{\mu}_{c}^{T} \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{x}_{i}+\boldsymbol{\mu}_{c}^{T} \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{\mu}_{c}\right] \end{aligned} μcLL(θc)=μc[2Ndln(2π)2NlnΣc21i=1N(xiμc)TΣc1(xiμc)]=μc[21i=1N(xiμc)TΣc1(xiμc)]=21i=1Nμc[(xiμc)TΣc1(xiμc)]=21i=1Nμc[(xiTμcT)Σc1(xiμc)]=21i=1Nμc[(xiTμcT)(Σc1xiΣc1μc)]=21i=1Nμc[xiTΣc1xixiTΣc1μcμcTΣc1xi+μcTΣc1μc]
由于 x i T Σ c − 1 μ c \boldsymbol{x}_{i}^{T} \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{\mu}_{c} xiTΣc1μc的计算结果为标量,所以
x i T Σ c − 1 μ c = ( x i T Σ c − 1 μ c ) T = μ c T ( Σ c − 1 ) T x i = μ c T ( Σ c T ) − 1 x i = μ c T Σ c − 1 x i \boldsymbol{x}_{i}^{T} \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{\mu}_{c}=\left(\boldsymbol{x}_{i}^{T} \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{\mu}_{c}\right)^{T}=\boldsymbol{\mu}_{c}^{T}\left(\boldsymbol{\Sigma}_{c}^{-1}\right)^{T} \boldsymbol{x}_{i}=\boldsymbol{\mu}_{c}^{T}\left(\boldsymbol{\Sigma}_{c}^{T}\right)^{-1} \boldsymbol{x}_{i}=\boldsymbol{\mu}_{c}^{T} \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{x}_{i} xiTΣc1μc=(xiTΣc1μc)T=μcT(Σc1)Txi=μcT(ΣcT)1xi=μcTΣc1xi
于是上式可以进一步化为
∂ L L ( θ c ) ∂ μ c = − 1 2 ∑ i = 1 N ∂ ∂ μ c [ x i T Σ c − 1 x i − 2 x i T Σ c − 1 μ c + μ c T Σ c − 1 μ c ] \frac{\partial L L\left(\boldsymbol{\theta}_{c}\right)}{\partial \boldsymbol{\mu}_{c}}=-\frac{1}{2} \sum_{i=1}^{N} \frac{\partial}{\partial \boldsymbol{\mu}_{c}}\left[\boldsymbol{x}_{i}^{T} \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{x}_{i}-2 \boldsymbol{x}_{i}^{T} \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{\mu}_{c}+\boldsymbol{\mu}_{c}^{T} \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{\mu}_{c}\right] μcLL(θc)=21i=1Nμc[xiTΣc1xi2xiTΣc1μc+μcTΣc1μc]
由矩阵微分公式 ∂ a T x ∂ x = a \dfrac{\partial \boldsymbol{a}^{T} \boldsymbol{x}}{\partial \boldsymbol{x}}=\boldsymbol{a} xaTx=a, ∂ x T B x ∂ x = ( B + B T ) x \dfrac{\partial \boldsymbol{x}^{T} \boldsymbol{B} \boldsymbol{x}}{\partial \boldsymbol{x}}=\left(\boldsymbol{B}+\boldsymbol{B}^{T}\right) \boldsymbol{x} xxTBx=(B+BT)x可得
∂ L L ( θ c ) ∂ μ c = − 1 2 ∑ i = 1 N [ 0 − ( 2 x i T Σ c − 1 ) T + ( Σ c − 1 + ( Σ c − 1 ) T ) μ c ] = − 1 2 ∑ i = 1 N [ − ( 2 ( Σ c − 1 ) T x i ) + ( Σ c − 1 + ( Σ c − 1 ) T ) μ c ] = − 1 2 ∑ i = 1 N [ − ( 2 Σ c − 1 x i ) + 2 Σ c − 1 μ c ] = ∑ i = 1 N Σ c − 1 x i − N Σ c − 1 μ c \begin{aligned} \frac{\partial L L\left(\boldsymbol{\theta}_{c}\right)}{\partial \boldsymbol{\mu}_{c}} &=-\frac{1}{2} \sum_{i=1}^{N}\left[0-\left(2 \boldsymbol{x}_{i}^{T} \boldsymbol{\Sigma}_{c}^{-1}\right)^{T}+\left(\boldsymbol{\Sigma}_{c}^{-1}+\left(\boldsymbol{\Sigma}_{c}^{-1}\right)^{T}\right) \boldsymbol{\mu}_{c}\right] \\ &=-\frac{1}{2} \sum_{i=1}^{N}\left[-\left(2\left(\boldsymbol{\Sigma}_{c}^{-1}\right)^{T} \boldsymbol{x}_{i}\right)+\left(\boldsymbol{\Sigma}_{c}^{-1}+\left(\boldsymbol{\Sigma}_{c}^{-1}\right)^{T}\right) \boldsymbol{\mu}_{c}\right] \\ &=-\frac{1}{2} \sum_{i=1}^{N}\left[-\left(2 \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{x}_{i}\right)+2 \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{\mu}_{c}\right] \\ &=\sum_{i=1}^{N} \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{x}_{i}-N \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{\mu}_{c} \end{aligned} μcLL(θc)=21i=1N[0(2xiTΣc1)T+(Σc1+(Σc1)T)μc]=21i=1N[(2(Σc1)Txi)+(Σc1+(Σc1)T)μc]=21i=1N[(2Σc1xi)+2Σc1μc]=i=1NΣc1xiNΣc1μc
令偏导数等于0可得
∂ L L ( θ c ) ∂ μ c = ∑ i = 1 N Σ c − 1 x i − N Σ c − 1 μ c = 0 N Σ c − 1 μ c = ∑ i = 1 N Σ c − 1 x i N Σ c − 1 μ c = Σ c − 1 ∑ i = 1 N x i N μ c = ∑ i = 1 N x i \begin{gathered} \frac{\partial L L\left(\boldsymbol{\theta}_{c}\right)}{\partial \boldsymbol{\mu}_{c}}=\sum_{i=1}^{N} \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{x}_{i}-N \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{\mu}_{c}=0 \\ N \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{\mu}_{c}=\sum_{i=1}^{N} \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{x}_{i} \\ N \boldsymbol{\Sigma}_{c}^{-1} \boldsymbol{\mu}_{c}=\boldsymbol{\Sigma}_{c}^{-1} \sum_{i=1}^{N} \boldsymbol{x}_{i} \\ N \boldsymbol{\mu}_{c}=\sum_{i=1}^{N} \boldsymbol{x}_{i} \end{gathered} μcLL(θc)=i=1NΣc1xiNΣc1μc=0NΣc1μc=i=1NΣc1xiNΣc1μc=Σc1i=1NxiNμc=i=1Nxi

μ c = 1 N ∑ i = 1 N x i ⇒ μ ^ c = 1 N ∑ i = 1 N x i \boldsymbol{\mu_{c}}=\frac{1}{N} \sum_{i=1}^{N} \boldsymbol{x}_{i} \Rightarrow \hat{\boldsymbol{\mu}}_{c}=\frac{1}{N} \sum_{i=1}^{N} \boldsymbol{x}_{i} μc=N1i=1Nxiμ^c=N1i=1Nxi

此即为西瓜书式7.12

L L ( θ C ) L L\left(\boldsymbol{\theta}_{C}\right) LL(θC)关于 Σ C \Sigma_{C} ΣC求偏导
∂ L L ( θ c ) ∂ Σ c = ∂ ∂ Σ c [ − N d 2 ln ⁡ ( 2 π ) − N 2 ln ⁡ ∣ Σ c ∣ − 1 2 ∑ i = 1 N ( x i − μ c ) T Σ c − 1 ( x i − μ c ) ] = ∂ ∂ Σ c [ − N 2 ln ⁡ ∣ Σ c ∣ − 1 2 ∑ i = 1 N ( x i − μ c ) T Σ c − 1 ( x i − μ c ) ] = − N 2 ⋅ ∂ ∂ Σ c [ ln ⁡ ∣ Σ c ∣ ] − 1 2 ∑ i = 1 N ∂ ∂ Σ c [ ( x i − μ c ) T Σ c − 1 ( x i − μ c ) ] \begin{aligned} \frac{\partial L L\left(\boldsymbol{\theta}_{c}\right)}{\partial \boldsymbol{\Sigma}_{c}} &=\frac{\partial}{\partial \boldsymbol{\Sigma}_{c}}\left[-\frac{N d}{2} \ln (2 \pi)-\frac{N}{2} \ln \left|\boldsymbol{\Sigma}_{c}\right|-\frac{1}{2} \sum_{i=1}^{N}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{\mathrm{T}} \boldsymbol{\Sigma}_{c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\right] \\ &=\frac{\partial}{\partial \boldsymbol{\Sigma}_{c}}\left[-\frac{N}{2} \ln \left|\boldsymbol{\Sigma}_{c}\right|-\frac{1}{2} \sum_{i=1}^{N}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{\mathrm{T}} \boldsymbol{\Sigma}_{c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\right] \\ &=-\frac{N}{2} \cdot \frac{\partial}{\partial \boldsymbol{\Sigma}_{c}}\left[\ln \left|\boldsymbol{\Sigma}_{c}\right|\right]-\frac{1}{2} \sum_{i=1}^{N} \frac{\partial}{\partial \boldsymbol{\Sigma}_{c}}\left[\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{\mathrm{T}} \boldsymbol{\Sigma}_{c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\right] \end{aligned} ΣcLL(θc)=Σc[2Ndln(2π)2NlnΣc21i=1N(xiμc)TΣc1(xiμc)]=Σc[2NlnΣc21i=1N(xiμc)TΣc1(xiμc)]=2NΣc[lnΣc]21i=1NΣc[(xiμc)TΣc1(xiμc)]
由矩阵微分公式 ∂ ∣ X ∣ ∂ X = ∣ X ∣ ⋅ ( X − 1 ) T \dfrac{\partial|\mathbf{X}|}{\partial \mathbf{X}}=|\mathbf{X}| \cdot\left(\mathbf{X}^{-1}\right)^{T} XX=X(X1)T ∂ a T X − 1 b ∂ X = − X − T a b T X − T \dfrac{\partial \boldsymbol{a}^{T} \mathbf{X}^{-1} \boldsymbol{b}}{\partial \mathbf{X}}=-\mathbf{X}^{-T} \boldsymbol{a} \boldsymbol{b}^{T} \mathbf{X}^{-T} XaTX1b=XTabTXT可得
∂ L L ( θ c ) ∂ Σ c = − N 2 ⋅ 1 ∣ Σ c ∣ ⋅ ∣ Σ c ∣ ⋅ ( Σ c − 1 ) T − 1 2 ∑ i = 1 N [ − Σ c − T ( x i − μ c ) ( x i − μ c ) T Σ c − T ] = − N 2 ⋅ ( Σ c − 1 ) T − 1 2 ∑ i = 1 N [ − Σ c − T ( x i − μ c ) ( x i − μ c ) T Σ c − T ] = − N 2 Σ c − 1 + 1 2 ∑ i = 1 N [ Σ c − 1 ( x i − μ c ) ( x i − μ c ) T Σ c − 1 ] \begin{aligned} \frac{\partial L L\left(\boldsymbol{\theta}_{c}\right)}{\partial \boldsymbol{\Sigma}_{c}} &=-\frac{N}{2} \cdot \frac{1}{\left|\boldsymbol{\Sigma}_{c}\right|} \cdot\left|\boldsymbol{\Sigma}_{c}\right| \cdot\left(\boldsymbol{\Sigma}_{c}^{-1}\right)^{T}-\frac{1}{2} \sum_{i=1}^{N}\left[-\boldsymbol{\Sigma}_{c}^{-T}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{T} \boldsymbol{\Sigma}_{c}^{-T}\right] \\ &=-\frac{N}{2} \cdot\left(\boldsymbol{\Sigma}_{c}^{-1}\right)^{T}-\frac{1}{2} \sum_{i=1}^{N}\left[-\boldsymbol{\Sigma}_{c}^{-T}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{T} \boldsymbol{\Sigma}_{c}^{-T}\right] \\ &=-\frac{N}{2} \boldsymbol{\Sigma}_{c}^{-1}+\frac{1}{2} \sum_{i=1}^{N}\left[\boldsymbol{\Sigma}_{c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{T} \boldsymbol{\Sigma}_{c}^{-1}\right] \end{aligned} ΣcLL(θc)=2NΣc1Σc(Σc1)T21i=1N[ΣcT(xiμc)(xiμc)TΣcT]=2N(Σc1)T21i=1N[ΣcT(xiμc)(xiμc)TΣcT]=2NΣc1+21i=1N[Σc1(xiμc)(xiμc)TΣc1]
令偏导数等于0可得
∂ L L ( θ c ) ∂ Σ c = − N 2 Σ c − 1 + 1 2 ∑ i = 1 N [ Σ c − 1 ( x i − μ c ) ( x i − μ c ) T Σ c − 1 ] = 0 \frac{\partial L L\left(\boldsymbol{\theta}_{c}\right)}{\partial \boldsymbol{\Sigma}_{c}}=-\frac{N}{2} \boldsymbol{\Sigma}_{c}^{-1}+\frac{1}{2} \sum_{i=1}^{N}\left[\boldsymbol{\Sigma}_{c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{T} \boldsymbol{\Sigma}_{c}^{-1}\right]=0 ΣcLL(θc)=2NΣc1+21i=1N[Σc1(xiμc)(xiμc)TΣc1]=0

− N 2 Σ c − 1 = − 1 2 ∑ i = 1 N [ Σ c − 1 ( x i − μ c ) ( x i − μ c ) T Σ c − 1 ] N Σ c − 1 = ∑ i = 1 N [ Σ c − 1 ( x i − μ c ) ( x i − μ c ) T Σ c − 1 ] N Σ c − 1 = Σ c − 1 [ ∑ i = 1 N ( x i − μ c ) ( x i − μ c ) T ] Σ c − 1 N = Σ c − 1 [ ∑ i = 1 N ( x i − μ c ) ( x i − μ c ) T ] \begin{gathered} -\frac{N}{2} \Sigma_{c}^{-1}=-\frac{1}{2} \sum_{i=1}^{N}\left[\boldsymbol{\Sigma}_{c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{T} \boldsymbol{\Sigma}_{c}^{-1}\right] \\ N \Sigma_{c}^{-1}=\sum_{i=1}^{N}\left[\boldsymbol{\Sigma}_{c}^{-1}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{T} \boldsymbol{\Sigma}_{c}^{-1}\right] \\ N \boldsymbol{\Sigma}_{c}^{-1}=\boldsymbol{\Sigma}_{c}^{-1}\left[\sum_{i=1}^{N}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{T}\right] \boldsymbol{\Sigma}_{c}^{-1} \\ N=\boldsymbol{\Sigma}_{c}^{-1}\left[\sum_{i=1}^{N}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{T}\right] \end{gathered} 2NΣc1=21i=1N[Σc1(xiμc)(xiμc)TΣc1]NΣc1=i=1N[Σc1(xiμc)(xiμc)TΣc1]NΣc1=Σc1[i=1N(xiμc)(xiμc)T]Σc1N=Σc1[i=1N(xiμc)(xiμc)T]

Σ c = 1 N ∑ i = 1 N ( x i − μ c ) ( x i − μ c ) T ⇒ Σ ^ c = 1 N ∑ i = 1 N ( x i − μ c ) ( x i − μ c ) T \Sigma_{c}=\frac{1}{N} \sum_{i=1}^{N}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{T} \Rightarrow \hat{\boldsymbol{\Sigma}}_{c}=\frac{1}{N} \sum_{i=1}^{N}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{c}\right)^{T} Σc=N1i=1N(xiμc)(xiμc)TΣ^c=N1i=1N(xiμc)(xiμc)T

此即为西瓜书式7.13

6.3 朴素贝叶斯分类器

已知最小化分类错误率的贝叶斯最优分类器为
h ∗ ( x ) = arg ⁡ max ⁡ c ∈ Y P ( c ∣ x ) h^{*}(\boldsymbol{x})=\underset{c \in \mathcal{Y}}{\arg \max } P(c \mid \boldsymbol{x}) h(x)=cYargmaxP(cx)
又由贝叶斯定理可知
P ( c ∣ x ) = P ( x , c ) P ( x ) = P ( c ) P ( x ∣ c ) P ( x ) P(c \mid \boldsymbol{x})=\frac{P(\boldsymbol{x}, c)}{P(\boldsymbol{x})}=\frac{P(c) P(\boldsymbol{x} \mid c)}{P(\boldsymbol{x})} P(cx)=P(x)P(x,c)=P(x)P(c)P(xc)
所以
h ∗ ( x ) = arg ⁡ max ⁡ c ∈ Y P ( c ) P ( x ∣ c ) P ( x ) = arg ⁡ max ⁡ c ∈ Y P ( c ) P ( x ∣ c ) h^{*}(\boldsymbol{x})=\underset{c \in \mathcal{Y}}{\arg \max } \frac{P(c) P(\boldsymbol{x} \mid c)}{P(\boldsymbol{x})}=\underset{c \in \mathcal{Y}}{\arg \max } P(c) P(\boldsymbol{x} \mid c) h(x)=cYargmaxP(x)P(c)P(xc)=cYargmaxP(c)P(xc)
已知属性条件独立性假设为
P ( x ∣ c ) = P ( x 1 , x 2 , … , x d ∣ c ) = ∏ i = 1 d P ( x i ∣ c ) P(\boldsymbol{x} \mid c)=P\left(x_{1}, x_{2}, \ldots, x_{d} \mid c\right)=\prod_{i=1}^{d} P\left(x_{i} \mid c\right) P(xc)=P(x1,x2,,xdc)=i=1dP(xic)
【其中,d表示 x \boldsymbol{x} x的维数

所以
h ∗ ( x ) = arg ⁡ max ⁡ c ∈ Y P ( c ) ∏ i = 1 d P ( x i ∣ c ) h^{*}(\boldsymbol{x})=\underset{c \in \mathcal{Y}}{\arg \max } P(c) \prod_{i=1}^{d} P\left(x_{i} \mid c\right) h(x)=cYargmaxP(c)i=1dP(xic)
此即为朴素贝叶斯分类器的分类器

对于 P ( c ) P(c) P(c),它表示的是样本空间中各类样本所占的比例,根据大数定律,当训练集包含充足的独立同分布样本时, P ( c ) P(c) P(c)可通过各类样本出现的频率来进行估计,也即
P ( c ) = ∣ D c ∣ ∣ D ∣ P(c)=\frac{\left|D_{c}\right|}{|D|} P(c)=DDc
其中,D表示训练集, ∣ D ∣ |D| D表示D中的样本个数, D c D_{c} Dc表示训练集D中第c类样本组成的集合, ∣ D c ∣ \left|D_{c}\right| Dc表示集合 D c D_{c} Dc中的样本个数。

对于 P ( x i ∣ c ) P\left(x_{i} \mid c\right) P(xic),若样本的第i个属性 x i x_{i} xi取值为连续值,我们假设该属性的取值服从正态分布,也即
P ( x i ∣ c ) ∼ N ( μ c , i , σ c , i 2 ) ⇒ P ( x i ∣ c ) = 1 2 π σ c , i exp ⁡ ( − ( x i − μ c , i ) 2 2 σ c , i 2 ) P\left(x_{i} \mid c\right) \sim \mathcal{N}\left(\mu_{c, i}, \sigma_{c, i}^{2}\right) \Rightarrow P\left(x_{i} \mid c\right)=\frac{1}{\sqrt{2 \pi} \sigma_{c, i}} \exp \left(-\frac{\left(x_{i}-\mu_{c, i}\right)^{2}}{2 \sigma_{c, i}^{2}}\right) P(xic)N(μc,i,σc,i2)P(xic)=2π σc,i1exp(2σc,i2(xiμc,i)2)
其中正态分布的参数可以用极大似然估计法推得: μ c , i \mu_{c, i} μc,i σ c , i 2 \sigma_{c, i}^{2} σc,i2属性上取值的均值和方差

对于 P ( x i ∣ c ) P\left(x_{i} \mid c\right) P(xic),若样本的第i个属性 x i x_{i} xi取值为离散值,同样根据极大似然估计法,我们用其频率值作为其概率值的估计值,也即
P ( x i ∣ c ) = ∣ D c , x i ∣ ∣ D c ∣ P\left(x_{i} \mid c\right)=\frac{\left|D_{c, x_{i}}\right|}{\left|D_{c}\right|} P(xic)=DcDc,xi
其中, D c , x i D_{c, x_{i}} Dc,xi表示 D c D_c Dc中在第i个属性上取值为 x i x_{i} xi的样本组成的集合。

例:现将一枚6面骰子抛掷10次,抛掷出的点数分别为2、3、2、5、4、6、1、3、4、2, 试基于此抛掷结果估计这枚骰子抛掷出各个点数的概率。

解:设这枚骰子抛掷出点数i的概率为 P i P_i Pi,根据极大似然估计法可以写出似然函数为
L ( θ ) = P 1 × P 2 3 × P 3 2 × P 4 2 × P 5 × P 6 L(\theta)=P_{1} \times P_{2}^{3} \times P_{3}^{2} \times P_{4}^{2} \times P_{5} \times P_{6} L(θ)=P1×P23×P32×P42×P5×P6
其对数似然函数即为
L L ( θ ) = ln ⁡ L ( θ ) = ln ⁡ ( P 1 × P 2 3 × P 3 2 × P 4 2 × P 5 × P 6 ) = ln ⁡ P 1 + 3 ln ⁡ P 2 + 2 ln ⁡ P 3 + 2 ln ⁡ P 4 + ln ⁡ P 5 + ln ⁡ P 6 \begin{aligned} L L(\theta) &=\ln L(\theta)=\ln \left(P_{1} \times P_{2}^{3} \times P_{3}^{2} \times P_{4}^{2} \times P_{5} \times P_{6}\right) \\ &=\ln P_{1}+3 \ln P_{2}+2 \ln P_{3}+2 \ln P_{4}+\ln P_{5}+\ln P_{6} \end{aligned} LL(θ)=lnL(θ)=ln(P1×P23×P32×P42×P5×P6)=lnP1+3lnP2+2lnP3+2lnP4+lnP5+lnP6
由于 P i P_i Pi之间满足如下约束
P 1 + P 2 + P 3 + P 4 + P 5 + P 6 = 1 P_{1}+P_{2}+P_{3}+P_{4}+P_{5}+P_{6}=1 P1+P2+P3+P4+P5+P6=1
所以此时最大化对数似然函数属于带约束的最优化问题,也即
max ⁡ L L ( θ ) = ln ⁡ P 1 + 3 ln ⁡ P 2 + 2 ln ⁡ P 3 + 2 ln ⁡ P 4 + ln ⁡ P 5 + ln ⁡ P 6  s.t.  P 1 + P 2 + P 3 + P 4 + P 5 + P 6 = 1 \begin{array}{ll} \max & L L(\theta)=\ln P_{1}+3 \ln P_{2}+2 \ln P_{3}+2 \ln P_{4}+\ln P_{5}+\ln P_{6} \\ \text { s.t. } & P_{1}+P_{2}+P_{3}+P_{4}+P_{5}+P_{6}=1 \end{array} max s.t. LL(θ)=lnP1+3lnP2+2lnP3+2lnP4+lnP5+lnP6P1+P2+P3+P4+P5+P6=1
定理:对于一个优化问题
min ⁡ f ( x )  s.t.  g i ( x ) ≤ 0 ( i = 1 , … , m ) h j ( x ) = 0 ( j = 1 , … , n ) \begin{array}{ll} \min & f(x) \\ \text { s.t. } & g_{i}(x) \leq 0 \quad(i=1, \ldots, m) \\ & h_{j}(x)=0 \quad(j=1, \ldots, n) \end{array} min s.t. f(x)gi(x)0(i=1,,m)hj(x)=0(j=1,,n)
f ( x ) , g i ( x ) , h j ( x ) f(x) ,g_{i}(x) ,h_{j}(x) f(x),gi(x),hj(x)一阶连续可微,并且 f ( x ) , g i ( x ) f(x), g_{i}(x) f(x),gi(x)是凸函数, h j ( x ) h_{j}(x) hj(x)是线性函数,那么满足如下KKT条件的点一定是优化问题的最优解。
{ ∇ x L ( x ∗ , μ ∗ , λ ∗ ) = ∇ f ( x ∗ ) + ∑ i = 1 m μ i ∗ ∇ g i ( x ∗ ) + ∑ j = 1 n λ j ∗ ∇ h j ( x ∗ ) = 0 h j ( x ∗ ) = 0 g i ( x ∗ ) ≤ 0 μ i ∗ ≥ 0 μ i ∗ g i ( x ∗ ) = 0 \left\{\begin{array}{l} \nabla_{x} L\left(\boldsymbol{x}^{*} , \boldsymbol{\mu}^{*} , \boldsymbol{\lambda}^{*}\right)=\nabla f\left(\boldsymbol{x}^{*}\right)+\sum_{i=1}^{m} \mu_{i}^{*} \nabla g_{i}\left(\boldsymbol{x}^{*}\right)+\sum_{j=1}^{n} \lambda_{j}^{*} \nabla h_{j}\left(\boldsymbol{x}^{*}\right)=0 \\ h_{j}\left(\boldsymbol{x}^{*}\right)=0 \\ g_{i}\left(\boldsymbol{x}^{*}\right) \leq 0 \\ \mu_{i}^{*} \geq 0 \\ \mu_{i}^{*}g_{i}\left(\boldsymbol{x}^{*}\right)=0 \end{array}\right. xL(x,μ,λ)=f(x)+i=1mμigi(x)+j=1nλjhj(x)=0hj(x)=0gi(x)0μi0μigi(x)=0
【参考文献:王燕军, 梁治安. 最优化基础理论与方法[M]. 复旦大学出版社, 2011.】

由拉格朗日乘子法可得拉格拉格朗日函数为
L ( θ , λ ) = ln ⁡ P 1 + 3 ln ⁡ P 2 + 2 ln ⁡ P 3 + 2 ln ⁡ P 4 + ln ⁡ P 5 + ln ⁡ P 6 + λ ( P 1 + P 2 + P 3 + P 4 + P 5 + P 6 − 1 ) \mathcal{L}(\theta, \lambda)=\ln P_{1}+3 \ln P_{2}+2 \ln P_{3}+2 \ln P_{4}+\ln P_{5}+\ln P_{6}+\lambda\left(P_{1}+P_{2}+P_{3}+P_{4}+P_{5}+P_{6}-1\right) L(θ,λ)=lnP1+3lnP2+2lnP3+2lnP4+lnP5+lnP6+λ(P1+P2+P3+P4+P5+P61)
对拉格朗日函数 L ( θ ) \mathcal{L}(\theta) L(θ)分别关于 P i P_i Pi求偏导,然后令其等于0可得
∂ L ( θ , λ ) ∂ P 1 = ∂ ∂ P 1 [ ln ⁡ P 1 + 3 ln ⁡ P 2 + 2 ln ⁡ P 3 + 2 ln ⁡ P 4 + ln ⁡ P 5 + ln ⁡ P 6 + λ ( P 1 + P 2 + P 3 + P 4 + P 5 + P 6 − 1 ) ] = 0 = ∂ ∂ P 1 ( ln ⁡ P 1 + λ P 1 ) = 0 = 1 P 1 + λ = 0 ⇒ λ = − 1 P 1 \begin{aligned} \frac{\partial \mathcal{L}(\theta, \lambda)}{\partial P_{1}} &=\frac{\partial}{\partial P_{1}}\left[\ln P_{1}+3 \ln P_{2}+2 \ln P_{3}+2 \ln P_{4}+\ln P_{5}+\ln P_{6}+\lambda\left(P_{1}+P_{2}+P_{3}+P_{4}+P_{5}+P_{6}-1\right)\right]=0 \\ &=\frac{\partial}{\partial P_{1}}\left(\ln P_{1}+\lambda P_{1}\right)=0 \\ &=\frac{1}{P_{1}}+\lambda=0 \\ & \Rightarrow \lambda=-\frac{1}{P_{1}} \end{aligned} P1L(θ,λ)=P1[lnP1+3lnP2+2lnP3+2lnP4+lnP5+lnP6+λ(P1+P2+P3+P4+P5+P61)]=0=P1(lnP1+λP1)=0=P11+λ=0λ=P11
同理可求得:
λ = − 1 P 1 = − 3 P 2 = − 2 P 3 = − 2 P 4 = − 1 P 5 = − 1 P 6 \lambda=-\frac{1}{P_{1}}=-\frac{3}{P_{2}}=-\frac{2}{P_{3}}=-\frac{2}{P_{4}}=-\frac{1}{P_{5}}=-\frac{1}{P_{6}} λ=P11=P23=P32=P42=P51=P61
又因为
P 1 + P 2 + P 3 + P 4 + P 5 + P 6 = 1 P_{1}+P_{2}+P_{3}+P_{4}+P_{5}+P_{6}=1 P1+P2+P3+P4+P5+P6=1
所以
P 1 = 1 10 , P 2 = 3 10 , P 3 = 2 10 , P 4 = 2 10 , P 5 = 1 10 , P 6 = 1 10 P_{1}=\frac{1}{10}, P_{2}=\frac{3}{10}, P_{3}=\frac{2}{10}, P_{4}=\frac{2}{10}, P_{5}=\frac{1}{10}, P_{6}=\frac{1}{10} P1=101,P2=103,P3=102,P4=102,P5=101,P6=101
此时抛掷出各个点数的概率值与其频率值相等。

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值