统计参数估计

统计参数估计

在进行统计决策时,我们假设后验概率是已知的,这往往不现实。在有些情况下,我们假设数据近似服从某种分布(最常见的如高斯分布),估计数据的分布参数。

最大似然估计

我们一般假设所观测到的变量是最有可能出现的,因此,直观上最大似然估计就是估计出使分布函数最契合所观测到数据分布的参数。形式化的语言表示为:选择θ,使得:
θ ^ = argmax ⁡ [ p ( X ∣ θ ) ] \hat{\theta}=\operatorname{argmax}[\mathrm{p}(\mathrm{X} | \theta)] θ^=argmax[p(Xθ)]
上述公式对应只有一个观测变量的情况,当有多个观测变量时,最大似然估计的目标是使每个观测变量概率都最大,也就是这些变量的概率乘积最大化:
θ ^ = argmax ⁡ [ log ⁡ ∏ k = 1 N p ( x ( k ∣ θ ) ] = argmax ⁡ [ ∑ k = 1 N log ⁡ p ( x ( k ∣ θ ) ] \hat{\theta}=\operatorname{argmax}\left[\log \prod_{\mathrm{k}=1}^{\mathrm{N}} \mathrm{p}\left(\mathrm{x}^{(\mathrm{k}} | \theta\right)\right]=\operatorname{argmax}\left[\sum_{\mathrm{k}=1}^{\mathrm{N}} \log \mathrm{p}\left(\mathrm{x}^{(\mathrm{k}} | \theta\right)\right] θ^=argmax[logk=1Np(x(kθ)]=argmax[k=1Nlogp(x(kθ)]
通过一个取对数操作,就讲乘积变成了加法,这样求导就方便了很多。最大似然估计,其实就是一个求最优化的问题。

多元高斯分布为例

对于一个多元高斯分布(就是有多个变量的高斯分布):
p ( x ; μ , Σ ) = 1 ( 2 π ) D 2 ∣ Σ ∣ 1 2 exp ⁡ ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) p(x ; \mu, \Sigma)=\frac{1}{(2 \pi)^{\frac{D}{2}}|\Sigma|^{\frac{1}{2}}} \exp \left(-\frac{1}{2}(x-\mu)^{T} \Sigma^{-1}(x-\mu)\right) p(x;μ,Σ)=(2π)2DΣ211exp(21(xμ)TΣ1(xμ))
其中:
μ = [ μ 1 μ 2 ⋮ μ N ] \mu=\left[\begin{array}{c} {\mu_{1}} \\ {\mu_{2}} \\ {\vdots} \\ {\mu_{N}} \end{array}\right] μ=μ1μ2μN

Σ = 1 m X T X \Sigma=\frac{1}{m} X^{T} X Σ=m1XTX

分别为均值和协方差。对于N个观测变量,最大似然估计需要估计均值和协方差使得:
E ( θ ) = ∑ k = 1 N log ⁡ p ( x k ∣ θ ) = ∑ k = 1 N − log ⁡ ( 2 π ) D 2 − 1 2 log ⁡ ∣ Σ ∣ − 1 2 ( x k − μ ) T Σ − 1 ( x k − μ ) E(\theta)=\sum_{k=1}^{N} \log p\left(x_{k} | \theta\right)=\sum_{k=1}^{N}-\log (2 \pi)^{\frac{D}{2}}-\frac{1}{2} \log |\Sigma|-\frac{1}{2}\left(x_{k}-\mu\right)^{T} \Sigma^{-1}\left(x_{k}-\mu\right) E(θ)=k=1Nlogp(xkθ)=k=1Nlog(2π)2D21logΣ21(xkμ)TΣ1(xkμ)
最大,即:
θ ^ = arg ⁡ max ⁡ E ( θ ) \hat \theta = \arg \max E(\theta ) θ^=argmaxE(θ)
1.首先估计均值。对μ求偏导:
x ∂ E ( θ ) ∂ μ = ∑ k = 1 N Σ − 1 ( x k − μ ) = 0 ( 注 : ∂ ( X T A X ) ∂ X = ( A + A T ) X , Σ − 1 T = Σ − 1 ) x\frac{{\partial E(\theta )}}{{\partial \mu }} = \sum\limits_{k = 1}^N {{\Sigma ^{ - 1}}} \left( {{x_k} - \mu } \right) = 0 (注:\frac{{\partial ({X^T}AX)}}{{\partial X}} = (A + {A^T})X,{\Sigma^{ - {1^T}}}{\rm{ = }}{\Sigma^{ - 1}}) xμE(θ)=k=1NΣ1(xkμ)=0(:X(XTAX)=(A+AT)X,Σ1T=Σ1)

μ = 1 N ∑ k = 1 N x k \mu=\frac{1}{N} \sum_{k=1}^{N} x_{k} μ=N1k=1Nxk

2.接下来估计协方差,对Sigma求偏导(实际上是对Sigma的逆求偏导,这并不影响结果,但可以是推导更简便):
∂ E ( θ ) ∂ Σ − 1 = ∂ ( ∑ k = 1 N 1 2 log ⁡ ∣ Σ − 1 ∣ − 1 2 ( x k − μ ) T Σ − 1 ( x k − μ ) ) ∂ Σ − 1 \frac{{\partial E(\theta )}}{{\partial {\Sigma ^{ - 1}}}} = \frac{{\partial \left( {\sum\limits_{k = 1}^N {\frac{1}{2}} \log \left| {{\Sigma^{ - 1}}} \right| - \frac{1}{2}{{\left( {{x_k} - \mu } \right)}^T}{{\bf{\Sigma}}^{ - 1}}\left( {{x_k} - \mu } \right)} \right)}}{{\partial {\Sigma^{ - 1}}}} Σ1E(θ)=Σ1(k=1N21logΣ121(xkμ)TΣ1(xkμ))
上式求解有些复杂,分别对式中两项拆开进行求解:

第一项:
对 于 对 称 矩 阵 A , 有 : ∣ A − 1 ∣ = ∣ A ∣ − 1 ; ∂ log ⁡ ∣ A ∣ ∂ A = 2 A − 1 − D i a g ( A − 1 ) 对于对称矩阵A,有:\left| {{A^{ - 1}}} \right| = {\left| A \right|^{ - 1}};\frac{{\partial \log{\left| A \right|}}}{{\partial A}} = 2{A^{ - 1}} - Diag({A^{ - 1}}) A,:A1=A1;AlogA=2A1Diag(A1)

于 是 就 有 : ∂ log ⁡ ∣ Σ − 1 ∣ ∂ Σ − 1 = 2 Σ − D i a g ( Σ ) 于是就有:\frac{{\partial \log \left| {{\Sigma ^{ - 1}}} \right|}}{{\partial {\Sigma ^{ - 1}}}} = 2\Sigma - Diag(\Sigma ) Σ1logΣ1=2ΣDiag(Σ)

第二项:
对 于 对 称 矩 阵 A 、 T , 有 : X T A X = T r ( A X X T ) ; ∂ T r ( A B ) ∂ A = B + B T − D i a g ( B ) ; 从 而 可 以 得 出 : ∂ X T A X ∂ A = 2 X X T − D i a g ( X X T ) \begin{array}{l} 对于对称矩阵A、T,有:{X^T}AX = Tr(AX{X^T});\frac{{\partial Tr(AB)}}{{\partial A}} = B + {B^T} - Diag(B);\\ 从而可以得出:\frac{{\partial {X^T}AX}}{{\partial A}}{\rm{ = 2}}X{X^T} - Diag(X{X^T}) \end{array} ATXTAX=Tr(AXXT);ATr(AB)=B+BTDiag(B);AXTAX=2XXTDiag(XXT)

于 是 就 有 : ∂ ( x k − μ ) T Σ − 1 ( x k − μ ) ∂ Σ − 1 = 2 ( x k − μ ) ( x k − μ ) T − D i a g ( ( x k − μ ) ( x k − μ ) T ) 于是就有:\frac{{\partial {{\left( {{x_k} - \mu } \right)}^T}{\Sigma ^{ - 1}}\left( {{x_k} - \mu } \right)}}{{\partial {\Sigma ^{ - 1}}}} = {\rm{2}}\left( {{x_k} - \mu } \right){\left( {{x_k} - \mu } \right)^T} - Diag(\left( {{x_k} - \mu } \right){\left( {{x_k} - \mu } \right)^T}) Σ1(xkμ)TΣ1(xkμ)=2(xkμ)(xkμ)TDiag((xkμ)(xkμ)T)

综合这两项有:
∂ E ( θ ) ∂ Σ − 1 = 1 2 ∑ k = 1 N 2 Σ − 2 ( x k − μ ) ( x k − μ ) T − D i a g ( Σ ) + D i a g ( ( x k − μ ) ( x k − μ ) T ) = 1 2 ∑ k = 1 N 2 ( Σ − ( x k − μ ) ( x k − μ ) T ) − D i a g ( Σ − ( x k − μ ) ( x k − μ ) T ) = 0 ( 注 : 直 观 可 得 D i a g ( A ) + D i a g ( B ) = D i a g ( A + B ) ) \begin{array}{l} \frac{{\partial E(\theta )}}{{\partial {\Sigma ^{ - 1}}}} = \frac{1}{2}\sum\limits_{k = 1}^N {2\Sigma - {\rm{2}}\left( {{x_k} - \mu } \right){{\left( {{x_k} - \mu } \right)}^T} - Diag(\Sigma )} + Diag(\left( {{x_k} - \mu } \right){\left( {{x_k} - \mu } \right)^T})\\ = \frac{1}{2}\sum\limits_{k = 1}^N {2(\Sigma - \left( {{x_k} - \mu } \right){{\left( {{x_k} - \mu } \right)}^T}) - Diag(\Sigma - } \left( {{x_k} - \mu } \right){\left( {{x_k} - \mu } \right)^T}){\rm{ = 0}} \\(注:直观可得Diag(A)+Diag(B)=Diag(A+B)) \end{array} Σ1E(θ)=21k=1N2Σ2(xkμ)(xkμ)TDiag(Σ)+Diag((xkμ)(xkμ)T)=21k=1N2(Σ(xkμ)(xkμ)T)Diag(Σ(xkμ)(xkμ)T)=0:Diag(A)+Diag(B)=Diag(A+B)
求解这个优化问题可得:
∑ k = 1 N Σ − ( x k − μ ) ( x k − μ ) T = 0 ⇒ Σ = 1 N ∑ k = 1 N ( x k − μ ) ( x k − μ ) T \sum\limits_{k = 1}^N {\Sigma - \left( {{x_k} - \mu } \right){{\left( {{x_k} - \mu } \right)}^T}{\rm{ = 0}} \Rightarrow } \Sigma {\rm{ = }}\frac{{\rm{1}}}{N}\sum\limits_{k = 1}^N {\left( {{x_k} - \mu } \right){{\left( {{x_k} - \mu } \right)}^T}} k=1NΣ(xkμ)(xkμ)T=0Σ=N1k=1N(xkμ)(xkμ)T

估计的偏差和方差

这部分内容部分参考了知乎@大海啊你全是水的文章

估计的偏差

所谓偏差就是估计值的期望和真实值的差距。公式描述为:
bias ⁡ ( θ ^ m ) = E ( θ ^ m ) − θ \operatorname{bias}\left(\hat{\boldsymbol{\theta}}_{m}\right)=\mathbb{E}\left(\hat{\boldsymbol{\theta}}_{m}\right)-\boldsymbol{\theta} bias(θ^m)=E(θ^m)θ
以高斯分布为例。我们知道高斯分布的参数估计分别为:
μ = x ˉ , σ 2 = 1 n ∑ i = 1 n ( x i − μ ) 2 \mu=\bar{x}, \quad \sigma^{2}=\frac{1}{n} \sum_{i=1}^{n}\left(x_{i}-\mu\right)^{2} μ=xˉ,σ2=n1i=1n(xiμ)2
推导过程可见@我是8位的 博客

均值估计的偏差为:
bias ⁡ ( μ ^ m ) = E ( μ ^ m ) − μ = E [ 1 m ∑ i = 1 m x ( i ) ] − μ = ( 1 m ∑ i = 1 m E [ x ( i ) ] ) − μ = ( 1 m ∑ i = 1 m μ ) − μ = μ − μ = 0 \begin{aligned} \operatorname{bias}\left(\hat{\mu}_{m}\right) &=\mathbb{E}\left(\hat{\mu}_{m}\right)-\mu \\ &=\mathbb{E}\left[\frac{1}{m} \sum_{i=1}^{m} x^{(i)}\right]-\mu \\ &=\left(\frac{1}{m} \sum_{i=1}^{m} \mathbb{E}\left[x^{(i)}\right]\right)-\mu \\ &=\left(\frac{1}{m} \sum_{i=1}^{m} \mu\right)-\mu \\ &=\mu-\mu=0 \end{aligned} bias(μ^m)=E(μ^m)μ=E[m1i=1mx(i)]μ=(m1i=1mE[x(i)])μ=(m1i=1mμ)μ=μμ=0
所以均值是无偏估计。方差估计的偏差为:
bias ⁡ ( σ ^ m 2 ) = E [ σ ^ m 2 ] − σ 2 = m − 1 m σ 2 − σ 2 = − σ 2 m \operatorname{bias}\left(\hat{\sigma}_{m}^{2}\right)=\mathbb{E}\left[\hat{\sigma}_{m}^{2}\right]-\sigma^{2}=\frac{m-1}{m} \sigma^{2}-\sigma^{2}=-\frac{\sigma^{2}}{m} bias(σ^m2)=E[σ^m2]σ2=mm1σ2σ2=mσ2
这个期望E是这样计算来的:
E [ σ ^ m 2 ] = E [ 1 m ∑ i = 1 m ( x ( i ) − μ ^ m ) 2 ] = E { 1 m ∑ i = 1 m [ ( x ( i ) ) 2 − 2 x ( i ) μ ^ m + μ ^ m 2 ] } = E { 1 m ∑ i = 1 m [ ( x ( i ) ) 2 ] − 2 μ ^ m m ∑ i = 1 m [ x ( i ) ] + 1 m ∑ i = 1 m ( μ ^ m 2 ) } = E { 1 m ∑ i = 1 m [ ( x ( i ) ) 2 ] − 2 μ ^ m 2 + μ ^ m 2 } = E { 1 m ∑ i = 1 m [ ( x ( i ) ) 2 ] − μ ^ m 2 } = 1 m ∑ i = 1 m E [ ( x ( i ) ) 2 ] − E ( μ ^ m 2 ) \begin{aligned} \mathbb{E}\left[\hat{\sigma}_{m}^{2}\right] &=\mathbb{E}\left[\frac{1}{m} \sum_{i=1}^{m}\left(x^{(i)}-\hat{\mu}_{m}\right)^{2}\right] \\ &=\mathbb{E}\left\{\frac{1}{m} \sum_{i=1}^{m}\left[\left(x^{(i)}\right)^{2}-2 x^{(i)} \hat{\mu}_{m}+\hat{\mu}_{m}^{2}\right]\right\} \\ &=\mathbb{E}\left\{\frac{1}{m} \sum_{i=1}^{m}\left[\left(x^{(i)}\right)^{2}\right]-\frac{2 \hat{\mu}_{m}}{m} \sum_{i=1}^{m}\left[x^{(i)}\right]+\frac{1}{m} \sum_{i=1}^{m}\left(\hat{\mu}_{m}^{2}\right)\right\} \\ &=\mathbb{E}\left\{\frac{1}{m} \sum_{i=1}^{m}\left[\left(x^{(i)}\right)^{2}\right]-2 \hat{\mu}_{m}^{2}+\hat{\mu}_{m}^{2}\right\} \\ &=\mathbb{E}\left\{\frac{1}{m} \sum_{i=1}^{m}\left[\left(x^{(i)}\right)^{2}\right]-\hat{\mu}_{m}^{2}\right\} \\ &=\frac{1}{m} \sum_{i=1}^{m} \mathbb{E}\left[\left(x^{(i)}\right)^{2}\right]-\mathbb{E}\left(\hat{\mu}_{m}^{2}\right) \end{aligned} E[σ^m2]=E[m1i=1m(x(i)μ^m)2]=E{m1i=1m[(x(i))22x(i)μ^m+μ^m2]}=E{m1i=1m[(x(i))2]m2μ^mi=1m[x(i)]+m1i=1m(μ^m2)}=E{m1i=1m[(x(i))2]2μ^m2+μ^m2}=E{m1i=1m[(x(i))2]μ^m2}=m1i=1mE[(x(i))2]E(μ^m2)

根 据 方 差 定 义 : V a r ( x ) = E { [ x − E ( x ) ] 2 } = E { x 2 − 2 x E ( x ) + E [ ( x ) ] 2 } = E ( x 2 ) − [ E ( x ) ] 2 根据方差定义:{\mathop{\rm Var}\nolimits} (x) = \mathbb{E}\left\{ {{{[x - \mathbb{E}(x)]}^2}} \right\} = \mathbb{E}\left\{ {{x^2} - 2x\mathbb{E}(x) + \mathbb{E}{{[(x)]}^2}} \right\} = \mathbb{E}({x^2}) - {[\mathbb{E}(x)]^2} :Var(x)=E{[xE(x)]2}=E{x22xE(x)+E[(x)]2}=E(x2)[E(x)]2

E ( x 2 ) = Var ⁡ ( x ) + [ E ( x ) ] 2 \mathbb{E}\left(x^{2}\right)=\operatorname{Var}(x)+[\mathbb{E}(x)]^{2} E(x2)=Var(x)+[E(x)]2

故有:
E { [ x ( i ) ] 2 } = Var ⁡ [ x ( i ) ] + { E [ x ( i ) ] } 2 = σ 2 + μ 2 E ( μ m 2 ) = Var ⁡ [ μ m ] + [ E ( μ m ) ] 2 = Var ⁡ { 1 m ∑ i = 1 m [ x ( i ) ] } + μ 2 = 1 m 2 ∑ i = 1 m Var ⁡ [ x ( i ) ] + μ 2 = σ 2 m + μ 2 \begin{aligned} \mathbb{E}\left\{\left[x^{(i)}\right]^{2}\right\}=& \operatorname{Var}\left[x^{(i)}\right]+\left\{\mathbb{E}\left[x^{(i)}\right]\right\}^{2}=\sigma^{2}+\mu^{2} \\ \mathbb{E}\left(\mu_{m}^{2}\right) &=\operatorname{Var}\left[\mu_{m}\right]+\left[\mathbb{E}\left(\mu_{m}\right)\right]^{2}=\operatorname{Var}\left\{\frac{1}{m} \sum_{i=1}^{m}\left[x^{(i)}\right]\right\}+\mu^{2} \\ &=\frac{1}{m^{2}} \sum_{i=1}^{m} \operatorname{Var}\left[x^{(i)}\right]+\mu^{2} \\ &=\frac{\sigma^{2}}{m}+\mu^{2} \end{aligned} E{[x(i)]2}=E(μm2)Var[x(i)]+{E[x(i)]}2=σ2+μ2=Var[μm]+[E(μm)]2=Var{m1i=1m[x(i)]}+μ2=m21i=1mVar[x(i)]+μ2=mσ2+μ2
最终可得方差的期望:
E [ σ ^ m 2 ] = σ 2 + μ 2 − σ 2 m − μ 2 = m − 1 m σ 2 \mathbb{E}\left[\hat{\sigma}_{m}^{2}\right]=\sigma^{2}+\mu^{2}-\frac{\sigma^{2}}{m}-\mu^{2}=\frac{m-1}{m} \sigma^{2} E[σ^m2]=σ2+μ2mσ2μ2=mm1σ2
所以方差是有偏估计。

估计的方差

方差顾名思义即:
Var ⁡ ( θ ^ ) \operatorname{Var}(\hat{\theta}) Var(θ^)
在大多数情况下,偏差和方差就行家庭和事业不可得兼:
在这里插入图片描述

贝叶斯估计

最大似然估计把待估计参数θ看作是一个确定但未知的参数,而贝叶斯估计把参数θ看作为一个随机变量。我们可能对θ的分布有一个粗略的认识,即我们可能粗略知道一个先验p(θ),通过利用观测到的样本使我们能够得出θ的后验概率密度p(θ|X)。
在这里插入图片描述
下面以简单的高斯分布为例

先给出高斯分布公式:
f ( x ) = 1 2 π σ exp ⁡ ( − ( x − μ ) 2 2 σ 2 ) f(x)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{(x-\mu)^{2}}{2 \sigma^{2}}\right) f(x)=2π σ1exp(2σ2(xμ)2)
假设我们观测到了下面的一系列样本,它们来自一个方差已知的高斯分布,我们需要估计这个分布的均值μ。
X = { x 1 , x 2 , … x N } X=\left\{x^{1}, x^{2}, \ldots x^{N}\right\} X={x1,x2,xN}
并且我们粗略地知道μ服从一个高斯分布(即先验):
p 0 ( θ ) = 1 2 π σ 0 exp ⁡ ( − 1 2 σ 0 2 ( θ − μ 0 ) 2 ) \mathrm{p}_{0}(\theta)=\frac{1}{\sqrt{2 \pi} \sigma_{0}} \exp \left(-\frac{1}{2 \sigma_{0}^{2}}\left(\theta-\mu_{0}\right)^{2}\right) p0(θ)=2π σ01exp(2σ021(θμ0)2)
根据贝叶斯公式就可以获得μ的后验概率密度p(θ|X):
p ( θ ∣ X ) = p ( X ∣ θ ) p 0 ( θ ) p ( X ) = p 0 ( θ ) p ( X ) ∏ k = 1 N p ( x ( k ∣ θ ) = 1 2 π σ 0 exp ⁡ ( − 1 2 σ 0 2 ( θ − μ 0 ) 2 ) 1 p ( X ) ∏ k = 1 N [ 1 2 π σ exp ⁡ ( − 1 2 σ 2 ( x ( k − θ ) 2 ) ] \begin{array}{l}p(\theta |X) = \frac{{p(X|\theta ){p_{\rm{0}}}(\theta )}}{{p(X)}}\\ = \frac{{{p_0}(\theta )}}{{p(X)}}\prod\limits_{k = 1}^N p \left( {{x^{(k}}|\theta } \right)\\ = \frac{1}{{\sqrt {2\pi } {\sigma _0}}}\exp \left( { - \frac{1}{{2\sigma _0^2}}{{\left( {\theta - {\mu _0}} \right)}^2}} \right)\frac{1}{{p(X)}}\prod\limits_{k = 1}^N {\left[ {\frac{1}{{\sqrt {2\pi \sigma } }}\exp \left( { - \frac{1}{{2{\sigma ^2}}}{{\left( {{x^{(k}} - \theta } \right)}^2}} \right)} \right]} \end{array} p(θX)=p(X)p(Xθ)p0(θ)=p(X)p0(θ)k=1Np(x(kθ)=2π σ01exp(2σ021(θμ0)2)p(X)1k=1N[2πσ 1exp(2σ21(x(kθ)2)]
我们选择参数θ时,一般选择后验概率最大的那个。即:
x θ ^ = arg ⁡ max ⁡ p ( θ ∣ X ) x\hat \theta = \arg \max p(\theta |X) xθ^=argmaxp(θX)
通过求解最优化问题获得参数θ。接着上面的例子,对p(θ|X)取对数后求偏导(取对数是为了避免连乘求导的困难):
∂ ∂ θ l o g p ( θ ∣ X ) = ∂ ∂ μ ( − 1 2 σ 0 2 ( μ − μ 0 ) 2 + ∑ k = 1 N − 1 2 σ 2 ( x ( k − μ ) 2 ) = 0 \frac{\partial }{{\partial \theta }}{\mathop{\rm logp}\nolimits} (\theta |X) = \frac{\partial }{{\partial \mu }}\left( { - \frac{1}{{2\sigma _0^2}}{{\left( {\mu - {\mu _0}} \right)}^2} + \sum\limits_{k = 1}^N {\frac{{ - 1}}{{2{\sigma ^2}}}} {{\left( {{x^{(k}} - \mu } \right)}^2}} \right) = 0 θlogp(θX)=μ(2σ021(μμ0)2+k=1N2σ21(x(kμ)2)=0

μ N = σ 2 σ 2 + N σ 0 2 μ 0 ⏟ P R I O R + N σ 0 2 σ 2 + N σ 0 2 1 N ∑ k = 1 N x ( k ⏟ M A X I M U M L I K E L H O O D {\mu _{\rm{N}}} = \underbrace {\frac{{{\sigma ^2}}}{{{\sigma ^2} + N\sigma _0^2}}{\mu _0}}_{{\rm{PRIOR }}} + \underbrace {\frac{{{\rm{N}}\sigma _0^2}}{{{\sigma ^2} + {\rm{N}}\sigma _0^2}}\frac{1}{N}\sum\limits_{k = 1}^N {{x^{(k}}} }_{{\rm{ MAXIMUM LIKELHOOD }}} μN=PRIOR σ2+Nσ02σ2μ0+MAXIMUMLIKELHOOD σ2+Nσ02Nσ02N1k=1Nx(k

其中,前面项是先验项,后面是似然项。当观测变量比较少时,先验信息为主要贡献;当观测变量比较多时,似然信息为主要贡献。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值