备注:脑子昏迷状态书写,错误请指正。
一、指数分布族
在统计应用中,有两大重要的参数族:指数分布族(exponential families),亦称指数型分布族;位置尺寸分布族(location-scale families)。
针对指数族分布,响应变量
Y
Y
Y的描述可不再局限于正态分布。
-
概率密度函数定义
假设有一列观测样本: { x i , y i } i = 1 n \{x_i,y_i\}_{i=1}^n {xi,yi}i=1n:
f ( y i ∣ x i ; β , ϕ ) = e x p { y i η i − b ( η i ) ϕ + c ( y i , ϕ ) } ≜ f ( y i ∣ η i , ϕ ) f(y_i|x_i; \beta,\phi) = exp\{\frac{y_i\eta_i-b(\eta_i)}{\phi} + c(y_i, \phi)\}\triangleq f(y_i|\eta_i,\phi) f(yi∣xi;β,ϕ)=exp{ϕyiηi−b(ηi)+c(yi,ϕ)}≜f(yi∣ηi,ϕ)
其中, η i \eta_i ηi为自然参数(Natural Parameter); ϕ \phi ϕ为尺度参数.
备注1:在canonical情形下, η i = β T x i \eta_i=\beta^Tx_i ηi=βTxi
备注2:
** η i = ( g ∘ μ ) ( η i ) = ( g ∘ μ ) ( β T x i ) = β T x i \eta_i=(g\circ\mu)(\eta_i)=(g\circ\mu)(\beta^Tx_i)=\beta^Tx_i ηi=(g∘μ)(ηi)=(g∘μ)(βTxi)=βTxi;
** μ \mu μ为激活函数「E.G. LR中的Sigmoid函数」; g g g成为Link Function「LR中的Logit函数/Probit函数」, 其中:g与 μ \mu μ互为逆函数: g = μ − 1 g=\mu^{-1} g=μ−1,
** 实则这里的 μ ( η i ) = E ( y i ∣ x i ) \mu(\eta_i)=E(y_i|x_i) μ(ηi)=E(yi∣xi) , 后续案例中可自行校验.
** 这里Link Function作用是对Y的期望做变换,使变换后的结果与x成线性关系: g ( E [ y i ∣ x i ] ) = β T x i g(E[y_i|x_i])=\beta^Tx_i g(E[yi∣xi])=βTxi
** 激活函数的作用是将预测结果映射到因变量所在的取值范围内
备注3: b ′ ( η ) b'(\eta) b′(η)为配分函数
备注4:上述公式是关于随机变量Y的概率密度函数,此处样本 y i y_i yi可以来自于为:伯努利分布、二项分布、高斯分布、泊松分布等
备注5:上述这种概率密度定义是简化的形式,更一般的描述为: p ( Y ∣ η ) = h ( Y ) e x p { η T ∗ ϕ ( Y ) − A ( η ) } p(Y|\eta)=h(Y)exp\{\eta^T*\phi(Y)-A(\eta)\} p(Y∣η)=h(Y)exp{ηT∗ϕ(Y)−A(η)},其中 ϕ ( Y ) \phi(Y) ϕ(Y)为充分统计量, A ( η ) A(\eta) A(η)为配分函数且与随机变量 Y Y Y无关 -
指数族的均值和方差
∂ ∂ η i E [ l o g f ( y i ∣ η i , ϕ ) ] = E [ ∂ ∂ η i l o g f ( y i ∣ η i , ϕ ) ] = ∫ 1 f ( y i ∣ η i , ϕ ) ∗ ∂ f ( y i ∣ η i , ϕ ) ∂ η i ∗ f ( y i ∣ η i , ϕ ) d y i = 0 = E [ y i − b ′ ( η i ) ϕ ] \begin{aligned} \frac{\partial}{\partial \eta_i} E[logf(y_i|\eta_i,\phi)] =& E[\frac{\partial}{\partial \eta_i}logf(y_i|\eta_i,\phi) ]\\ =& \int\frac{1}{f(y_i|\eta_i,\phi)}*\frac{\partial f(y_i|\eta_i,\phi)}{\partial \eta_i}*f(y_i|\eta_i,\phi)dy_i\\ =& 0\\ =& E[\frac{y_i-b'(\eta_i)}{\phi}] \end{aligned} ∂ηi∂E[logf(yi∣ηi,ϕ)]====E[∂ηi∂logf(yi∣ηi,ϕ)]∫f(yi∣ηi,ϕ)1∗∂ηi∂f(yi∣ηi,ϕ)∗f(yi∣ηi,ϕ)dyi0E[ϕyi−b′(ηi)]
可得随机变量 Y Y Y的期望: E ( Y ) = b ′ ( η i ) E(Y)=b'(\eta_i) E(Y)=b′(ηi)
借助上一篇“Fisher信息量与Cramer-Rao不等式”博文中有提到:
E
[
∂
2
l
n
(
f
(
x
:
θ
)
∂
θ
2
]
=
-
E
{
(
∂
l
n
f
(
x
;
θ
)
∂
θ
)
2
}
E[\frac{\partial^2 ln(f(x:\theta)}{\partial \theta^2}] = \textbf{-}E\{(\frac{\partial lnf(x;\theta)}{ \partial\theta})^2\}
E[∂θ2∂2ln(f(x:θ)]=-E{(∂θ∂lnf(x;θ))2}
E
[
y
i
−
b
′
(
η
i
)
ϕ
]
2
=
E
[
∂
∂
η
i
l
o
g
f
(
y
i
∣
η
i
,
ϕ
)
]
2
=
−
E
[
∂
2
∂
η
i
2
l
o
g
(
y
i
∣
η
i
,
ϕ
)
]
=
E
[
b
′
′
(
η
i
)
ϕ
]
\begin{aligned} E[\frac{y_i-b'(\eta_i)}{\phi} ]^2 =& E[\frac{\partial}{\partial \eta_i}logf(y_i|\eta_i,\phi)]^2\\ =& -E[\frac{\partial^2}{\partial \eta_i^2}log(y_i|\eta_i,\phi)]\\ =& E[\frac{b''(\eta_i)}{\phi}] \end{aligned}
E[ϕyi−b′(ηi)]2===E[∂ηi∂logf(yi∣ηi,ϕ)]2−E[∂ηi2∂2log(yi∣ηi,ϕ)]E[ϕb′′(ηi)]
可得:
V
a
r
(
Y
)
=
E
[
Y
−
b
′
(
η
i
)
]
2
=
ϕ
∗
b
′
′
(
η
i
)
Var(Y)=E[Y-b'(\eta_i)]^2=\phi*b''(\eta_i)
Var(Y)=E[Y−b′(ηi)]2=ϕ∗b′′(ηi)
二、指数分布族案例
(1)伯努利分布[Bernoulli Distribution] ~
B
(
1
,
p
)
B(1, p)
B(1,p)
已知:伯努利分布
Y
Y
Y,
E
(
Y
)
=
p
E(Y)=p
E(Y)=p,
V
a
r
(
Y
)
=
p
(
1
−
p
)
Var(Y)=p(1-p)
Var(Y)=p(1−p).
p
i
y
i
∗
(
1
−
p
i
)
1
−
y
i
=
e
x
p
{
y
i
∗
l
o
g
(
p
i
)
+
(
1
−
y
i
)
∗
l
o
g
(
1
−
p
i
)
}
=
e
x
p
{
y
i
∗
l
o
g
(
p
i
1
−
p
i
)
−
[
−
l
o
g
(
1
−
p
i
)
]
}
\begin{aligned} p_{i}^{y_i}*(1-p_i)^{1-y_i} =& exp\{y_i*log(p_i) + (1-y_i)*log(1-p_i)\}\\ =& exp\{y_i*log(\frac{p_i}{1-p_i}) - [-log(1-p_i)]\} \end{aligned}
piyi∗(1−pi)1−yi==exp{yi∗log(pi)+(1−yi)∗log(1−pi)}exp{yi∗log(1−pipi)−[−log(1−pi)]}
可得:
ϕ
=
1
\phi=1
ϕ=1
η
i
=
l
o
g
(
p
i
1
−
p
i
)
\eta_i=log(\frac{p_i}{1-p_i})
ηi=log(1−pipi) =>
p
i
=
e
η
i
1
+
e
η
i
=
1
1
+
e
−
η
i
=
1
1
+
e
−
β
T
x
i
p_i=\frac{e^{\eta_i}}{1+e^{\eta_i}}=\frac{1}{1+e^{-\eta_i}}=\frac{1}{1+e^{-\beta^{T}x_i}}
pi=1+eηieηi=1+e−ηi1=1+e−βTxi1
b
(
η
i
)
=
−
l
o
g
(
1
−
p
i
)
=
l
o
g
(
1
+
e
η
i
)
b(\eta_i)=-log(1-p_i)=log(1+e^{\eta_i})
b(ηi)=−log(1−pi)=log(1+eηi)
下面我们验证指数分布族中
b
(
η
)
b(\eta)
b(η)与分布期望、方差的关系:
b
′
(
η
i
)
=
e
η
i
1
+
e
η
i
=
p
i
=
E
[
Y
]
b'(\eta_i)=\frac{e^{\eta_i}}{1+e^{\eta_i}}=p_i=E[Y]
b′(ηi)=1+eηieηi=pi=E[Y]
b
′
′
(
η
i
)
∗
ϕ
=
b
′
′
(
η
i
)
=
e
η
i
(
1
+
e
η
i
)
2
=
p
i
∗
(
1
−
p
i
)
=
V
a
r
(
Y
)
b''(\eta_i)*\phi=b''(\eta_i)=\frac{e^{\eta_i}}{(1+e^{\eta_i})^2}=p_i*(1-p_i)=Var(Y)
b′′(ηi)∗ϕ=b′′(ηi)=(1+eηi)2eηi=pi∗(1−pi)=Var(Y)
注1:上述式子中,link function:
g
=
l
o
g
(
t
1
−
t
)
g=log(\frac{t}{1-t})
g=log(1−tt)称为Logit函数; 激活函数
μ
=
1
1
+
e
−
η
\mu=\frac{1}{1+e^{-\eta}}
μ=1+e−η1称为Sigmiod函数
注2:因为伯努利分布中,因变量取值为{0,1}, 所需预测内容为取得{0,1}值的概率
∈
[
0
,
1
]
\in[0,1]
∈[0,1],借助上述Sigmoid函数,将
(
−
∞
,
+
∞
)
→
[
0
,
1
]
(-\infty,+\infty)\rightarrow[0,1]
(−∞,+∞)→[0,1]
注3:将
(
−
∞
,
+
∞
)
→
[
0
,
1
]
(-\infty,+\infty)\rightarrow[0,1]
(−∞,+∞)→[0,1]除了采取Sigmoid激活函数,还可以采用正态分布的累积分布函数
Φ
(
x
)
∈
[
0
,
1
]
\Phi(x)\in[0,1]
Φ(x)∈[0,1]作为激活,此时Link Function称为Probit函数,记作:
Φ
−
1
(
t
)
\Phi^{-1}(t)
Φ−1(t)
(2)二项分布[Binomial Distribution]~
B
(
n
,
p
)
B(n, p)
B(n,p)
已知:二项分布
Y
Y
Y,
E
(
Y
)
=
n
p
E(Y)=np
E(Y)=np,
V
a
r
(
Y
)
=
n
p
(
1
−
p
)
Var(Y)=np(1-p)
Var(Y)=np(1−p).
C
n
y
i
p
i
y
i
∗
(
1
−
p
i
)
n
−
y
i
=
e
x
p
{
y
i
∗
l
o
g
(
p
i
1
−
p
i
)
+
n
l
o
g
(
1
−
p
i
)
+
l
o
g
C
n
y
i
}
=
e
x
p
{
y
i
∗
l
o
g
(
p
i
1
−
p
i
)
−
[
−
l
o
g
(
1
−
p
i
)
]
}
\begin{aligned} C_n^{y_i} p_{i}^{y_i} * (1-p_i)^{n-y_i} =& exp\{y_i*log(\frac{p_i}{1-p_i}) + nlog(1-p_i) + logC_n^{y_i}\}\\ =& exp\{y_i*log(\frac{p_i}{1-p_i}) - [-log(1-p_i)]\} \end{aligned}
Cnyipiyi∗(1−pi)n−yi==exp{yi∗log(1−pipi)+nlog(1−pi)+logCnyi}exp{yi∗log(1−pipi)−[−log(1−pi)]}
对上式随机变量
y
i
y_i
yi~
B
(
n
,
p
i
)
B(n,p_i)
B(n,pi),做如下变换:
y
i
/
n
,
s
.
t
.
y
i
=
0
,
1
/
n
,
2
/
m
=
n
,
⋯
,
1
y_i/n, s.t. y_i=0,1/n,2/m=n,\cdots,1
yi/n,s.t.yi=0,1/n,2/m=n,⋯,1
C
n
y
i
p
i
y
i
∗
(
1
−
p
i
)
n
−
y
i
=
e
x
p
{
n
y
i
∗
l
o
g
(
p
i
1
−
p
i
)
+
n
l
o
g
(
1
−
p
i
)
+
l
o
g
C
n
n
y
i
}
=
e
x
p
{
y
i
∗
l
o
g
(
p
i
1
−
p
i
)
−
(
−
l
o
g
(
1
−
p
i
)
)
1
n
+
l
o
g
C
n
n
y
i
}
\begin{aligned} C_n^{y_i} p_{i}^{y_i} * (1-p_i)^{n-y_i} =& exp\{ny_i*log(\frac{p_i}{1-p_i}) + nlog(1-p_i) + logC_n^{ny_i}\}\\ =& exp\{\frac{y_i*log(\frac{p_i}{1-p_i}) - (-log(1-p_i))}{\frac{1}{n}}+ logC_n^{ny_i}\} \end{aligned}
Cnyipiyi∗(1−pi)n−yi==exp{nyi∗log(1−pipi)+nlog(1−pi)+logCnnyi}exp{n1yi∗log(1−pipi)−(−log(1−pi))+logCnnyi}
可得:
ϕ
=
1
n
\phi=\frac{1}{n}
ϕ=n1
η
i
=
l
o
g
(
p
i
1
−
p
i
)
\eta_i=log(\frac{p_i}{1-p_i})
ηi=log(1−pipi) =>
p
i
=
e
η
i
1
+
e
η
i
=
1
1
+
e
−
η
i
p_i=\frac{e^{\eta_i}}{1+e^{\eta_i}}=\frac{1}{1+e^{-\eta_i}}
pi=1+eηieηi=1+e−ηi1
b
(
η
i
)
=
−
l
o
g
(
1
−
p
i
)
=
l
o
g
(
1
+
e
η
i
)
b(\eta_i)=-log(1-p_i)=log(1+e^{\eta_i})
b(ηi)=−log(1−pi)=log(1+eηi) =>
b
′
(
η
i
)
=
e
η
i
1
+
e
η
i
=
p
i
b'(\eta_i)=\frac{e^{\eta_i}}{1+e^{\eta_i}}=p_i
b′(ηi)=1+eηieηi=pi=>
E
[
B
(
n
,
p
i
)
/
n
]
=
p
i
E[B(n,p_i)/n]=p_i
E[B(n,pi)/n]=pi
=>
E
[
B
(
n
,
p
i
)
]
=
n
p
i
E[B(n,p_i)]=np_i
E[B(n,pi)]=npi
b
′
′
(
η
i
)
∗
ϕ
=
b
′
′
(
η
i
)
/
n
=
e
η
i
n
(
1
+
e
η
i
)
2
=
p
i
∗
(
1
−
p
i
)
/
n
b''(\eta_i)*\phi=b''(\eta_i)/n=\frac{e^{\eta_i}}{n(1+e^{\eta_i})^2}=p_i*(1-p_i)/n
b′′(ηi)∗ϕ=b′′(ηi)/n=n(1+eηi)2eηi=pi∗(1−pi)/n
=>
V
a
r
(
B
(
n
,
p
i
)
)
=
n
p
i
(
1
−
p
i
)
Var(B(n,p_i))=np_i(1-p_i)
Var(B(n,pi))=npi(1−pi)
注意:上述式子中,link function:
g
=
l
o
g
(
t
1
−
t
)
g=log(\frac{t}{1-t})
g=log(1−tt), 激活函数
μ
=
1
1
+
e
−
η
\mu=\frac{1}{1+e^{-\eta}}
μ=1+e−η1
(3)正态分布[Normal Distribution]~
N
(
μ
,
σ
)
N(\mu, \sigma)
N(μ,σ)
已知:正态分布
Y
Y
Y,
E
(
Y
)
=
μ
E(Y)=\mu
E(Y)=μ,
V
a
r
(
Y
)
=
σ
2
Var(Y)=\sigma^2
Var(Y)=σ2.
1
2
π
σ
∗
e
x
p
{
−
(
y
−
u
)
2
2
σ
2
}
=
e
x
p
{
y
∗
μ
−
μ
2
2
σ
2
−
y
2
2
∗
σ
2
−
1
2
l
o
g
(
2
π
σ
2
)
}
\begin{aligned} \frac{1}{\sqrt{2\pi}\sigma}*exp\{-\frac{(y-u)^2}{2\sigma^2}\} =& exp\{\frac{y*\mu-\frac{\mu^2}{2}}{\sigma^2} - \frac{y^2}{2*\sigma^2} - \frac{1}{2}log(2\pi\sigma^2)\} \end{aligned}
2πσ1∗exp{−2σ2(y−u)2}=exp{σ2y∗μ−2μ2−2∗σ2y2−21log(2πσ2)}
可得:
ϕ
=
σ
2
\phi=\sigma^2
ϕ=σ2
η
=
μ
\eta=\mu
η=μ
b
(
η
)
=
μ
2
2
b(\eta)=\frac{\mu^2}{2}
b(η)=2μ2 =>
b
′
(
η
)
=
μ
b'(\eta)=\mu
b′(η)=μ =>
E
[
y
]
=
μ
E[y]=\mu
E[y]=μ
b
′
′
(
η
)
∗
ϕ
=
σ
2
b''(\eta)*\phi=\sigma^2
b′′(η)∗ϕ=σ2 =>
V
a
r
(
y
)
=
σ
2
Var(y)=\sigma^2
Var(y)=σ2
注意:上述式子中,
η
i
=
μ
(
η
i
)
=
g
(
η
i
)
\eta_i=\mu(\eta_i)=g(\eta_i)
ηi=μ(ηi)=g(ηi),
μ
\mu
μ与
g
g
g为恒等连接映射
(4)泊松分布[Possion Distribution]~
P
(
θ
)
P(\theta)
P(θ)
已知:正态分布
Y
Y
Y,
E
(
Y
)
=
θ
E(Y)=\theta
E(Y)=θ,
V
a
r
(
Y
)
=
θ
Var(Y)=\theta
Var(Y)=θ.
θ
y
∗
e
−
θ
y
!
=
e
x
p
{
y
l
o
g
(
θ
)
−
θ
−
l
o
g
(
y
!
)
}
\begin{aligned} \frac{\theta^y*e^{-\theta}}{y!} =& exp\{ylog(\theta)- \theta - log(y!)\} \end{aligned}
y!θy∗e−θ=exp{ylog(θ)−θ−log(y!)}
可得:
ϕ
=
1
\phi=1
ϕ=1
η
=
l
o
g
(
θ
)
\eta=log(\theta)
η=log(θ) =>
θ
=
e
η
\theta=e^\eta
θ=eη
b
(
η
)
=
θ
=
e
η
b(\eta)=\theta=e^\eta
b(η)=θ=eη =>
b
′
(
η
)
=
e
η
=
θ
b'(\eta)=e^\eta=\theta
b′(η)=eη=θ =>
E
[
y
]
=
θ
E[y]=\theta
E[y]=θ
b
′
′
(
η
i
)
∗
ϕ
=
e
η
=
θ
b''(\eta_i)*\phi=e^\eta=\theta
b′′(ηi)∗ϕ=eη=θ =>
V
a
r
(
y
)
=
θ
Var(y)=\theta
Var(y)=θ
注意:上述式子中,link function:
g
=
l
o
g
(
t
)
g=log(t)
g=log(t), 激活函数
μ
=
e
η
\mu=e^\eta
μ=eη