【统计学习笔记】习题一
1.1.1 伯努利分布的极大似然估计
P
(
X
=
1
)
=
θ
P
(
X
=
0
)
=
1
−
θ
P(X=1)=\theta\quad P(X=0)=1-\theta
P(X=1)=θP(X=0)=1−θ
设随机变量k次取1,n-k次取0,则
似然函数为:
L
(
θ
)
=
∏
i
=
1
n
P
(
x
i
;
θ
)
=
θ
k
(
1
−
θ
)
n
−
k
L(\theta)=\prod\limits_{i=1}^nP(x_i;\theta)=\theta^k(1-\theta)^{n-k}
L(θ)=i=1∏nP(xi;θ)=θk(1−θ)n−k
取对数:
log
(
L
(
θ
)
)
=
k
log
(
θ
)
+
(
n
−
k
)
log
(
1
−
θ
)
\log(L(\theta))=k\log(\theta)+(n-k)\log(1-\theta)
log(L(θ))=klog(θ)+(n−k)log(1−θ)
求导:
∂
log
(
L
(
θ
)
)
∂
θ
=
k
θ
−
n
−
k
1
−
θ
\frac{\partial{\log(L(\theta))}}{\partial{\theta}}=\frac{k}{\theta}-\frac{n-k}{1-\theta}
∂θ∂log(L(θ))=θk−1−θn−k
当
θ
=
k
/
n
\theta=k/n
θ=k/n时,导数为0,故
θ
\theta
θ的极大似然估计值为
k
/
n
k/n
k/n。
1.1.2 贝叶斯估计
由贝叶斯定理可得:
P
(
θ
∣
A
1
,
A
2
,
⋯
,
A
n
)
=
P
(
A
1
,
A
2
,
⋯
,
A
n
∣
θ
)
×
P
(
θ
)
P
(
A
1
,
A
2
,
⋯
,
A
n
)
P(\theta|A_1,A_2,\cdots,A_n)=\frac{P(A_1,A_2,\cdots,A_n|\theta)\times P(\theta)}{P(A_1,A_2,\cdots,A_n)}
P(θ∣A1,A2,⋯,An)=P(A1,A2,⋯,An)P(A1,A2,⋯,An∣θ)×P(θ)
θ
\theta
θ的贝叶斯估计值为:
θ
^
=
a
r
g
max
θ
P
(
θ
∣
A
1
,
A
2
,
⋯
,
A
n
)
=
a
r
g
max
θ
∏
P
(
A
1
,
A
2
,
⋯
,
A
n
∣
θ
)
×
P
(
θ
)
=
a
r
g
max
θ
θ
k
(
1
−
θ
)
n
−
k
θ
α
−
1
(
1
−
θ
)
β
−
1
\hat{\theta}=arg\max\limits_\theta P(\theta|A_1,A_2,\cdots,A_n)\\=arg\max\limits_\theta \prod P(A_1,A_2,\cdots,A_n|\theta)\times P(\theta)\\=arg\max\limits_\theta\theta^k(1-\theta)^{n-k}\theta^{\alpha-1}(1-\theta)^{\beta-1}
θ^=argθmaxP(θ∣A1,A2,⋯,An)=argθmax∏P(A1,A2,⋯,An∣θ)×P(θ)=argθmaxθk(1−θ)n−kθα−1(1−θ)β−1
求导可得,
θ
^
=
k
+
(
α
−
1
)
n
+
(
α
−
1
)
+
(
β
−
1
)
\hat\theta=\frac{k+(\alpha-1)}{n+(\alpha-1)+(\beta-1)}
θ^=n+(α−1)+(β−1)k+(α−1)
其中
α
,
β
\alpha,\beta
α,β是
β
\beta
β分布中的参数。
1.2 极大似然估计是经验风险最小化的特殊情况
经验风险最小化就是求解优化问题:
min
f
∈
F
1
N
∑
i
=
1
N
L
(
y
i
,
f
(
x
i
)
)
\min\limits_{f\in\mathcal{F}}\frac{1}{N}\sum\limits_{i=1}^{N}L(y_i,f(x_i))
f∈FminN1i=1∑NL(yi,f(xi))
当模型是条件概率分布、损失函数是对数损失函数时,这个问题就变成了:
min
θ
∈
Θ
−
1
N
∑
i
=
1
N
log
P
(
y
i
∣
(
x
i
;
θ
)
)
\min\limits_{\theta\in\Theta}-\frac{1}{N}\sum\limits_{i=1}^{N}\log P(y_i|(x_i;\theta))
θ∈Θmin−N1i=1∑NlogP(yi∣(xi;θ))
等价于极大似然估计:
max
θ
∈
Θ
1
N
∑
i
=
1
N
log
P
(
y
i
∣
(
x
i
;
θ
)
)
\max\limits_{\theta\in\Theta}\frac{1}{N}\sum\limits_{i=1}^{N}\log P(y_i|(x_i;\theta))
θ∈ΘmaxN1i=1∑NlogP(yi∣(xi;θ))