已知最大熵模型为
P
w
(
y
∣
x
)
=
1
Z
w
(
x
)
e
x
p
(
∑
i
=
1
n
w
i
f
i
(
x
,
y
)
)
P_{w}(y|x)=\frac{1}{Z_{w}(x)}exp\Big(\sum_{i=1}^nw_{i}f_{i}(x,y)\Big)
Pw(y∣x)=Zw(x)1exp(i=1∑nwifi(x,y))其中,
Z
w
(
x
)
=
∑
y
e
x
p
(
∑
i
=
1
n
w
i
f
i
(
x
,
y
)
)
Z_{w}(x)=\sum_{y}exp\Big(\sum_{i=1}^nw_{i}f_{i}(x,y)\Big)
Zw(x)=y∑exp(i=1∑nwifi(x,y))对数似然函数为
L
(
w
)
=
∑
x
,
y
P
~
(
x
,
y
)
∑
i
=
1
n
w
i
f
i
(
x
,
y
)
−
∑
x
P
~
(
x
)
log
Z
w
(
x
)
L(w)=\sum_{x,y}\tilde{P}(x,y)\sum_{i=1}^nw_if_i(x,y)-\sum_{x}\tilde{P}(x)\log{Z_{w}(x)}
L(w)=x,y∑P~(x,y)i=1∑nwifi(x,y)−x∑P~(x)logZw(x)
推导过程:
对于给定的经验分布
P
~
(
x
,
y
)
\tilde{P}(x,y)
P~(x,y),模型参数从
w
w
w到
w
+
δ
w+\delta
w+δ,对数似然函数的改变量是
L
(
w
+
δ
)
−
L
(
w
)
=
∑
x
,
y
P
~
(
x
,
y
)
log
P
w
+
δ
(
y
∣
x
)
−
∑
x
,
y
P
~
(
x
,
y
)
log
P
w
(
y
∣
x
)
L(w+\delta)-L(w)=\sum_{x,y}\tilde{P}(x,y)\log{P_{w+\delta}(y|x)}-\sum_{x,y}\tilde{P}(x,y)\log{P_w(y|x)}
L(w+δ)−L(w)=x,y∑P~(x,y)logPw+δ(y∣x)−x,y∑P~(x,y)logPw(y∣x)
=
∑
x
,
y
P
~
(
x
,
y
)
log
(
1
Z
w
+
δ
(
x
)
e
x
p
(
∑
i
=
1
n
(
w
i
+
δ
i
)
f
i
(
x
,
y
)
)
)
−
∑
x
,
y
P
~
(
x
,
y
)
log
(
1
Z
w
(
x
)
e
x
p
(
∑
i
=
1
n
w
i
f
i
(
x
,
y
)
)
)
=\sum_{x,y}\tilde{P}(x,y)\log{\bigg(\frac{1}{Z_{w+\delta}(x)}exp\Big(\sum_{i=1}^n({w_{i}+\delta_{i}})f_{i}(x,y)\Big)\bigg)-\sum_{x,y}\tilde{P}(x,y)\log{\bigg(\frac{1}{Z_{w}(x)}exp\Big(\sum_{i=1}^nw_{i}f_{i}(x,y)\Big)\bigg)}}
=x,y∑P~(x,y)log(Zw+δ(x)1exp(i=1∑n(wi+δi)fi(x,y)))−x,y∑P~(x,y)log(Zw(x)1exp(i=1∑nwifi(x,y)))
=
∑
x
,
y
P
~
(
x
,
y
)
(
log
1
Z
w
+
δ
(
x
)
+
∑
i
=
1
n
(
(
w
i
+
δ
i
)
f
i
(
x
,
y
)
)
)
−
∑
x
,
y
P
~
(
x
,
y
)
(
log
1
Z
w
(
x
)
+
∑
i
=
1
n
(
w
i
f
i
(
x
,
y
)
)
)
=\sum_{x,y}\tilde{P}(x,y)\Big(\log{\frac{1}{Z_{w+\delta}(x)}}+\sum_{i=1}^n((w_{i}+\delta_{i})f_{i}(x,y))\Big)-\sum_{x,y}\tilde{P}(x,y)\Big(\log{\frac{1}{Z_{w}(x)}}+\sum_{i=1}^n(w_{i}f_{i}(x,y))\Big)
=x,y∑P~(x,y)(logZw+δ(x)1+i=1∑n((wi+δi)fi(x,y)))−x,y∑P~(x,y)(logZw(x)1+i=1∑n(wifi(x,y)))
=
∑
x
,
y
P
~
(
x
,
y
)
∑
i
=
1
n
δ
i
f
i
(
x
,
y
)
−
∑
x
P
~
(
x
)
log
Z
w
+
δ
(
x
)
Z
w
(
x
)
=\sum_{x,y}\tilde{P}(x,y)\sum_{i=1}^n\delta_{i}f_{i}(x,y)-\sum_{x}\tilde{P}(x)\log{\frac{Z_{w+\delta}(x)}{Z_{w}(x)}}
=x,y∑P~(x,y)i=1∑nδifi(x,y)−x∑P~(x)logZw(x)Zw+δ(x)
参考:
《统计学习方法》,李航,p89
计算对数似然函数改变量
最新推荐文章于 2024-07-28 10:16:39 发布