Pw(y|x)=1Zwexp(∑i=1nwifi(x,y)))
P
w
(
y
|
x
)
=
1
Z
w
e
x
p
(
∑
i
=
1
n
w
i
f
i
(
x
,
y
)
)
)
其中:
Zw(x)=∑yexp(∑i=1nwifi(x,y))
Z
w
(
x
)
=
∑
y
e
x
p
(
∑
i
=
1
n
w
i
f
i
(
x
,
y
)
)
极大似然估计
对于给定数据集T={(x1,y1),(x2,y2),⋅⋅⋅,(xN,yN)}
T
=
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
⋅
⋅
⋅
,
(
x
N
,
y
N
)
}
,其中x
x
的可能取值记为X={v1,v2,⋅⋅⋅,vm},y
y
的可能取值记为Y={γ1,γ2,⋅⋅⋅,γn}。用C(X=vi,Y=γj)
C
(
X
=
v
i
,
Y
=
γ
j
)
表示样本(vi,γj)
(
v
i
,
γ
j
)
在数据集中出现的次数。
采用极大思然估计模型参数,似然函数为:
L(y1,y2,⋅⋅⋅,yN|x1,x2,⋅⋅⋅,xN)=∏i=1Np(xi|yi)=∏X,Yp(Y=γj|X=vi)C(X=vi,Y=γj)
L
(
y
1
,
y
2
,
⋅
⋅
⋅
,
y
N
|
x
1
,
x
2
,
⋅
⋅
⋅
,
x
N
)
=
∏
i
=
1
N
p
(
x
i
|
y
i
)
=
∏
X
,
Y
p
(
Y
=
γ
j
|
X
=
v
i
)
C
(
X
=
v
i
,
Y
=
γ
j
)
两边同时开N次方,得:
L(y1,y2,⋅⋅⋅,yN|x1,x2,⋅⋅⋅,xN)1N=∏X,Yp(Y=γi|x=vj)C(X=vi,Y=γj)N=∏X,Yp(Y=γj|X=vi)p˜(X=vi,Y=γj)
L
(
y
1
,
y
2
,
⋅
⋅
⋅
,
y
N
|
x
1
,
x
2
,
⋅
⋅
⋅
,
x
N
)
1
N
=
∏
X
,
Y
p
(
Y
=
γ
i
|
x
=
v
j
)
C
(
X
=
v
i
,
Y
=
γ
j
)
N
=
∏
X
,
Y
p
(
Y
=
γ
j
|
X
=
v
i
)
p
~
(
X
=
v
i
,
Y
=
γ
j
)
p˜(X=vi,Y=γj)
p
~
(
X
=
v
i
,
Y
=
γ
j
)
表示数据集的经验概率分布。
对数似然为:
Lp˜(Pw)=Nlog∏X,Yp(Y=γj|X=vi)p˜(X=vi,Y=γj)=N∑X,Yp˜(X=vi,Y=γj)logp(Y=γj|X=vi)
L
p
~
(
P
w
)
=
N
l
o
g
∏
X
,
Y
p
(
Y
=
γ
j
|
X
=
v
i
)
p
~
(
X
=
v
i
,
Y
=
γ
j
)
=
N
∑
X
,
Y
p
~
(
X
=
v
i
,
Y
=
γ
j
)
l
o
g
p
(
Y
=
γ
j
|
X
=
v
i
)
Lp˜(Pw)∝∑X,Yp˜(X=vi,Y=γj)logp(Y=γj|X=vi)
L
p
~
(
P
w
)
∝
∑
X
,
Y
p
~
(
X
=
v
i
,
Y
=
γ
j
)
l
o
g
p
(
Y
=
γ
j
|
X
=
v
i
)
简记为:
Lp˜(Pw)=∑x,yp˜(x,y)logp(y|x)
L
p
~
(
P
w
)
=
∑
x
,
y
p
~
(
x
,
y
)
l
o
g
p
(
y
|
x
)
当条件概率是最大熵模型时,有:
Lp˜(Pw)=∑x,yp˜(x,y)(∑i=1nwifi(x,y)−logZw(x))=∑x,yp˜(x,y)∑i=1nwifi(x,y)−∑xp˜(x,y)logZw(x)
L
p
~
(
P
w
)
=
∑
x
,
y
p
~
(
x
,
y
)
(
∑
i
=
1
n
w
i
f
i
(
x
,
y
)
−
l
o
g
Z
w
(
x
)
)
=
∑
x
,
y
p
~
(
x
,
y
)
∑
i
=
1
n
w
i
f
i
(
x
,
y
)
−
∑
x
p
~
(
x
,
y
)
l
o
g
Z
w
(
x
)
最大熵模型的对偶函数:
L(Pw,w)=−H(pw)+∑i=1nwi(Ep˜(fi)−Epw(fi))=∑x,yp˜(x)pw(y|x)logpw(y|x)+∑i=1nwi(∑x,yp˜(x,y)fi(x,y)−∑x,yp˜(x)pw(y|x)fi(x,y))=∑x,yp˜(x)pw(y|x)(∑i=1nwifi(x,y)−logZw(x))+∑x,yp˜(x,y)∑i=1nwifi(x,y)−∑x,yp˜(x)pw(y|x)∑i=1nwifi(x,y)=∑x,yp˜(x,y)∑i=1nwifi(x,y)−∑x,yp˜(x,y)pw(y|x)logZw(x)=∑x,yp˜(x,y)∑i=1nwifi(x,y)−∑xp˜(x,y)logZw(x)
L
(
P
w
,
w
)
=
−
H
(
p
w
)
+
∑
i
=
1
n
w
i
(
E
p
~
(
f
i
)
−
E
p
w
(
f
i
)
)
=
∑
x
,
y
p
~
(
x
)
p
w
(
y
|
x
)
l
o
g
p
w
(
y
|
x
)
+
∑
i
=
1
n
w
i
(
∑
x
,
y
p
~
(
x
,
y
)
f
i
(
x
,
y
)
−
∑
x
,
y
p
~
(
x
)
p
w
(
y
|
x
)
f
i
(
x
,
y
)
)
=
∑
x
,
y
p
~
(
x
)
p
w
(
y
|
x
)
(
∑
i
=
1
n
w
i
f
i
(
x
,
y
)
−
l
o
g
Z
w
(
x
)
)
+
∑
x
,
y
p
~
(
x
,
y
)
∑
i
=
1
n
w
i
f
i
(
x
,
y
)
−
∑
x
,
y
p
~
(
x
)
p
w
(
y
|
x
)
∑
i
=
1
n
w
i
f
i
(
x
,
y
)
=
∑
x
,
y
p
~
(
x
,
y
)
∑
i
=
1
n
w
i
f
i
(
x
,
y
)
−
∑
x
,
y
p
~
(
x
,
y
)
p
w
(
y
|
x
)
l
o
g
Z
w
(
x
)
=
∑
x
,
y
p
~
(
x
,
y
)
∑
i
=
1
n
w
i
f
i
(
x
,
y
)
−
∑
x
p
~
(
x
,
y
)
l
o
g
Z
w
(
x
)