线性链条件随机场CRF

线性链条件随机场(CRF)在分类问题中广泛应用,因其考虑了特征间的全局依赖性,能捕捉y与x之间的任意依赖关系。CRF通过动态规划计算概率,同时预测所有标签状态,利用双元特征函数处理相邻标签间的相互影响。在无法处理某些情况时,可通过高阶CRF描述长程交互。
摘要由CSDN通过智能技术生成

Advantages of CRF

CRF is popularly used in classification problems. There are 5 important reasons:
1. The chief advantage of CRF lies in the fact that it models the conditional distribution P(y|x) rather than the joint distribution P(y,x) .
2. A CRF can be used to capture arbitrary dependencies among components of x and y.
3. The major difference between CRF with some other existing methods is that it is a “global” model that considers all residues of the protein as a whole rather than focus merely on a local window around the tag to be labeled.
4. In the inference, the states of all tags are predicted simultaneously in a way that maximizes the overall likelihood. The interdependence between the states of adjacent tags is also explicitly exploited through doublet feature functions used in the model.
5. For some cases that chain CRF cannot handle, we can use high order CRF to add high order features to describe the long-range interaction.

Chain CRF

Definition of Chain CRF

In the chain CRF, all the nodes in the graph form a linear chain. According to the nature of undirected graphical model,  

P(y1,...,yM|x)=1ZxΠi(ψi(Yi,X)ϕi(Yi,Yi1,X))

where ψi(Yi,X) acts over single labels and ϕi(Yi,Yi1,X) acts over edges.
ψi(Yi,X)=exp(ΣkθkSk(yi,x,i))

ϕi(Yi,Yi1,X)=exp(Σjλjtj(yi1,yi,x,i))

After merging all parameter into a single vector,
P(y1,...,yM|x)=exp(ΛTF)Zx

where in the k×n matrix \mathcal{F}, Fji=fj(yi,yi1,x,i) , and dΛ is a vector( k×1 ).
y=argmaxyΛTF

The forward and backward vectors

We use dynamic programming to calculate Zx .

α(y,i)=Σy,α(y,,i1)exp(ΛTf(y,y,,x,i))

β(y,i)=Σy,β(y,,i+1)exp(ΛTf(y,y,,x,i+1))

where f(.,.,.,i) is the feature vector evaluated at the ith sequence position. α and β are called the forward and backward vectors respectively, and both have a time complexity of O(TN2) .\
Then we can write the marginals and partition function as below:
P(Yi=y|x)=α(y,i)β(y,i)/Zx

P(Yi=y,Yi+1=y,|x)=α(y,i)β(y,i+1)exp(ΛTf(y,y,,x,i+1))/Zx

Zx=Σyα(y,T)=Σyβ(y,1)

Inference of Chain CRF

Inference in linear CRF using the Viterbi algorithm:

δ(y,i)=maxy,δ(y,,i1)exp(ΛTf(y,y,,x,i))

where the normalized probability of the best labeling is given by maxyδ(n,y)Zx ,and the best labeling is given by argmaxyδ(n,y)
The time complexity is O(TN2) .
We can also use pseudo likelihood to perform the inference.
yt=argmaxytP(yt|Λ,x)

Training of Chain CRF

CRF too suffer from the bane of overfitting, we impose a penalty on large parameter values. The penalized log-likelihood is given by:

LΛ=Σk(ΛTF(yk,xk)logZΛ(xk))||Λ||22σ2

and the gradient is given by:
LΛ=Σk(F(yk,xk)EP(y|xk)[F(y,xk)]))Λσ2

where EP(y|xk)[F(y,xk)]=ΣiΣy,y,α(i1,y,)β(i,y)exp(ΛTf(y,y,,xk,i)) .\
Then Gradient Descent method (you can find in the last blog) can be used to train all the parameters.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值