Log-Linear Model & CRF 条件随机场详解

本文介绍了Log-Linear模型及其如何转换为线性CRF,详细讨论了条件随机场(CRF)的定义、推断问题和学习问题。通过对特征函数的讨论,解释了如何使用动态规划来解决CRF的推断问题,并探讨了CRF参数的学习方法。
摘要由CSDN通过智能技术生成

往期文章链接目录

Log-Linear model

Let x x x be an example, and let y y y be a possible label for it. A log-linear model assumes that

p ( y ∣ x ; w ) = exp ⁡ [ ∑ j = 1 J w j F j ( x , y ) ] Z ( x , w ) p(y | x ; w)=\frac{\exp [\sum_{j=1}^J w_{j} F_{j}(x, y)]}{Z(x, w)} p(yx;w)=Z(x,w)exp[j=1JwjFj(x,y)]

where the partition function

Z ( x , w ) = ∑ y ′ exp ⁡ [ ∑ j = 1 J w j F j ( x , y ′ ) ] Z(x, w)=\sum_{y^{\prime}} \exp [\sum_{j=1}^J w_{j} F_{j}\left(x, y^{\prime}\right)] Z(x,w)=yexp[j=1JwjFj(x,y)]

Note that in ∑ y ′ \sum_{y^{\prime}} y, we make a summation over all possible y y y. Therefore, given x x x, the label predicted by the model is

y ^ = argmax ⁡ y p ( y ∣ x ; w ) = argmax ⁡ y ∑ j = 1 J w j F j ( x , y ) \hat{y}=\underset{y}{\operatorname{argmax}} p(y | x ; w)=\underset{y}{\operatorname{argmax}} \sum_{j=1}^J w_{j} F_{j}(x, y) y^=yargmaxp(yx;w)=yargmaxj=1JwjFj(x,y)

Each expression F j ( x , y ) F_j(x, y) Fj(x,y) is called a feature-function. You can think of it as the j j j-th feature extracted from ( x , y ) (x,y) (x,y).

Remark of the log-linear model:

  • a linear combination ∑ j = 1 J w j F j ( x , y ) \sum_{j=1}^J w_{j} F_{j}(x, y) j=1JwjFj(x,y) can take any positive or negative real value; the exponential makes it positive.

  • The division makes the result p ( y ∣ x ; w ) p(y | x ; w) p(yx;w) between 0 and 1, i.e. makes them be valid probabilities.

Conditional Random Fields (CRF)

Last time, we talked about Markov Random Fields. In this post, we are going to discuss Conditional Random Fields, which is an important special case of Markov Random Fields arises when they are applied to model a conditional probability distribution p ( y ∣ x ) p(y|x) p(yx), where x x x and y y y are vactor-valued variables.

Formal definition of CRF

Formally, a CRF is a Markov network which specifies a conditional distribution

P ( y ∣ x ) = 1 Z ( x ) ∏ c ∈ C ϕ c ( x c , y c ) P(y\mid x) = \frac{1}{Z(x)} \prod_{c \in C} \phi_c(x_c,y_c) P(yx)=Z(x)1cCϕc(xc,yc)

with partition function

Z = ∑ y ∈ Y ∏ c ∈ C ϕ c ( x c , y c ) Z = \sum_{y \in \mathcal{Y}} \prod_{c \in C} \phi_c(x_c,y_c) Z=yYcCϕc(xc,yc)

we further assume that the factors ϕ c ( x c , y c ) \phi_c(x_c,y_c) ϕc(xc,yc) (maximal cliques) are of the form

ϕ c ( x c , y c ) = exp ⁡ [ w c T f c ( x c , y c ) ] \phi_c(x_c,y_c) = \exp[w_c^T f_c(x_c, y_c)] ϕc(xc,yc)=exp[wcTfc(xc,yc)]

Since we require our potential function ϕ \phi ϕ to be non-negative, it’s natural to use the exponential function. f c ( x c , y c ) f_c(x_c, y_c) fc(xc,yc) can be an arbitrary set of features describing the compatibility between x c x_c xc and y c y_c yc. Note that these feature functions could be designed by manually doing feature engineering or using deep learning, LSTM, etc.

Log-linear model to linear-CRF

As a remainder, let x x x be an example, and let y y y be a possible label for it. Then a log-linear model assumes that

p ( y ∣ x ; w ) = exp ⁡ [ ∑ j = 1 J w j F j ( x , y ) ] Z ( x , w ) p(y | x ; w)=\frac{\exp [\sum_{j=1}^J w_{j} F_{j}(x, y)]}{Z(x, w)} p(yx;w)=Z(x,w)exp[j=1JwjFj(x,y)]

From now on, we use the bar notation for sequences. Then to linear-CRF, we write the above equation as

p ( y ˉ ∣ x ˉ ; w ) = exp ⁡ [ ∑ j = 1 J w j F j ( x ˉ , y ˉ ) ] Z ( x ˉ , w ) = exp ⁡ [ ∑ j = 1 J w j ∑ i = 2 T f j ( y i − 1 , y i , x ˉ ) ] Z ( x ˉ , w ) ( 1 ) \begin{aligned} p(\bar y | \bar x; w) &= \frac{\exp [\sum_{j=1}^J w_{j} F_{j}(\bar x, \bar y)]}{Z(\bar x, w)}\\ &= \frac{\exp [\sum_{j=1}^J w_{j} \sum_{i=2}^{T} f_j (y_{i-1}, y_i, \bar x)]}{Z(\bar x, w)} &&\quad(1) \end{aligned} p(yˉxˉ;w)=Z(xˉ,w)exp[j=1JwjFj(xˉ,yˉ)]=Z(xˉ,w)exp[j=1Jwji=2Tfj(yi1,yi,xˉ)](1)

where y y y can take values from { 1 , 2 , . . . , m } \{1,2,...,m\} { 1,2,...,m}. Here is an example:

Assume we have a sequence x ˉ = ( x 1 , x 2 , x 3 , x 4 ) \bar x = (x_1, x_2, x_3, x_4) xˉ=(x1,x2,x3,x4) and the corresponding hidden sequence y ˉ = ( y 1 , y 2 , y 3 , y 4 ) \bar y = (y_1, y_2, y_3, y_4) yˉ=(y1,y2,y3,y4).

We can divide each feature-function F j ( x ˉ , y ˉ ) F_j(\bar x, \bar y) Fj(xˉ,yˉ) into fuctions for each maximal clique. That is,

F j ( x ˉ , y ˉ ) = ∑ i = 2 T f j ( y i − 1 , y i , x ˉ ) (1.1) F_j(\bar x, \bar y) = \sum_{i=2}^{T} f_j (y_{i-1}, y_i, \bar x) \tag {1.1} Fj(xˉ,yˉ)=i=2Tfj(yi1,yi,xˉ)(1.1)

Perticularly, from the above figure, since we have 3 3 3 maximal cliques, so

F j ( x ˉ , y ˉ ) = f j ( y 1 , y 2 , x ˉ ) + f j ( y 2 , y 3 , x ˉ ) + f j ( y 3 , y 4 , x

  • 4
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值