Log-Linear Model & CRF 条件随机场详解

最新推荐文章于 2022-02-13 14:37:44 发布

Jay_Tang

最新推荐文章于 2022-02-13 14:37:44 发布

阅读量802

点赞数 4

分类专栏：机器学习核心推导 NLP 核心推导文章标签：自然语言处理机器学习算法动态规划

本文链接：https://blog.csdn.net/Jay_Tang/article/details/106212417

版权

本文介绍了Log-Linear模型及其如何转换为线性CRF，详细讨论了条件随机场（CRF）的定义、推断问题和学习问题。通过对特征函数的讨论，解释了如何使用动态规划来解决CRF的推断问题，并探讨了CRF参数的学习方法。

摘要由CSDN通过智能技术生成

文章目录

往期文章链接目录

Log-Linear model

Let $x$ be an example, and let $y$ be a possible label for it. A log-linear model assumes that

$w)=\frac{\exp [\sum_{j=1}^J w_{j} F_{j}(x, y)]}{Z(x, w)}$

where the partition function

$w)=\sum_{y^{\prime}} \exp [\sum_{j=1}^J w_{j} F_{j}\left(x, y^{\prime}\right)]$

Note that in $\sum_{y^{\prime}}$ , we make a summation over all possible $y$ . Therefore, given $x$ , the label predicted by the model is

$\hat{y}=\underset{y}{\operatorname{argmax}} p(y | x ; w)=\underset{y}{\operatorname{argmax}} \sum_{j=1}^J w_{j} F_{j}(x, y)$

Each expression $F_j(x, y)$ is called a feature-function. You can think of it as the $j$ -th feature extracted from $(x, y)$ .

Remark of the log-linear model:

a linear combination $\sum_{j=1}^J w_{j} F_{j}(x, y)$ can take any positive or negative real value; the exponential makes it positive.
The division makes the result $p (y ∣ x; w)$ between 0 and 1, i.e. makes them be valid probabilities.

Conditional Random Fields (CRF)

Last time, we talked about Markov Random Fields. In this post, we are going to discuss Conditional Random Fields, which is an important special case of Markov Random Fields arises when they are applied to model a conditional probability distribution $p (y ∣ x)$ , where $x$ and $y$ are vactor-valued variables.

Formal definition of CRF

Formally, a CRF is a Markov network which specifies a conditional distribution

$P(y\mid x) = \frac{1}{Z(x)} \prod_{c \in C} \phi_c(x_c,y_c)$

with partition function

$\sum_{y \in \mathcal{Y}} \prod_{c \in C} \phi_c(x_c,y_c)$

we further assume that the factors $\phi_c(x_c,y_c)$ (maximal cliques) are of the form

$\phi_c(x_c,y_c) = \exp[w_c^T f_c(x_c, y_c)]$

Since we require our potential function $\phi$ to be non-negative, it’s natural to use the exponential function. $f_c(x_c, y_c)$ can be an arbitrary set of features describing the compatibility between $x_c$ and $y_c$ . Note that these feature functions could be designed by manually doing feature engineering or using deep learning, LSTM, etc.

Log-linear model to linear-CRF

As a remainder, let $x$ be an example, and let $y$ be a possible label for it. Then a log-linear model assumes that

$w)=\frac{\exp [\sum_{j=1}^J w_{j} F_{j}(x, y)]}{Z(x, w)}$

From now on, we use the bar notation for sequences. Then to linear-CRF, we write the above equation as

$\begin{aligned} p(\bar y | \bar x; w) &= \frac{\exp [\sum_{j=1}^J w_{j} F_{j}(\bar x, \bar y)]}{Z(\bar x, w)}\\ &= \frac{\exp [\sum_{j=1}^J w_{j} \sum_{i=2}^{T} f_j (y_{i-1}, y_i, \bar x)]}{Z(\bar x, w)} &&\quad(1) \end{aligned}$

where $y$ can take values from ${1,2,...,m\}$ . Here is an example:

Assume we have a sequence $\bar x = (x_1, x_2, x_3, x_4)$ and the corresponding hidden sequence $\bar y = (y_1, y_2, y_3, y_4)$ .

We can divide each feature-function $F_j(\bar x, \bar y)$ into fuctions for each maximal clique. That is,

$F_j(\bar x, \bar y) = \sum_{i=2}^{T} f_j (y_{i-1}, y_i, \bar x) \tag {1.1}$

Perticularly, from the above figure, since we have $3$ maximal cliques, so

最低0.47元/天解锁文章

Jay_Tang

关注

4
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Log-Linear Model & CRF 条件随机场详解

文章目录往期文章链接目录Log-Linear modelConditional Random Fields (CRF)Formal definition of CRFLog-linear model to linear-CRFInference problem for CRFLearning problem for CRFLearning problem for general Log-Linear modelLearning problem for CRFCompute Z(xˉ,w)Z(\bar x,
复制链接

扫一扫