Reinforcement Learning for Relation Classification from Noisy Data阅读笔记

最新推荐文章于 2022-07-26 22:20:28 发布

feitianlzk

最新推荐文章于 2022-07-26 22:20:28 发布

阅读量2.2k

点赞数

分类专栏：科研文章标签： RL

科研专栏收录该内容

9 篇文章 0 订阅

订阅专栏

Reinforcement Learning for Relation Classification from Noisy Data阅读笔记

《Reinforcement Learning for Relation Classification from Noisy Data》原文链接

- Reinforcement Learning for Relation Classification from Noisy Data阅读笔记

Problem

instance selection
1. given a set of <script type="math/tex" id="MathJax-Element-1"> </script> pairs as $X = {(x 1 , r 1 ), (x 2 , r 2 ), . . . , (x n , r n )}$ , where $x_i$ is a sentence associated with two entities $(h_i , t_i )$ and $r_i$ is a noisy relation label produced by distant supervision
2. determine which sentence truly describes the relation and should be selected as a training instance
relation classification
1. given a sentence $x_i$ and the mentioned entity pair $(h_i , t_i )$
2. predict the semantic relation $r_i$ in $x_i$

Overview

overall process
图片采自原文

Instance selector

The agent follows a policy to decide which action (choosing the current sentence or not) at each state (consisting of the current sentence, the chosen sentence set, and the entity pair), and then receive a reward from the relation classifier at the terminal state when all the selections are made

we split the training sentence instances X = {x 1 , … , x n } into N bags B = {B 1 , B 2 , … , B N } and compute a reward when we finish data selection in a bag. Each bag corresponds to a distinct entity pair, and each bag $B_k$ is a sequence of sentences ${x^1_k , x^2_k, . . . ,x^k_{|B_{{}_k}|} }$ with the same relation label $r_k$

policy: select or not

state: a continuous real-valued vector $F (s_i )$

vector representation of the current sentence
- nonlinear layer of the CNN
The representation of the chosen sentence set
- average of the vector representations
two entities in sentences
- pre-trained knowledge graph embedding table(TransE)

action: policy function(logistic regression)
action
reward:
reward

For the special case $B̂ = ∅$ , we set the reward as the average likelihood of all sentences in the training data
where $B̂$ is the set of selected sentences, which is a subset of B, and r is the relation label of bag B

Optimization:
objective function:
obj func

value function: determined by reward

v i = V (s i | B) = r (s | B | + 1 | B), f o r i = 1, 2, . . ., | B |

$v_i = V (s_i |B) = r(s_{|B|+1} |B),\space for \space i = 1, 2, ..., |B|$
update policy:
According to the policy gradient theorem and the REINFORCE algorithm
Monto-Carlo based policy gradient method
update

Relation classifier

CNN + softmax(proposed by Kim Y)
- cnn
- loss

Model training

pre-training strategy is quite crucial, widely recommended by many other reinforcement learning studies
In order to have a stable update, We update Θ’and Φ’ by linear interpolation: Θ’ ← (1 − τ )Θ’ + τ Θ and
Φ’ ← (1 − τ )Φ’ + τ Φ, where τ << 1 is a hyper-parameter
procedure

Note

distant supervision: In distant supervision, we make use of an already existing database, such as Freebase or a domain-specific database, to collect examples for the relation we want to extract. We then use these examples to automatically generate our training data. For example, Freebase contains the fact that Barack Obama and Michelle Obama are married. We take this fact, and then label each pair of “Barack Obama” and “Michelle Obama” that appear in the same sentence as a positive example for our marriage relation. This way we can easily generate a large amount of (possibly noisy) training data.
Monto-Carlo policy gradient theorem
high variance，slow convergence rate

思考

训练policy function, 讲reward分成多个bag进行更新,同一个bag中r相同,对于其他任务，bag应该如何划分?
instance selector, 参数更新，句子的序列顺序是否有关?
按照目前的v(s)(或reward)计算策略, 如果s,a的个数相同而次序不同，应该不影响。
relation classifier 的Loss function(cross entropy)是否缺了一部分?
训练使用Mc based gradient, others?