LINGMESS: Linguistically Informed Multi Expert Scorers for Coreference Resolution(lingmess-coref)

1 Introduction

Figure 1 atchitecture of multi expert model

just as the shown diagram, which categorize the mention-pairs into 6 types, and there will be a trainable scoring function for each category. 

2 six types mention pairs

  • PRON-PRON-C: compatible pronouns based on their attributes such as gender, number and animacy;
  • PRON-PRON-NC: incompatible pronouns;
  • ENT-PRON: a pronoun and another span
  • MATCH: non-pronoun spans with the same content words;(not pronoun and the content words are same)
  • CONTAINS: one contains the content words of the other;(It's a little hard to understand this one, check the following example)
  • OTHER: all other pairs.

CONTAINS example

He reportedly showed DeLay a videotape that made him weep. Tom DeLay then ...

3. mathematical expression

shared scoring functions(to calculate pairwise scores)

F_{\mathrm{S}}(c, q)=\left\{\begin{array}{ll}f_{m}(c)+f_{m}(q)+f_{a}(c, q) & c \neq \varepsilon \\0 & c=\varepsilon\end{array}\right.

where 

  • c candidate span
  • q query span which appears before c in the sentence
  • \varepsilon means null antecedent
  • f_{m}(scoring each individual span)  
  • f_{a}(scoring the pairwise interaction), f_{m} and  f_{a} are parameterized functions. 
  • f_{a}(c, q) is the pairwise scoring function for the candidate span c and the mention q
  • F_{\mathrm{S}} means shared function, used to score all the mention pairs. but this paper didn't provide the mathematical expression of f_{m} and f_{a}.

sum of probabilities

For each possible mention q, the learning objective optimizes the sum of probabilities over the true antecedent \hat{c} of q.

L_{\mathrm{S}}(q)=\log \sum_{\hat{c} \in \mathcal{C}(q) \cap \operatorname{GOLD}(q)} P_{\mathrm{S}}(\hat{c} \mid q)

where 

  • L_{\mathrm{S}}(q) means sum of probabilities.
  •  \mathcal{C}(q) is the set of all candidate antecedents with a null antecedent \varepsilon.
  • {GOLD}(q) is the set of the true antecedents of q.
  • P_{\mathrm{S}}(\hat{c} \mid q) represents the probability that a candidate span c is the correct antecedent  given the mention q.

P_{\mathrm{S}}(\hat{c} \mid q)=\frac{\exp F_{\mathrm{S}}(\hat{c}, q)}{\sum_{c^{\prime} \in \mathcal{C}(q)} \exp F_{\mathrm{S}}\left(c^{\prime}, q\right)}

where 

  • \hat{c} is a specific candidate span,
  • c^{\prime} represents all other candidate spans considered for mention q.

LINGMESS

pairwise scoring function

\begin{aligned}F(c, q) & =\left\{\begin{array}{ll}f_{m}(c)+f_{m}(q)+f(c, q) & c \neq \varepsilon \\0 & c=\varepsilon\end{array}\right. \\f(c, q) & =f_{a}(c, q)+f_{a}^{T(c, q)}(c, q)\end{aligned}

where

  • {T(c, q)} is a deterministic, rule-based function to determine the category t of the pair (c, q)
  • f(c, q) scoring c as the antecedent of q

Training

L_{\mathrm{COREF}}(q)=\log \sum_{\hat{c} \in \mathcal{C}(q) \cap \operatorname{GOLD}(q)} P(\hat{c} \mid q)

where

  • L_{\mathrm{COREF}}(q) means the model optimizes the objective function.

Training objective by also training each “expert” separately

\begin{array}{c}L_{t}(q)=\log \sum_{\hat{c} \in \mathcal{C}_{t}(q) \cap \operatorname{GOLD}(q)} P_{t}(\hat{c} \mid q) \\ \\P_{t}(\hat{c} \mid q)=\frac{\exp F_{t}(\hat{c}, q)}{\sum_{c^{\prime} \in \mathcal{C}_{t}(q)} \exp F_{t}\left(c^{\prime}, q\right)} \\ \\F_{t}(c, q)=\left\{\begin{array}{ll}f_{m}(c)+f_{m}(q)+f_{a}^{t}(c, q) & c \neq \varepsilon \\0 & c=\varepsilon\end{array}\right.\end{array}

where 

  • L_{t}(q) stand for model's predictions for the correct antecedents of mention q under a specific type t
  • t is in {PRON-PRON-C, PRON-PRON-NC, ENT-PRON, MATCH, CONTAINS, OTHER}

Final objective for each mention span q is

\begin{aligned}L(q) & =L_{\mathrm{COREF}}(q)+L_{\mathrm{EXPERTS}}(q) \\ \\ L_{\mathrm{EXPERTS}}(q) & =\sum_{t} L_{t}(q)+L_{\mathrm{S}}(q)\end{aligned}

where 

L_{t}(q) means the sum of probabilities over the type t.

4 Experiment

5 Debug

Call multiple GPUs

An A6000(48GB) is not enough to run the project, so we use nn.DataParallel to call multiple GPUs to train the model, if input batch_size=20, a single GPU will work on total batch_size, but with nn.DataParallel,  GPU:0 could work in batch_size=8, GPU: 1 work on batch_size=8 and GPU:3 work on batch_size=4, which could release the pressure of single GPU.  

To deploy that, use

model = nn.DataParallel(model)

there are some details of it, debug in the project could be better.

note: The shape output from every device should be same(batch_size could be different), otherwise the gather operation(which happened when you got the outputs from the model) will fail, the errors will occur.  example:

# ok
torch.Size([]) torch.Size([1, 30, 30]) torch.Size([1, 30, 30]) cuda:2
torch.Size([]) torch.Size([2, 30, 30]) torch.Size([2, 30, 30]) cuda:1
torch.Size([]) torch.Size([2, 30, 30]) torch.Size([2, 30, 30]) cuda:0

# gather failed
torch.Size([]) torch.Size([1, 482, 482]) torch.Size([1, 482, 482]) cuda:1
torch.Size([]) torch.Size([1, 451, 451]) torch.Size([1, 451, 451]) cuda:0

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值