LINGMESS: Linguistically Informed Multi Expert Scorers for Coreference Resolution(lingmess-coref)

qq_45911550

已于 2023-10-30 12:34:29 修改

阅读量95

点赞数

文章标签： coref python

于 2023-10-29 09:57:08 首次发布

本文链接：https://blog.csdn.net/qq_45911550/article/details/134099921

版权

1 Introduction

Figure 1 atchitecture of multi expert model

just as the shown diagram, which categorize the mention-pairs into 6 types, and there will be a trainable scoring function for each category.

2 six types mention pairs

PRON-PRON-C: compatible pronouns based on their attributes such as gender, number and animacy;
PRON-PRON-NC: incompatible pronouns;
ENT-PRON: a pronoun and another span
MATCH: non-pronoun spans with the same content words;(not pronoun and the content words are same)
CONTAINS: one contains the content words of the other;(It's a little hard to understand this one, check the following example)
OTHER: all other pairs.

CONTAINS example

He reportedly showed DeLay a videotape that made him weep. Tom DeLay then ...

3. mathematical expression

shared scoring functions(to calculate pairwise scores)

$F_{\mathrm{S}}(c, q)=\left\{\begin{array}{ll}f_{m}(c)+f_{m}(q)+f_{a}(c, q) & c \neq \varepsilon \\0 & c=\varepsilon\end{array}\right.$

where

$c$ candidate span
$q$ query span which appears before $c$ in the sentence
$\varepsilon$ means null antecedent
$f_{m}$ (scoring each individual span)
$f_{a}$ (scoring the pairwise interaction), $f_{m}$ and $f_{a}$ are parameterized functions.
$f_{a}(c, q)$ is the pairwise scoring function for the candidate span $c$ and the mention $q$
$F_{\mathrm{S}}$ means shared function, used to score all the mention pairs. but this paper didn't provide the mathematical expression of $f_{m}$ and $f_{a}$ .

sum of probabilities

For each possible mention $q$ , the learning objective optimizes the sum of probabilities over the true antecedent $\hat{c}$ of $q$ .

$L_{\mathrm{S}}(q)=\log \sum_{\hat{c} \in \mathcal{C}(q) \cap \operatorname{GOLD}(q)} P_{\mathrm{S}}(\hat{c} \mid q)$

where

$L_{\mathrm{S}}(q)$ means sum of probabilities.
$\mathcal{C}(q)$ is the set of all candidate antecedents with a null antecedent $\varepsilon$ .
${GOLD}(q)$ is the set of the true antecedents of $q$ .
$P_{\mathrm{S}}(\hat{c} \mid q)$ represents the probability that a candidate span $c$ is the correct antecedent given the mention $q$ .

$P_{\mathrm{S}}(\hat{c} \mid q)=\frac{\exp F_{\mathrm{S}}(\hat{c}, q)}{\sum_{c^{\prime} \in \mathcal{C}(q)} \exp F_{\mathrm{S}}\left(c^{\prime}, q\right)}$

where

$\hat{c}$ is a specific candidate span,
$c^{\prime}$ represents all other candidate spans considered for mention $q$ .

LINGMESS

pairwise scoring function

$\begin{aligned}F(c, q) & =\left\{\begin{array}{ll}f_{m}(c)+f_{m}(q)+f(c, q) & c \neq \varepsilon \\0 & c=\varepsilon\end{array}\right. \\f(c, q) & =f_{a}(c, q)+f_{a}^{T(c, q)}(c, q)\end{aligned}$

where

${T(c, q)}$ is a deterministic, rule-based function to determine the category $t$ of the pair $(c, q)$
$f(c, q)$ scoring $c$ as the antecedent of $q$

Training

$L_{\mathrm{COREF}}(q)=\log \sum_{\hat{c} \in \mathcal{C}(q) \cap \operatorname{GOLD}(q)} P(\hat{c} \mid q)$

where

$L_{\mathrm{COREF}}(q)$ means the model optimizes the objective function.

Training objective by also training each “expert” separately

$\begin{array}{c}L_{t}(q)=\log \sum_{\hat{c} \in \mathcal{C}_{t}(q) \cap \operatorname{GOLD}(q)} P_{t}(\hat{c} \mid q) \\ \\P_{t}(\hat{c} \mid q)=\frac{\exp F_{t}(\hat{c}, q)}{\sum_{c^{\prime} \in \mathcal{C}_{t}(q)} \exp F_{t}\left(c^{\prime}, q\right)} \\ \\F_{t}(c, q)=\left\{\begin{array}{ll}f_{m}(c)+f_{m}(q)+f_{a}^{t}(c, q) & c \neq \varepsilon \\0 & c=\varepsilon\end{array}\right.\end{array}$

where

$L_{t}(q)$ stand for model's predictions for the correct antecedents of mention $q$ under a specific type $t$
$t$ is in {PRON-PRON-C, PRON-PRON-NC, ENT-PRON, MATCH, CONTAINS, OTHER}

Final objective for each mention span $q$ is

$\begin{aligned}L(q) & =L_{\mathrm{COREF}}(q)+L_{\mathrm{EXPERTS}}(q) \\ \\ L_{\mathrm{EXPERTS}}(q) & =\sum_{t} L_{t}(q)+L_{\mathrm{S}}(q)\end{aligned}$

where

$L_{t}(q)$ means the sum of probabilities over the type $t$ .

4 Experiment

5 Debug

Call multiple GPUs

An A6000(48GB) is not enough to run the project, so we use nn.DataParallel to call multiple GPUs to train the model, if input batch_size=20, a single GPU will work on total batch_size, but with nn.DataParallel, GPU:0 could work in batch_size=8, GPU: 1 work on batch_size=8 and GPU:3 work on batch_size=4, which could release the pressure of single GPU.

To deploy that, use

model = nn.DataParallel(model)

there are some details of it, debug in the project could be better.

note: The shape output from every device should be same(batch_size could be different), otherwise the gather operation(which happened when you got the outputs from the model) will fail, the errors will occur. example:

# ok
torch.Size([]) torch.Size([1, 30, 30]) torch.Size([1, 30, 30]) cuda:2
torch.Size([]) torch.Size([2, 30, 30]) torch.Size([2, 30, 30]) cuda:1
torch.Size([]) torch.Size([2, 30, 30]) torch.Size([2, 30, 30]) cuda:0

# gather failed
torch.Size([]) torch.Size([1, 482, 482]) torch.Size([1, 482, 482]) cuda:1
torch.Size([]) torch.Size([1, 451, 451]) torch.Size([1, 451, 451]) cuda:0

qq_45911550

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
LINGMESS: Linguistically Informed Multi Expert Scorers for Coreference Resolution(lingmess-coref)

Architecture of the multi-expert model
复制链接

扫一扫