用交叉熵(cross entropy)计算点互信息PMI

xhrt

已于 2024-03-19 15:30:13 修改

阅读量537

点赞数 10

文章标签：人工智能深度学习

于 2024-03-19 10:26:43 首次发布

本文链接：https://blog.csdn.net/m0_61688615/article/details/136829944

版权

本文讨论了论文中如何通过计算领域点互信息PMIDC解决表面形式竞争问题，介绍了PMIDC的计算方法，即使用交叉熵来区别于常规的点互信息，并给出了相关代码示例，展示了如何通过深度学习模型计算条件概率和领域条件概率以求得PMIDC和PMI.

摘要由CSDN通过智能技术生成

最近读了论文Surface Form Competition: Why the Highest Probability Answer Isn't Always Right，在读代码时，发现论文对于点互信息的计算，是通过交叉熵进行的。查了一番资料还是不理解，在这里和大伙儿探讨下。

论文提出通过计算领域点互信息PMIDC，来解决表面形式竞争的问题。具体的计算公式为：

点互信息PMI（关于点互信息的介绍可以看这篇博客信息熵、KL散度、交叉熵、互信息、点互信息-CSDN博客）：

PMIDC通过引入“domain”的信息（because），明确了任务范围

举例来说，

Premise(X):The bar closed because.

Domain Premise(Xdomain):because.

Hypothesis 1(y1):it was crowded.

Hypothesis 2(y2):it was 3am.

以y1为例，

对于LM，我们想要计算P(y1|x)，即P(it was crowded | The bar closed because)

对于PMIDC，我们想要计算P(y1|x)/P(y1|Xdomain)，其中P(y1|Xdomain)即 P(it was crowded | because)

对于y2也是同理，我们实际上关注的是使表达式值最大化的yi中i的取值。

#PMIDC= H(y|domain)-H(y|x)
dcpmi = [ce_0 - ce_1 for ce_0,ce_1 in zip(domain_cond_ce, cond_ce)]
#PMI(x,y)=H(y)-H(y|x) H(y)：随机变量y的熵;H(y|x)：给定x的条件下y的熵，cond_ce；
pmi = [ce_0 - ce_1 for ce_0,ce_1 in zip(uncond_ce, cond_ce)]

代码： 通过计算交叉熵来计算点互信息

交叉熵公式：

函数 cross_entropy_list(inputs, targets），通过计算预测值logits和真实值targets之间的交叉熵， 得到 -log P(target|input)：

#计算H(logits,targets)，得到 -logP(targets|inputs)，即-logP(y|x)
def cross_entropy_list(inputs, targets）:
# get logits from the model
with torch.no_grad():
    input_ids = input_ids.to(device)
    logits = model(input_ids).logits.cpu()[:,:-1].contiguous()

# get cross-entropies given the logits
logit_shape = logits.shape
logits = logits.view(-1, logit_shape[-1])
#计算预测值logits和真实值label之间的交叉熵
ce_list = F.cross_entropy(logits, labels[:,1:].contiguous().view(-1), reduction='none')
ce_list = ce_list.view(n_seqs, max_len -1).sum(dim=1).squeeze().tolist()

通过函数cross_entropy_list()，得到H(y|x)，H(y|domain)，H(y)。注意，我之前一直以为x和(x,domain)是两回事，通过代码发现其实x就是(x,domian)。这里H(y)的计算，是以常数序列25为输入计算的。

## get conditional CEs -P(y|x) P(the woman got her hair cut|The man perceived that the woman looked different because)
#-P(y|x) = H(x,y),H(x,y)实际上是H(model(x),y)
# 要计算P(y|x) 先得到x的预测输出 logits = model(inputs) 再计算预测值和真实值的交叉熵 cross_entropy(logits,target)
cond_ce = cross_entropy_list([opt['premise'] for opt in options],
                            [opt['hypothesis'] for opt in options],
                            model, cache=cache, batch=batch, calculate = calculate)

#{'premise': ' The man perceived that the woman looked different because', 'hypothesis': ' the woman got her hair cut.',

## get domain conditional CEs -P(y|domain) P(the woman got her hair cut|because)
domain_cond_ce  = cross_entropy_list([opt['uncond_premise'] for opt in options],
                                [opt['uncond_hypothesis'] for opt in options],
                                model, cache=cache, batch=batch, calculate = calculate)


# 'uncond_premise': ' because', 'uncond_hypothesis': ' the woman got her hair cut.'

## get unconditional CEs -P(y)
#计算常数序列25和uncond_hypothesis直接的交叉熵 P(the woman got her hair cut|[25,25])
uncond_ce = cross_entropy_list([[25] for opt in options],
                               [opt['uncond_hypothesis'] for opt in options],
                               model, cache=cache, batch=batch, calculate = calculate)

通过交叉熵计算得到PMIDC和PMI:

#PMIDC= H(y|domain)-H(y|x)
dcpmi = [ce_0 - ce_1 for ce_0,ce_1 in zip(domain_cond_ce, cond_ce)]
#PMI(x,y)=H(y)-H(y|x) H(y)：随机变量y的熵;H(y|x)：给定x的条件下y的熵，cond_ce；
pmi = [ce_0 - ce_1 for ce_0,ce_1 in zip(uncond_ce, cond_ce)]

最后得到预测值：

#根据条件交叉熵的最小值确定索引
lm_pred = cond_ce.index(min(cond_ce))
lm_avg_pred = avg_cond_ce.index(min(avg_cond_ce))
lm_domain_cond_pred = domain_cond_ce.index(min(domain_cond_ce))
#根据领域点互信息的最大值确定索引
dcpmi_pred = dcpmi.index(max(dcpmi))
pmi_pred = pmi.index(max(pmi))

xhrt

关注

10
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
用交叉熵(cross entropy)计算点互信息PMI

最近读了论文Surface Form Competition: Why the Highest Probability Answer Isn't Always Right，在读代码时，发现论文对于点互信息的计算，是通过交叉熵进行的。通过函数cross_entropy_list()，得到H(y|x)，H(y|domain)，H(y)。注意，我之前一直以为x和(x,domain)是两回事，通过代码发现其实x就是(x,domian)。这里H(y)的计算，是以常数序列25为输入计算的。对于LM，我们想要计算。
复制链接

扫一扫