pytorch weighted_and_neg_topk_cross_entropy 加权的负权重topk交叉熵损失

最新推荐文章于 2024-09-28 14:34:07 发布

ONE_SIX_MIX

最新推荐文章于 2024-09-28 14:34:07 发布

阅读量354

点赞数

CC 4.0 BY-SA版权

分类专栏：深度学习神经网络文章标签： pytorch 深度学习损失函数 NLP 自然语言处理

本文链接：https://blog.csdn.net/ONE_SIX_MIX/article/details/129732022

深度学习同时被 2 个专栏收录

44 篇文章

订阅专栏

神经网络

33 篇文章

订阅专栏

该文介绍了一种用于GPT模型训练的改进损失函数，结合加权和数据增强。新方法在目标类别权重为负且未出现在预测的前K个高概率类别时忽略梯度计算，以适应NLG任务。此优化可能导致非单调的Loss曲线，但能提高模型性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

根据这段时间的NLG经验，继续改进损失函数。

主要用于以下文章所写的 NLP 增强管道。

一种用于GPT模型训练的包含加权和数据增强和损失方法的设计
https://blog.csdn.net/ONE_SIX_MIX/article/details/129682576

相比上面文章里改的的loss，加入topk 负类型测试，当负权重的类别在预测类别前 K 的高概率类别时，才会传递梯度，否则会跳过

import torch
import torch.nn.functional as F
from typing import Optional


@torch.jit.script
def weighted_and_neg_topk_cross_entropy(
    input: torch.Tensor,
    target: torch.Tensor,
    topk: Optional[int]=None,
    target_weight: Optional[torch.Tensor]=None,
    target_mask: Optional[torch.Tensor]=None,
    label_smoothing: float=0.,
    ignore_zero_target_weight: bool=True,
):
    '''
    加权的负权重topk交叉熵损失，主要用于NLG任务，对基于Sample的生成方式比较有效。
    主要行为：
    如果 目标项 对应的 target_weight 权重大于0，按照正常的交叉熵来计算
    如果 目标项 对应的 target_weight 权重小于0，先检查 目标项 的类别是否在 预测项前topk个高概率预测中，如果在，则按正常交叉熵来计算，如果不在，则跳过该项的计算。
    行为目的：
    忽略负向权重已掉出前topk的预测类别的梯度计算

    注意：
    如果使用了负向权重，在模型性能越好时，Loss值并非是单调下降的，可能会上升。并且Loss值可以小于0，然后Loss最小时并不是最优（训练最优）模型。如果需要用于评估，需要结合其他指标来评估。
    例如，可以使用多数为负值的 target_weight，可以发现 Loss 值是负的，然后在收敛后期时，Loss会反弹到0值处。

    以下维度缩写，B 代表批量大小，C 代表词向量维度
    虽然写着形状是 [B,C,...] 和 [B,...]
    :param input:                       FloatTensor shape [B,C,...] , 模型的输出
    :param target:                      LongTensor shape [B,...] , 预测目标
    :param topk:                        int or None , 检查前k个预测，None代表不使用，推荐使用10
    :param target_weight:               FloatTensor shape [B,...] or None , 每个目标的权值
    :param target_mask:                 BoolTensor shape [B,...] or None , 目标的掩码，True代表参与计算，False代表忽略
    :param label_smoothing:             float , 标签平滑
    :param ignore_zero_target_weight:   bool, 是否忽略 target_weight 中为0的目标，使其不参与梯度计算
    :return:
    '''
    assert target.shape[0] == input.shape[0] and target.shape[1:] == input.shape[2:], 'Error! Bad input and target shape.'
    assert topk is None or 0 < topk <= input.shape[1], 'Error! Bad param topk.'
    assert target_weight is None or target.shape == target_weight.shape, 'Error! Bad target_weight shape.'
    assert target_mask is None or (target.shape == target_mask.shape and target_mask.dtype == torch.bool), 'Error! Bad target_mask shape or dtype.'

    loss = F.cross_entropy(input, target, label_smoothing=label_smoothing, reduction='none')

    if target_weight is not None:
        loss = loss * target_weight

    if target_mask is None:
        target_mask = torch.full_like(target, 1, dtype=torch.bool)

    if target_weight is not None and topk is not None:
        # 如果负向权重的目标类别不在前K个列表中时，则跳过
        out_topk_cls = torch.topk(input.detach(), topk, dim=1, sorted=False)[1]
        # 筛选出 权重为负的，并且预测类别在前k个最高概率里的项
        neg_cls_slient_mask = torch.logical_and(~(target[:, None] == out_topk_cls).max(dim=1)[0], target_weight < 0)
        # 取反
        inv_neg_cls_slient_mask = ~neg_cls_slient_mask
        # 应用到 mask 上，即额外排除掉 权重为负的，并且预测类别不在前k个最高概率里的项
        target_mask = target_mask & inv_neg_cls_slient_mask

    if ignore_zero_target_weight and target_weight is not None:
        target_mask = target_mask & ~(target_weight == 0.)

    if target_mask.any().item():
        loss = loss[target_mask].mean()
    else:
        # 如果 mask 全部均为 False，代表 loss 为 0，为确保loss可以backward，所以使用 mul(0.) 处理
        loss = loss.sum().mul(0.)

    return loss


if __name__ == '__main__':
    a = torch.rand([1, 10])
    a[0, 1]+=5
    t = torch.zeros([1],dtype=torch.long) + 1

    a.requires_grad = True
    optim = torch.optim.Adam([a],lr=1e-2)

    for i in range(1000):
        optim.zero_grad()
        loss = weighted_and_neg_topk_cross_entropy(a, t, 9, torch.as_tensor([-0.1]), torch.as_tensor([True]), 0)
        loss.backward()
        optim.step()
        print(loss, a.tolist())