PyTorch：输出层和损失函数loss function

-柚子皮-

已于 2023-10-08 20:47:37 修改

阅读量2.6k

点赞数

分类专栏： Pytorch 文章标签： pytorch loss function

于 2020-10-21 00:16:47 首次发布

本文链接：https://blog.csdn.net/pipisorry/article/details/109192165

版权

Pytorch 专栏收录该内容

18 篇文章

订阅专栏

输出相关

[PyTorch：全局函数_输出相关]

Loss Functions

[loss-functions]

nn.L1Loss	Creates a criterion that measures the mean absolute error (MAE) between each element in the input xx and target yy .
nn.MSELoss	Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input xx and target yy .
nn.CrossEntropyLoss	This criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.
nn.CTCLoss	The Connectionist Temporal Classification loss.
nn.NLLLoss	The negative log likelihood loss.
nn.PoissonNLLLoss	Negative log likelihood loss with Poisson distribution of target.
nn.KLDivLoss	The Kullback-Leibler divergence loss measure
nn.BCELoss	Creates a criterion that measures the Binary Cross Entropy between the target and the output:
nn.BCEWithLogitsLoss	This loss combines a Sigmoid layer and the BCELoss in one single class.
nn.MarginRankingLoss	Creates a criterion that measures the loss given inputs x1x1 , x2x2 , two 1D mini-batch Tensors, and a label 1D mini-batch tensor yy (containing 1 or -1).
nn.HingeEmbeddingLoss	Measures the loss given an input tensor xx and a labels tensor yy (containing 1 or -1).
nn.MultiLabelMarginLoss	Creates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input xx (a 2D mini-batch Tensor) and output yy (which is a 2D Tensor of target class indices).
nn.SmoothL1Loss	Creates a criterion that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise.
nn.SoftMarginLoss	Creates a criterion that optimizes a two-class classification logistic loss between input tensor xx and target tensor yy (containing 1 or -1).
nn.MultiLabelSoftMarginLoss	Creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input xx and target yy of size (N, C)(N,C) .
nn.CosineEmbeddingLoss	Creates a criterion that measures the loss given input tensors x_1x1 , x_2x2 and a Tensor label yy with values 1 or -1.
nn.MultiMarginLoss	Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input xx (a 2D mini-batch Tensor) and output yy (which is a 1D tensor of target class indices, 0 \leq y \leq \text{x.size}(1)-10≤y≤x.size(1)−1 ):
nn.TripletMarginLoss	Creates a criterion that measures the triplet loss given an input tensors x1x1 , x2x2 , x3x3 and a margin with a value greater than 00 .
nn.TripletMarginWithDistanceLoss	Creates a criterion that measures the triplet loss given input tensors aa , pp , and nn (representing anchor, positive, and negative examples, respectively), and a nonnegative, real-valued function (“distance function”) used to compute the relationship between the anchor and positive example (“positive distance”) and the anchor and negative example (“negative distance”).

参数

reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the weighted mean of the output is taken, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

[CROSSENTROPYLOSS]

[pytorch loss function 总结]

BCELoss/BCEWithLogitsLoss和CrossEntropyLoss的区别

BCEWithLogitsLoss = Sigmoid+BCELoss，当网络最后一层使用nn.Sigmoid时，就用BCELoss，当网络最后一层不使用nn.Sigmoid时，就用BCEWithLogitsLoss。
BCELoss/BCEWithLogitsLoss用于单标签二分类或者多标签二分类，输出和目标的维度是(batch,C)，batch是样本数量，C是类别数量，对于每一个batch的C个值，对每个值求sigmoid到0-1之间，所以每个batch的C个值之间是没有关系的，相互独立的，所以之和不一定为1。每个C值代表属于一类标签的概率。如果是单标签二分类，那输出和目标的维度是(batch,1)即可。

另外torch.nn.BCELoss是一个类，其中的forward的具体实现调用的是函数F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)。

[torch.nn.modules.loss — PyTorch 2.1 documentation]

[TORCH.NN.FUNCTIONAL.BINARY_CROSS_ENTROPY]

CrossEntropyLoss用于多类别分类，输出和目标的维度是(batch,C)，batch是样本数量，C是类别数量，每一个C之间是互斥的，相互关联的，对于每一个batch的C个值，一起求每个C的softmax，所以每个batch的所有C个值之和是1，哪个值大，代表其属于哪一类。如果用于二分类，那输出和目标的维度是(batch,2)。

二分类输出层使用两个神经元和只使用一个神经元的区别

[深度学习：神经网络的输出层]

二分类时一般把bert压缩成一维过sigmoid或者压缩成2维过softmax。压缩成2维时，Sigmoid（每1维都过sigmoid）和softmax输出的结果不同，第一个所有神经元之和可能大于1，第二个所有神经元输出的结果之和一定等于1。

[神经网络进行二分类时，输出层使用两个神经元和只使用一个神经元，模型的性能有何差异]

[二分类问题输出一个节点还是两个节点_IT莫莫的博客-CSDN博客]

训练算loss时1 使用一个神经元，是用sigmoid处理，最后通过0.5阈值判断0或者1，交叉熵损失函数用的是F.binary_cross_entropy。
2 使用两个神经元，是用softmax处理，并且交叉熵损失函数用的是CrossEntropyLoss。
两个神经元算loss时
elif self.config.problem_type == "single_label_classification":
loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

Note: 这里加view[PyTorch：tensor-基本操作_view]数据和shape都没变化
logits(batch_size, num_labels) labels(batch_size,)

评估Evaluation 或者预测predict时
使用两个神经元时不需要过softmax，直接比较两个神经元大小，取argmax。
如：两个神经元评估算metric时
# You can define your custom compute_metrics function. It takes an `EvalPrediction` object (a namedtuple with a predictions and label_ids field) and has to return a dictionary string to float.
def compute_metrics(p: EvalPrediction):
    preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
    preds = np.squeeze(preds) if is_regression else np.argmax(preds, axis=1)
    return metric.compute(predictions=preds, references=p.label_ids)
如：两个神经元predict时
predictions = trainer.predict(predict_dataset, metric_key_prefix="predict").predictions
predictions = np.squeeze(predictions) if is_regression else np.argmax(predictions, axis=1)

Note: 只使用一个神经元时，必须过sigmoid吧，不然没法和0.5比较。

示例

nn.CrossEntropyLoss的示例

import torch
from torch import nn

# Example of target with class indices
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
# input.size(): torch.Size([3, 5])
target = torch.empty(3, dtype=torch.long).random_(5)
# target.size(): torch.Size([3])
output = loss(input, target)
output.backward()

# Example of target with class probabilities
input = torch.randn(3, 5, requires_grad=True)
# input.size(): torch.Size([3, 5])
target = torch.randn(3, 5).softmax(dim=1)
# target.size(): torch.Size([3, 5])
output = loss(input, target)
output.backward()

nn.BCELOSS的示例

m = nn.Sigmoid()
loss = nn.BCELoss()
input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(m(input), target)
output.backward()

nn.functional.binary_cross_entropy的示例

import torch.nn.functional as F

labels = dataloader["label"]
predictions = outputs.squeeze().contiguous()

loss = F.binary_cross_entropy(predictions, labels, reduction='mean')

input = torch.randn(3, 2, requires_grad=True)
target = torch.rand(3, 2, requires_grad=False)
loss = F.binary_cross_entropy(torch.sigmoid(input), target)
loss.backward()

from: -柚子皮-

ref: [torch.nn — PyTorch 2.0 documentation]