PyTorch学习笔记 - 损失函数-CSDN博客

本文链接：https://blog.csdn.net/2501_90669630/article/details/148435179

文章目录

1. 内置损失函数
2. 继承 nn.Module 自定义损失函数
3. 继承 autograd.Function 自定义损失函数
3. 三种不同方式实现 MSE 实验

PyTorch 除了内置损失函数，还可以自定义损失函数。我们以均方误差为例来讲解 PyTorch 中损失函数的使用方法。均方误差(Mean Squared Error, MSE)是预测值

x=(x_1, x_2, ..., x_n)

与真实值

y=(y_1, y_2, ..., y_n)

之差的平方和的平均值，数学公式如下：

\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (x_i - y_i)^2

计算

\text{MSE}

损失函数对输入向量

x

的梯度如下：

\dfrac{dMSE}{dx} = \dfrac{2}{n}(x-y)

具体而言，

\dfrac{dMSE}{dx_i} = \dfrac{2}{n}(x_i - y_i)

1. 内置损失函数

PyTorch 在 torch.nn 模块中提供了均方误差函数：

import torch.nn as nn

mse_loss = nn.MSELoss()

2. 继承 nn.Module 自定义损失函数

只需实现 forward() 方法，无需手动编写反向传播（自动求导引擎处理）。自定义损失函数类实例化后直接调用即可计算损失值。
继承 nn.Module 自定义均方误差损失函数的实现代码如下：

import torch.nn as nn

class MSELossV1(nn.Module):
    def __init__(self):
        super().__init__()
    
    def forward(self, input, target):
        squared_diff = (input - target) ** 2
        n = squared_diff.numel()
        return squared_diff.sum() / n

3. 继承 autograd.Function 自定义损失函数

在 PyTorch 中，torch.autograd.Function 是一个用于定义自定义自动求导操作的类。它允许用户实现自定义的前向传播forward 和反向传播 backward 逻辑。这对于实现非标准操作、自定义激活函数、或在某些特殊场景中替代现有 PyTorch 操作非常有用。
torch.autograd.Function 实现自定义求导，需要实现 forward 和 backward 方法，这意味着需要自己手算反向传播求梯度公式。
ctx 是上下文对象，用于在 forward 和 backward 之间传递数据。常用方法是：

ctx.save_for_backward(*tensors)：保存张量供反向传播使用
ctx.saved_tensors：获取保存的张量

forward 方法返回计算结果，而 backward 返回对每个输入的梯度。
Function.apply(input) 是调用自定义函数的标准方式。继承 autograd.Function 自定义均方误差损失函数的实现代码如下：

import torch
from torch.autograd import Function

class MSELossV2(Function):
    @staticmethod
    def forward(ctx, input, target):
    	squared_diff = (input - target) ** 2
        n = squared_diff.numel()
        ctx.save_for_backward(input, target)
        return squared_diff.sum() / n

    @staticmethod
    def backward(ctx, grad_output):
        input, target = ctx.saved_tensors
        n = input.numel()
        grad_input = 2 / n * (input - target) * grad_output
        return grad_input, None

在 PyTorch 的 torch.autograd.Function 中，backward 方法的返回值数量和顺序必须与 forward 方法的输入参数一一对应。例如，forward 传入的参数为 input 和 target，则 backward 也要返回两个梯度（例如 grad_input, None）。
每个输入参数都需要对应一个梯度输出：

如果输入参数是张量且需要梯度(requires_grad=True)，返回其梯度
如果输入参数是整数或不需要梯度的张量，返回 None

backward 中的 grad_output 是一个张量，其形状与当前操作的输出张量一致。它表示在反向传播时，每个输出元素的梯度乘以一个
权重（即 grad_output 的值），从而影响输入梯度的计算。

如果 grad_output 未指定（默认为 None），PyTorch 会假设输出是一个标量，并自动使用全 1 的权重，即 torch.ones_like(output)
如果输出是向量或张量，则必须显式指定 grad_output，否则会报错

grad_output 的使用总结如下：

场景	`grad_output` 的作用	示例
标量输出	默认为 1，无需显式指定	`loss.backward()`
向量输出	必须指定，形状与输出一致	`y.backward(torch.ones_like(y))`
多输出	每个输出对应一个 `grad_output`	`grad_output=[v1, v2]`
自定义反向传播	传递上层梯度，计算输入梯度	`backward(ctx, grad_output)`

代码示例：

import torch

x = torch.tensor([2.0], requires_grad=True)
у = x**2
у.backward()   # 等价于 y.backward(torch.tensor(1.0))
print(x.grad)  # 输出 4.0 (dy/dx = 2x = 4)


x2 = torch.tensor([1.0, 2.0], requires_grad=True)
y = x2 * 2
grad_output = torch.tensor([1.0, 0.5])  # 权重分别为 1 和 0.5
y.backward(grad_output)  # x2_grad = tensor([2., 1.]) (grad_output · dy/dx = [1.0, 0.5] · [2., 2.] = [2., 1.])

3. 三种不同方式实现 MSE 实验

实验代码如下：

import torch
import torch.nn as nn
from torch.autograd import Function


class MSELossV1(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, input, target):
        squared_diff = (input - target) ** 2
        n = squared_diff.numel()
        return squared_diff.sum() / n


class MSELossV2(Function):
    @staticmethod
    def forward(ctx, input, target):
        squared_diff = (input - target) ** 2
        n = squared_diff.numel()
        ctx.save_for_backward(input, target)
        return squared_diff.sum() / n

    @staticmethod
    def backward(ctx, grad_output):
        input, target = ctx.saved_tensors
        n = input.numel()
        grad_input = 2 / n * (input - target) * grad_output
        return grad_input, None


if __name__ == "__main__":
    mse_loss = nn.MSELoss()
    mse_loss_v2 = MSELossV1()

    x = torch.tensor(
            [[1.0, 2.0, 3.0],
             [4.0, 5.0, 6.0],
             [7.0, 8.0, 9.0]], requires_grad=True
        )
    x2 = x.detach().clone().requires_grad_(True)
    x3 = x.detach().clone().requires_grad_(True)

    y = torch.tensor(
        [[0.5, 2.5, 2.0],
         [3.5, 5.5, 5.0],
         [6.5, 8.5, 8.0]]
    )

    loss = mse_loss(x, y)
    loss2 = mse_loss_v2(x2, y)
    loss3 = MSELossV2.apply(x3, y)

    print(f"loss: {loss}, loss2: {loss2}, loss3: {loss3}")

    loss.backward()
    loss2.backward()
    loss3.backward()
    print(f"x.grad: \n{x.grad}\n x2.grad: \n{x2.grad}\n x3.grad: \n{x3.grad}")