PyTorch二分类时BCELoss，CrossEntropyLoss，Sigmoid等的选择和使用

最新推荐文章于 2025-04-03 17:46:44 发布

蛐蛐蛐

最新推荐文章于 2025-04-03 17:46:44 发布

阅读量9.1k

点赞数 20

分类专栏：科研工具深度学习 Python技巧文章标签： pytorch 深度学习 python

本文链接：https://blog.csdn.net/qysh123/article/details/120170765

版权

科研工具同时被 3 个专栏收录

137 篇文章

订阅专栏

Python技巧

99 篇文章

订阅专栏

深度学习

65 篇文章

订阅专栏

这两天我们有一篇顶会的论文需要Minor Revision，reviewer说让我们在网络图中把fully connected layer画出来。对到底应该画图、画到多详细有点疑问，所以简单总结一下。

我们在论文中解决的是一个二分类的问题，我们说hidden state送入fully connected layer+sigmoid做分类，但没有在网络图中画出来，有reviewer就让我们改图了。不过画了fully connected layer，还需要画sigmoid吗？这里就总结一下使用PyTorch做二分类时的几种情况：

总体上来讲，有三种实现形式：

一、fully connected layer是形如：nn.Linear(num_ftrs, 1)的形式（PyTorch中的nn.Linear即fully connected layer），也就是说输出维度为1，然后我们再使用sigmoid将输出映射到0,1之间，类似于：

self.outputs = nn.Linear(NETWORK_WIDTH, 1)

def forward(self, x):
    # other layers omitted
    x = self.outputs(x)           
    return torch.sigmoid(x)

那么在这种情况下，我们使用torch.nn.BCELoss作为loss function：

criterion = nn.BCELoss()

net_out = net(data)
loss = criterion(net_out, target)

二、nn.Linear和上面相同，但是不显式使用sigmoid，而使用torch.nn.BCEWithLogitsLoss作为loss function：

self.outputs = nn.Linear(NETWORK_WIDTH, 1)

def forward(self, x):
    # other layers omitted
    x = self.outputs(x)           
    return x
###############################################################
criterion = nn.BCEWithLogitsLoss()

net_out = net(data)
loss = criterion(net_out, target)

正如这里所指出的：https://discuss.pytorch.org/t/confused-about-binary-classification-with-pytorch/83759/5

Just to clarify something, for a binary-classification problem, you are best off using the logits that come out of a final Linear layer, with no threshold or Sigmoid activation, and feed them into BCEWithLogitsLoss. (Using Sigmoid and BCELoss is less numerically stable.)

以上两种其实差别不大，但是后者激活函数与loss在一起的时候（也就是说BCEWithLogitsLoss已经包含了sigmoid作为激活函数），训练过程会更加稳定。

三、nn.Linear输出维度为2维（其实我们在使用BERT，GNN等model的时候，会常常使用第三种这样的方式，我看了看我的代码中，还是这种情况最多），这时候我们使用的loss function是torch.nn.CrossEntropyLoss，其已经包含了softmax作为激活函数），例如下面的代码：

self.dense = nn.Linear(hidden_dim,2)
################################################
criterion = nn.CrossEntropyLoss()
net_out = net(data)
loss = criterion(net_out, target)

和前两种相比，第三种方式稍有不同，具体可以参考这里的说明：https://www.zhihu.com/question/295247085/answer/1778398778

当你用Sigmoid函数的时候，你的最后一层全连接层的神经元个数为1，而当你用Softmax函数的时候，你的最后一层全连接层的神经元个数是2。这个很好理解，因为Sigmoid函数只有是目标和不是目标之分，实际上只存在一类目标类，另外一个是背景类。而Softmax函数将目标分类为了二类，所以有两个神经元。这也是导致两者存在差异的主要原因。

所以总结一下，在PyTorch中进行二分类，有三种主要的全连接层，激活函数和loss function组合的方法，分别是：torch.nn.Linear+torch.sigmoid+torch.nn.BCELoss，torch.nn.Linear+BCEWithLogitsLoss，和torch.nn.Linear（输出维度为2）+torch.nn.CrossEntropyLoss，后两个loss function分别集成了Sigmoid和Softmax。

其中前两种基本相同，第二种在训练和计算过程中更加稳定（所以针对我们论文的情况，可以不用画出sigmoid），最后一种从理论上来讲是等价的（参考：二分类问题，应该选择sigmoid还是softmax？），但是参数更多，网络更为复杂，Softmax本质是适用于多分类问题的。

参考了以下网页：

[1] https://discuss.pytorch.org/t/confused-about-binary-classification-with-pytorch/83759/5

[2] https://stackoverflow.com/questions/53628622/loss-function-its-inputs-for-binary-classification-pytorch

[3] https://www.zhihu.com/question/295247085