我的 torch 版本: 1.8.1+cu111
我的 paddle 版本: 2.4.1
BCELoss
在 forward
中也是调用 nn.functional.binary_cross_entropy
,所以只看该函数即可
# Paddle 的 BCELoss 源码
class BCELoss(Layer):
def __init__(self, weight=None, reduction='mean', name=None):
if reduction not in ['sum', 'mean', 'none']:
raise ValueError(
"The value of 'reduction' in bce_loss should be 'sum', 'mean' or 'none', but "
"received %s, which is not allowed." % reduction
)
super(BCELoss, self).__init__()
self.weight = weight
self.reduction = reduction
self.name = name
def forward(self, input, label): # 此处也是调用 binary_cross_entropy 来实现
out = paddle.nn.functional.binary_cross_entropy(
input, label, self.weight, self.reduction, self.name
)
return out
1. loss公式
输入 x = [ x 1 , x 2 , . . . , x n ] \mathbf{x} = [x_1, x_2, ..., x_n] x=[x1,x2,...,xn] 共有 n 个分量,则 p ( x i ) p(x_i) p(xi) 是 x \mathbf{x} x 为第 i i i类的概率, q ( x i ) q(x_i) q(xi) 是 x \mathbf{x} x 为第 i i i类的预测概率
H ( p , q ) = − ∑ i = 1 n p ( x i ) l o g ( q ( x i ) ) H(p, q) = - \sum_{i=1}^{n} p(x_i) log(q(x_i)) H(p,q)=−i=1∑np(xi)log(q(xi))
由于我们在二分类时只有两类,则:
H
(
p
,
q
)
=
−
∑
i
=
1
n
p
(
x
i
)
l
o
g
(
q
(
x
i
)
)
=
−
p
(
x
i
)
∗
l
o
g
(
q
(
x
i
)
)
−
(
1
−
p
(
x
i
)
)
∗
l
o
g
(
1
−
q
(
x
i
)
)
\begin{aligned} H(p, q) &= - \sum_{i=1}^{n} p(x_i) log(q(x_i)) \\ &= -p(x_i) * log(q(x_i)) - (1-p(x_i)) * log(1-q(x_i)) \end{aligned}
H(p,q)=−i=1∑np(xi)log(q(xi))=−p(xi)∗log(q(xi))−(1−p(xi))∗log(1−q(xi))
将
p
(
x
i
)
p(x_i)
p(xi)视为
l
a
b
e
l
label
label 值
y
_
y_{\_}
y_,
q
(
x
i
)
q(x_i)
q(xi)视为预测值
y
^
\hat{y}
y^,也就是:
H
(
p
,
q
)
=
−
∑
i
=
1
n
p
(
x
i
)
l
o
g
(
q
(
x
i
)
)
=
−
p
(
x
i
)
∗
l
o
g
(
q
(
x
i
)
)
−
(
1
−
p
(
x
i
)
)
∗
l
o
g
(
1
−
q
(
x
i
)
)
=
−
y
_
∗
l
o
g
(
y
^
)
−
(
1
−
y
_
)
∗
l
o
g
(
1
−
y
^
)
\begin{aligned} H(p, q) &= - \sum_{i=1}^{n} p(x_i) log(q(x_i)) \\ &= -p(x_i) * log(q(x_i)) - (1-p(x_i)) * log(1-q(x_i)) \\ &=-y_{\_} * log(\hat{y}) - (1 - y_{\_} ) * log(1-\hat{y}) \end{aligned}
H(p,q)=−i=1∑np(xi)log(q(xi))=−p(xi)∗log(q(xi))−(1−p(xi))∗log(1−q(xi))=−y_∗log(y^)−(1−y_)∗log(1−y^)
y _ y_{\_} y_ 是 m m m行 n n n列的 0 / 1 0/1 0/1矩阵, y ^ \hat{y} y^ 是 m m m行 n n n列的取值为 0 0 0 到 1 1 1 的矩阵
也就是文档中的内容:
O u t = − 1 ∗ ( l a b e l ∗ l o g ( i n p u t ) + ( 1 − l a b e l ) ∗ l o g ( 1 − i n p u t ) ) Out = -1 * (label * log(input) + (1 - label) * log(1 - input)) Out=−1∗(label∗log(input)+(1−label)∗log(1−input))
而 w e i g h t weight weight 也就是每个 batch 中元素的权重,或者说每一类的权重
2. 实验代码
torch 和 paddle 的 binary_cross_entropy
是对齐的,以下是实验代码,已经不调用API的手动计算代码
# -*- coding: utf-8 -*-
"""
Created on Wed Jan 4 22:36:50 2023
@author: Ryan
"""
import numpy as np
import torch
import paddle
# ----------- numpy 参数 -----------
np.random.seed(1107)
# 假设 bs=4, 7种(多分类)
np_logit = np.random.rand(4, 5).astype("float32")
np_target = np.random.randint(2, size=(4, 5)).astype("float32")
# 给每个 batch 的元素 加权重
np_weight = np.random.randint(2, 4, size=(5,)).astype("float32")
# ----------- torch -----------
t_logit = torch.tensor(np_logit)
t_target = torch.tensor(np_target)
t_weight = torch.tensor(np_weight)
t_out = torch.nn.functional.binary_cross_entropy(t_logit, t_target,
weight=t_weight,
reduction='none')
# 手动计算
t_out_hand = t_target * torch.log(t_logit) + (1-t_target) * torch.log(1-t_logit)
t_out_hand *= -1
t_out_hand = t_out_hand * t_weight
# ----------- paddle -----------
p_logit = paddle.to_tensor(np_logit)
p_target = paddle.to_tensor(np_target)
p_weight = paddle.to_tensor(np_weight)
p_out = paddle.nn.functional.binary_cross_entropy(p_logit, p_target,
weight=p_weight,
reduction='none')
# 手动计算
p_out_hand = p_target * paddle.log(p_logit) + (1-p_target) * paddle.log(1-p_logit)
p_out_hand *= -1
p_out_hand = p_out_hand * p_weight
结果中,torch结果和 Paddle结果相等
t_out == t_out_hand
p_out == p_out_hand
同时加了 weight 的结果也是对齐的
3. 其他
binary_cross_entropy
与 binary_cross_entropy_with_logits
唯一的区别就是,前者的输入是 logit 通过 sigmoid 的结果,而后者直接接受 logit 作为输入,同时后者也做了一定的简化