文章目录
Loss Functions in Pytorch
nn.L1Loss | Creates a criterion that measures the mean absolute error (MAE) between each element in the input x x x and target y y y. |
---|---|
nn.MSELoss | Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input x x x and target y y y . |
nn.CrossEntropyLoss | This criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class. |
nn.CTCLoss | The Connectionist Temporal Classification loss. |
nn.NLLLoss | The negative log likelihood loss. |
nn.PoissonNLLLoss | Negative log likelihood loss with Poisson distribution of target. |
nn.KLDivLoss | The Kullback-Leibler divergence _ Loss |
nn.BCELoss | Creates a criterion that measures the Binary Cross Entropy between the target and the output: |
nn.BCEWithLogitsLoss | This loss combines a Sigmoid layer and the BCELoss in one single class. |
nn.MarginRankingLoss | Creates a criterion that measures the loss given inputs x 1 x_1 x1 , x 2 x_2 x2 , two 1D mini-batch Tensors, and a label 1D mini-batch tensor y y y (containing 1 or -1). |
nn.HingeEmbeddingLoss | Measures the loss given an input tensor x x x and a labels tensor y y y (containing 1 or -1). |
nn.MultiLabelMarginLoss | Creates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input x x x (a 2D mini-batch Tensor) and output y y y (which is a 2D Tensor of target class indices). |
nn.SmoothL1Loss | Creates a criterion that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise. |
nn.SoftMarginLoss | Creates a criterion that optimizes a two-class classification logistic loss between input tensor x x x and target tensor y y y (containing 1 or -1). |
nn.MultiLabelSoftMarginLoss | Creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input x x x and target y y y of size ( N , C ) (N, C) (N,C). |
nn.CosineEmbeddingLoss | Creates a criterion that measures the loss given input tensors x 1 x_1 x1 , x 2 x_2 x2 and a Tensor label y y y with values 1 or -1. |
nn.MultiMarginLoss | Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input x x x (a 2D mini-batch Tensor) and output y y y (which is a 1D tensor of target class indices, 0 ≤ y ≤ x.size ( 1 ) − 1 0 \leq y \leq \text{x.size}(1)-1 0≤y≤x.size(1)−1 ): |
nn.TripletMarginLoss | Creates a criterion that measures the triplet loss given an input tensors x 1 x_1 x1 , x 2 x_2 x2 , x 3 x_3 x3 and a margin with a value greater than 0 . |
基本用法:
# 构造函数有自己的参数
criterion = LossCriterion()
# 调用标准时也有参数
loss = criterion(x, y)
解释:
第一行代码:在 Pytorch 中,所有损失函数都定义为一个 class,因此,使用损失函数的第一步是实例化。
第二行代码:
在 Pytorch 中,所有损失函数都继承于父类 _Loss
,而 _Loss
又同样继承于 Module
,前面介绍过 Mudule
是 callable,因此,损失函数的实例也是 callable,此时可传入必须的参数,如预测结果
x
x
x 和真实值
y
y
y。
一些说明
1、
在早期版本的 Pytorch 中,使用 bool 型参数 size_average
决定是否对计算出来的总损失求平均(即除以样本数 n),现已弃用,改为使用 reduction
,
- 默认
reduction='mean'
,即求平均。 - 可选
reduction='sum'
。 - None
下面举一个简单的例子,
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)
loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')
# forward
loss_none = loss_f_none(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)
# view
print("Cross Entropy Loss:\n ", loss_none, loss_sum, loss_mean)
输出
Cross Entropy Loss:
tensor([1.3133, 0.1269, 0.1269]) tensor(1.5671) tensor(0.5224)
2、
同样的,在早期版本的 Pytorch 中,使用 bool 型参数 reduce
决定是否对 mini-batch 的总损失求平均,现已弃用,默认对 mini-batch
取了平均。
3、
在下面的损失函数的介绍中,将直接省略 size_average
和 reduce
。
4、
下述的 1-D
是一维的意思。
5、
下述的 Parameters 指的是实例化时的参数。
下述的 shape 包括了调用时的参数,及 output 值。
L1Loss
torch.nn.L1Loss(size_average=None, reduce=None, reduction: str = 'mean')
创建一个衡量输入预测结果 x
和目标 y
之间差的绝对值的平均值的标准。
l
o
s
s
(
x
,
y
)
=
1
/
n
∑
∣
x
i
−
y
i
∣
loss (x,y)=1/n\sum|x_i-y_i|
loss(x,y)=1/n∑∣xi−yi∣
实例化时的参数:
size_average (bool, optional) – Deprecated (seereduction
). 若size_average=False
,那么求出来的绝对值的和将不会除以n
.reduce (bool, optional) – Deprecated (seereduction
). 用于控制mini-batch
是否取平均。- reduction (string*,* optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed. Note:size_average
andreduce
are in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction
. Default:'mean'
- ‘mean’
- ‘sum’
shape
- Input: (N, *) where * means, any number of additional dimensions
- Target: (N, *) , same shape as the input
- Output: scalar. If
reduction
is'none'
, then (N, *), same shape as the input.
Examples:
>>> loss = nn.L1Loss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5)
>>> output = loss(input, target)
>>> output.backward()
MSELoss
torch.nn.MSELoss(reduction: str = 'mean')
均方误差:
l
o
s
s
(
x
,
y
)
=
1
/
n
∑
(
x
i
−
y
i
)
2
loss (x,y)=1/n\sum (x_i-y_i)^2
loss(x,y)=1/n∑(xi−yi)2
Parameters、shape 同上。
Examples:
>>> loss = nn.MSELoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5)
>>> output = loss(input, target)
>>> output.backward()
CrossEntropyLoss
torch.nn.CrossEntropyLoss(weight: Optional[torch.Tensor] = None,
ignore_index: int = -100,
reduction: str = 'mean')
此标准将 LogSoftMax
和 NLLLoss
集成到一个类中,当训练一个多类分类器的时候,这个方法是十分有用的。
Parameters
- weight:
1-D
tensor,n
个元素,分别代表n
个类别的权重,如果你的训练样本很不均衡的话,是非常有用的。默认值为 None。 - ignore_index: 忽略某个类别。
- reduction
shape
- Input: (N,C)
C
是类别的数量 - Target: (N)
N
是mini-batch
的大小,即一个 mini_batch 的样本数量。
Examples:
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()
Loss 可以表述为以下形式:
l
o
s
s
(
x
,
c
l
a
s
s
)
=
−
log
e
x
p
(
x
[
c
l
a
s
s
]
)
∑
j
e
x
p
(
x
[
j
]
)
)
=
−
x
[
c
l
a
s
s
]
+
l
o
g
(
∑
j
e
x
p
(
x
[
j
]
)
)
\begin {aligned} loss (x, class) &= -\text {log}\frac {exp (x [class])}{\sum_j exp (x [j]))}\ &= -x [class] + log (\sum_j exp (x [j])) \end {aligned}
loss(x,class)=−log∑jexp(x[j]))exp(x[class]) =−x[class]+log(j∑exp(x[j]))
当 weight
参数被指定的时候,loss
的计算公式变为:
l
o
s
s
(
x
,
c
l
a
s
s
)
=
w
e
i
g
h
t
s
[
c
l
a
s
s
]
∗
(
−
x
[
c
l
a
s
s
]
+
l
o
g
(
∑
j
e
x
p
(
x
[
j
]
)
)
)
loss (x, class) = weights [class] * (-x [class] + log (\sum_j exp (x [j])))
loss(x,class)=weights[class]∗(−x[class]+log(j∑exp(x[j])))
计算出的 loss
对 mini-batch
的大小取了平均。
CTCLOSS
torch.nn.CTCLoss(blank: int = 0,
reduction: str = 'mean',
zero_infinity: bool = False)
CTC Loss(连接时序分类损失)主要用在没有事先对齐的序列化数据训练上。比如语音识别、OCR 识别等等。
后续用到再补充。
NLLLoss
torch.nn.NLLLoss(weight: Optional[torch.Tensor] = None,
ignore_index: int = -100,
reduction: str = 'mean')
The negative log likelihood loss for training a classification problem with C classes.
和 CrossEntropyLoss
相差了一个 LogSoftmax 层。
Examples:
>>> m = nn.LogSoftmax(dim=1)
>>> loss = nn.NLLLoss()
>>> # input is of size N x C = 3 x 5
>>> input = torch.randn(3, 5, requires_grad=True)
>>> # each element in target has to have 0 <= value < C
>>> target = torch.tensor([1, 0, 4])
>>> output = loss(m(input), target)
>>> output.backward()
>>>
>>>
>>> # 2D loss example (used, for example, with image inputs)
>>> N, C = 5, 4
>>> loss = nn.NLLLoss()
>>> # input is of size N x C x height x width
>>> data = torch.randn(N, 16, 10, 10)
>>> conv = nn.Conv2d(16, C, (3, 3))
>>> m = nn.LogSoftmax(dim=1)
>>> # each element in target has to have 0 <= value < C
>>> target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
>>> output = loss(m(conv(data)), target)
>>> output.backward()
PoissonNLLLoss
用于目标值为泊松分布的负对数似然损失。
torch.nn.PoissonNLLLoss(log_input: bool = True,
full: bool = False,
eps: float = 1e-08,
reduction: str = 'mean')
KLDivLoss
计算 KL 散度损失。
torch.nn.KLDivLoss(reduction: str = 'mean',
log_target: bool = False)
KL 散度常用来描述两个分布的距离,并在输出分布的空间上执行直接回归是有用的。
与 NLLLoss
一样,给定的输入应该是 log-probabilities
。然而。和 NLLLoss
不同的是,input
不限于 2-D
tensor,因为此标准是基于 element
的。
target
应该和 input
的形状相同。
此 loss 可以表示为:
l
o
s
s
(
x
,
t
a
r
g
e
t
)
=
1
n
∑
i
(
t
a
r
g
e
t
i
∗
(
l
o
g
(
t
a
r
g
e
t
i
)
−
x
i
)
)
loss (x,target)=\frac {1}{n}\sum_i (target_i*(log (target_i)-x_i))
loss(x,target)=n1i∑(targeti∗(log(targeti)−xi))
默认情况下,loss 会基于 element
求平均。如果 size_average=False
loss
会被累加起来。
BCELoss
torch.nn.BCELoss(weight: Optional[torch.Tensor] = None,
reduction: str = 'mean')
计算 target
与 output
之间的二进制交叉熵。
l
o
s
s
(
o
,
t
)
=
−
1
n
∑
i
(
t
[
i
]
∗
l
o
g
(
o
[
i
]
)
+
(
1
−
t
[
i
]
)
∗
l
o
g
(
1
−
o
[
i
]
)
)
loss (o,t)=-\frac {1}{n}\sum_i (t [i] *log(o[i])+(1-t[i])* log (1-o [i]))
loss(o,t)=−n1i∑(t[i]∗log(o[i])+(1−t[i])∗log(1−o[i]))
如果 weight
被指定 :
l
o
s
s
(
o
,
t
)
=
−
1
n
∑
i
w
e
i
g
h
t
s
[
i
]
∗
(
t
[
i
]
∗
l
o
g
(
o
[
i
]
)
+
(
1
−
t
[
i
]
)
∗
l
o
g
(
1
−
o
[
i
]
)
)
loss (o,t)=-\frac {1}{n}\sum_iweights [i] *(t[i]* log(o[i])+(1-t[i])* log(1-o[i]))
loss(o,t)=−n1i∑weights[i]∗(t[i]∗log(o[i])+(1−t[i])∗log(1−o[i]))
这个用于计算 auto-encoder
的 reconstruction error
。注意 0<=target [i]<=1。
默认情况下,loss 会基于 element
平均,如果 size_average=False
的话,loss
会被累加。
BCEWithLogitsLoss
torch.nn.BCEWithLogitsLoss(weight: Optional[torch.Tensor] = None,
reduction: str = 'mean',
pos_weight: Optional[torch.Tensor] = None)
MarginRankingLoss
torch.nn.MarginRankingLoss(margin: float = 0.0,
reduction: str = 'mean')
创建一个标准,给定输入 x 1 x1 x1, x 2 x2 x2 两个 1-D mini-batch Tensor’s,和一个 y y y(1-D mini-batch tensor) , y y y 里面的值只能是 - 1 或 1。
如果 y=1
,代表第一个输入的值应该大于第二个输入的值,如果 y=-1
的话,则相反。
mini-batch
中每个样本的 loss 的计算公式如下:
l
o
s
s
(
x
,
y
)
=
m
a
x
(
0
,
−
y
∗
(
x
1
−
x
2
)
+
m
a
r
g
i
n
)
loss(x, y) = max(0, -y * (x1 - x2) + margin)
loss(x,y)=max(0,−y∗(x1−x2)+margin)
如果 size_average=True
, 那么求出的 loss
将会对 mini-batch
求平均,反之,求出的 loss
会累加。默认情况下,size_average=True
。
HingeEmbeddingLoss
torch.nn.HingeEmbeddingLoss(margin: float = 1.0,
reduction: str = 'mean')
给定一个输入
x
x
x(2-D mini-batch tensor) 和对应的 标签
y
y
y (1-D tensor,1,-1),此函数用来计算之间的损失值。这个 loss
通常用来测量两个输入是否相似,即:使用 L1 成对距离。典型是用在学习非线性 embedding
或者半监督学习中:
l
o
s
s
(
x
,
y
)
=
1
n
∑
i
{
x
i
,
i
f
y
i
=
=
1
m
a
x
(
0
,
m
a
r
g
i
n
−
x
i
)
,
i
f
y
i
=
=
−
1
loss (x,y)=\frac {1}{n}\sum_i \begin {cases} x_i, &\text if~y_i==1 \ max (0, margin-x_i), &if ~y_i==-1 \end {cases}
loss(x,y)=n1i∑{xi,if yi==1 max(0,margin−xi),if yi==−1
x
x
x 和
y
y
y 可以是任意形状,且都有 n
的元素,loss
的求和操作作用在所有的元素上,然后除以 n
。如果您不想除以 n
的话,可以通过设置 size_average=False
。
margin
的默认值为 1, 可以通过构造函数来设置。
MultiLabelMarginLoss
torch.nn.MultiLabelMarginLoss(reduction: str = 'mean')
计算多标签分类的 hinge loss
(margin-based loss
) ,计算 loss
时需要两个输入: input x (2-D mini-batch Tensor
),和 output y (2-D tensor
表示 mini-batch 中样本类别的索引)。
l
o
s
s
(
x
,
y
)
=
1
x
.
s
i
z
e
(
0
)
∑
i
=
0
,
j
=
0
I
,
J
(
m
a
x
(
0
,
1
−
(
x
[
y
[
j
]
]
−
x
[
i
]
)
)
)
loss (x, y) = \frac {1}{x.size (0)}\sum_{i=0,j=0}^{I,J}(max (0, 1 - (x [y [j]] - x [i])))
loss(x,y)=x.size(0)1i=0,j=0∑I,J(max(0,1−(x[y[j]]−x[i])))
其中 I=x.size(0),J=y.size(0)
。对于所有的 i
和 j
,满足
y
[
j
]
≠
0
,
i
≠
y
[
j
]
y [j]\neq0, i \neq y [j]
y[j]=0,i=y[j]
x
和 y
必须具有同样的 size
。
这个标准仅考虑了第一个非零 y[j] targets
此标准允许了,对于每个样本来说,可以有多个类别。
SmoothL1Loss
平滑版 L1 loss
。
torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction: str = 'mean')
loss 的公式如下:
l
o
s
s
(
x
,
y
)
=
1
n
∑
i
{
0.5
∗
(
x
i
−
y
i
)
2
,
i
f
∣
x
i
−
y
i
∣
<
1
∣
x
i
−
y
i
∣
−
0.5
,
o
t
h
e
r
w
i
s
e
loss (x, y) = \frac {1}{n}\sum_i \begin {cases} 0.5*(x_i-y_i)^2, & if~|x_i - y_i| < 1\ |x_i - y_i| - 0.5, \\ \\ & otherwise \end {cases}
loss(x,y)=n1i∑⎩⎪⎨⎪⎧0.5∗(xi−yi)2,if ∣xi−yi∣<1 ∣xi−yi∣−0.5,otherwise
此 loss 对于异常点的敏感性不如 MSELoss
,而且,在某些情况下防止了梯度爆炸,(参照 Fast R-CNN
)。这个 loss
有时也被称为 Huber loss
。
x 和 y 可以是任何包含 n
个元素的 tensor。默认情况下,求出来的 loss
会除以 n
,可以通过设置 size_average=True
使 loss 累加。
SoftMarginLoss
torch.nn.SoftMarginLoss(reduction: str = 'mean')
创建一个标准,用来优化 2 分类的 logistic loss
。输入为 x
(一个 2-D mini-batch Tensor)和 目标 y
(一个包含 1 或 - 1 的 Tensor)。
l
o
s
s
(
x
,
y
)
=
1
x
.
n
e
l
e
m
e
n
t
(
)
∑
i
(
l
o
g
(
1
+
e
x
p
(
−
y
[
i
]
∗
x
[
i
]
)
)
)
loss (x, y) = \frac {1}{x.nelement ()}\sum_i (log (1 + exp (-y [i]* x [i])))
loss(x,y)=x.nelement()1i∑(log(1+exp(−y[i]∗x[i])))
MultiLabelSoftMarginLoss
torch.nn.MultiLabelSoftMarginLoss(weight: Optional[torch.Tensor] = None,
reduction: str = 'mean')
创建一个标准,基于输入 x 和目标 y 的 max-entropy
,优化多标签 one-versus-all
的损失。x
:2-D mini-batch Tensor;y
:binary 2D Tensor。对每个 mini-batch 中的样本,对应的 loss 为:
l
o
s
s
(
x
,
y
)
=
−
1
x
.
n
E
l
e
m
e
n
t
(
)
∑
i
=
0
I
y
[
i
]
log
e
x
p
(
x
[
i
]
)
(
1
+
e
x
p
(
x
[
i
]
)
+
(
1
−
y
[
i
]
)
log
1
1
+
e
x
p
(
x
[
i
]
)
loss (x, y) = - \frac {1}{x.nElement ()}\sum_{i=0}^I y [i]\text {log}\frac {exp (x [i])}{(1 + exp (x [i])} + (1-y [i])\text {log}\frac {1}{1+exp (x [i])}
loss(x,y)=−x.nElement()1i=0∑Iy[i]log(1+exp(x[i])exp(x[i])+(1−y[i])log1+exp(x[i])1
其中 I=x.nElement()-1
,
y
[
i
]
∈
0
,
1
y[i] \in {0,1}
y[i]∈0,1,y
和 x
必须要有同样 size
。
CosineEmbeddingLoss
torch.nn.CosineEmbeddingLoss(margin: float = 0.0,
reduction: str = 'mean')
给定 输入 Tensors
,x1
, x2
和一个标签 Tensor y
(元素的值为 1 或 - 1)。此标准使用 cosine
距离测量两个输入是否相似,一般用来用来学习非线性 embedding
或者半监督学习。
margin
应该是 - 1 到 1 之间的值,建议使用 0 到 0.5。如果没有传入 margin
实参,默认值为 0。
每个样本的 loss 是:
l
o
s
s
(
x
,
y
)
=
{
1
−
c
o
s
(
x
1
,
x
2
)
,
i
f
y
=
=
1
max
(
0
,
c
o
s
(
x
1
,
x
2
)
−
m
a
r
g
i
n
)
,
i
f
y
=
=
−
1
loss (x, y) = \begin {cases} 1 - cos (x1, x2), &if~y == 1 \max (0, cos (x1, x2) - margin), \\ \\ &if~y == -1 \end {cases}
loss(x,y)=⎩⎪⎨⎪⎧1−cos(x1,x2),if y==1max(0,cos(x1,x2)−margin),if y==−1
如果 size_average=True
求出的 loss 会对 batch 求均值,如果 size_average=False
的话,则会累加 loss
。默认情况 size_average=True
。
MultiMarginLoss
torch.nn.MultiMarginLoss(p: int = 1,
margin: float = 1.0,
weight: Optional[torch.Tensor] = None,
reduction: str = 'mean')
用来计算 multi-class classification 的 hinge loss(magin-based loss)。输入是 x
(2D mini-batch Tensor), y
(1D Tensor) 包含类别的索引, 0 <= y <= x.size(1))
。
对每个 mini-batch 样本:
l
o
s
s
(
x
,
y
)
=
1
x
.
s
i
z
e
(
0
)
∑
i
=
0
I
(
m
a
x
(
0
,
m
a
r
g
i
n
−
x
[
y
]
+
x
[
i
]
)
p
)
loss (x, y) = \frac {1}{x.size (0)}\sum_{i=0}^I (max (0, margin - x [y] + x [i])^p)
loss(x,y)=x.size(0)1i=0∑I(max(0,margin−x[y]+x[i])p)
其中 I=x.size(0)
i
≠
y
i\neq y
i=y。 可选择的,如果您不想所有的类拥有同样的权重的话,您可以通过在构造函数中传入 weights
参数来解决这个问题,weights
是一个 1D 权重 Tensor。
传入 weights 后,loss 函数变为:
l
o
s
s
(
x
,
y
)
=
1
x
.
s
i
z
e
(
0
)
∑
i
m
a
x
(
0
,
w
[
y
]
∗
(
m
a
r
g
i
n
−
x
[
y
]
−
x
[
i
]
)
)
p
loss (x, y) = \frac {1}{x.size (0)}\sum_imax (0, w [y] * (margin - x [y] - x [i]))^p
loss(x,y)=x.size(0)1i∑max(0,w[y]∗(margin−x[y]−x[i]))p
默认情况下,求出的 loss 会对 mini-batch 取平均,可以通过设置 size_average=False
来取消取平均操作。
自定义损失函数
不如来看最简单的 L1Loss 的实现
class L1Loss(_Loss):
__constants__ = ['reduction']
def __init__(self, size_average=None, reduce=None, reduction: str = 'mean') -> None:
super(L1Loss, self).__init__(size_average, reduce, reduction)
def forward(self, input: Tensor, target: Tensor) -> Tensor:
return F.l1_loss(input, target, reduction=self.reduction)
可以看到,L1Loss 类似子模块的定义,在 forward 中,调用了 F 中的函数,
def l1_loss(input, target, size_average=None, reduce=None, reduction='mean'):
# type: (Tensor, Tensor, Optional[bool], Optional[bool], str) -> Tensor
if not torch.jit.is_scripting():
tens_ops = (input, target)
if any([type(t) is not Tensor for t in tens_ops]) and has_torch_function(tens_ops):
return handle_torch_function(
l1_loss, tens_ops, input, target, size_average=size_average, reduce=reduce,
reduction=reduction)
if not (target.size() == input.size()):
warnings.warn("Using a target size ({}) that is different to the input size ({}). "
"This will likely lead to incorrect results due to broadcasting. "
"Please ensure they have the same size.".format(target.size(), input.size()),
stacklevel=2)
if size_average is not None or reduce is not None:
reduction = _Reduction.legacy_get_string(size_average, reduce)
if target.requires_grad:
ret = torch.abs(input - target)
if reduction != 'none':
ret = torch.mean(ret) if reduction == 'mean' else torch.sum(ret)
else:
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
ret = torch._C._nn.l1_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
return ret