0. 往期内容
[二]深度学习Pytorch-张量的操作:拼接、切分、索引和变换
[七]深度学习Pytorch-DataLoader与Dataset(含人民币二分类实战)
[八]深度学习Pytorch-图像预处理transforms
[九]深度学习Pytorch-transforms图像增强(剪裁、翻转、旋转)
[十]深度学习Pytorch-transforms图像操作及自定义方法
[十一]深度学习Pytorch-模型创建与nn.Module
[十二]深度学习Pytorch-模型容器与AlexNet构建
[十三]深度学习Pytorch-卷积层(1D/2D/3D卷积、卷积nn.Conv2d、转置卷积nn.ConvTranspose)
[十六]深度学习Pytorch-18种损失函数loss function
深度学习Pytorch-损失函数loss
- 0. 往期内容
- 1. 损失函数概念
- 2. 18种损失函数
-
- 2.1 CrossEntropyLoss(weight=None, ignore_index=- 100, reduction='mean')
- 2.2 nn.NLLLoss(weight=None, ignore_index=- 100, reduction='mean')
- 2.3 nn.BCELoss(weight=None, reduction='mean')
- 2.4 nn.BCEWithLogitsLoss(weight=None, reduction='mean', pos_weight=None)
- 2.5 & 2.6 nn.L1Loss(reduction='mean') & nn.MSELoss(reduction='mean')
- 2.7 nn.SmoothL1Loss(reduction='mean')
- 2.8 nn.PoissonNLLLoss(log_input=True, full=False, eps=1e-08, reduction='mean')
- 2.9 nn.KLDivLoss(reduction='mean')
- 2.10 nn.MarginRankingLoss(margin=0.0, reduction='mean')
- 2.11 nn.MultiLabelMarginLoss(reduction='mean')
- 2.12 nn.SoftMarginLoss(reduction='mean')
- 2.13 nn.MultiLabelSoftMarginLoss(reduction='mean')
- 2.14 nn.MultiMarginLoss(p=1, margin=1.0, weight=None, reduction='mean')
- 2.15 nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-06, reduction='mean')
- 2.16 nn.HingeEmbeddingLoss(margin=1.0, reduction='mean')
- 2.17 nn.CosineEmbeddingLoss(margin=0.0, reduction='mean')
- 2.18 nn.CTCLoss(blank=0, reduction='mean')
- 3. 完整代码
1. 损失函数概念
regularization-正则化
2. 18种损失函数
2.1 CrossEntropyLoss(weight=None, ignore_index=- 100, reduction=‘mean’)
CrossEntropyLoss(weight=None, size_average=None, ignore_index=- 100, reduce=None, reduction='mean', label_smoothing=0.0)
交叉熵损失函数并不是公式意义上的交叉熵计算,而是有不同之处。它采用softmax对数据进行了归一化,把数据值归一化到概率输出的形式。交叉熵损失函数常常用于分类任务中。交叉熵是衡量两个概率分布之间的差异,交叉熵值越低,表示两个概率分布越近。
(1)熵用来描述一个事件的不确定性,不确定性越大则熵越大.
(2)信息熵是自信息的期望,自信息用于衡量单个输出单个事件的不确定性.
(3)相对熵又叫做KL散度,用于衡量两个分布之间的差异,即两个分布之间的距离,但不是距离的函数,不具备对称性。P是真实的分布,Q是模型输出的分布,Q需要逼近、拟合P的分布.
(4)交叉熵用于衡量两个分布之间的相似度.
(5)P是真实的分布,即训练集中样本的分布。由于训练集是固定的,概率分布也是固定的,H(P)是个常数.
p(xi)=1
,则可以转换为如下:
需要将Q(xi)
概率值归一化:
代码示例:
# fake data
#二分类任务,输出神经元2个,batchsize是3,即三个样本:[1,2] [1,3] [1,3]
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
#dtype必须是long,有多少个样本,tensor(1D)就有多长。
target = torch.tensor([0, 1, 1], dtype=torch.long)
# ----------------------------------- CrossEntropy loss: reduction -----------------------------------
flag = 0
# flag = 1
if flag:
# def loss function
loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')
# forward
loss_none = loss_f_none(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)
# view
print("Cross Entropy Loss:\n ", loss_none, loss_sum, loss_mean)
#输出为[1.3133, 0.1269, 0.1269] 1.5671 0.5224
# --------------------------------- compute by hand
flag = 0
# flag = 1
if flag:
idx = 0
input_1 = inputs.detach().numpy()[idx] # [1, 2]
target_1 = target.numpy()[idx] # [0]
# 第一项
x_class = input_1[target_1]
# 第二项
sigma_exp_x = np.sum(list(map(np.exp, input_1)))
log_sigma_exp_x = np.log(sigma_exp_x)
# 输出loss
loss_1 = -x_class + log_sigma_exp_x
print("第一个样本loss为: ", loss_1)
# ----------------------------------- weight -----------------------------------
flag = 0
# flag = 1
if flag:
# def loss function
#有多少个类别weights这个向量就要设置多长
weights = torch.tensor([1, 2], dtype=torch.float)
# weights = torch.tensor([0.7, 0.3], dtype=torch.float)
loss_f_none_w = nn.CrossEntropyLoss(weight=weights, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=weights, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=weights, reduction='mean')
# forward
loss_none_w = loss_f_none_w(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)
# view
print("\nweights: ", weights)
print(loss_none_w, loss_sum, loss_mean)
#权值为[1,2],则输出[1.3133, 0.2539, 0.2539] 1.8210 0.3642
# target=[0,1,1]所以对应0的需要乘以权值1,对应1的需要乘以权值2.
# 1.3133是1.3133*1=1.3133,0.2530是0.1269*2=0.2539,0.2530是0.1269*2=0.2539.
#0.3642是1.8210/(1+2+2)=0.3642,分母是权值的份数(5).
# --------------------------------- compute by hand
flag = 0
# flag = 1
if flag:
weights = torch.tensor([1, 2], dtype=torch.float)
#weights_all=5
weights_all = np.sum(list(map(lambda x: weights.numpy()[x], target.numpy()))) # [0, 1, 1] ---> # [1 2 2]
mean = 0
loss_sep = loss_none.detach().numpy()
for i in range(target.shape[0]):
x_class = target.numpy()[i]
tmp = loss_sep[i] * (weights.numpy()[x_class] / weights_all)
mean += tmp
print(mean)
官网示例:
# Example of target with class indices,此时target为1D,长度为batchsize的大小
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()
# Example of target with class probabilities,此时target的形状与input一致
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5).softmax(dim=1)
output = loss(input, target)
output.backward()
2.2 nn.NLLLoss(weight=None, ignore_index=- 100, reduction=‘mean’)
nn.NLLLoss(weight=None, size_average=None, ignore_index=- 100, reduce=None, reduction='mean')
代码示例:
# fake data
#二分类任务,输出神经元2个,batchsize是3,即三个样本:[1,2] [1,3] [1,3]
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
#dtype必须是long,有多少个样本,tensor(1D)就有多长。
target = torch.tensor([0, 1, 1], dtype=torch.long)
# ----------------------------------- 2 NLLLoss -----------------------------------
flag = 0
# flag = 1
if flag:
weights = torch.tensor([1, 1], dtype=torch.float)
loss_f_none_w = nn.NLLLoss(weight=weights, reduction='none')
loss_f_sum = nn.NLLLoss(weight=weights, reduction='sum')
loss_f_mean = nn.NLLLoss(weight=weights, reduction='mean')
# forward
loss_none_w = loss_f_none_w(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)
# view
print("\nweights: ", weights)
print("NLL Loss", loss_none_w, loss_sum, loss_mean)
#输出[-1,-3,-3] -7 -2.3333
# target[0, 1, 1]
#-1是因为第一个样本[1,2]是第0类,因此只对第一个神经元输出操作:-1*1。
#-3是因为第二个样本[1,3]是第1类,因此只对第二个神经元输出操作:-1*3。
#-3是因为第三个样本[1,3]是第1类,因此只对第二个神经元输出操作:-1*3。
#求mean时的分母是权重weight的和。
官方示例:
m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 3 x 5
input = torch.randn(3, 5, requires_grad=True)
# each element in target has to have 0 <= value < C
target = torch.tensor([1, 0, 4])
output = loss(m(input), target)
output.backward()
# 2D loss example (used, for example, with image inputs)
N, C = 5, 4
loss = nn.NLLLoss()
# input is of size N x C x height x width
data = torch.randn(N, 16, 10, 10)
conv = nn.Conv2d(16, C, (3, 3))
m = nn.LogSoftmax(dim=1)
# each element in target has to have 0 <= value < C
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
output = loss(m(conv(data)), target)
output.backward()
2.3 nn.BCELoss(weight=None, reduction=‘mean’)
nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')
注意事项:输入值取值在[0,1]
xn是模型输出的概率取值,yn是标签(二分类中对应0或1)。
代码示例:
# ----------------------------------- 3 BCE Loss -----------------------------------
flag = 0
# flag = 1
if flag:
inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float) #4个样本
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float) #float类型,每个神经元一一对应去计算loss
target_bce = target
# itarget
#一定要加上sigmoid,将输入值取值在[0,1]区间
inputs = torch.sigmoid(inputs)
weights = torch.tensor([1, 1], dtype=torch.float)
loss_f_none_w = nn.BCELoss(weight=weights, reduction='none')
loss_f_sum = nn.BCELoss(weight=weights, reduction='sum')
loss_f_mean = nn.BCELoss(weight=weights, reduction='mean')
# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)
# view
print("\nweights: ", weights)
print("BCE Loss", loss_none_w, loss_sum, loss_mean)
#输出[[0.3133, 2.1269], [0.1269, 2.1269], [3.0486, 0.0181], [4.0181, 0.0067]] 11.7856 1.4732
#每个神经元一一对应去计算loss,因此loss个数是2*4=8个
# --------------------------------- compute by hand
flag = 0
# flag = 1
if flag:
idx = 0
x_i = inputs.detach().numpy()[idx, idx]
y_i = target.numpy()[idx, idx] #
# loss
# l_i = -[ y_i * np.log(x_i) + (1-y_i) * np.log(1-y_i) ] # np.log(0) = nan
l_i = -y_i * np.log(x_i) if y_i else -(1-y_i) * np.log(1-x_i)
# 输出loss
print("BCE inputs: ", inputs)
print("第一个loss为: ", l_i) #0.3133
一定要加上sigmoid,将输入值取值在[0,1]区间。
官网示例:
m = nn.Sigmoid()
loss = nn.BCELoss()
input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(m(input), target)
output.backward()
2.4 nn.BCEWithLogitsLoss(weight=None, reduction=‘mean’, pos_weight=None)
nn.BCEWithLogitsLoss(weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=None)
(1)pos_weight用于均衡正负样本,正样本的loss需要乘以pos_weight。
(2)比如正样本100
个,负样本300
个,此时可以将pos_weight设置为3
,就可以实现正负样本均衡。
(3)pos_weight里是一个tensor列表,需要和标签个数相同,比如现在有一个多标签分类,类别有200个,那么 pos_weight 就是为每个类别赋予的权重值,长度为200
(4)如果现在是二分类,只需要将正样本loss的权重写上即可,比如我们有正负两类样本,正样本数量为100个,负样本为400个,我们想要对正负样本的loss进行加权处理,将正样本的loss权重放大4倍,通过这样的方式缓解样本不均衡问题:
criterion = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([4]))
代码示例:
# ----------------------------------- 4 BCE with Logis Loss -----------------------------------
# flag = 0
flag = 1
if flag:
inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
target_bce = target
# inputs = torch.sigmoid(inputs) 不能加sigmoid!!!
weights = torch.tensor([1, 1], dtype=torch.float)
loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none')
loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum')
loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean')
# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)
# view
print("\nweights: ", weights)
print(loss_none_w, loss_sum, loss_mean)
#输出[[0.3133, 2.1269], [0.1269, 2.1269], [3.0486, 0.0181], [4.0181, 0.0067]] 11.7856 1.4732
# --------------------------------- pos weight
# flag = 0
flag = 1
if flag:
inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
target_bce = target
# itarget
# inputs = torch.sigmoid(inputs)
weights = torch.tensor(