ps:咱们继续.
先贴一下交叉熵的公式:
在贴一下我的尝试:
>>> import torch
>>> input=torch.randn(4,2)
>>> input
tensor([[ 0.0543, 0.5641],
[ 1.2221, -0.5496],
[-0.7951, -0.1546],
[-0.4557, 1.4724]])
>>> pt=torch.softmax(input,dim=1)
>>> pt
tensor([[0.3753, 0.6247],
[0.8547, 0.1453],
[0.3451, 0.6549],
[0.1270, 0.8730]])
>>> target=torch.tensor([1,0,1,1])
>>> ones=torch.eye(2)
>>> targetones=ones.index_select(0,target)
>>> targetones
tensor([[0., 1.],
[1., 0.],
[0., 1.],
[0., 1.]])
>>> torch.log(pt)
tensor([[-0.9802, -0.4704],
[-0.1570, -1.9287],
[-1.0639, -0.4233],
[-2.0639, -0.1358]])
>>> -(targetones*torch.log(pt))
tensor([[0.0000, 0.4704],
[0.1570, 0.0000],
[0.0000, 0.4233],
[0.0000, 0.1358]])
>>> torch.log(1-pt)
tensor([[-0.4704, -0.9802],
[-1.9287, -0.1570],
[-0.4233, -1.0639],
[-0.1358, -2.0639]])
>>> -((1-targetones)*torch.log(1-pt))
tensor([[0.4704, 0.0000],
[0.0000, 0.1570],
[0.4233, 0.0000],
[0.1358, 0.0000]])
>>> -(targetones*torch.log(pt))-((1-targetones)*torch.log(1-pt))
tensor([[0.4704, 0.4704],
[0.1570, 0.1570],
[0.4233, 0.4233],
[0.1358, 0.1358]])
二分类的-(targetones*torch.log(pt))和-((1-targetones)*torch.log(1-pt))怎么感觉一样啊.接着尝试如下:
>>> target=torch.Tensor([1,0,1,1])
>>> pt
tensor([[0.3753, 0.6247],
[0.8547, 0.1453],
[0.3451, 0.6549],
[0.1270, 0.8730]])
>>> p=pt[:,1]
>>> p
tensor([0.6247, 0.1453, 0.6549, 0.8730])
>>> -(target*torch.log(p))-((1-target)*torch.log(1-p))
tensor([0.4704, 0.1570, 0.4233, 0.1358])
>>> target=torch.Tensor([0,1,0,0])
>>> p=pt[:,0]
>>> p
tensor([0.3753, 0.8547, 0.3451, 0.1270])
>>> -(target*torch.log(p))-((1-target)*torch.log(1-p))
tensor([0.4704, 0.1570, 0.4233, 0.1358])
>>> celoss(input,torch.tensor([1,0,1,1]))
tensor(0.2966)
>>> loss(torch.log(torch.softmax(input,dim=1)),torch.tensor([1,0,1,1]))
tensor(0.2966)
>>> (-(target*torch.log(p))-((1-target)*torch.log(1-p))).mean()
tensor(0.2966)
果然是这样.上面这步已经在NLLLoss函数里了做了(真实值1和真实值0).
所以要把alpha和(1-alpha)放到用NLLLoss函数之前,并且同样变成Tensor(4,2)(Tensor生成torch.FloatTensor,而tensor生成torch.LongTensor).进一步尝试先把Focal Loss思路尝试完如下:
>>> target=torch.Tensor([1,0,1,1])
>>> p=pt[:,1]
>>> p
tensor([0.6247, 0.1453, 0.6549, 0.8730])
>>> alpha=0.25
>>> gamma=2
>>> -alpha*(1-p)**gamma*(target*torch.log(p))-(1-alpha)*p**gamma*((1-target)*torch.log(1-p))
tensor([0.0166, 0.0025, 0.0126, 0.0005])
在来尝试使用NLLLoss函数
alpha=0.25
>>> aa=torch.Tensor([1-alpha,alpha])
>>> aa
tensor([0.7500, 0.2500])
>>> bb=aa.repeat(4,1)
>>> bb
tensor([[0.7500, 0.2500],
[0.7500, 0.2500],
[0.7500, 0.2500],
[0.7500, 0.2500]])
>>> loss=torch.nn.NLLLoss()
>>> target=torch.tensor([1,0,1,1])
>>> loss(torch.log(bb*(pt**gamma)),target)
tensor(1.7049)
但是结果不太对啊.还要再想想.
算了好像行不通啊,只能用前一种方式去实现它.根据思路去实现如下:
import torch
import torch.nn as nn
#二分类
class FocalLoss(nn.Module):
def __init__(self, gamma=2,alpha=0.25):
super(FocalLoss, self).__init__()
self.gamma = gamma
self.alpha=alpha
def forward(self, input, target):
# input:size is M*2. M is the batch number
# target:size is M.
pt=torch.softmax(input,dim=1)
p=pt[:,1]
loss = -self.alpha*(1-p)**self.gamma*(target*torch.log(p))-\
(1-self.alpha)*p**self.gamma*((1-target)*torch.log(1-p))
return loss.mean()