二分类损失函数及AUC指标

二分类损失函数及AUC指标

二分类损失函数

主要问题在于输入输出,以及损失函数设置

from sklearn.metrics import roc_auc_score
import torch.nn.functional as F
import torch.nn as nn
import torch

class Net(nn.Module):
  def __init__(self,in_dim,hidden_dim,out_dim=1):
     super().__init__()
     self.fc1 = nn.Linear(in_dim,hidden_dim)
     self.fc2 = nn.Linear(hidden_dim,hidden_dim)
     self.fc3 = nn.Linear(hidden_dim,1)
     self.relu = nn.Relu()
      
	def forward(self,x):
    out = self.fc1(x)
    out = self.relu(out)
    out = self.fc2(out)
    out = self.relu(out)
    out = self.fc3(out)
    out = torch.sigmoid(out)
    return out

in_dim = 20
hidden_dim = 64
out_dim = 1

x = torch.randn((10,20))
y = torch.Tensor([0,1,0,1,1,0,0,0,1,0]).long()
model = Net(in_dim,hidden_dim,out_dim)
optimizer = optim.Adam(model.parameters(),lr=0.001)
loss_func = F.binary_cross_entropy

pred = []
label = y.cpu().numpy()
for epoch in range(10):
  
  optimizer.zero_grad()
  out = model(x)
	out = out.squeeze()
  loss = loss_func(out,y)
  pred.extend(out)
  loss.backward()
  optimizer.step()
  print('epoch:{},loss:{}'.format(epoch+1,loss.item()))
    
auc = roc_auc_score(label,pred)   
print('final auc',auc)

CrossEntropy损失函数

该函数结合了nn.LogSoftmax()和nn.NLLLoss(),定义为某一个样本的类别占所有概率,先做softmax 属于(0,1), 再求log, 属于(-∞,0), Softmax 再计算log,在减少计算量的同时,保证单调性。

在这里插入图片描述
为每个样本加上权重
在这里插入图片描述
最后:
在这里插入图片描述

例子:3个样本,5种类别

>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()

>>> input
tensor([[-6.1358e-04, -1.7644e-01, -1.8418e+00, -5.2999e-01,  1.3100e+00],
        [ 5.7319e-01, -4.6630e-01, -1.0063e+00, -2.2955e-01,  4.9074e-01],
        [ 1.8272e+00, -1.6430e-01,  9.5128e-01, -1.3161e+00,  1.6006e+00]],
       requires_grad=True)
>>> input.shape
torch.Size([3, 5])
>>> target
tensor([0, 4, 4])
>>> target.shape
torch.Size([3])
>>> output
tensor(1.3653, grad_fn=<NllLossBackward>)

BCELoss: binary cross entropy loss,二分类损失函数

在交叉熵前增味样本增加权重。reduction是缩放
在这里插入图片描述

>>> m = nn.Sigmoid()
>>> loss = nn.BCELoss()
>>> input = torch.randn(3, requires_grad=True)
>>> target = torch.empty(3).random_(2)
>>> output = loss(m(input), target)
>>> output.backward()

>>> output
tensor(0.5941, grad_fn=<BinaryCrossEntropyBackward>)
>>> input
tensor([-0.4230,  1.6890,  0.0147], requires_grad=True)
>>> input.shape
torch.Size([3])
>>> target
tensor([1., 1., 1.])
>>> target.shape
torch.Size([3])

CrossEntropyLoss:
https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html#torch.nn.BCELoss

BCELoss:
https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss

AUC的计算

2021.11.22更新

AUC意义:在有M个正样本,N个负样本的数据集里,对于随机给定一组样本对,预测为正概率大于预测为负概率的概率,即:
A U C = P ( P 正 样 本 > P 负 样 本 ) AUC = P(P_{正样本}>P_{负样本}) AUC=P(P>P)
例子:

labelpred
00.1
00.4
00.5
10.3
10.6

上述表格有5个样本, 其中包括2个正样本,3个负样本,即M=2, N=3

A U C = ∑ I ( P 正 , P 负 ) M ∗ N AUC = \frac{\sum I(P_正,P_负)}{M*N} AUC=MNI(P,P)
其中,
I ( P 正 , P 负 ) = { 1 P 正 > P 负 0.5 P 正 = P 负 0 P 正 < P 负 I(P_正,P_负)=\left\{ \begin{array}{rcl} 1 & & {P_正 > P_负}\\ 0.5 & & {P_正 = P_负}\\ 0 & & {P_正 < P_负}\\ \end{array} \right. I(P,P)=10.50P>PP=PP<P

P正 > P负的样本对总共有4对 (0.3,0.1), (0.6,0.1),(0.6,0.4),(0.6,0.5),预测正负样本对总数2*3 = 6,所以AUC = 4/6=0.67;

计算代码:

def cal_auc(y, y_hat):
    pos_sample = [j for i, j in zip(y, y_hat) if i == 1]
    neg_sample = [j for i, j in zip(y, y_hat) if i == 0]
    n_count = 0
    for a in pos_sample:
        for b in neg_sample:
            if a == b:
                n_count += 0.5
            elif a > b:
                n_count += 1
    auc = float(n_count) / (float(len(pos_sample))*float(len(neg_sample)))
    return auc

label = [0, 0, 1, 1, 0]
p = [0.1, 0.4, 0.3, 0.6, 0.5]

auc = cal_auc(label, p)
print('auc = ', auc)

看一个 sklearn 提供的二分类例子:

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

#加载数据集X(569,30), y(569,)
X, y = load_breast_cancer(return_X_y=True)  
#用逻辑回归训练
clf = LogisticRegression(solver="liblinear", random_state=0).fit(X, y)

#使用roc_auc_score计算auc值,其中第一个参数y是真实值,clf.predict_proba(X)是模型预测输出, clf.predict_proba(X)[:, 1]代表选择输出维1的概率值
auc1 = roc_auc_score(y, clf.predict_proba(X)[:, 1])

print(clf.predict_proba(X)[:,0]) #(569,)
print(clf.predict_proba(X)[:,1]) #(569,)

auc2 = roc_auc_score(y, clf.decision_function(X))

predict_proba(X) 这个函数输入的是样本X ,返回每个类的样本概率,其中每个输出的类按顺序排列。

比如二分类的一个样本输出为[0.4,0.6] ,则表示预测为0的概率为0.4, 预测为1的概率为0.6., clf.predict_proba(X)[:, 1] 即为预测为1的概率值0.6. 多分类同理。

AUC参考:
AUC的计算方法
如何理解机器学习和统计中的AUC?

以下是使用深度学习进行二分类任务预测的代码,以测试集AUC为评价指标: ```python import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader, TensorDataset from sklearn.metrics import roc_auc_score # 定义神经网络模型 class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.fc1 = nn.Linear(10, 64) self.fc2 = nn.Linear(64, 32) self.fc3 = nn.Linear(32, 1) self.sigmoid = nn.Sigmoid() def forward(self, x): x = self.fc1(x) x = nn.functional.relu(x) x = self.fc2(x) x = nn.functional.relu(x) x = self.fc3(x) x = self.sigmoid(x) return x # 加载数据集 data = torch.load('data.pt') x_train, y_train, x_test, y_test = data train_dataset = TensorDataset(x_train, y_train) train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True) test_dataset = TensorDataset(x_test, y_test) test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False) # 定义损失函数和优化器 criterion = nn.BCELoss() net = Net() optimizer = optim.Adam(net.parameters(), lr=0.01) # 训练模型 for epoch in range(10): running_loss = 0.0 for i, data in enumerate(train_loader): inputs, labels = data optimizer.zero_grad() outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() # 在测试集上计算AUC y_pred = [] y_true = [] with torch.no_grad(): for data in test_loader: inputs, labels = data outputs = net(inputs) y_pred += outputs.tolist() y_true += labels.tolist() auc = roc_auc_score(y_true, y_pred) print('Epoch %d, loss: %.3f, test AUC: %.3f' % (epoch + 1, running_loss / len(train_loader), auc)) ``` 在这段代码中,我们使用了 PyTorch 框架来构建神经网络模型,使用 Adam 优化器来更新模型参数,使用二分类交叉熵损失函数来评估模型性能。在每个 epoch 结束后,我们在测试集上计算 AUC 来评价模型的性能。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值