Datawhale夏令营-efficientnetb0&resnet50

本文主要对比一下两种网络的差异,以及timm与transformers使用的一些小差异

from torch import nn
from torch.utils.data import Dataset, DataLoader
from torch import optim
from torchvision import transforms
import timm,torch
from PIL import Image
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tqdm
model = timm.create_model('efficientvit_b0.r224_in1k', pretrained=True, num_classes=2)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

一、EfficientNet-b0

首先导入预训练的efficientnet模型

data_config = timm.data.resolve_model_data_config(model)
val_trans = timm.data.create_transform(**data_config, is_training=False)
train_trans = timm.data.create_transform(**data_config, is_training=True)

这里注意baseline里面是手写的transform,但实际上可以简化,直接使用模型的预处理,这样数据的mean、std都与模型一致

这就是模型的最后的分类层,可以看到我们已经设定为二分类。当然也可以手动修改 ,具体操作见下文resnet50

定义数据集并导入数据

from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
class DeepFakeDataset(Dataset):
    def __init__(self, img_path, img_label, transform=None):
        self.img_path = img_path
        self.img_label = img_label
        self.transform = transform if transform else transforms.Compose([transforms.ToTensor()])
    
    def __len__(self):
        return len(self.img_label)
    
    def __getitem__(self, idx):
        image = Image.open(self.img_path[idx]).convert('RGB')
        label = self.img_label[idx]
        image = self.transform(image)
        return image, label

import pandas as pd
train_label = pd.read_csv('/kaggle/input/deepfake/phase1/trainset_label.txt')
val_label = pd.read_csv('/kaggle/input/deepfake/phase1/valset_label.txt')

train_label['path'] = '/kaggle/input/deepfake/phase1/trainset/' + train_label['img_name']
val_label['path'] = '/kaggle/input/deepfake/phase1/valset/' + val_label['img_name']

train_dataset = DeepFakeDataset(train_label['path'].head(500), train_label['target'].head(500), transform=train_trans)
val_dataset = DeepFakeDataset(val_label['path'].head(500), val_label['target'].head(500), transform=val_trans)

这里需要注意一点:由于数据本身很大,会导致显存装不下,于是我们需要释放

del train_label, val_label

定义Dataloader

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)

定义训练函数

def train(model, train_loader, optimizer, criterion, epochs):
    cnt = 0
    model.train()
    model = model.to(device)
    loss_list = []
    acc_list = []
    for epoch in tqdm.tqdm(range(epochs)):
        sum_loss = 0.0
        sum_acc = 0.0
        for images, labels in train_loader:
            images = images.to(device)
            labels = labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            acc = torch.sum((outputs.argmax(1).view(-1) == labels.float().view(-1)).to(dtype=torch.float))

            cnt += 1
            if cnt % 100 == 0:
                print(f"{cnt} -- loss : {loss} -- acc : {acc}")
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            sum_loss += loss.item()
            sum_acc += acc.item()
        loss_list.append(sum_loss)
        acc_list.append(sum_acc//len(train_dataset))
        
    return loss_list, acc_list

注意:其实正常是每一个epoch都做一次测试,以保存下性能最优的模型参数。本文省略该步骤,只做模型训练流程科普用

criterion = nn.CrossEntropyLoss().to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-5)
epochs = 5
loss_list, acc_list = train(model, train_loader, optimizer, criterion, epochs)

绘制Loss曲线

fig, ax1 = plt.subplots()  
  
color = 'tab:red'  
ax1.set_xlabel('Epoch')  
ax1.set_ylabel('Loss', color=color)  
ax1.plot(loss_list, color=color)  
ax1.tick_params(axis='y', labelcolor=color)  

fig.tight_layout()
plt.grid(True) 
plt.show()

效果如下图:

可以看到loss还在下降,增加epoch及dataset大小可以获得更加效果

验证模型效果

def validate(model, train_loader, criterion):
    model.eval()
    sum_loss = 0.0
    sum_acc = 0.0
    with torch.no_grad():
        for images, labels in train_loader:
            images = images.to(device)
            labels = labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            sum_loss += loss.item()
            
            acc = torch.sum((outputs.argmax(1).view(-1) == labels.float().view(-1)).to(dtype=torch.float))
            sum_acc += acc.item()   
        print(f"Validate -- Loss : {sum_loss}, Acc : {sum_acc/len(train_dataset)}")

最终acc为0.788

二、ResNet50

大体流程与上文一致

导入模型、导入数据

from transformers import AutoImageProcessor, ResNetForImageClassification
import datasets

# 定义最后的二分类线性层
cls = nn.Sequential(
    nn.Linear(1000, 1),
    nn.Sigmoid()
)
processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50")
model = ResNetForImageClassification.from_pretrained("microsoft/resnet-50")
model.add_module("cls", cls)
model = model.to(device)

注意:这里只是添加进model,但是没有重写forward,因此cls不会一起被调用


train_label = pd.read_csv('/kaggle/input/deepfake/phase1/trainset_label.txt')
val_label = pd.read_csv('/kaggle/input/deepfake/phase1/valset_label.txt')

train_label['path'] = '/kaggle/input/deepfake/phase1/trainset/' + train_label['img_name']
val_label['path'] = '/kaggle/input/deepfake/phase1/valset/' + val_label['img_name']

train_dataset = datasets.Dataset.from_pandas(train_label.head(500))
val_dataset = datasets.Dataset.from_pandas(val_label.head(500))
train_dataset

在官方给的示例中,使用了datasets库中的Dataset,这个更加便捷,可以一键从pandas, csv, dict等数据结构中导入数据,同时支持自定义数据预处理方式。同时官方给出的数据处理方法与transform类似,命名为processor,可以将img_file处理为dict格式 ,包含pixel_values属性。而model也就是对pixel_values的数据进行处理。具体可查阅官网

dataset.with_transform(....)
dataset.map(
    func,
    ....
    ....
)

同时官方给出的模型,支持File类型,即:

img = Image.open("img_path")
model(**processor(img, return_tensors="pt"))

注意:模型的使用与一般的有差异,需要传入pixvel_values数据,这个数据可以通过processor处理图像文件得到。此外outputs也不是直接接收model的返回数据。model返回的不单是结果,真正的目标数据在logits属性上。

logits = model(pixel_values=images).logits

## 等价于pytorch
outputs = model(inputs)

定义Dataloader

del train_label, val_label
train_loader = DataLoader(train_dataset,batch_size=64,shuffle=True)
val_loader = DataLoader(val_dataset,batch_size=64,shuffle=False)

训练模型

epochs = 5
criterion = nn.BCELoss().to(device)
optimizer = torch.optim.SGD(model.cls.parameters(), lr=1e-4)
running_loss_list = []
# model = model.cpu()
model.train()
for epoch in tqdm.tqdm(range(epochs)):
    print('Epoch {}/{}'.format(epoch+1, epochs))
    print('-' * 10)
    running_loss = 0.0
    for item in train_loader:
        optimizer.zero_grad()
        label = item["target"].unsqueeze(1)
        paths = item["path"]
        images = []
        for path in paths:
            image = processor(Image.open(path), return_tensors="pt").get('pixel_values').to(device)
            images.append(image)

        images = torch.cat(images, dim=0)
        outputs = model(pixel_values=images)
        logits = outputs.get("logits")
        binary_result = model.cls(logits)
        binary_result = binary_result.cpu()
        loss = criterion(binary_result, label.to(dtype=torch.float))
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print("-----    " , running_loss , "    -----")
    running_loss_list.append(running_loss)
    

注意代码中手动从logits在进入cls层得到二分类结果的步骤,也可以自定义一下forward

绘制Loss曲线

x = list(range(len(running_loss_list)))

plt.figure()

plt.plot(x, running_loss_list, marker='o')

plt.title('loss - epoch')
plt.ylabel('loss')
plt.grid(True) 

plt.show()

验证模型效果

model.eval()
eval_accuracy = 0.0
for item in val_loader:
    with torch.no_grad():
        label = item["target"].unsqueeze(1)
        paths = item["path"]
        images = []
        for path in paths:
            image = processor(Image.open(path), return_tensors="pt").get('pixel_values').to(device)
            images.append(image)

        images = torch.cat(images, dim=0)
        outputs = model(pixel_values=images)
        logits = outputs.get("logits")
        binary_result = model.cls(logits)
        binary_result = binary_result.cpu()
        binary_result = (binary_result >= 0.5).to(dtype=torch.int)
        accuracy = torch.sum(binary_result == label).item()
        eval_accuracy += accuracy
print(eval_accuracy/len(val_dataset))

最终acc为0.604

三、比较

通过对比可以发现,timm的模型更接近于pytorch的实现,调用方式等都相同。而transformers中的函数面向更加宽泛的使用场景,尤其是NLP方向,需要注意一点小差异,整体上更加方便

efficientnet与rsenet相比,不仅参数量更少,效果也更佳。在本题的模型大小具有明确限制的情况下,使用efficientnet更优。经实践,在全数据下跑一个epoch,efficientnet的最终score即可达到0.98

  • 8
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值