李宏毅DeepLearning_hw3_Convolutional-Neural-Network_ImageClassification

最新推荐文章于 2024-02-29 23:40:07 发布

云澈丿

最新推荐文章于 2024-02-29 23:40:07 发布

阅读量342

点赞数 1

分类专栏：深度学习文章标签：深度学习人工智能

本文链接：https://blog.csdn.net/weixin_45333934/article/details/131781024

版权

深度学习专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Homework 3 - Convolutional Neural Network

Author: Yunche

Video: https://www.bilibili.com/video/BV1Wv411h7kN?t=0.8&p=35

Objectives:

This is the example code of homework 3 of the machine learning course by Prof. Hung-yi Lee.
In this homework, you are required to build a convolutional neural network for image classification, possibly with some advanced training tips.
There are three levels here:

Easy: Build a simple convolutional neural network as the baseline. (2 pts)

Medium: Design a better architecture or adopt different data augmentations to improve the performance. (2 pts)

Hard: Utilize provided unlabeled data to obtain better results. (2 pts)

About the Dataset

The dataset used here is food-11, a collection of food images in 11 classes.

For the requirement in the homework, TAs slightly modified the data.
Please DO NOT access the original fully-labeled training data or testing labels.

Also, the modified dataset is for this course only, and any further distribution or commercial use is forbidden.

数据集地址(Kaggle)

Import Packages

First, we need to import packages that will be used later.

In this homework, we highly rely on torchvision, a library of PyTorch.

import torch
import torchvision
import os
import torch.nn as nn
import numpy as np
import pandas as pd
import torch.nn.functional as F
import torchvision.transforms as transform
from PIL import Image
from torch.autograd import Variable
from torch.utils.data import Dataset, DataLoader, ConcatDataset, Subset
from torchvision.datasets import DatasetFolder, VisionDataset
from tqdm.auto import tqdm
import random


myseed = 99999  # set a random seed for reproducibility
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(myseed)
torch.manual_seed(myseed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(myseed)
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

Dataset, Data Loader, and Transforms

Torchvision provides lots of useful utilities for image preprocessing, data wrapping as well as data augmentation.

Here, since our data are stored in folders by class labels, we can directly apply torchvision.datasets.DatasetFolder for wrapping data without much effort.

Please refer to PyTorch official website for details about different transforms.

Data augmentation+dropout。对train_tfm进行修改，添加了常用的augmentation 方法，包括RandomResizedCrop（随机截取并resize）、RandomHorizontalFlip（随机横向翻转）、RandomVerticalFlip（随机竖向翻转）、RandomRoation（随机旋转）、RandomAffine（随机仿射）、RandomGrayscale（随机灰度化）。另外在模型的全连接层的最前面加上dropout层，注意dropout一定放到全连接层，千万不要放到卷积层。

# It is important to do data augmentation in training.
# However, not every augmentation is useful.
# Please think about what kind of augmentation is helpful for food recognition.
train_tfm = transform.Compose([
    # Resize the image into a fixed shape (height = width = 128)
    # transform.CenterCrop()
    transform.RandomResizedCrop((128, 128), scale=(0.7, 1.0)),
    # transform.AutoAugment(transforms.AutoAugmentPolicy.IMAGENET),
    # transform.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
    transform.RandomHorizontalFlip(0.5),
    transform.RandomVerticalFlip(0.5),
    transform.RandomRotation(180),
    transform.RandomAffine(30),
    # transform.RandomInvert(p=0.2),
    # transform.RandomPosterize(bits=2),
    # transform.RandomSolarize(threshold=192.0, p=0.2),
    # transform.RandomEqualize(p=0.2),
    transform.RandomGrayscale(p=0.2),
    transform.ToTensor(),
    # transform.RandomApply(torch.nn.ModuleList([]))
    # You may add some transforms here.
    # ToTensor() should be the last one of the transforms.
])

class FoodDataSet(Dataset):
    def __init__(self, path, tfm=test_tfm, files=None):
        super(FoodDataSet, self).__init__()
        self.path = path
        self.files = sorted([os.path.join(path, x) for x in os.listdir(path) if x.endswith(".jpg")])
        if files != None:
            self.files = files
        print(f"One {path} sample", self.files[0])
        self.transform = tfm

    def __len__(self):
        return len(self.files)

    def __getitem__(self, idx):
        fname = self.files[idx]
        im = Image.open(fname)
        im = self.transform(im)
        # im = self.data[idx]
        try:
            label = int(fname.split("/")[-1].split("_")[0])
        except:
            label = -1  # test has no label
        return im, label

Model

The basic model here is simply a stack of convolutional layers followed by some fully-connected layers.

Since there are three channels for a color image (RGB), the input channels of the network must be three.
In each convolutional layer, typically the channels of inputs grow, while the height and width shrink (or remain unchanged, according to some hyperparameters like stride and padding).

Before fed into fully-connected layers, the feature map must be flattened into a single one-dimensional vector (for each image).
These features are then transformed by the fully-connected layers, and finally, we obtain the “logits” for each class.

WARNING – You Must Know

You are free to modify the model architecture here for further improvement.
However, if you want to use some well-known architectures such as ResNet50, please make sure NOT to load the pre-trained weights.
Using such pre-trained models is considered cheating and therefore you will be punished.
Similarly, it is your responsibility to make sure no pre-trained weights are used if you use torch.hub to load any modules.

For example, if you use ResNet-18 as your model:

model = torchvision.models.resnet18(pretrained=False) → This is fine.

model = torchvision.models.resnet18(pretrained=True) → This is NOT allowed.

class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()
        # torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
        # torch.nn.MaxPool2d(kernel_size, stride, padding)
        # input 維度 [3, 128, 128]
        self.cnn = nn.Sequential(
            nn.Conv2d(3, 64, 3, 1, 1),  # [64, 128, 128]
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [64, 64, 64] 
           

            nn.Conv2d(64, 128, 3, 1, 1), # [128, 64, 64]
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [128, 32, 32]
          
            nn.Conv2d(128, 256, 3, 1, 1), # [256, 32, 32]
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [256, 16, 16]

            nn.Conv2d(256, 512, 3, 1, 1), # [512, 16, 16]
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),       # [512, 8, 8]
            
            nn.Conv2d(512, 512, 3, 1, 1), # [512, 8, 8]
            nn.BatchNorm2d(512),
            nn.ReLU(),  
            nn.MaxPool2d(2, 2, 0),       # [512, 4, 4]
        )
        self.fc = nn.Sequential(
            nn.Dropout(0.4),
            nn.Linear(512*4*4, 1024),
            nn.ReLU(),
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Linear(512, 11)
        )

    def forward(self, x):
        out = self.cnn(x)
        out = out.view(out.size()[0], -1)
        return self.fc(out)

Training

You can finish supervised learning by simply running the provided code without any modification.

The function “get_pseudo_labels” is used for semi-supervised learning.
It is expected to get better performance if you use unlabeled data for semi-supervised learning.
However, you have to implement the function on your own and need to adjust several hyperparameters manually.

For more details about semi-supervised learning, please refer to Prof. Lee’s slides.

Again, please notice that utilizing external data (or pre-trained model) for training is prohibited.

batch_size = 64
_dataset_dir = "../datasets/food11"
# Construct datasets.
# The argument "loader" tells how torchvision reads the data.
train_set = FoodDataset(os.path.join(_dataset_dir,"training"), tfm=train_tfm)
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True)
valid_set = FoodDataset(os.path.join(_dataset_dir,"validation"), tfm=test_tfm)
valid_loader = DataLoader(valid_set, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True)


device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)

# "cuda" only when GPUs are available.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)

# The number of training epochs and patience.
n_epochs = 100
patience = 8 # If no improvement in 'patience' epochs, early stop

# Initialize a model, and put it on the device specified.
model = Classifier().to(device)

# For the classification task, we use cross-entropy as the measurement of performance.
criterion = nn.CrossEntropyLoss()

# Initialize optimizer, you may fine-tune some hyperparameters such as learning rate on your own.
optimizer = torch.optim.Adam(model.parameters(), lr=0.0003, weight_decay=1e-5) 

# Initialize trackers, these are not parameters and should not be changed
stale = 0
best_acc = 0

for epoch in range(n_epochs):

    # ---------- Training ----------
    # Make sure the model is in train mode before training.
    model.train()

    # These are used to record information in training.
    train_loss = []
    train_accs = []

    for batch in tqdm(train_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch
        #imgs = imgs.half()
        #print(imgs.shape,labels.shape)

        # Forward the data. (Make sure data and model are on the same device.)
        logits = model(imgs.to(device))

        # Calculate the cross-entropy loss.
        # We don't need to apply softmax before computing cross-entropy as it is done automatically.
        loss = criterion(logits, labels.to(device))

        # Gradients stored in the parameters in the previous step should be cleared out first.
        optimizer.zero_grad()

        # Compute the gradients for parameters.
        loss.backward()

        # Clip the gradient norms for stable training.
        grad_norm = nn.utils.clip_grad_norm_(model.parameters(), max_norm=10)

        # Update the parameters with computed gradients.
        optimizer.step()

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

        # Record the loss and accuracy.
        train_loss.append(loss.item())
        train_accs.append(acc)
        
    train_loss = sum(train_loss) / len(train_loss)
    train_acc = sum(train_accs) / len(train_accs)

    # Print the information.
    print(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")

    # ---------- Validation ----------
    # Make sure the model is in eval mode so that some modules like dropout are disabled and work normally.
    model.eval()

    # These are used to record information in validation.
    valid_loss = []
    valid_accs = []

    # Iterate the validation set by batches.
    for batch in tqdm(valid_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch
        #imgs = imgs.half()

        # We don't need gradient in validation.
        # Using torch.no_grad() accelerates the forward process.
        with torch.no_grad():
            logits = model(imgs.to(device))

        # We can still compute the loss (but not the gradient).
        loss = criterion(logits, labels.to(device))

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

        # Record the loss and accuracy.
        valid_loss.append(loss.item())
        valid_accs.append(acc)
        #break

    # The average loss and accuracy for entire validation set is the average of the recorded values.
    valid_loss = sum(valid_loss) / len(valid_loss)
    valid_acc = sum(valid_accs) / len(valid_accs)

    # Print the information.
    print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")


    # update logs
    if valid_acc > best_acc:
        with open(f"./{_exp_name}_log.txt","a"):
            print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f} -> best")
    else:
        with open(f"./{_exp_name}_log.txt","a"):
            print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")


    # save models
    if valid_acc > best_acc:
        print(f"Best model found at epoch {epoch}, saving model")
        torch.save(model.state_dict(), f"{_exp_name}_best.ckpt")
        # only save best to prevent output memory exceed error
        best_acc = valid_acc
        stale = 0
    else:
        stale += 1
        if stale > patience:
            print(f"No improvment {patience} consecutive epochs, early stopping")
            break

Testing

For inference, we need to make sure the model is in eval mode, and the order of the dataset should not be shuffled (“shuffle=False” in test_loader).

Last but not least, don’t forget to save the predictions into a single CSV file.
The format of CSV file should follow the rules mentioned in the slides.

WARNING – Keep in Mind

Cheating includes but not limited to:

using testing labels,
submitting results to previous Kaggle competitions,
sharing predictions with others,
copying codes from any creatures on Earth,
asking other people to do it for you.

Any violations bring you punishments from getting a discount on the final grade to failing the course.

It is your responsibility to check whether your code violates the rules.
When citing codes from the Internet, you should know what these codes exactly do.
You will NOT be tolerated if you break the rule and claim you don’t know what these codes do.

test_set = FoodDataset(os.path.join(_dataset_dir,"test"), tfm=test_tfm)
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False, num_workers=0, pin_memory=True)

Testing and generate prediction CSV

model_best = Classifier().to(device)
model_best.load_state_dict(torch.load(f"{_exp_name}_best.ckpt"))
model_best.eval()  
prediction = []
with torch.no_grad():
    for data,_ in test_loader:
        test_pred = model_best(data.to(device))
        test_label = np.argmax(test_pred.cpu().data.numpy(), axis=1)
        prediction += test_label.squeeze().tolist()

#create test csv

def pad4(i):
    return "0" * (4 - len(str(i))) + str(i)


df = pd.DataFrame()
df["Id"] = [pad4(i) for i in range(1, len(test_set) + 1)]
df["Category"] = prediction
df.to_csv("submission.csv", index=False)

进阶使用残差网络

在这里插入图片描述

如图设计残差神经网络，基本的block包含两层卷积，并将卷积层的输出F(x)与block的输入x相加。这里需要注意的是，它们可能具有不同的维度。为了解决这个问题，可以使用1X1的卷积对x进行转换，使其与F(x)具有相同的维度。另外，在Kaiming He的论文中，还提出了使用zero-padding的方法来解决维度不同的情况。

相对于CrossEntropy，FocalLoss考虑了样本不均衡的问题，并增加了错误分类样本loss的权重。它有两个参数，即alpha和gamma。为了设定FocalLoss的alpha值，对各个样本的数量进行了统计，并根据不同类别的数目进行了设定。同时，将gamma值设为固定值2。

在Cross Validation + Ensemble中，使用了4-fold方法，得到了4个模型。在推理过程中，每张图片会产生4个输出结果。我将这4个输出结果相加，然后使用argmax函数得到最终的分类结果。

运行了代码后令人惊讶的是，Ensemble方法真的很强大。这四个模型中最好的准确率是0.79，最差的准确率是0.77。然而，当它们合并在一起时，准确率竟然大幅提高至0.85。

模型部分

  # class Classifier(nn.Module):
#     def __init__(self):
#         super(Classifier, self).__init__()
#         # torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
#         # torch.nn.MaxPool2d(kernel_size, stride, padding)
#         # input 維度 [3, 128, 128]
#         # 公式： out_channel = [(n - f + 2*p) / s] + 1
#         self.cnn_layers = nn.Sequential(
#             nn.Conv2d(3, 64, 3, 1, 1),  # [64, 128, 128]
#             nn.BatchNorm2d(64),
#             nn.ReLU(),
#             nn.MaxPool2d(2, 2, 0),  # [64, 64, 64]
# 
#             nn.Conv2d(64, 128, 3, 1, 1),  # [128, 64, 64]
#             nn.BatchNorm2d(128),
#             nn.ReLU(),
#             nn.MaxPool2d(2, 2, 0),  # [128, 32, 32]
# 
#             nn.Conv2d(128, 256, 3, 1, 1),  # [256, 32, 32]
#             nn.BatchNorm2d(256),
#             nn.ReLU(),
#             nn.MaxPool2d(2, 2, 0),  # [256, 16, 16]
# 
#             nn.Conv2d(256, 512, 3, 1, 1),  # [512, 16, 16]
#             nn.BatchNorm2d(512),
#             nn.ReLU(),
#             nn.MaxPool2d(2, 2, 0),  # [512, 8, 8]
# 
#             nn.Conv2d(512, 512, 3, 1, 1),  # [512, 8, 8]
#             nn.BatchNorm2d(512),
#             nn.ReLU(),
#             nn.MaxPool2d(2, 2, 0),  # [512, 4, 4]
#         )
#         self.fc = nn.Sequential(
#             nn.Dropout(0.4),
#             nn.Linear(512 * 4 * 4, 1024),
#             nn.ReLU(),
#             nn.Linear(1024, 512),
#             nn.ReLU(),
#             nn.Linear(512, 11)
#         )
# 
#     def forward(self, x):
#         out = self.cnn_layers(x)
#         out = out.view(out.size()[0], -1)
#         return self.fc(out)



class Residual_Block(nn.Module):
    def __init__(self, in_channel, out_channel, stride=1):
        # torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
        # torch.nn.MaxPool2d(kernel_size, stride, padding)
        super(Residual_Block, self).__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride, padding=1),
            nn.BatchNorm2d(out_channel),
            nn.ReLU(inplace=True)
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(out_channel, out_channel, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(out_channel),
        )
        self.relu = nn.ReLU(inplace=True)

        self.downsample = None

        if stride != 1 or (out_channel != in_channel):
            '''
                基本的block包含两层卷积，卷积层的输出F(x)与block的输入x相加，
                注意这两个可能不是相同维度的，如果不相同，我使用1X1的卷积对x
                进行变换使其与F(x)有相同的维度
            '''
            self.downsample = nn.Sequential(
                nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=stride),
                nn.BatchNorm2d(out_channel),
            )

    def forward(self, x):
        residual = x
        out = self.conv1(x)
        out = self.conv2(out)

        if self.downsample:
            residual = self.downsample(x)

        out += residual
        return self.relu(out)


class Classifier(nn.Module):
    def __init__(self, block, num_layers, num_classes=11):
        super(Classifier, self).__init__()
        self.preconv = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=7, stride=2, padding=3, bias=False),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
        )

        self.layer0 = self.make_residual(block, 32, 64, num_layers[0], stride=2)
        self.layer1 = self.make_residual(block, 64, 128, num_layers[1], stride=2)
        self.layer2 = self.make_residual(block, 128, 256, num_layers[2], stride=2)
        self.layer3 = self.make_residual(block, 256, 512, num_layers[3], stride=2)

        # self.avgpool = nn.AvgPool2d(2)

        self.fc = nn.Sequential(
            nn.Dropout(0.4),
            nn.Linear(512 * 4 * 4, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(0.2),
            nn.Linear(512, 11)
        )

    def make_residual(self, block, in_channel, out_channel, num_layer, stride=1):
        layers = []
        layers.append(block(in_channel, out_channel, stride))
        for i in range(1, num_layer):
            layers.append(block(out_channel, out_channel))
        return nn.Sequential(*layers)

    def forward(self, x):
        # [3, 128, 128]
        out = self.preconv(x)  # [32, 64, 64]
        out = self.layer0(out)  # [64, 32, 32]
        out = self.layer1(out)  # [128, 16, 16]
        out = self.layer2(out)  # [256, 8, 8]
        out = self.layer3(out)  # [512, 4, 4]
        # out = self.avgpool(out) # [512, 2, 2]
        out = self.fc(out.view(out.size(0), -1))
        return out


class FocalLoss(nn.Module):
    def __init__(self, class_num, alpha=None, gamma=2, size_average=True):
        super(FocalLoss, self).__init__()
        if alpha is None:
            self.alpha = Variable(torch.ones(class_num, 1))
        else:
            if isinstance(alpha, Variable):
                self.alpha = alpha
            else:
                self.alpha = Variable(alpha)
        self.gamma = gamma
        self.class_num = class_num
        self.size_average = size_average

    def forward(self, inputs, targets):
        N = inputs.size(0)
        C = inputs.size(1)
        P = F.softmax(inputs, dim=1)

        class_mask = inputs.data.new(N, C).fill_(0)
        class_mask = Variable(class_mask)
        ids = targets.view(-1, 1)
        class_mask.scatter_(1, ids.data, 1.)

        if inputs.is_cuda and not self.alpha.is_cuda:
            self.alpha = self.alpha.cuda()
        alpha = self.alpha[ids.data.view(-1)]
        probs = (P * class_mask).sum(1).view(-1, 1)

        log_p = probs.log()

        batch_loss = -alpha * (torch.pow((1 - probs), self.gamma)) * log_p

        if self.size_average:
            loss = batch_loss.mean()
        else:
            loss = batch_loss.sum()

        return loss

    class MyCrossEntropy(nn.Module):
        def __init__(self, class_num):
            pass

训练部分

batch_size = 64
num_layers = [2, 4, 3, 1]  # residual number layers
alpha = torch.Tensor([1, 2.3, 0.66, 1, 1.1, 0.75, 2.3, 3.5, 1.1, 0.66, 1.4])

# The number of training epochs and patience.
n_epochs = 300
patience = 32  # If no improvement in 'patience' epochs, early stop

k_fold = 4


train_dir = "../datasets/food11/training"
val_dir = "../datasets/food11/validation"
train_files = [os.path.join(train_dir, x) for x in os.listdir(train_dir) if x.endswith('.jpg')]
val_files = [os.path.join(val_dir, x) for x in os.listdir(val_dir) if x.endswith('.jpg')]
total_files = train_files + val_files
random.shuffle(total_files)

num = len(total_files) // k_fold  # 每個訓練批次的數據样本数

test_fold = k_fold

for i in range(test_fold):
    fold = i + 1
    print(f'\n\nStarting Fold: {fold} ********************************************')
    model = Classifier(Residual_Block, num_layers).to(device)
    criterion = FocalLoss(11, alpha=alpha)
    optimizer = torch.optim.Adam(model.parameters(), lr=0.0004, weight_decay=1e-5)
    # 调整优化器的学习率。它实现了一种基于余弦函数的学习率衰减策略，并结合了热重启（warm restarts）的机制
    scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=16, T_mult=1)
    stale = 0
    best_acc = 0

    val_data = total_files[i * num: (i + 1) * num]
    train_data = total_files[:i * num] + total_files[(i + 1) * num:]

    train_set = FoodDataSet(tfm=train_tfm, files=train_data)
    train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True)

    valid_set = FoodDataSet(tfm=test_tfm, files=val_data)
    valid_loader = DataLoader(valid_set, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True)

    for epoch in range(n_epochs):

        # ---------- Training ----------
        # Make sure the model is in train mode before training.
        model.train()

        # These are used to record information in training.
        train_loss = []
        train_accs = []
        lr = optimizer.param_groups[0]["lr"]

        pbar = tqdm(train_loader)
        pbar.set_description(f'T: {epoch + 1:03d}/{n_epochs:03d}')
        for batch in pbar:
            # A batch consists of image data and corresponding labels.
            imgs, labels = batch
            # imgs = imgs.half()
            # print(imgs.shape,labels.shape)

            # Forward the data. (Make sure data and model are on the same device.)
            logits = model(imgs.to(device))

            # Calculate the cross-entropy loss.
            # We don't need to apply softmax before computing cross-entropy as it is done automatically.
            loss = criterion(logits, labels.to(device))

            # Gradients stored in the parameters in the previous step should be cleared out first.
            optimizer.zero_grad()

            # Compute the gradients for parameters.
            loss.backward()

            # Clip the gradient norms for stable training.
            # 对模型的梯度进行裁剪，使梯度的范数不超过指定的阈值,防止梯度爆炸
            grad_norm = nn.utils.clip_grad_norm_(model.parameters(), max_norm=10)

            # Update the parameters with computed gradients.
            optimizer.step()

            # Compute the accuracy for current batch.
            acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

            # Record the loss and accuracy.
            train_loss.append(loss.item())
            train_accs.append(acc)
            pbar.set_postfix({'lr': lr, 'b_loss': loss.item(), 'b_acc': acc.item(),
                              'loss': sum(train_loss) / len(train_loss),
                              'acc': sum(train_accs).item() / len(train_accs)})

        scheduler.step()

验证部分


        # Make sure the model is in eval mode so that some modules like dropout are disabled and work normally.
        model.eval()

        # These are used to record information in validation.
        valid_loss = []
        valid_accs = []

        # Iterate the validation set by batches.
        pbar = tqdm(valid_loader)
        pbar.set_description(f'V: {epoch + 1:03d}/{n_epochs:03d}')
        for batch in pbar:
            # A batch consists of image data and corresponding labels.
            imgs, labels = batch
            # imgs = imgs.half()

            # We don't need gradient in validation.
            # Using torch.no_grad() accelerates the forward process.
            with torch.no_grad():
                logits = model(imgs.to(device))

            # We can still compute the loss (but not the gradient).
            loss = criterion(logits, labels.to(device))

            # Compute the accuracy for current batch.
            acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

            # Record the loss and accuracy.
            valid_loss.append(loss.item())
            valid_accs.append(acc)
            pbar.set_postfix({'v_loss': sum(valid_loss) / len(valid_loss),
                              'v_acc': sum(valid_accs).item() / len(valid_accs)})

            # break

        # The average loss and accuracy for entire validation set is the average of the recorded values.
        valid_loss = sum(valid_loss) / len(valid_loss)
        valid_acc = sum(valid_accs) / len(valid_accs)

        if valid_acc > best_acc:
            print(f"Best model found at fold {fold} epoch {epoch + 1}, acc={valid_acc:.5f}, saving model")
            torch.save(model.state_dict(), f"Fold_{fold}_best.ckpt")
            # only save best to prevent output memory exceed error
            best_acc = valid_acc
            stale = 0
        else:
            stale += 1
            if stale > patience:
                print(f"No improvment {patience} consecutive epochs, early stopping")
                break

得到结果


models = []
for i in range(test_fold):
    fold = i + 1
    model_best = Classifier(Residual_Block, num_layers).to(device)
    model_best.load_state_dict(torch.load(f"Fold_{fold}_best.ckpt"))
    model_best.eval()
    models.append(model_best)

prediction = []
with torch.no_grad():
    for data, _ in test_loader:
        test_preds = []
        for model_best in models:
            test_preds.append(model_best(data.to(device)).cpu().data.numpy())
        test_preds = sum(test_preds)
        test_label = np.argmax(test_preds, axis=1)
        prediction += test_label.squeeze().tolist()

# create test csv
见sample

云澈丿

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
李宏毅DeepLearning_hw3_Convolutional-Neural-Network_ImageClassification

李宏毅机器学习Homework 3 - Convolutional Neural Network-图片分类
复制链接

扫一扫