李宏毅_机器学习_作业3（详解）_HW3 Image Classification

loco_monkey

已于 2022-07-04 21:36:07 修改

阅读量4.6k

点赞数 14

分类专栏：李宏毅_机器学习_homework 文章标签：机器学习深度学习 python

于 2022-07-04 18:28:21 首次发布

本文链接：https://blog.csdn.net/loco_monkey/article/details/125565805

版权

本文介绍了在机器学习图像分类任务中，如何通过数据增强提高模型性能，包括训练过程中的数据增强技巧、Residual Network的应用以及交叉验证策略。在简单模型的基础上，逐步增加数据增强、模型设计和训练轮数，最终实现模型准确率的提升。在应用数据增强和ResNet后，结合交叉验证策略，有效防止过拟合，提升了模型在验证集上的表现。

摘要由CSDN通过智能技术生成

Baseline

Simple:0.50099(运行初始代码即可)
Medium:0.73207 Training Augmentation+Train Longer
进行数据增强，增加训练次数
Strong:0.81872 Training Augmentation+Model Design+Train Longer(Cross
Validation+Ensemble)
Boss:0.88446 Training Augmentation+Model Design+Test Time Augmentation +Train Longer(Cross Validation+Ensemble)

Simple

初始代码运行时遇到了一个bug,在Dataset模块
一开始使用GPU训练时，报错了但无法定位出错位置，后来换成CPU训练,定位到错误

解决了一个bug

使用GPU时的error

RuntimeError                              Traceback (most recent call last)
<ipython-input-10-b55e170576c6> in <module>()
     49 
     50         # Compute the gradients for parameters.
---> 51         loss.backward()
     52 
     53         # Clip the gradient norms for stable training.

D:\Python_resource\ANACONDA\anaconda\envs\pytorch\lib\site-packages\torch\_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

使用CPU时的error
切换CPU训练时，得重启项目

IndexError                                Traceback (most recent call last)
<ipython-input-9-8535fa60da04> in <module>()
     43         # Calculate the cross-entropy loss.
     44         # We don't need to apply softmax before computing cross-entropy as it is done automatically.
---> 45         loss = criterion(logits, labels.to(device))
     46 
     47         # Gradients stored in the parameters in the previous step should be cleared out first.

IndexError: Target -1 is out of bounds.

发现是在计算training set的loss时出现了-1的下标越界，然后就通过train_dataset查看了读取到的training data，发现所有training_data读取进来的label都是-1，然后查看数据集，所有training图片都没有-1的label

#所有training_data读取进来的label都是-1
print(train_set.__getitem__(1))
'''
(tensor([[[0.0118, 0.0196, 0.0235,  ..., 0.0078, 0.0000, 0.0039],
         [0.1843, 0.3843, 0.4392,  ..., 0.1647, 0.0118, 0.0039],
         [0.5804, 0.6314, 0.6353,  ..., 0.2863, 0.0157, 0.0000],
         ...,
         [0.3804, 0.3686, 0.3647,  ..., 0.7294, 0.2314, 0.0078],
         [0.4078, 0.4039, 0.3922,  ..., 0.6549, 0.1412, 0.0039],
         [0.2039, 0.3294, 0.3765,  ..., 0.2980, 0.0314, 0.0039]],

        [[0.0118, 0.0235, 0.0314,  ..., 0.0118, 0.0039, 0.0000],
         [0.2118, 0.4157, 0.4784,  ..., 0.1608, 0.0118, 0.0000],
         [0.6275, 0.6824, 0.6902,  ..., 0.2745, 0.0157, 0.0000],
         ...,
         [0.0157, 0.0157, 0.0157,  ..., 0.7216, 0.2235, 0.0039],
         [0.0196, 0.0118, 0.0157,  ..., 0.6471, 0.1333, 0.0039],
         [0.0196, 0.0314, 0.0196,  ..., 0.2863, 0.0235, 0.0039]],

        [[0.0196, 0.0275, 0.0353,  ..., 0.0275, 0.0039, 0.0157],
         [0.1804, 0.3765, 0.4275,  ..., 0.1451, 0.0078, 0.0157],
         [0.5608, 0.6118, 0.6118,  ..., 0.2902, 0.0314, 0.0118],
         ...,
         [0.1176, 0.1137, 0.1137,  ..., 0.5882, 0.1882, 0.0118],
         [0.1216, 0.1216, 0.1216,  ..., 0.5451, 0.1176, 0.0078],
         [0.0549, 0.1020, 0.1176,  ..., 0.2431, 0.0353, 0.0118]]]), -1)
'''

Datasets
然后查看读取数据的Dataset部分,定位到产生label的部分

class FoodDataset(Dataset):

    def __init__(self,path,tfm=test_tfm,files = None):
        super(FoodDataset).__init__()
        self.path = path
        self.files = sorted([os.path.join(path,x) for x in os.listdir(path) if x.endswith(".jpg")])
        if files != None:
            self.files = files
        print(f"One {
     path} sample",self.files[0])
        self.transform = tfm
  
    def __len__(self):
        return len(self.files)
  
    def __getitem__(self,idx):
        fname = self.files