噼里啪啦 图像分类篇

ResNet34训练

从头训练train from scratch

  • 这里比较头疼的是:为什么resnet的训练中,layer4的最后一个conv层,其均值和方差在训练中出现了突变;某几个batch的数据老是在这样跳变。这是为什么?
    • 数据问题?
    • 适应性问题?
    • lr问题?
  • 消融实验:看fastai的库是不是有相同的问题,若也有,结论就不一样了。
    • ​​​​​​​

 

迁移训练

 

 

 

VGG网络训练

那个全连接参数太多,就改小一点;然后bs=16可以训练起来。
(1)CBR结构,加了BN层,ReLU层。
(2)acc=80%多;loss下降的也是曲折;20个epoch;总有几个batch在突变;

VGG加载预训练权重

(1)没有BN层,原始的VGG模型,加载了预训练权重;只训练head部分
(2)好像还是抖动的。


 2023年8月23日19:37:01

AlexNet 花分类 224输入图像尺寸

daisy: 633张图片 雏菊
dandelion: 898张图片 蒲公英
roses: 641张图片 玫瑰
sunflowers: 699张图片 向日葵
tulips: 799张图片 郁金香

  1. 模型方面:全连接参数太多不好训练,第一个全连接那里有个全局maxpool,降低参数了了
  2. 训练50个epoch,无预训练:acc=70%;固定学习率lr=0.0002,有BN、leaky_relu;
    1. resnet加预训练后:3个epoch=94%
  3. 训练50个epoch,feature的最后一个卷积层如下所示:网络动荡!

 

import torch.nn as nn
import torch
import torch.nn.functional as F
class GenerativeRelu(nn.Module):
    def __init__(self, leak=None, sub=0.4, maxv=6.) -> None:
        super().__init__()
        self.leak, self.sub, self.maxv = leak, sub, maxv
    def forward(self, x):
        x = F.leaky_relu(x, self.leak) if self.leak is not None else F.relu(x)
        if self.sub is not None: x.sub_(self.sub)
        if self.maxv is not None: x.clamp_max_(self.maxv)
        return x
class AlexNet(nn.Module):
    def __init__(self, num_classes=1000, init_weights=False, leaky_relu=False, negative_slope=0.1, bn=False, hook=False):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2),  # input[3, 224, 224]  output[48, 55, 55]
            nn.BatchNorm2d(48, eps=1e-5, momentum=0.1) if bn else None,GenerativeRelu(leak=negative_slope) if leaky_relu else nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[48, 27, 27]

            nn.Conv2d(48, 128, kernel_size=5, padding=2),           # output[128, 27, 27]
            nn.BatchNorm2d(128, eps=1e-5, momentum=0.1) if bn else None,GenerativeRelu(leak=negative_slope) if leaky_relu else nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[128, 13, 13]

            nn.Conv2d(128, 192, kernel_size=3, padding=1),          # output[192, 13, 13]           
            nn.BatchNorm2d(192, eps=1e-5, momentum=0.1) if bn else None, GenerativeRelu(leak=negative_slope) if leaky_relu else nn.ReLU(inplace=True),

            nn.Conv2d(192, 192, kernel_size=3, padding=1),          # output[192, 13, 13]
            nn.BatchNorm2d(192, eps=1e-5, momentum=0.1) if bn else None, GenerativeRelu(leak=negative_slope) if leaky_relu else nn.ReLU(inplace=True),

            nn.Conv2d(192, 128, kernel_size=3, padding=1),          # output[128, 13, 13]
            nn.BatchNorm2d(128, eps=1e-5, momentum=0.1) if bn else None, GenerativeRelu(leak=negative_slope) if leaky_relu else nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[128, 6, 6]
        )
        self.maxpool = nn.AdaptiveMaxPool2d(1)
        self.classifier = nn.Sequential( # 加了BN,没必要dropout了
            # nn.Dropout(p=0.5),  
            # nn.Linear(128 * 6 * 6, 2048),      # 这一层参数太多,改成VGG那种,直接降低到1*1的featuremap 
            nn.Linear(128, 2048),
            nn.BatchNorm1d(2048, eps=1e-5, momentum=0.1) if bn else None, GenerativeRelu(leak=negative_slope) if leaky_relu else nn.ReLU(inplace=True),
            # nn.Dropout(p=0.5),
            nn.Linear(2048, 2048),
            nn.BatchNorm1d(2048, eps=1e-5, momentum=0.1) if bn else None, GenerativeRelu(leak=negative_slope) if leaky_relu else nn.ReLU(inplace=True),
            nn.Linear(2048, num_classes),
        )
        if init_weights:
            self._initialize_weights(leaky_relu)
        if hook:
            to_hook_layer = [l for l in self.features if isinstance(l, nn.Conv2d)]
            to_hook_layer.append(self.classifier[0])
            to_hook_layer.append(self.classifier[2])

        self.hooks = Hooks([self.features[14]], append_stats)

    def forward(self, x):
        x = self.features(x)
        x = self.maxpool(x)
        x = torch.flatten(x, start_dim=1)
        x = self.classifier(x)

        return x

    def _initialize_weights(self, leaky_relu):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                if leaky_relu:
                    nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='leaky_relu')  # 从relu改为leaky_relu,因为前者大部分都神经元都死了
                else:
                     nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                # nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='leaky_relu') # 在nn.init中制订了nonlinearity非线性函数后,可以自动指定增益的gain的,也就是a
                if leaky_relu:
                    nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='leaky_relu')  # 从relu改为leaky_relu,因为前者大部分都神经元都死了
                else:
                     nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                # nn.init.normal_(m.weight, 0, 0.01) # up主
                nn.init.constant_(m.bias, 0)
# def append_stats(hook, mod, inp, outp):
#     if not hasattr(hook,'stats'): hook.stats = ([],[],[])
#     means,stds,hists = hook.stats
#     means.append(outp.data.detach().cpu().mean())
#     stds .append(outp.data.detach().cpu().std())
#     hists.append(outp.data.cpu().histc(40,-7,7))


2023年8月25日10:12:40 修改bug:
CBR写成了CRB,BN和relu的顺序写错了。重新运行实验。

LeNet网络实验-cifar10数据集,32*32输入尺寸太小了

  1. 这里的mean和std是不正常的,说明网络爆炸了,方差越来越大了;且第一层Conv层,80%激活元都为0
    1. 什么原因呢?——conv层和Linear层的权重没有正确的初始化!
    2. solution:加入了kaiming_normal_初始化化,依然没有效果!
  2. 优化1:改为5个CNN层,ReLU,无BN;
    1. 可以看出:前面4层的均值好多了!但方差还是很大的;尤其是最后一层的方差简直炸毛了!!!
       
  3. 优化3:改为normal_init初始化:
    1. 一下子就正常了:跟上面相比,就是初始化的方式变了,说明normal更加适合conv层的初始化
    2. 但是第一层还是有80%的激活元是死的。最后一层的激活元是乱炸的,完全是混乱和随机预测的。
  4. 优化4:relu改为Leaky-ReLU,统计直方图同时也要变化
    1. 这里的均值更小了;应该的,因为leaky-ReLU保留的负数多了;最后一层的变化要明显一些。fc2层
    2. 也是后面的层方差变化也要剧烈一些
    3. 靠近后面的层方差变化剧烈:说明先更新参数的是后面,也就是越靠近输出的层,梯度更新越大!前面的层没有怎么被更新到!
    4. 第一层的激活元为0的比列降低到了60%——leaky-relu可以让反向传播更丰富一些,传递的更远
    5. 最后一层也没有再炸毛了。
  5. 优化5:加入了BN层
    1. 一下子预先的毛刺就笑了一些;尤其是最后一层的毛刺平滑了很多。BN有利于网络的收敛和稳定性
    2. 前面三层的均值都在0附近,第四层的均值在推高!!!但第四层的方差又是稳定的;说明第四层可能是在第三层上的BN上,加了一个偏置抬升了。

  6. 优化5:增大学习率——爆炸了,lr=0.1
  7. 优化6:lr=0.01,还是爆炸了;a=0.1
  8. 优化7:a=sqrt(2);初始化参数修改;前面的a=0.1,这里修改一下,加了ReLU之后,参数应该变化了!!
    1. 和前面的5相比,要更好一些了。看第四层要开始收敛了,没有继续扩大均值;
    2. 四层的std都要小一些了。第一层的80%激活元都还是0。
  9. 优化8: lr=0.01,还是要爆炸
  10. 加入1cycle后看变化
    1. lr过大,还是要爆炸
    2. max_lr=0.005; 看到第一层训练的要好些了,不是一直是0了;所以学习率慢慢变小,是有作用的哦!!!真个网络收敛的快一些  

  • 2023年8月14日10:06:23 更新:有个问题,我更改的a应该是针对nn.init.kaiming_uniform(a=0/1) ,但是下面代码用的是kaiming_normal_,a的意义应该不一样了。搜易……
    • 对于深度学习中,一个小的bug,虽然不起眼,也不容易发现;这种类型的问题,还需要小心注意的得出结论。

1. LeNet初始化权重的问题

  • 由于我使用的是torch 1.10.0的版本,其Conv2d的init是使用a=sqrt(5)
    • 我将这里的torch默认初始化改为a=1之后,acc的对比如下:
  • 可以看出:更改初始化之后,5个epoch,acc提高了3个点。
  • 改为a=0后,继续上升了,loss也下降了,acc也提高了,所以初始化还是很重要的,这才5个epoch。
    • 更重要的是初始化影响了后面的训练过程。
# 这是init a=sqrt(5)初始化后的情况
[1,   500] train_loss: 1.756  test_accuracy: 0.458
[1,  1000] train_loss: 1.434  test_accuracy: 0.515
[2,   500] train_loss: 1.191  test_accuracy: 0.573
[2,  1000] train_loss: 1.173  test_accuracy: 0.600
[3,   500] train_loss: 1.037  test_accuracy: 0.624
[3,  1000] train_loss: 1.017  test_accuracy: 0.626
[4,   500] train_loss: 0.917  test_accuracy: 0.638
[4,  1000] train_loss: 0.916  test_accuracy: 0.645
[5,   500] train_loss: 0.851  test_accuracy: 0.666
[5,  1000] train_loss: 0.839  test_accuracy: 0.655
Finished Training

# conv2d权重重新初始化后,看看效果: 重新初始化a=1,因为没有ReLU
[1,   500] train_loss: 1.693  test_accuracy: 0.479
[1,  1000] train_loss: 1.397  test_accuracy: 0.538
[2,   500] train_loss: 1.171  test_accuracy: 0.583
[2,  1000] train_loss: 1.110  test_accuracy: 0.612
[3,   500] train_loss: 0.988  test_accuracy: 0.649
[3,  1000] train_loss: 0.966  test_accuracy: 0.658
[4,   500] train_loss: 0.862  test_accuracy: 0.657
[4,  1000] train_loss: 0.872  test_accuracy: 0.684
[5,   500] train_loss: 0.769  test_accuracy: 0.680
[5,  1000] train_loss: 0.797  test_accuracy: 0.684
Finished Training

# 搞错了,forward中有ReLU,所以还得改为a=0,看结果
[1,   500] train_loss: 1.640  test_accuracy: 0.522
[1,  1000] train_loss: 1.338  test_accuracy: 0.554
[2,   500] train_loss: 1.126  test_accuracy: 0.595
[2,  1000] train_loss: 1.074  test_accuracy: 0.638
[3,   500] train_loss: 0.951  test_accuracy: 0.646
[3,  1000] train_loss: 0.935  test_accuracy: 0.652
[4,   500] train_loss: 0.832  test_accuracy: 0.675
[4,  1000] train_loss: 0.844  test_accuracy: 0.679
[5,   500] train_loss: 0.746  test_accuracy: 0.691
[5,  1000] train_loss: 0.771  test_accuracy: 0.690
Finished Training

# 改为BCEloss,多标签(虽然只会有一个标签),但是要去掉softmax的绝对地位,所以看下效果呢
[1,   500] train_loss: 0.267  test_accuracy: 0.454
[1,  1000] train_loss: 0.221  test_accuracy: 0.524
[2,   500] train_loss: 0.190  test_accuracy: 0.575
[2,  1000] train_loss: 0.183  test_accuracy: 0.609
[3,   500] train_loss: 0.164  test_accuracy: 0.640
[3,  1000] train_loss: 0.163  test_accuracy: 0.650
[4,   500] train_loss: 0.150  test_accuracy: 0.662
[4,  1000] train_loss: 0.149  test_accuracy: 0.666
[5,   500] train_loss: 0.139  test_accuracy: 0.681
[5,  1000] train_loss: 0.140  test_accuracy: 0.676
Finished Training

  • 没看到下面forward是加了ReLU的,所以还要改为a=0,测试一下。
import torch.nn as nn
import torch.nn.functional as F


class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 5)
        self.pool1 = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, 5)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, a=1, mode='fan_out') # 没有ReLU,a=1,有ReLU, a=0
            elif isinstance(m, nn.Linear):
                nn.init.kaiming_normal_(m.weight, a=1, mode='fan_in')


    def forward(self, x):
        x = F.relu(self.conv1(x))    # input(3, 32, 32) output(16, 28, 28)
        x = self.pool1(x)            # output(16, 14, 14)
        x = F.relu(self.conv2(x))    # output(32, 10, 10)
        x = self.pool2(x)            # output(32, 5, 5)
        x = x.view(-1, 32*5*5)       # output(32*5*5)
        x = F.relu(self.fc1(x))      # output(120)
        x = F.relu(self.fc2(x))      # output(84)
        x = self.fc3(x)              # output(10)
        return x


up主的结果是:68.6% 

 2. AlexNet初始化网络权重的问题

  • 在改了conv2d和linear的初始化为a=0后,因为后面都跟了ReLU的,acc都有上升的。
[epoch 1] train_loss: 1.356  val_accuracy: 0.429
[epoch 2] train_loss: 1.187  val_accuracy: 0.500
[epoch 3] train_loss: 1.095  val_accuracy: 0.544
[epoch 4] train_loss: 1.037  val_accuracy: 0.593
[epoch 5] train_loss: 0.993  val_accuracy: 0.577
[epoch 6] train_loss: 0.923  val_accuracy: 0.618
[epoch 7] train_loss: 0.908  val_accuracy: 0.640
[epoch 8] train_loss: 0.878  val_accuracy: 0.676
[epoch 9] train_loss: 0.847  val_accuracy: 0.646
[epoch 10] train_loss: 0.831  val_accuracy: 0.670
Finished Training

# 改初始化方式:为a=0之后
[epoch 1] train_loss: 1.350  val_accuracy: 0.486
[epoch 2] train_loss: 1.163  val_accuracy: 0.508
[epoch 3] train_loss: 1.086  val_accuracy: 0.571
[epoch 4] train_loss: 1.012  val_accuracy: 0.640
[epoch 5] train_loss: 0.955  val_accuracy: 0.651
[epoch 6] train_loss: 0.920  val_accuracy: 0.657
[epoch 7] train_loss: 0.907  val_accuracy: 0.684
[epoch 8] train_loss: 0.847  val_accuracy: 0.690
[epoch 9] train_loss: 0.831  val_accuracy: 0.670
[epoch 10] train_loss: 0.805  val_accuracy: 0.695
Finished Training

# 改为BCEloss后,acc也提高了介个点
[epoch 1] train_loss: 0.449  val_accuracy: 0.522
[epoch 2] train_loss: 0.400  val_accuracy: 0.538
[epoch 3] train_loss: 0.357  val_accuracy: 0.629
[epoch 4] train_loss: 0.341  val_accuracy: 0.618
[epoch 5] train_loss: 0.319  val_accuracy: 0.613
[epoch 6] train_loss: 0.307  val_accuracy: 0.670
[epoch 7] train_loss: 0.321  val_accuracy: 0.648
[epoch 8] train_loss: 0.285  val_accuracy: 0.681
[epoch 9] train_loss: 0.286  val_accuracy: 0.692
[epoch 10] train_loss: 0.266  val_accuracy: 0.703
Finished Training

加入模型诊断:均值、方差、直方图、激活元为0的比例统计

  • 这个网络训练的是有问题的,①acc波动很大;②loss看起来在下降,其实是个偶然;③从mean,std和直方图三大工具可以看出,这个网络的训练是畸形的。大部分神经元都死了,什么原因呢?

 

import torch.nn as nn
import torch


class AlexNet(nn.Module):
    def __init__(self, num_classes=1000, init_weights=False, leaky_relu=False, negative_slope=0.1):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2),  # input[3, 224, 224]  output[48, 55, 55]
            nn.LeakyReLU(negative_slope=0.01) if leaky_relu else nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[48, 27, 27]
            nn.Conv2d(48, 128, kernel_size=5, padding=2),           # output[128, 27, 27]
            nn.LeakyReLU(negative_slope=0.01) if leaky_relu else nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[128, 13, 13]
            nn.Conv2d(128, 192, kernel_size=3, padding=1),          # output[192, 13, 13]
            nn.LeakyReLU(negative_slope=0.01) if leaky_relu else nn.ReLU(inplace=True),
            nn.Conv2d(192, 192, kernel_size=3, padding=1),          # output[192, 13, 13]
            nn.LeakyReLU(negative_slope=0.01) if leaky_relu else nn.ReLU(inplace=True),
            nn.Conv2d(192, 128, kernel_size=3, padding=1),          # output[128, 13, 13]
            nn.LeakyReLU(negative_slope=0.01) if leaky_relu else nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[128, 6, 6]
        )
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(128 * 6 * 6, 2048),
            nn.LeakyReLU(negative_slope=0.01) if leaky_relu else nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(2048, 2048),
            nn.LeakyReLU(negative_slope=0.01) if leaky_relu else nn.ReLU(inplace=True),
            nn.Linear(2048, num_classes),
        )
        if init_weights:
            self._initialize_weights(leaky_relu)
        
        # modules = nn.Sequential()
        # for i,m in enumerate(self.features):
        #         if isinstance(m, nn.Conv2d):
        #             modules.append(m)
        self.hooks = Hooks(self.features, append_stats)
        # # 做模型诊断,看均值和方差
        # self.act_means = [[] for l in self.features if isinstance(l, nn.Conv2d)]
        # self.act_stds = [[] for l in self.features if isinstance(l, nn.Conv2d)]

        # def append_stats(i, mod, inp, outp):
        #     self.act_means[i].append(outp.data.mean())
        #     self.act_stds[i].append(outp.data.std())
        # for i,m in enumerate(self.features):
        #     if isinstance(m, nn.Conv2d):
        #         from functools import partial
        #         m.register_forward_hook(partial(append_stats, i))

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, start_dim=1)
        x = self.classifier(x)

        return x

    def _initialize_weights(self, leaky_relu):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                if leaky_relu:
                    nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='leaky_relu')  # 从relu改为leaky_relu,因为前者大部分都神经元都死了
                else:
                     nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                # nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='leaky_relu') # 在nn.init中制订了nonlinearity非线性函数后,可以自动指定增益的gain的,也就是a
                if leaky_relu:
                    nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='leaky_relu')  # 从relu改为leaky_relu,因为前者大部分都神经元都死了
                else:
                     nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='relu')
                # nn.init.normal_(m.weight, 0, 0.01) # up主
                nn.init.constant_(m.bias, 0)

from functools import partial
def listify(o):
    if o is None: return []
    if isinstance(o, list): return o
    if isinstance(o, str): return [o]
    return [o]
class ListContainer():
    def __init__(self, items): self.items = listify(items)
    def __getitem__(self, idx):
        if isinstance(idx, (int,slice)): return self.items[idx]
        if isinstance(idx[0],bool):
            assert len(idx)==len(self) # bool mask
            return [o for m,o in zip(idx,self.items) if m]
        return [self.items[i] for i in idx]
    def __len__(self): return len(self.items)
    def __iter__(self): return iter(self.items)
    def __setitem__(self, i, o): self.items[i] = o
    def __delitem__(self, i): del(self.items[i])
    def __repr__(self):
        res = f'{self.__class__.__name__} ({len(self)} items)\n{self.items[:10]}'
        if len(self)>10: res = res[:-1]+ '...]'
        return res
    
class Hook():
    def __init__(self, m, f): 
        self.hook = m.register_forward_hook(partial(f, self)) # 给m注册一个append_stats函数,第一个参数为hook,也就是自己
    def remove(self): self.hook.remove()
    def __del__(self): self.remove()

def append_stats(hook, mod, inp, outp):
    if not hasattr(hook,'stats'): hook.stats = ([],[],[])
    means,stds,hists = hook.stats
    means.append(outp.data.detach().cpu().mean())
    stds .append(outp.data.detach().cpu().std())
    hists.append(outp.data.cpu().histc(40,0,5))

class Hooks(ListContainer):
    def __init__(self, ms, f): 
        super().__init__([Hook(m, f) for m in ms])
    def __enter__(self, *args): return self
    def __exit__ (self, *args): self.remove()
    def __del__(self): self.remove()

    def __delitem__(self, i):
        self[i].remove()
        super().__delitem__(i)
        
    def remove(self):
        for h in self: h.remove()

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值