ResNet34训练
从头训练train from scratch
- 这里比较头疼的是:为什么resnet的训练中,layer4的最后一个conv层,其均值和方差在训练中出现了突变;某几个batch的数据老是在这样跳变。这是为什么?
- 数据问题?
- 适应性问题?
- lr问题?
- 消融实验:看fastai的库是不是有相同的问题,若也有,结论就不一样了。
-
迁移训练
VGG网络训练
那个全连接参数太多,就改小一点;然后bs=16可以训练起来。
(1)CBR结构,加了BN层,ReLU层。
(2)acc=80%多;loss下降的也是曲折;20个epoch;总有几个batch在突变;
VGG加载预训练权重
(1)没有BN层,原始的VGG模型,加载了预训练权重;只训练head部分
(2)好像还是抖动的。
2023年8月23日19:37:01
AlexNet 花分类 224输入图像尺寸
daisy: 633张图片 雏菊
dandelion: 898张图片 蒲公英
roses: 641张图片 玫瑰
sunflowers: 699张图片 向日葵
tulips: 799张图片 郁金香
- 模型方面:全连接参数太多不好训练,第一个全连接那里有个全局maxpool,降低参数了了
- 训练50个epoch,无预训练:acc=70%;固定学习率lr=0.0002,有BN、leaky_relu;
- resnet加预训练后:3个epoch=94%
- 训练50个epoch,feature的最后一个卷积层如下所示:网络动荡!
import torch.nn as nn
import torch
import torch.nn.functional as F
class GenerativeRelu(nn.Module):
def __init__(self, leak=None, sub=0.4, maxv=6.) -> None:
super().__init__()
self.leak, self.sub, self.maxv = leak, sub, maxv
def forward(self, x):
x = F.leaky_relu(x, self.leak) if self.leak is not None else F.relu(x)
if self.sub is not None: x.sub_(self.sub)
if self.maxv is not None: x.clamp_max_(self.maxv)
return x
class AlexNet(nn.Module):
def __init__(self, num_classes=1000, init_weights=False, leaky_relu=False, negative_slope=0.1, bn=False, hook=False):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2), # input[3, 224, 224] output[48, 55, 55]
nn.BatchNorm2d(48, eps=1e-5, momentum=0.1) if bn else None,GenerativeRelu(leak=negative_slope) if leaky_relu else nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), # output[48, 27, 27]
nn.Conv2d(48, 128, kernel_size=5, padding=2), # output[128, 27, 27]
nn.BatchNorm2d(128, eps=1e-5, momentum=0.1) if bn else None,GenerativeRelu(leak=negative_slope) if leaky_relu else nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), # output[128, 13, 13]
nn.Conv2d(128, 192, kernel_size=3, padding=1), # output[192, 13, 13]
nn.BatchNorm2d(192, eps=1e-5, momentum=0.1) if bn else None, GenerativeRelu(leak=negative_slope) if leaky_relu else nn.ReLU(inplace=True),
nn.Conv2d(192, 192, kernel_size=3, padding=1), # output[192, 13, 13]
nn.BatchNorm2d(192, eps=1e-5, momentum=0.1) if bn else None, GenerativeRelu(leak=negative_slope) if leaky_relu else nn.ReLU(inplace=True),
nn.Conv2d(192, 128, kernel_size=3, padding=1), # output[128, 13, 13]
nn.BatchNorm2d(128, eps=1e-5, momentum=0.1) if bn else None, GenerativeRelu(leak=negative_slope) if leaky_relu else nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), # output[128, 6, 6]
)
self.maxpool = nn.AdaptiveMaxPool2d(1)
self.classifier = nn.Sequential( # 加了BN,没必要dropout了
# nn.Dropout(p=0.5),
# nn.Linear(128 * 6 * 6, 2048), # 这一层参数太多,改成VGG那种,直接降低到1*1的featuremap
nn.Linear(128, 2048),
nn.BatchNorm1d(2048, eps=1e-5, momentum=0.1) if bn else None, GenerativeRelu(leak=negative_slope) if leaky_relu else nn.ReLU(inplace=True),
# nn.Dropout(p=0.5),
nn.Linear(2048, 2048),
nn.BatchNorm1d(2048, eps=1e-5, momentum=0.1) if bn else None, GenerativeRelu(leak=negative_slope) if leaky_relu else nn.ReLU(inplace=True),
nn.Linear(2048, num_classes),
)
if init_weights:
self._initialize_weights(leaky_relu)
if hook:
to_hook_layer = [l for l in self.features if isinstance(l, nn.Conv2d)]
to_hook_layer.append(self.classifier[0])
to_hook_layer.append(self.classifier[2])
self.hooks = Hooks([self.features[14]], append_stats)
def forward(self, x):
x = self.features(x)
x = self.maxpool(x)
x = torch.flatten(x, start_dim=1)
x = self.classifier(x)
return x
def _initialize_weights(self, leaky_relu):
for m in self.modules():
if isinstance(m, nn.Conv2d):
if leaky_relu:
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='leaky_relu') # 从relu改为leaky_relu,因为前者大部分都神经元都死了
else:
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
# nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='leaky_relu') # 在nn.init中制订了nonlinearity非线性函数后,可以自动指定增益的gain的,也就是a
if leaky_relu:
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='leaky_relu') # 从relu改为leaky_relu,因为前者大部分都神经元都死了
else:
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
# nn.init.normal_(m.weight, 0, 0.01) # up主
nn.init.constant_(m.bias, 0)
# def append_stats(hook, mod, inp, outp):
# if not hasattr(hook,'stats'): hook.stats = ([],[],[])
# means,stds,hists = hook.stats
# means.append(outp.data.detach().cpu().mean())
# stds .append(outp.data.detach().cpu().std())
# hists.append(outp.data.cpu().histc(40,-7,7))
2023年8月25日10:12:40 修改bug:
CBR写成了CRB,BN和relu的顺序写错了。重新运行实验。
LeNet网络实验-cifar10数据集,32*32输入尺寸太小了
- 这里的mean和std是不正常的,说明网络爆炸了,方差越来越大了;且第一层Conv层,80%激活元都为0
- 什么原因呢?——conv层和Linear层的权重没有正确的初始化!
- solution:加入了kaiming_normal_初始化化,依然没有效果!
- 优化1:改为5个CNN层,ReLU,无BN;
- 可以看出:前面4层的均值好多了!但方差还是很大的;尤其是最后一层的方差简直炸毛了!!!
- 可以看出:前面4层的均值好多了!但方差还是很大的;尤其是最后一层的方差简直炸毛了!!!
- 优化3:改为normal_init初始化:
- 一下子就正常了:跟上面相比,就是初始化的方式变了,说明normal更加适合conv层的初始化!
- 但是第一层还是有80%的激活元是死的。最后一层的激活元是乱炸的,完全是混乱和随机预测的。
- 优化4:relu改为Leaky-ReLU,统计直方图同时也要变化
- 这里的均值更小了;应该的,因为leaky-ReLU保留的负数多了;最后一层的变化要明显一些。fc2层
- 也是后面的层方差变化也要剧烈一些
- 靠近后面的层方差变化剧烈:说明先更新参数的是后面,也就是越靠近输出的层,梯度更新越大!前面的层没有怎么被更新到!
- 第一层的激活元为0的比列降低到了60%——leaky-relu可以让反向传播更丰富一些,传递的更远。
- 最后一层也没有再炸毛了。
- 优化5:加入了BN层
- 一下子预先的毛刺就笑了一些;尤其是最后一层的毛刺平滑了很多。BN有利于网络的收敛和稳定性。
- 前面三层的均值都在0附近,第四层的均值在推高!!!但第四层的方差又是稳定的;说明第四层可能是在第三层上的BN上,加了一个偏置抬升了。
- 优化5:增大学习率——爆炸了,lr=0.1
- 优化6:lr=0.01,还是爆炸了;a=0.1
- 优化7:a=sqrt(2);初始化参数修改;前面的a=0.1,这里修改一下,加了ReLU之后,参数应该变化了!!
- 和前面的5相比,要更好一些了。看第四层要开始收敛了,没有继续扩大均值;
- 四层的std都要小一些了。第一层的80%激活元都还是0。
- 优化8: lr=0.01,还是要爆炸
- 加入1cycle后看变化:
- lr过大,还是要爆炸
- max_lr=0.005; 看到第一层训练的要好些了,不是一直是0了;所以学习率慢慢变小,是有作用的哦!!!真个网络收敛的快一些。
- lr过大,还是要爆炸
- 2023年8月14日10:06:23 更新:有个问题,我更改的a应该是针对nn.init.kaiming_uniform(a=0/1) ,但是下面代码用的是kaiming_normal_,a的意义应该不一样了。搜易……
- 对于深度学习中,一个小的bug,虽然不起眼,也不容易发现;这种类型的问题,还需要小心注意的得出结论。
1. LeNet初始化权重的问题
- 由于我使用的是torch 1.10.0的版本,其Conv2d的init是使用a=sqrt(5)
- 我将这里的torch默认初始化改为a=1之后,acc的对比如下:
- 可以看出:更改初始化之后,5个epoch,acc提高了3个点。
- 改为a=0后,继续上升了,loss也下降了,acc也提高了,所以初始化还是很重要的,这才5个epoch。
- 更重要的是初始化影响了后面的训练过程。
# 这是init a=sqrt(5)初始化后的情况
[1, 500] train_loss: 1.756 test_accuracy: 0.458
[1, 1000] train_loss: 1.434 test_accuracy: 0.515
[2, 500] train_loss: 1.191 test_accuracy: 0.573
[2, 1000] train_loss: 1.173 test_accuracy: 0.600
[3, 500] train_loss: 1.037 test_accuracy: 0.624
[3, 1000] train_loss: 1.017 test_accuracy: 0.626
[4, 500] train_loss: 0.917 test_accuracy: 0.638
[4, 1000] train_loss: 0.916 test_accuracy: 0.645
[5, 500] train_loss: 0.851 test_accuracy: 0.666
[5, 1000] train_loss: 0.839 test_accuracy: 0.655
Finished Training
# conv2d权重重新初始化后,看看效果: 重新初始化a=1,因为没有ReLU
[1, 500] train_loss: 1.693 test_accuracy: 0.479
[1, 1000] train_loss: 1.397 test_accuracy: 0.538
[2, 500] train_loss: 1.171 test_accuracy: 0.583
[2, 1000] train_loss: 1.110 test_accuracy: 0.612
[3, 500] train_loss: 0.988 test_accuracy: 0.649
[3, 1000] train_loss: 0.966 test_accuracy: 0.658
[4, 500] train_loss: 0.862 test_accuracy: 0.657
[4, 1000] train_loss: 0.872 test_accuracy: 0.684
[5, 500] train_loss: 0.769 test_accuracy: 0.680
[5, 1000] train_loss: 0.797 test_accuracy: 0.684
Finished Training
# 搞错了,forward中有ReLU,所以还得改为a=0,看结果
[1, 500] train_loss: 1.640 test_accuracy: 0.522
[1, 1000] train_loss: 1.338 test_accuracy: 0.554
[2, 500] train_loss: 1.126 test_accuracy: 0.595
[2, 1000] train_loss: 1.074 test_accuracy: 0.638
[3, 500] train_loss: 0.951 test_accuracy: 0.646
[3, 1000] train_loss: 0.935 test_accuracy: 0.652
[4, 500] train_loss: 0.832 test_accuracy: 0.675
[4, 1000] train_loss: 0.844 test_accuracy: 0.679
[5, 500] train_loss: 0.746 test_accuracy: 0.691
[5, 1000] train_loss: 0.771 test_accuracy: 0.690
Finished Training
# 改为BCEloss,多标签(虽然只会有一个标签),但是要去掉softmax的绝对地位,所以看下效果呢
[1, 500] train_loss: 0.267 test_accuracy: 0.454
[1, 1000] train_loss: 0.221 test_accuracy: 0.524
[2, 500] train_loss: 0.190 test_accuracy: 0.575
[2, 1000] train_loss: 0.183 test_accuracy: 0.609
[3, 500] train_loss: 0.164 test_accuracy: 0.640
[3, 1000] train_loss: 0.163 test_accuracy: 0.650
[4, 500] train_loss: 0.150 test_accuracy: 0.662
[4, 1000] train_loss: 0.149 test_accuracy: 0.666
[5, 500] train_loss: 0.139 test_accuracy: 0.681
[5, 1000] train_loss: 0.140 test_accuracy: 0.676
Finished Training
- 没看到下面forward是加了ReLU的,所以还要改为a=0,测试一下。
import torch.nn as nn
import torch.nn.functional as F
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.conv1 = nn.Conv2d(3, 16, 5)
self.pool1 = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(16, 32, 5)
self.pool2 = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(32*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, a=1, mode='fan_out') # 没有ReLU,a=1,有ReLU, a=0
elif isinstance(m, nn.Linear):
nn.init.kaiming_normal_(m.weight, a=1, mode='fan_in')
def forward(self, x):
x = F.relu(self.conv1(x)) # input(3, 32, 32) output(16, 28, 28)
x = self.pool1(x) # output(16, 14, 14)
x = F.relu(self.conv2(x)) # output(32, 10, 10)
x = self.pool2(x) # output(32, 5, 5)
x = x.view(-1, 32*5*5) # output(32*5*5)
x = F.relu(self.fc1(x)) # output(120)
x = F.relu(self.fc2(x)) # output(84)
x = self.fc3(x) # output(10)
return x
up主的结果是:68.6%
2. AlexNet初始化网络权重的问题
- 在改了conv2d和linear的初始化为a=0后,因为后面都跟了ReLU的,acc都有上升的。
[epoch 1] train_loss: 1.356 val_accuracy: 0.429
[epoch 2] train_loss: 1.187 val_accuracy: 0.500
[epoch 3] train_loss: 1.095 val_accuracy: 0.544
[epoch 4] train_loss: 1.037 val_accuracy: 0.593
[epoch 5] train_loss: 0.993 val_accuracy: 0.577
[epoch 6] train_loss: 0.923 val_accuracy: 0.618
[epoch 7] train_loss: 0.908 val_accuracy: 0.640
[epoch 8] train_loss: 0.878 val_accuracy: 0.676
[epoch 9] train_loss: 0.847 val_accuracy: 0.646
[epoch 10] train_loss: 0.831 val_accuracy: 0.670
Finished Training
# 改初始化方式:为a=0之后
[epoch 1] train_loss: 1.350 val_accuracy: 0.486
[epoch 2] train_loss: 1.163 val_accuracy: 0.508
[epoch 3] train_loss: 1.086 val_accuracy: 0.571
[epoch 4] train_loss: 1.012 val_accuracy: 0.640
[epoch 5] train_loss: 0.955 val_accuracy: 0.651
[epoch 6] train_loss: 0.920 val_accuracy: 0.657
[epoch 7] train_loss: 0.907 val_accuracy: 0.684
[epoch 8] train_loss: 0.847 val_accuracy: 0.690
[epoch 9] train_loss: 0.831 val_accuracy: 0.670
[epoch 10] train_loss: 0.805 val_accuracy: 0.695
Finished Training
# 改为BCEloss后,acc也提高了介个点
[epoch 1] train_loss: 0.449 val_accuracy: 0.522
[epoch 2] train_loss: 0.400 val_accuracy: 0.538
[epoch 3] train_loss: 0.357 val_accuracy: 0.629
[epoch 4] train_loss: 0.341 val_accuracy: 0.618
[epoch 5] train_loss: 0.319 val_accuracy: 0.613
[epoch 6] train_loss: 0.307 val_accuracy: 0.670
[epoch 7] train_loss: 0.321 val_accuracy: 0.648
[epoch 8] train_loss: 0.285 val_accuracy: 0.681
[epoch 9] train_loss: 0.286 val_accuracy: 0.692
[epoch 10] train_loss: 0.266 val_accuracy: 0.703
Finished Training
加入模型诊断:均值、方差、直方图、激活元为0的比例统计
- 这个网络训练的是有问题的,①acc波动很大;②loss看起来在下降,其实是个偶然;③从mean,std和直方图三大工具可以看出,这个网络的训练是畸形的。大部分神经元都死了,什么原因呢?
import torch.nn as nn
import torch
class AlexNet(nn.Module):
def __init__(self, num_classes=1000, init_weights=False, leaky_relu=False, negative_slope=0.1):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2), # input[3, 224, 224] output[48, 55, 55]
nn.LeakyReLU(negative_slope=0.01) if leaky_relu else nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), # output[48, 27, 27]
nn.Conv2d(48, 128, kernel_size=5, padding=2), # output[128, 27, 27]
nn.LeakyReLU(negative_slope=0.01) if leaky_relu else nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), # output[128, 13, 13]
nn.Conv2d(128, 192, kernel_size=3, padding=1), # output[192, 13, 13]
nn.LeakyReLU(negative_slope=0.01) if leaky_relu else nn.ReLU(inplace=True),
nn.Conv2d(192, 192, kernel_size=3, padding=1), # output[192, 13, 13]
nn.LeakyReLU(negative_slope=0.01) if leaky_relu else nn.ReLU(inplace=True),
nn.Conv2d(192, 128, kernel_size=3, padding=1), # output[128, 13, 13]
nn.LeakyReLU(negative_slope=0.01) if leaky_relu else nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), # output[128, 6, 6]
)
self.classifier = nn.Sequential(
nn.Dropout(p=0.5),
nn.Linear(128 * 6 * 6, 2048),
nn.LeakyReLU(negative_slope=0.01) if leaky_relu else nn.ReLU(inplace=True),
nn.Dropout(p=0.5),
nn.Linear(2048, 2048),
nn.LeakyReLU(negative_slope=0.01) if leaky_relu else nn.ReLU(inplace=True),
nn.Linear(2048, num_classes),
)
if init_weights:
self._initialize_weights(leaky_relu)
# modules = nn.Sequential()
# for i,m in enumerate(self.features):
# if isinstance(m, nn.Conv2d):
# modules.append(m)
self.hooks = Hooks(self.features, append_stats)
# # 做模型诊断,看均值和方差
# self.act_means = [[] for l in self.features if isinstance(l, nn.Conv2d)]
# self.act_stds = [[] for l in self.features if isinstance(l, nn.Conv2d)]
# def append_stats(i, mod, inp, outp):
# self.act_means[i].append(outp.data.mean())
# self.act_stds[i].append(outp.data.std())
# for i,m in enumerate(self.features):
# if isinstance(m, nn.Conv2d):
# from functools import partial
# m.register_forward_hook(partial(append_stats, i))
def forward(self, x):
x = self.features(x)
x = torch.flatten(x, start_dim=1)
x = self.classifier(x)
return x
def _initialize_weights(self, leaky_relu):
for m in self.modules():
if isinstance(m, nn.Conv2d):
if leaky_relu:
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='leaky_relu') # 从relu改为leaky_relu,因为前者大部分都神经元都死了
else:
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
# nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='leaky_relu') # 在nn.init中制订了nonlinearity非线性函数后,可以自动指定增益的gain的,也就是a
if leaky_relu:
nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='leaky_relu') # 从relu改为leaky_relu,因为前者大部分都神经元都死了
else:
nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='relu')
# nn.init.normal_(m.weight, 0, 0.01) # up主
nn.init.constant_(m.bias, 0)
from functools import partial
def listify(o):
if o is None: return []
if isinstance(o, list): return o
if isinstance(o, str): return [o]
return [o]
class ListContainer():
def __init__(self, items): self.items = listify(items)
def __getitem__(self, idx):
if isinstance(idx, (int,slice)): return self.items[idx]
if isinstance(idx[0],bool):
assert len(idx)==len(self) # bool mask
return [o for m,o in zip(idx,self.items) if m]
return [self.items[i] for i in idx]
def __len__(self): return len(self.items)
def __iter__(self): return iter(self.items)
def __setitem__(self, i, o): self.items[i] = o
def __delitem__(self, i): del(self.items[i])
def __repr__(self):
res = f'{self.__class__.__name__} ({len(self)} items)\n{self.items[:10]}'
if len(self)>10: res = res[:-1]+ '...]'
return res
class Hook():
def __init__(self, m, f):
self.hook = m.register_forward_hook(partial(f, self)) # 给m注册一个append_stats函数,第一个参数为hook,也就是自己
def remove(self): self.hook.remove()
def __del__(self): self.remove()
def append_stats(hook, mod, inp, outp):
if not hasattr(hook,'stats'): hook.stats = ([],[],[])
means,stds,hists = hook.stats
means.append(outp.data.detach().cpu().mean())
stds .append(outp.data.detach().cpu().std())
hists.append(outp.data.cpu().histc(40,0,5))
class Hooks(ListContainer):
def __init__(self, ms, f):
super().__init__([Hook(m, f) for m in ms])
def __enter__(self, *args): return self
def __exit__ (self, *args): self.remove()
def __del__(self): self.remove()
def __delitem__(self, i):
self[i].remove()
super().__delitem__(i)
def remove(self):
for h in self: h.remove()