目前:fastai lesson8~lesson11的部分都重构了
- mnist数据集比较简单、28×28的像素,都是一样的。背景也比较干净,同时是分类任务,用简单的网络就可以处理的。
- 由于数据集过于简单,没办法看到一些基本操作的效果,改为后面的Imagenette的数据集
0. mnist数据集调试
0.1 一层线性层Linear(),1个epoch,梯度下降
- 可以看出权重就是个0,很神奇。线性层开始训练出来的时候,权重是个0的样子。784*10=7840个参数
- 2个线性层,784*50, 50*10,将两个权重层乘积起来。多个线性层的叠加,784*50+ 50*10=39700个参数。
- 一个线性层和两个线性层,差别不大。本质上是一样的。
- 但Adam加快了收敛,最后的权重图和上面的不太一样!
- 同样的网络,训练改为1cycle之后的样子。
- 随着训练次数增加,权重会进一步变化,便宜数字的形态越来越远。
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
import warnings
warnings.filterwarnings("ignore")
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
from fastai.basics import *
path = Path('/home/gdyanfa1/zhouhairong_py/fastai_dataset')
with gzip.open(path/'mnist.pkl.gz','rb') as f:
((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')
x_train,y_train,x_valid,y_valid = map(torch.tensor, (x_train,y_train,x_valid,y_valid))
ba = 128
train_ds = TensorDataset(x_train, y_train)
valid_ds = TensorDataset(x_valid, y_valid)
data = DataBunch.create(train_ds, valid_ds)
class Mnist_Logistic(nn.Module):
def __init__(self):
super().__init__()
self.lin = nn.Linear(784,10,bias=True)
def forward(self, xb):
return self.lin(xb)
# 使用类建立一个对象,并放在Cuda上
model = Mnist_Logistic().cuda()
lr=2e-2
loss_func = nn.CrossEntropyLoss()
def update(x,y,lr):
wd = 1e-5
y_hat = model(x) # 为什么这个就会去调用forward?应该是nn.Module的设置
w2 = 0.
for p in model.parameters():
w2 += (p**2).sum()
loss = loss_func(y_hat, y) + w2*wd
# loss里面加入了w2*wd,那么在grad中就会自动加入这部分的计算了。
loss.backward()
with torch.no_grad():
for p in model.parameters():
p.sub_(lr*p.grad)
p.grad.zero_()
return loss.item()
losses = [update(x,y,lr) for x,y in data.train_dl]
plt.plot(losses);
losses = [update(x,y,lr) for x,y in data.train_dl]
t = model.lin.weight.detach().cpu()
import matplotlib.pyplot as plt
plt.imshow(t[0,:].view(28,28))
# 采用一层线性层,看看效果
learn = Learner(data, Mnist_Logistic(), loss_func=loss_func, metrics=accuracy)
learn.fit_one_cycle(1, 1e-2)
t = learn.model.lin.weight.detach().cpu()
import matplotlib.pyplot as plt
fig, axes = plt.subplots(3,3)
axes_list = []
for i in range(axes.shape[0]):
for j in range(axes.shape[1]):
axes_list.append(axes[i,j])
i = 0
for ax in axes_list:
ax.imshow(t[i,:].view(28,28))
ax.set_title(i)
i = i+1
1. pets.ipynb调试
- https://nbviewer.org/github/fastai/course-v3/blob/master/nbs/dl1/lesson1-pets.ipynb
- gitee上调试自己的
- fastai1调试本地的库,可以更加清楚的看到
训练集、验证集,80%,20%,固定的随机性。
- 图像处理的变换:数据增强有哪些?
- crop_pad随机裁剪,同时缝隙部分用reflection进行填充
- 水平镜像翻转 flip_lr
- wrap是透视变换
- 旋转
- 缩放
- 对比度拉升
- 亮度提升
0.1 resnet34预训练模型
-
最开始哪些层是可以训练的?——BN层,why?
-
最后两个Linear层和所有的BN层是可以训练的!why?
-
最后两个线性层是分类器:512*37+37=18981个参数。
-
-
冻结参数层:learn.freeze()------基本上是freeze(-1),即只训练最后的custom_head层。只训练yolo层或识别层。
-
所有的BN层不冻结,freeze()只冻结卷积层。freeze()也不冻结最后一层。
-
-
Sequential
======================================================================
Layer (type) Output Shape Param # Trainable
======================================================================
Conv2d [64, 112, 112] 9,408 False
______________________________________________________________________
BatchNorm2d [64, 112, 112] 128 True
______________________________________________________________________
ReLU [64, 112, 112] 0 False
______________________________________________________________________
MaxPool2d [64, 56, 56] 0 False
______________________________________________________________________
Conv2d [64, 56, 56] 36,864 False
______________________________________________________________________
BatchNorm2d [64, 56, 56] 128 True
______________________________________________________________________
ReLU [64, 56, 56] 0 False
______________________________________________________________________
Conv2d [64, 56, 56] 36,864 False
______________________________________________________________________
BatchNorm2d [64, 56, 56] 128 True
______________________________________________________________________
Conv2d [64, 56, 56] 36,864 False
______________________________________________________________________
BatchNorm2d [64, 56, 56] 128 True
______________________________________________________________________
ReLU [64, 56, 56] 0 False
______________________________________________________________________
Conv2d [64, 56, 56] 36,864 False
______________________________________________________________________
BatchNorm2d [64, 56, 56] 128 True
______________________________________________________________________
Conv2d [64, 56, 56] 36,864 False
______________________________________________________________________
BatchNorm2d [64, 56, 56] 128 True
______________________________________________________________________
ReLU [64, 56, 56] 0 False
______________________________________________________________________
Conv2d [64, 56, 56] 36,864 False
______________________________________________________________________
BatchNorm2d [64, 56, 56] 128 True
______________________________________________________________________
Conv2d [128, 28, 28] 73,728 False
______________________________________________________________________
BatchNorm2d [128, 28, 28] 256 True
______________________________________________________________________
ReLU [128, 28, 28] 0 False
______________________________________________________________________
Conv2d [128, 28, 28] 147,456 False
______________________________________________________________________
BatchNorm2d [128, 28, 28] 256 True
______________________________________________________________________
Conv2d [128, 28, 28] 8,192 False
______________________________________________________________________
BatchNorm2d [128, 28, 28] 256 True
______________________________________________________________________
Conv2d [128, 28, 28] 147,456 False
______________________________________________________________________
BatchNorm2d [128, 28, 28] 256 True
______________________________________________________________________
ReLU [128, 28, 28] 0 False
______________________________________________________________________
Conv2d [128, 28, 28] 147,456 False
______________________________________________________________________
BatchNorm2d [128, 28, 28] 256 True
______________________________________________________________________
Conv2d [128, 28, 28] 147,456 False
______________________________________________________________________
BatchNorm2d [128, 28, 28] 256 True
______________________________________________________________________
ReLU [128, 28, 28] 0 False
______________________________________________________________________
Conv2d [128, 28, 28] 147,456 False
______________________________________________________________________
BatchNorm2d [128, 28, 28] 256 True
______________________________________________________________________
Conv2d [128, 28, 28] 147,456 False
______________________________________________________________________
BatchNorm2d [128, 28, 28] 256 True
______________________________________________________________________
ReLU [128, 28, 28] 0 False
______________________________________________________________________
Conv2d [128, 28, 28] 147,456 False
______________________________________________________________________
BatchNorm2d [128, 28, 28] 256 True
______________________________________________________________________
Conv2d [256, 14, 14] 294,912 False
______________________________________________________________________
BatchNorm2d [256, 14, 14] 512 True
______________________________________________________________________
ReLU [256, 14, 14] 0 False
______________________________________________________________________
Conv2d [256, 14, 14] 589,824 False
______________________________________________________________________
BatchNorm2d [256, 14, 14] 512 True
______________________________________________________________________
Conv2d [256, 14, 14] 32,768 False
______________________________________________________________________
BatchNorm2d [256, 14, 14] 512 True
______________________________________________________________________
Conv2d [256, 14, 14] 589,824 False
______________________________________________________________________
BatchNorm2d [256, 14, 14] 512 True
______________________________________________________________________
ReLU [256, 14, 14] 0 False
______________________________________________________________________
Conv2d [256, 14, 14] 589,824 False
______________________________________________________________________
BatchNorm2d [256, 14, 14] 512 True
______________________________________________________________________
Conv2d [256, 14, 14] 589,824 False
______________________________________________________________________
BatchNorm2d [256, 14, 14] 512 True
______________________________________________________________________
ReLU [256, 14, 14] 0 False
______________________________________________________________________
Conv2d [256, 14, 14] 589,824 False
______________________________________________________________________
BatchNorm2d [256, 14, 14] 512 True
______________________________________________________________________
Conv2d [256, 14, 14] 589,824 False
______________________________________________________________________
BatchNorm2d [256, 14, 14] 512 True
______________________________________________________________________
ReLU [256, 14, 14] 0 False
______________________________________________________________________
Conv2d [256, 14, 14] 589,824 False
______________________________________________________________________
BatchNorm2d [256, 14, 14] 512 True
______________________________________________________________________
Conv2d [256, 14, 14] 589,824 False
______________________________________________________________________
BatchNorm2d [256, 14, 14] 512 True
______________________________________________________________________
ReLU [256, 14, 14] 0 False
______________________________________________________________________
Conv2d [256, 14, 14] 589,824 False
______________________________________________________________________
BatchNorm2d [256, 14, 14] 512 True
______________________________________________________________________
Conv2d [256, 14, 14] 589,824 False
______________________________________________________________________
BatchNorm2d [256, 14, 14] 512 True
______________________________________________________________________
ReLU [256, 14, 14] 0 False
______________________________________________________________________
Conv2d [256, 14, 14] 589,824 False
______________________________________________________________________
BatchNorm2d [256, 14, 14] 512 True
______________________________________________________________________
Conv2d [512, 7, 7] 1,179,648 False
______________________________________________________________________
BatchNorm2d [512, 7, 7] 1,024 True
______________________________________________________________________
ReLU [512, 7, 7] 0 False
______________________________________________________________________
Conv2d [512, 7, 7] 2,359,296 False
______________________________________________________________________
BatchNorm2d [512, 7, 7] 1,024 True
______________________________________________________________________
Conv2d [512, 7, 7] 131,072 False
______________________________________________________________________
BatchNorm2d [512, 7, 7] 1,024 True
______________________________________________________________________
Conv2d [512, 7, 7] 2,359,296 False
______________________________________________________________________
BatchNorm2d [512, 7, 7] 1,024 True
______________________________________________________________________
ReLU [512, 7, 7] 0 False
______________________________________________________________________
Conv2d [512, 7, 7] 2,359,296 False
______________________________________________________________________
BatchNorm2d [512, 7, 7] 1,024 True
______________________________________________________________________
Conv2d [512, 7, 7] 2,359,296 False
______________________________________________________________________
BatchNorm2d [512, 7, 7] 1,024 True
______________________________________________________________________
ReLU [512, 7, 7] 0 False
______________________________________________________________________
Conv2d [512, 7, 7] 2,359,296 False
______________________________________________________________________
BatchNorm2d [512, 7, 7] 1,024 True
______________________________________________________________________
AdaptiveAvgPool2d [512, 1, 1] 0 False
______________________________________________________________________
AdaptiveMaxPool2d [512, 1, 1] 0 False
______________________________________________________________________
Flatten [1024] 0 False
______________________________________________________________________
BatchNorm1d [1024] 2,048 True
______________________________________________________________________
Dropout [1024] 0 False
______________________________________________________________________
Linear [512] 524,800 True
______________________________________________________________________
ReLU [512] 0 False
______________________________________________________________________
BatchNorm1d [512] 1,024 True
______________________________________________________________________
Dropout [512] 0 False
______________________________________________________________________
Linear [37] 18,981 True
______________________________________________________________________
Total params: 21,831,525
Total trainable params: 563,877
Total non-trainable params: 21,267,648
Optimized with 'torch.optim.adam.Adam', betas=(0.9, 0.99)
Using true weight decay as discussed in https://www.fast.ai/2018/07/02/adam-weight-decay/
Loss function : FlattenedLoss
======================================================================
Callbacks functions applied
0.2 训练方法
- 先冻结网络,只训练最后一个custom_head,线性层,即分类器或Yolo层。
- 然后解冻网络,连着前面的卷积层一起训练。变差了
- unfreeze()网络,重新找一个学习率
- 再次训练,又降低下去了?应该是前层、后层,使用不同的学习率进行训练。前面层的学习率不能太高,后面层的学习率可以高一些
- learn.unfreeze()
learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4))
- learn.unfreeze()
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2" #此时显示4块显卡
import sys
sys.path.insert(0,'/home/gdyanfa1/zhouhairong_py/course-v3/nbs/dl1/fastai1')
from fastai.vision import *
from fastai.metrics import error_rate
from fastai import *
import warnings
warnings.filterwarnings("ignore")
bs = 64
path = Path('/home/gdyanfa1/zhouhairong_py/fastai_dataset/oxford-iiit-pet')
path_anno = path/'annotations'
path_img = path/'images'
fnames = get_image_files(path_img)
np.random.seed(2)
pat = r'/([^/]+)_\d+.jpg$'
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs
).normalize(imagenet_stats)
learn = cnn_learner(data, models.resnet34, metrics=error_rate)
learn.fit_one_cycle(4)
learn.save('stage-1')
interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()
len(data.valid_ds)==len(losses)==len(idxs)
interp.most_confused(min_val=2)
learn.unfreeze()
learn.fit_one_cycle(1)
learn.load('stage-1')
learn.lr_find()
learn.unfreeze()
learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4))
1. 使用线性模型
pytorch中的参数初始化方法总结_ys1305的博客-CSDN博客_reset_parameters
model = nn.Sequential(nn.Linear(), nn.ReLU(), nn.Linear())
- pytorch的默认初始化,在各个层的reset_parameters()方法中。
# 在这里对mnist数据集进行分类处理,实现acc的提升
from exp.nb_09c import *
""" 0.数据准备
没有用自己写的DataBunch,ItemList等接口。ImageList的get是要去open的
mnist走的还是pytorch的Dataloader的接口
"""
x_train,y_train,x_valid,y_valid = get_data() # 这个函数在nb_02.py中定义
x_train,x_valid = normalize_to(x_train,x_valid) # nb_05.py中
n,m = x_train.shape
c = y_train.max().item() + 1
bs = 512
# 使用Dataset来管理batch数据: nb_03.py
train_ds,valid_ds = Dataset(x_train, y_train),Dataset(x_valid, y_valid)
# nb_08.py get_dls在nb_03.py,使用的是Dataloader
data = DataBunch(*get_dls(train_ds, valid_ds, bs), c)
loss_func = F.cross_entropy
""" 1. 线性模型(50,10),使用pytorch的nn.Module基类,不重构了
"""
nh = 50
def init_linear_(m, f):
if isinstance(m, nn.Linear):
f(m.weight, a=0.1)
if getattr(m, 'bias', None) is not None: m.bias.data.zero_()
for l in m.children(): init_linear_(l, f)
def init_linear(m, uniform=False):
f = init.kaiming_uniform_ if uniform else init.kaiming_normal_
init_linear_(m, f)
# ① model,由于是自定义的线性模型,没有初始化
model = nn.Sequential(nn.Linear(m, nh), nn.ReLU(), nn.Linear(nh, c))
lr = 0.5
# get_runner nb_06.py 由于不是CNN网络,所以不是get_cnn_runner
# 使用get_runner而不是get_learner
# device = torch.device('cuda', 0)
# torch.cuda.set_device(device)
cbfs = [partial(AvgStatsCallback, accuracy), CudaCallback, Recorder, ProgressCallback]
phases = combine_scheds([0.3, 0.7], cos_1cycle_anneal(0.2, 0.6, 0.2))
sched = ParamScheduler('lr', phases)
# Learner在nb_09b.py 线性模型、交叉熵loss、lr、cbfs、opt 在Learner.fit中有opt的初始化函数的。
# ② 优化器 nb_09b.py 简单的sgd梯度下降,weight_decay是l2正则化
learn = Learner(model=model, data=data, loss_func=loss_func, lr=lr, cb_funcs=cbfs)
# 可以在fit的时候添加一个cbs
# sgd: p = p - lr*p.grad
# weight_decay: p = p * ( 1 - lr*wd)
def append_stats(hook, mod, inp, outp):
if not hasattr(hook,'stats'): hook.stats = ([],[],[])
means,stds,hists = hook.stats
means.append(outp.data.mean().cpu()) # 激活元的值
stds .append(outp.data.std().cpu())
hists.append(outp.data.cpu().histc(40,0,10)) #histc isn't implemented on the GPU
def get_hist(h):
return torch.stack(h.stats[2]).t().float().log1p() # h.stats[2]为直方图
with Hooks(model, append_stats) as hooks:
learn.fit(1) # pytorch_init + sgd
fig, [ax0, ax1] = plt.subplots(1,2, figsize=(10,4))
for h in hooks:
ms, ss, hi = h.stats
ax0.plot(ms), ax0.set_title("act_means", loc='center'), ax0.set_xlabel('batches')
ax0.legend(range(3))
ax1.plot(ss), ax1.set_title("act_stds", loc='center'), ax1.set_xlabel('batches')
ax1.legend(range(3))
fig,axes = plt.subplots(2,2, figsize=(15,6))
for ax,h in zip(axes.flatten(), hooks[:3]):
ax.imshow(get_hist(h), origin='lower'), ax.set_title("acts_hist", loc='center'), ax.set_xlabel('activiations')
ax.axis('off')
plt.tight_layout()
def get_min(h): # 将直方图的前两个数加起来
h1 = torch.stack(h.stats[2]).t().float()
return h1[:2].sum(0)/h1.sum(0)
fig,axes = plt.subplots(2,2, figsize=(15,6))
for ax,h in zip(axes.flatten(), hooks[:3]):
ax.plot(get_min(h)), ax.set_title("hist[:2] zero ratio", loc='center'), plt.xlabel('batches')
ax.set_ylim(0,1)
plt.tight_layout()
① Linear的模型,需要自己写一个。Learner在nb_09b.py中,opt是在fit的时候才去构建了Opt的对象。
② opt如果是sgd,就是默认的。不写就可以了。
③ 如果cuda启动不起来,电脑需要重启。
2. Imagenette数据集调试记录
pytorch报错:ValueError: num_samples should be a positive integer value, but got num_samp=0
原因:不支持路径中带有下划线。
Pytorch 调试常用
代码仓库:Dive-into-DL-PyTorch/2.2_tensor.md at master · ShusenTang/Dive-into-DL-PyTorch · GitHub
李沐的《动手学深度学习》原书中MXNet代码实现改为PyTorch实现。本项目面向对深度学习感兴趣,尤其是想使用PyTorch进行深度学习的童鞋。本项目并不要求你有任何深度学习或者机器学习的背景知识,你只需了解基础的数学和编程,如基础的线性代数、微分和概率,以及基础的Python编程。
目录如下所示:
1. Tensor的使用