【无标题】

最新推荐文章于 2024-04-30 18:48:42 发布

浅浅浅焰

最新推荐文章于 2024-04-30 18:48:42 发布

阅读量257

点赞数

文章标签： python 深度学习 pytorch

本文链接：https://blog.csdn.net/qq_43437435/article/details/128835175

版权

复现代码teaching where to look

1.下载数据集后，需要将bin文件中的图片解压出来
运行` ## require install mxnet (pip install mxnet-cpu：安装不了)

python utility/load_images_from_bin.py --data_type evaluation --data_dir $FACE_DIR

error1：

AttributeError: module 'numpy' has no attribute 'bool'

原因：mxnet与numpy版本不匹配

pip3 install mxnet-mkl==1.6.0 numpy==1.23.1

error2：PermissionError: [Errno 13] Permission denied: '/data'
原因：a.文件权限问题（没问题）
b.文件路径问题（修改后正确，70行）

#parser.add_argument('--data_dir', type=str, default='/data/sung/dataset/Face')
    parser.add_argument('--data_dir', type=str, default='/home/xxx/KD-LR/teaching-where-to-look-main/face_dataset')

2.训练教师网络：

python train_teacher.py --save_dir $CHECKPOINT_DIR --down_size $DOWN_SIZE --total_iters $TOTAL_ITERS \
                        --batch_size $BATCH_SIZE --gpus $GPU_ID --data_dir $FACE_DIR

error1：RuntimeError: CUDA error: no kernel image is available for execution on the device， NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
原因：出现这个问题的原因应该是 cuda或pytorch 与显卡算力不区配
cuda版本11.3，降pytorch版本
“
使用比较新的显卡（比如NVIDIA GeForce RTX 3090）时，由于显卡的架构比较新，可能旧版本的pytorch库没有支持到。这时候就会出现capability sm_86 is not compatible的问题，同时根据输出可以看到 The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75当前pytorch只能支持上面几种架构。

问题解决
最常见的解决方式是升级Pytorch版本，新的版本增加了对新显卡架构的支持。但是有时候升级到1.10.0问题仍然没有解决，其实1.7.1版本的pytorch就已经支持3090，问题没有解决的原因大概率是CUDA版本的问题。3090显卡一般使用CUDA11+，而直接pip安装的pytorch可能是cuda10.2版本的，所以只依靠升级pytorch版本是不行的，还需要安装对应cuda版本的pytorch
————————————————
版权声明：本文为CSDN博主「a563562675」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/a563562675/article/details/121656894

pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

解决！

warnin:1：UserWarning: Detected call of lr_scheduler.step()beforeoptimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step()beforelr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate warnings.warn("Detected call of lr_scheduler.step()beforeoptimizer.step(). "
用户警告：检测到在“optimizer.step（）”之前调用“lr_scheduler.step（）”。在 PyTorch 1.1.0 及更高版本中，您应该以相反的顺序调用它们：在“lr_scheduler.step（）”之前调用它们。如果不这样做，将导致 PyTorch 跳过学习率计划的第一个值。查看更多详细信息，请访问 https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn（“检测到在’optimizer.step（）‘之前调用’lr_scheduler.step（）’。

如上述改变调用顺序即可，train_teacher.py76行：注释的位置是原先的位置
https://blog.csdn.net/qq_43437435/article/details/128371541?spm=1001.2014.3001.5502

    for epoch in range(1, 10000):

        #scheduler.step()

        # Train model
        HR_Net.train()
        optimizer.zero_grad()

        for data in tqdm(trainloader):
            img, label = data[0].to(device), data[1].to(device)

            # Forward HR network
            HR_logits = HR_Net(img)
            HR_out = HR_Margin(HR_logits, label)

            # Loss Calcullation
            loss_ce = criterion_ce(HR_out, label)

            # Backward
            #optimizer.zero_grad()
            loss_ce.backward()
            optimizer.zero_grad()
            optimizer.step()

            scheduler.step()
            # Iters
            total_iters += 1
            if total_iters % 100 == 0:
                _, predict = torch.max(HR_out.data, 1)
                total = label.size(0)
                correct = (np.array(predict.cpu()) == np.array(label.data.cpu())).sum()
                print("Iters: {:0>6d}/[{:0>2d}], loss: {:.4f}, train_accuracy: {:.4f}, learning rate: {}".format(total_iters, epoch, loss_ce.item(), 100*correct/total, scheduler.get_lr()[0]))

warning2：UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr().
用户警告：要获取调度程序计算的最后一个学习率，请使用“get_last_lr（）”。
第107行，把get_lr()改成get_last_lr()即可
无其他问题

3.关于学生网络中的hook
第91行

 # Add Hook
    target_layer = 'attention_target'
    LR_manager = attention_manager(LR_Net, multi_gpus, target_layer)#utility.hook.py,获取输出的每一层注意力图
    HR_manager = attention_manager(HR_Net, multi_gpus, target_layer)

utility.hook.py

import torch.nn as nn

class attention_manager(object):
    def __init__(self, model, multi_gpu, target_layer='attention_target'):        
        self.multi_gpu = multi_gpu
        self.target_layer = target_layer
        self.attention = []
        self.handler = []

        self.model = model
        
        if multi_gpu:
            self.register_hook(self.module.model)
        else:
            self.register_hook(self.model)
            

    def register_hook(self, model):
        def get_attention_features(_, inputs, outputs):
            self.attention.append(outputs)# 获取输出的每一层注意力图

        
        for name, layer in model._modules.items():
            #print('model._modeles.items',name)#可以遍历输出每一层，每一层都有名字name和layer，这个name是CBAMRESnet（主干网络）中定义前向网络时的命名（https://blog.csdn.net/hxxjxw/article/details/107734140）
            # but recursively register hook on all it's module children 但是递归地在所有模块子项上注册钩子
            if isinstance(layer, nn.Sequential):
                self.register_hook(layer)#将输出层加入attention[]中去
            else:
                if name == self.target_layer:
                    handle = layer.register_forward_hook(get_attention_features)#torch.nn.Module.register_forward_hook(module, in, out)用来导出指定子模块（可以是层、模块等nn.Module类型）的输入输出张量，但只可修改输出，常用来导出或修改卷积特征图。
                    self.handler.append(handle)
                    
                else:
                    for name, layer2 in layer._modules.items():
                        if name == self.target_layer:
                            handle = layer2.register_forward_hook(get_attention_features)
                            self.handler.append(handle)
                
    
    def remove_hook(self):
        for handler in self.handler:
            handler.remove()