复现代码teaching where to look
1.下载数据集后,需要将bin文件中的图片解压出来
运行` ## require install mxnet (pip install mxnet-cpu:安装不了)
python utility/load_images_from_bin.py --data_type evaluation --data_dir $FACE_DIR
error1:
AttributeError: module 'numpy' has no attribute 'bool'
原因:mxnet与numpy版本不匹配
pip3 install mxnet-mkl==1.6.0 numpy==1.23.1
error2:PermissionError: [Errno 13] Permission denied: '/data'
原因:a.文件权限问题(没问题)
b.文件路径问题(修改后正确,70行)
#parser.add_argument('--data_dir', type=str, default='/data/sung/dataset/Face')
parser.add_argument('--data_dir', type=str, default='/home/xxx/KD-LR/teaching-where-to-look-main/face_dataset')
2.训练教师网络:
python train_teacher.py --save_dir $CHECKPOINT_DIR --down_size $DOWN_SIZE --total_iters $TOTAL_ITERS \
--batch_size $BATCH_SIZE --gpus $GPU_ID --data_dir $FACE_DIR
error1:RuntimeError: CUDA error: no kernel image is available for execution on the device, NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
原因:出现这个问题的原因应该是 cuda或pytorch 与显卡算力不区配
cuda版本11.3,降pytorch版本
“
使用比较新的显卡(比如NVIDIA GeForce RTX 3090)时,由于显卡的架构比较新,可能旧版本的pytorch库没有支持到。这时候就会出现capability sm_86 is not compatible的问题,同时根据输出可以看到 The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75当前pytorch只能支持上面几种架构。
问题解决
最常见的解决方式是升级Pytorch版本,新的版本增加了对新显卡架构的支持。但是有时候升级到1.10.0问题仍然没有解决,其实1.7.1版本的pytorch就已经支持3090,问题没有解决的原因大概率是CUDA版本的问题。3090显卡一般使用CUDA11+,而直接pip安装的pytorch可能是cuda10.2版本的,所以只依靠升级pytorch版本是不行的,还需要安装对应cuda版本的pytorch
————————————————
版权声明:本文为CSDN博主「a563562675」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/a563562675/article/details/121656894
pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
解决!
warnin:1:UserWarning: Detected call of
lr_scheduler.step()before
optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order:
optimizer.step()before
lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate warnings.warn("Detected call of
lr_scheduler.step()before
optimizer.step(). "
用户警告:检测到在“optimizer.step()”之前调用“lr_scheduler.step()”。在 PyTorch 1.1.0 及更高版本中,您应该以相反的顺序调用它们:在“lr_scheduler.step()”之前调用它们。 如果不这样做,将导致 PyTorch 跳过学习率计划的第一个值。查看更多详细信息,请访问 https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn(“检测到在’optimizer.step()‘之前调用’lr_scheduler.step()’。
如上述改变调用顺序即可,train_teacher.py76行:注释的位置是原先的位置
https://blog.csdn.net/qq_43437435/article/details/128371541?spm=1001.2014.3001.5502
for epoch in range(1, 10000):
#scheduler.step()
# Train model
HR_Net.train()
optimizer.zero_grad()
for data in tqdm(trainloader):
img, label = data[0].to(device), data[1].to(device)
# Forward HR network
HR_logits = HR_Net(img)
HR_out = HR_Margin(HR_logits, label)
# Loss Calcullation
loss_ce = criterion_ce(HR_out, label)
# Backward
#optimizer.zero_grad()
loss_ce.backward()
optimizer.zero_grad()
optimizer.step()
scheduler.step()
# Iters
total_iters += 1
if total_iters % 100 == 0:
_, predict = torch.max(HR_out.data, 1)
total = label.size(0)
correct = (np.array(predict.cpu()) == np.array(label.data.cpu())).sum()
print("Iters: {:0>6d}/[{:0>2d}], loss: {:.4f}, train_accuracy: {:.4f}, learning rate: {}".format(total_iters, epoch, loss_ce.item(), 100*correct/total, scheduler.get_lr()[0]))
warning2:UserWarning: To get the last learning rate computed by the scheduler, please use
get_last_lr().
用户警告:要获取调度程序计算的最后一个学习率,请使用“get_last_lr()”。
第107行,把get_lr()改成get_last_lr()即可
无其他问题
3.关于学生网络中的hook
第91行
# Add Hook
target_layer = 'attention_target'
LR_manager = attention_manager(LR_Net, multi_gpus, target_layer)#utility.hook.py,获取输出的每一层注意力图
HR_manager = attention_manager(HR_Net, multi_gpus, target_layer)
utility.hook.py
import torch.nn as nn
class attention_manager(object):
def __init__(self, model, multi_gpu, target_layer='attention_target'):
self.multi_gpu = multi_gpu
self.target_layer = target_layer
self.attention = []
self.handler = []
self.model = model
if multi_gpu:
self.register_hook(self.module.model)
else:
self.register_hook(self.model)
def register_hook(self, model):
def get_attention_features(_, inputs, outputs):
self.attention.append(outputs)# 获取输出的每一层注意力图
for name, layer in model._modules.items():
#print('model._modeles.items',name)#可以遍历输出每一层,每一层都有名字name和layer,这个name是CBAMRESnet(主干网络)中定义前向网络时的命名(https://blog.csdn.net/hxxjxw/article/details/107734140)
# but recursively register hook on all it's module children 但是递归地在所有模块子项上注册钩子
if isinstance(layer, nn.Sequential):
self.register_hook(layer)#将输出层加入attention[]中去
else:
if name == self.target_layer:
handle = layer.register_forward_hook(get_attention_features)#torch.nn.Module.register_forward_hook(module, in, out)用来导出指定子模块(可以是层、模块等nn.Module类型)的输入输出张量,但只可修改输出,常用来导出或修改卷积特征图。
self.handler.append(handle)
else:
for name, layer2 in layer._modules.items():
if name == self.target_layer:
handle = layer2.register_forward_hook(get_attention_features)
self.handler.append(handle)
def remove_hook(self):
for handler in self.handler:
handler.remove()