pytorch中各种问题解决方案

淡淡的皮卡丘

已于 2022-03-16 15:03:29 修改

阅读量2.1k

点赞数 6

文章标签： pytorch python

于 2021-01-17 10:32:14 首次发布

本文链接：https://blog.csdn.net/sinat_41840241/article/details/112730404

版权

1.tensorboard只显示部分的step

【解决方案】 tensorboard为了加速，不做人事。只需要在tensorboard 可视化命令时参加参数–samples_per_plugin 即可
示例：

tensorboard --logdir ./train2-weight[61336] --bind_all --samples_per_plugin=images=1000000000000000

解释一下：这里面的–samples_per_plugin=images=10000000就是显示1000000张图片出来，所以这个值尽可能大一点就好。

2.torchvision里面的transforms.resize()的弊端。

直接上源码：

class Resize(torch.nn.Module):
    """Resize the input image to the given size.
    If the image is torch Tensor, it is expected
    to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions

    Args:
        size (sequence or int): Desired output size. If size is a sequence like
            (h, w), output size will be matched to this. If size is an int,
            smaller edge of the image will be matched to this number.
            i.e, if height > width, then image will be rescaled to
            (size * height / width, size).
            In torchscript mode size as single int is not supported, use a sequence of length 1: ``[size, ]``.
        interpolation (InterpolationMode): Desired interpolation enum defined by
            :class:`torchvision.transforms.InterpolationMode`. Default is ``InterpolationMode.BILINEAR``.
            If input is Tensor, only ``InterpolationMode.NEAREST``, ``InterpolationMode.BILINEAR`` and
            ``InterpolationMode.BICUBIC`` are supported.
            For backward compatibility integer values (e.g. ``PIL.Image.NEAREST``) are still acceptable.

从源码中可以看出，resize是有一个Interpolation的部分，意思就是如果你的图像的尺寸被扩大了，这个时候他就会使用双线性插值法。对于大部分pic2pic的low-level问题来说，这种方式对结果的影响并不大，但是对于分割这种结果非0即1的任务，还是有影响的。

3.dataparallel多卡训练的结果在单卡跑出来结果指标变低

问题描述：在训练的时候采用了一下多卡训练代码：

##定义网络以及多卡设置
device = torch.device('cuda')
device_ids = [0,1,2,3,4,]
os.environ['CUDA_VISIBLE_DEVICE']='0,1,2,3,4'
net = Model()
net= torch.dataparallel(net,device_ids = device_ids)
net.to(device)
##训练
for data in dataloader:
	out = net(data['input'].to(device))
	loss = CE(out,data['label'].to(device))
	optmize.zero_grad()
	loss.backward()
	optimize.step()
##保存参数
ckpt = torch.save(net.satet_dict())

后续测试的时候使用单卡逻辑加载：

##加载模型
device = torch.device('cuda')
os.environ['CUDA_VISIBLE_DEVICE']='0'
net = Model()
net.load_state_dict(torch.load('ckpt'),map_location = device)
net.to(device)
##推理
net.eval()
for data in test_laoder:
	out = net(data['input'].to(device))

实际上上面这种单卡加载模型的方式加载多卡保存的模型是不对的，正确的是dataparallel训练的模型无论后续单卡还是多卡推理，都要用dataparallel包装网络。

##定义网络以及多卡设置
device = torch.device('cuda')
device_ids = [0]
os.environ['CUDA_VISIBLE_DEVICE']='0'
net = Model()
net= torch.dataparallel(net,device_ids = device_ids)
net.load_state_dict(torch.load('ckpt'),map_location = device)
net.to(device)
##推理
net.eval()
for data in test_laoder:
	out = net(data['input'].to(device))