pytorch学习_York1996的博客-CSDN博客

pytorch学习

关注

关注数：文章数：74 文章阅读量：1214371 文章收藏量：1887

作者: York1996

懒人一枚，热爱提高效率，简化流程，自动办公；

展开

Attempting to deserialize object on a CUDA device but torch.cuda.is_available()的可能原因

原因可能是只有两个卡，但是制定的gpu id是4.

原创 2023-01-10 17:34:08 · 680 阅读 · 0 评论
RuntimeError: Cannot insert a Tensor that requires grad as a constant. 可能原因

排查方法：看报错也不清楚是什么原因。中午去吃饭的路上想到，逐个测试模型的各个子模块不就可以了。所有的子模块的输入和输出一般也都是一个张量，所以比较好实现。当测试的模块足够小的时候，问题的原因就好想出来了。因为他和不出错的模块（没有list的）是有明显区别的。出现问题的情形：paddle paddle代码改为pytorch，pytorch模型转为onn。pytorch模型可以正常跑，但是onnx生成不成功。列表保存模型的时候，把python的list或者[]，替换为nn.modellist.

原创 2022-08-20 14:01:05 · 2730 阅读 · 1 评论
保存的state_dict有带着module的可能原因

可能的原因：多卡训练解决方法：有更好的解决方法可以在下面留言哦，比如保存的时候就没有这些狼衔的。解决问题的思路：和单卡的时候保存的state_dict对比，寻找规律，最后用字典推导式来解决；...

原创 2022-06-27 16:46:58 · 517 阅读 · 0 评论
记录自己pytorch加载数据遇到的坑们

第0个坑：自己的类别少一个，但是训练的前33个epoch没有出现问题，到33个之后就中断了。cuda device assert error之类的错误，这种问题看得多了，就知道肯定是类别和输出或者损失函数的weight数量不一致。但是为什么前33个epoch没有问题呢，是因为数据有随机crop，触发到这个错误的可能性很低。另外也体现了我的类别不均衡。第一个坑：我拿到手的代码，yuv数据，转成了float32，然后用这个样的数据去转成rgb做数据增强。但是转成float32数据的范围还是uint8的范围，这样

原创 2022-06-23 09:19:36 · 259 阅读 · 0 评论
pytorch模型使用nn.UpsamplingBilinear2d转到TDA4有精度损失的原因

pytorch中的UpsamplingBilinear2d的初始化函数def __init__(self, size: Optional[_size_2_t] = None, scale_factor: Optional[_ratio_2_t] = None) -> None: super(UpsamplingBilinear2d, self).__init__(size, scale_factor, mode='bilinear', align_corn.

原创 2022-05-30 20:11:01 · 1807 阅读 · 0 评论
pytorch dataset读取问题；ValueError: At least one stride in the given numpy array is negative, 的可能解决问题；

ValueError: Caught ValueError in DataLoader worker process 0.ValueError: At least one stride in the given numpy array is negative, and tensors with negative strides are not currently supported. (You can probably work around this by making a copy of you

原创 2022-01-17 11:05:12 · 3261 阅读 · 0 评论
pytorch量化之后转caffe的几个注意事项

AttributeError: 'NoneType' object has no attribute 'in_quant_part'ValueError: Quantized operation(YOLOX::YOLOX/YOLOXHead[head]/2326) must be instance of "torch.nn.Module", please replace torch.cat with <class 'pytorch_nndct.nn.modules.functional.Cat

原创 2021-12-29 11:08:21 · 666 阅读 · 0 评论
AttributeError: ‘Identity‘ object has no attribute ‘running_mean‘的可能原因

torch.nn.Identity()是用来占位的，会直接返回输入。我这里遇到这个错误，是因为xilinx的量化工具，本来和conv同层次的bn，量化之后，跑到了conv的下一层级。比如层名字是A，之前是A.conv，A.bn。现在是A.conv.bn...

原创 2021-12-25 14:20:44 · 1086 阅读 · 0 评论
pytorch转caffe遇到的问题、经验总结；

转换的时候，需要写一个caffe的proto，还好我之前是根据caffe模型写的pytorch模型，所以变量的命名基本一致，所以转换的时候做对应就简单一些。设计网络的时候，最好用流程图画出网络模型，然后在开始写代码。提取pth中模型参数的时候，不需要原始的网络，pth使用torch读取之后，就已经是一个dict结构了。这个参数赋值给caffe的时候，需要caffe和torch权重的对应关系，然后一一赋值即可。做好之后，将结果可视化出来，看看直观上看上去是否完全一致。最后如果有测试集的话，测试

原创 2021-12-13 15:54:55 · 1029 阅读 · 0 评论
onnx错误

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: vuy_img for the following indicesindex: 0 Got: 1 Expected: 8Please fix either the inputs or the model.错误原

原创 2021-11-17 16:49:47 · 8911 阅读 · 4 评论
pytorch转onnx报错的可能原因traced region did not have observable data dependence

RuntimeError: output 1 (1[ CPULongType{} ]) of traced region did not have observable data dependence with trace inputs; this probably indicates your program cannot be understood by the tracer.原因maybe是 pytorch中forward应该return一个tensor，而不是list，dict..

原创 2021-11-11 14:23:06 · 4073 阅读 · 0 评论
Default process group has not been initialized, please make sure to call init_process_group可能解决方法

syncBN改成BN

原创 2021-11-05 14:45:13 · 3785 阅读 · 3 评论
pytorch半精度half计算loss得到nan的可能解决方法

计算loss的时候，临时把half类型的转成float类型。half是指float16类型的，float代表float32。我也不知道为什么half类型的输出送到损失函数中不能得到正确的损失，如果您知道为什么，可以在下方留言。...

原创 2021-10-31 12:36:14 · 2004 阅读 · 6 评论
pointnet2_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: 的可能原因

编译完的pointNet模块找不到，提示错误Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/public/home/G19940018/3DGroup/Yaochun/PointRCNN/pointnet2_lib/pointnet2/fps.py", line 7, in <module> import pointnet2_cuda as p...

原创 2021-02-25 09:19:28 · 2451 阅读 · 1 评论
cannot assign module before Module.__init__() call的可能原因

def __init__(self,cats): super(VGG6, self).__init__()缺少上面的super......

原创 2020-05-08 22:18:11 · 992 阅读 · 0 评论
使用交叉熵损失函数binary_cross_entropy的时候得到负数的loss的可能原因

binary_cross_entropy需要input和target都在0-1之间，请检查下输入吧。可以安装上面注释的那一行看看输入值的范围符不符合要求！说明：有些损失函数对输入有要求体现在，输入值的范围不对会提示，cuda device triggered之类的错误，但是现在没有提示错误，而是loss<0,并且越来越小，训练了200个周期，最后这个损失值变成了负几万的数...

原创 2020-04-03 23:54:45 · 7777 阅读 · 0 评论
ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found的可能解决办法

解决方案之一，一开始运行好好的，后来安装了新版本的pytorch，就不能用了从1.4降级到1.2还是不行，最后降级到1.10就可以用了。

原创 2020-03-25 14:13:26 · 2841 阅读 · 0 评论
conda配置了镜像源还是从官网下载的原因（pytorch）

用conda安装软件的标准语法格式为：$ conda install -c <channel> <software>而pytorch官网中conda给的命令行是这样的，有-c选项，就说明已经制定了下载地址，所以自己配置的镜像源不管用。所以应该把-c pytorch去掉，就可以从镜像源下载文件了。但是貌似有时候下载到的不是最新的，但是可以正常用。be...

原创 2020-02-15 15:11:10 · 3099 阅读 · 1 评论
'dict' object has no attribute 'cuda'的解决方法

obj=obj.cuda()其中obj是一个字典，但是字典类型不支持直接cuda（）操作。可以用下面的方法将obj的每个value都变成cuda（）类型。obj={key:obj[key].cuda() for key in obj}...

原创 2019-11-20 15:56:40 · 14382 阅读 · 5 评论
理解torch.nonzero的作用和用法

看代码：import torcha=torch.randint(-1,2,(10,),dtype=torch.int)print(a)print(a.size())print(torch.nonzero(a))print(torch.nonzero(a).size())输出结果为：tensor([ 0, -1, 1, 1, -1, 0, 1, -1, -1, -1...

原创 2019-11-07 15:04:02 · 13198 阅读 · 1 评论
PyTorch中permute的用法

permute(dims)将tensor的维度换位。参数：参数是一系列的整数，代表原来张量的维度。比如三维就有0，1，2这些dimension。例：import torchimport numpy as npa=np.array([[[1,2,3],[4,5,6]]])unpermuted=torch.tensor(a)print(unpermuted.siz...

原创 2018-08-20 20:07:08 · 152038 阅读 · 11 评论
ascii codec cant decode byte 0x9e in position 1:ordinal in range(128)的可能原因

错误如下可能原因是torch.load()加载的是python2.*生成的文件，当前的代码是用的python3.*。至于为什么版本不同会导致这个问题，我还不是很清楚。

原创 2018-12-03 17:04:21 · 710 阅读 · 0 评论
pointnet中stn和mlp的理解错误的方式。

一开始以为文章中的代码是这样的意思：self.inputTransform=nn.Sequential( nn.Linear(point_num*3,64), nn.BatchNorm1d(64), nn.ReLU(inplace=True), nn.Linear(64, 128), nn.BatchNorm1d(128), nn.ReLU(...

原创 2018-11-18 14:33:59 · 2156 阅读 · 0 评论
使用BN时ValueError: expected 2D or 3D input (got 4D input)的可能原因

可能原因在于应该使用BatchNorm2d而你使用了BatchNorm1d如果是BatchNorm1d的话，input的形状应该是：Input: :math:`(N, C)` or :math:`(N, C, L)`如果是BatchNorm2d的话，input的形状应该是：Input: :math:`(N, C, H, W)`除此之外，还有BatchNorm3...

原创 2018-11-18 15:27:10 · 20145 阅读 · 2 评论
UserWarning:Implicit dimension choice for softmax has been deprecated. 消除警告的办法

UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. input = module(input)这个警告的原因是softmax（）函数已经被弃用了，虽然程序还是可以运行成功，但是这个做法不被pytor...

原创 2018-11-17 18:19:29 · 23808 阅读 · 7 评论
one of the variables needed for gradient computation has been modified by an inplace operation的可能原因

很明显，字面意思是有一些变量需要计算梯度，但是已经被就地操作符修改了。关于inplace操作：在PyTorch中in-place operation的含义建议可以尝试把其中一些inplace操作或者inplace=true这样的操作符删除试试。、另一个可能性：nn.ReLU(inplace=True),nn.Dropout(p=0.7,inplace=True),Relu下...

原创 2018-11-17 17:31:00 · 1572 阅读 · 0 评论
pytorch， multi-target not supported at 的一种可能原因

在使用交叉熵损失函数的时候，target的形状应该是和label的形状一致或者是只有batchsize这一个维度的。如果target是这样的【batchszie，1】就会出现上述的错误。改一下试试，用squeeze（）函数降低纬度，如果不知道squeeze怎么用的，可以参考我的其他博客。pytorch下的unsqueeze和squeeze用法 - york1996的博客 - CSDN博客 ...

原创 2018-11-17 16:09:16 · 15621 阅读 · 12 评论
torch.cuda.LongTensor but found type torch.cuda.FloatTensor for argument #2 'target'的一种可能原因

可能是在使用交叉熵损失函数的时候，target需要是整数，才能转化成索引值，进而进行one-hot编码。输出一下target的张量，可以看到每个值都后面有一个点.比如5.这样，应该表示的就是浮点类型的值。这个时候需要target=target.long()执行一下类型转换。...

原创 2018-11-17 16:02:43 · 5104 阅读 · 0 评论
给电视剧标注人脸的简单步骤：

https://www.bilibili.com/video/av39236953/写了一个可以给电视剧中的人脸标注的程序，流程是：首先使用opencv的人脸检测功能，每隔10帧就把视频中可能是人脸的部分抠出来，保存到本地。保存起来的人脸是没有标签的，而且还可能会有不是人脸的被判别成为人脸，所以先手工给人脸分类，标注用自己写的标注工具。分类之后，写一个网络学习这些有标签的数据，然后再读...

原创 2018-12-12 22:53:19 · 1236 阅读 · 0 评论
安装pytorch1.0下载速度非常慢怎么办？（一种可能的解决方法）

首先，把https://download.pytorch.org/whl/cu100/torch-1.0.0-cp36-cp36m-win_amd64.whl这样的字符串复制到浏览器，看看下载速度，如果下载速度还是很慢的话，复制到手机浏览器，用数据流量下载，下载之后再发到电脑上。如果还不行，让别人把下载好的发给你...

原创 2018-12-14 20:06:08 · 9745 阅读 · 11 评论
Win10下使用pip安装Pytorch1.0

bili视频链接：视频连接PyTorch可以安装在各种Windows发行版上并使用。根据您的系统和计算需求，您在Windows上使用PyTorch的经验可能在处理时间方面有所不同。为了充分利用PyTorch的CUDA支持，建议您的Windows系统使用NVIDIA GPU，但不是必需的。...

原创 2018-12-21 14:02:52 · 2362 阅读 · 0 评论
TypeError: slice indices must be integers or None or have an __index__ method的可能原因

切片时候索引值要用整数，如果不是整数，请用int把浮点小数转换成整数使用。

原创 2018-12-30 18:37:41 · 1443 阅读 · 0 评论
torch实现clip by tensor操作

tf的clip_by_value不仅可以clip by number还能clip by tensor,到了torch暂时没有找到，只有clamp，其中max和min必须是number。自己实现了一下。def clip_by_tensor(t,t_min,t_max): """ clip_by_tensor :param t: tensor :param t...

原创 2019-04-21 16:35:43 · 14030 阅读 · 1 评论
Pytorch简单并行训练的代码

if torch.cuda.device_count() > 1: # 并行 pretrained_model = nn.DataParallel(pretrained_model)第一句是判断可用的gpu个数

原创 2019-04-14 09:35:15 · 829 阅读 · 0 评论
cannot assign module before Module.__init__() call的可能原因

缺少父类的初始化.super(VGG16, self).__init__()加了之后:class VGG16(nn.Module): def __init__(self): super(VGG16, self).__init__() vgg = torchvision.models.vgg16(pretrained=True)......

原创 2019-06-15 21:59:43 · 2620 阅读 · 1 评论
cuda runtime error (59) : device-side assert triggered at 的可能解决方法

实际类别数和网络输出的类别数不一致。nn.Linear(32,40)categories

原创 2019-09-25 20:17:40 · 3793 阅读 · 0 评论
pytroch用自定义的tensor初始化nn.sequential中linear或者conv层的一种简单方法。

话不多说，上代码，上面写的很清楚。import torch.nn as nnimport torchnet= nn.Sequential( nn.Linear(1024, 512), nn.ReLU(inplace=True), nn.Linear(512, 256), nn.ReLU(inplace=True), nn.Linear(256, 6...

原创 2018-11-19 11:04:52 · 3444 阅读 · 0 评论
The expanded size of the tensor (256) must match the existing size (81) at non-singleton dimension1

#RuntimeError: The expanded size of the tensor (256) must match the existing size (81) at non-singleton dimension 1 在写以下代码的时候遇到的self.inputFC[4].bias.data=torch.eye(3).view(-1)另一种原因：impor...

原创 2018-11-19 10:38:34 · 33909 阅读 · 2 评论
pytorch：DCGAN生成动漫头像

动漫头像数据集下载地址：动漫头像数据集_百度云连接，DCGAN论文下载地址：https://arxiv.org/abs/1511.06434数据集里面的图片是这个样子的：这是DCGAN的主要改进地方：下面是所有代码：第一个模块：import torchimport torch.nn as nnimport numpy as npimport torch.nn....

原创 2018-09-20 10:15:40 · 7114 阅读 · 14 评论
input value should be between 0~1的可能原因

RuntimeError: Assertion `x >= 0. && x <= 1.' failed. input value should be between 0~1, but got -0.234535 at c:\new-builder_3\win-wheel\pytorch\aten\src\thnn\generic/BCECriterion.c:62 ...

原创 2018-09-14 19:21:56 · 6480 阅读 · 0 评论

pytorch学习

作者: York1996

Attempting to deserialize object on a CUDA device but torch.cuda.is_available()的可能原因

RuntimeError: Cannot insert a Tensor that requires grad as a constant. 可能原因

保存的state_dict有带着module的可能原因

记录自己pytorch加载数据遇到的坑们

pytorch模型使用nn.UpsamplingBilinear2d转到TDA4有精度损失的原因

pytorch dataset读取问题；ValueError: At least one stride in the given numpy array is negative, 的可能解决问题；

pytorch量化之后转caffe的几个注意事项

AttributeError: ‘Identity‘ object has no attribute ‘running_mean‘的可能原因

pytorch转caffe遇到的问题、经验总结；

onnx错误

pytorch转onnx报错的可能原因traced region did not have observable data dependence

Default process group has not been initialized, please make sure to call init_process_group可能解决方法

pytorch半精度half计算loss得到nan的可能解决方法

pointnet2_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: 的可能原因

cannot assign module before Module.__init__() call的可能原因

使用交叉熵损失函数binary_cross_entropy的时候得到负数的loss的可能原因

ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found的可能解决办法

conda配置了镜像源还是从官网下载的原因（pytorch）

'dict' object has no attribute 'cuda'的解决方法

理解torch.nonzero的作用和用法

PyTorch中permute的用法

ascii codec cant decode byte 0x9e in position 1:ordinal in range(128)的可能原因

pointnet中stn和mlp的理解错误的方式。

使用BN时ValueError: expected 2D or 3D input (got 4D input)的可能原因

UserWarning:Implicit dimension choice for softmax has been deprecated. 消除警告的办法

one of the variables needed for gradient computation has been modified by an inplace operation的可能原因

pytorch， multi-target not supported at 的一种可能原因

torch.cuda.LongTensor but found type torch.cuda.FloatTensor for argument #2 'target'的一种可能原因

给电视剧标注人脸的简单步骤：

安装pytorch1.0下载速度非常慢怎么办？（一种可能的解决方法）

Win10下使用pip安装Pytorch1.0

TypeError: slice indices must be integers or None or have an __index__ method的可能原因

torch实现clip by tensor操作

Pytorch简单并行训练的代码

cannot assign module before Module.__init__() call的可能原因

cuda runtime error (59) : device-side assert triggered at 的可能解决方法

pytroch用自定义的tensor初始化nn.sequential中linear或者conv层的一种简单方法。

The expanded size of the tensor (256) must match the existing size (81) at non-singleton dimension1

pytorch：DCGAN生成动漫头像

input value should be between 0~1的可能原因

cannot assign module before Module.init() call的可能原因

TypeError: slice indices must be integers or None or have an index method的可能原因

cannot assign module before Module.init() call的可能原因