文章目录
- ImportError
- ModuleNotFoundError
- TypeError
- RuntimeError
- 1.CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 2.00 GiB total capacity; 1
- 2.RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
- 3.RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2 'target' in call to _thnn_nll_
- 4.RuntimeError: Sizes of tensors must match except in dimension 2. Got 28 and 27 (The offending index is 0)
- 5.RuntimeError: "log_cuda" not implemented for 'Long'
- 编译问题
- OnnxError
- BrokenPipeError
- AssertionError
- AttributeError
- ValueError
- Other
- pytorch计算模型参数量报错: size mismatch for stage2.0.branch2.3.weight: copying a param with shape torch.Size([58, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([58, 1, 5, 5]).
ImportError
1.ImportError: DLL load failed while importing win32api: 找不到指定的程序。
卸载掉pywin32 pip uninstall pywin32
pip install pywin32==227 -i https://pypi.tuna.tsinghua.edu.cn/simple/
ModuleNotFoundError
1.No module named ‘pycocotools‘
升级为2.0.1地址
cocoapi兼容window10:代码地址
下载好了之后通过cmd进入到pythonAPI下
运行一下代码
# install pycocotools locally
python setup.py build_ext --inplace
# install pycocotools to the Python site-packages
python setup.py build_ext install
安装完成成功import pycocotools
2.ModuleNotFoundError: No module named ‘copy_test’
ModuleNotFoundError: No module named ‘copy_test’
解决办法:将copy_test改为copy
TypeError
1.can‘t convert cuda:0 device type tensor to numpy
原因分析:
CUDA tensor格式的数据改成numpy时,需要先将其转换成cpu float-tensor随后再转到numpy格式。
因为numpy不能读取CUDA tensor ,需要将它转化为 CPU tensor
(loss / (iteration + 1)).numpy()
改为
(loss / (iteration + 1)).cpu().detach().numpy()
2.Object of type ‘ndarray’ is not JSON serializable
json不认numpy的array,全部改为str类型
3.TypeError: src data type = 17 is not supported
在cvtColor前面加上np.array。
img_np = np.array(img,np.uint8)
frame = cv2.cvtColor(img_np, cv2.COLOR_BGR2RGB)
RuntimeError
1.CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 2.00 GiB total capacity; 1
解决链接:---->
2.RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
解决办法:调小batch-size试试
3.RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2 ‘target’ in call to thnn_nll
解决方式:发现是关于label的错误,将loss = loss_function(out, label) 变为loss = loss_function(out, label.long())
4.RuntimeError: Sizes of tensors must match except in dimension 2. Got 28 and 27 (The offending index is 0)
原因:尺寸不匹配问题
解决:定位到DW卷积处,发现卷积核为5,而padding为1,不满足(k-1)/2=p,所以这里的padding应该为2,问题解决。
5.RuntimeError: “log_cuda” not implemented for ‘Long’
原因:Long类型的数据不支持log对数运算, 为什么Tensor是Long类型? 因为创建numpy 数组时没有指定dtype, 默认使用的是int64, 所以从numpy array转成torch.Tensor后, 数据类型变成了Long
loss = torch.where(true == 1, (1 - pred) ** gamma * (torch.log(pred)), pred ** gamma * (torch.log(1 - pred)))
改为
loss = torch.where(true == 1, (1 - pred) ** gamma * (torch.log(torch.from_numpy(pred))), pred ** gamma * (torch.log(torch.from_numpy(1 - pred))))
编译问题
1.windows编译detectron2
出现问题error: command ‘C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvcc.exe’ failed with exit status 1
解决:下载https://github.com/conansherry/detectron2替换了/detectron2/layers/csrc/nms_rotated/目录下的3个文件,并成功了。
2.windows编译PSROIAlign报错
-
error: command ‘C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\nvcc.exe’ failed with exit status 1
-
原因:出现这个错误的原因是,数据类型不对,无法编译通过。
-
解决方式:根据红色路径找到这个cu文件,在进行修改()
// line 275
dim3 grid(std::min(THCCeilDiv((long)output_size, 512L), 4096L));
// line 320
dim3 grid(std::min(THCCeilDiv((long)grad.numel(), 512L), 4096L));
实际情况如下
3.RuntimeError: view size is not compatible with input tensor‘s size and stride
这是因为view()需要Tensor中的元素地址是连续的,但可能出现Tensor不连续的情况,所以先用 .contiguous() 将其在内存中变成连续分布:改为
OnnxError
Unexpected input data type. Actual: (N11onnxruntime17PrimitiveDataTypeIdEE)
def process_image(img):
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
h, w = img.shape[:2]
scale_x = float(input_shape[1]) / w
scale_y = float(input_shape[0]) / h
img = cv2.resize(img, None, None, fx=scale_x, fy=scale_y, interpolation=cv2.INTER_CUBIC)
return np.expand_dims(img, axis=0).astype(np.float32)
def predict(image, shape):
# image1=np.asarray(image)
# print(image1)
input_data = np.random.random((1, 416, 416, 3)).astype('float32')
# print(input_data)
# print('1',type(input_data))
# print('2',type(image1.shape))
Yolov4_onnx(input_data=image)
将源代码中的np.expand_dims(img, axis=0)改为np.expand_dims(img, axis=0).astype(np.float32)
BrokenPipeError
1.Broken pipe
- 错误原因 多线程问题
- 将num_workers的值改为0,代表不适用多线程
AssertionError
1.Results do not correspond to current coco set
AttributeError
1.AttributeError: ‘tuple‘ object has no attribute ‘cuda‘
原因:target是tuple类型,但.conda()需要是tensor类型
解决:tuple——np.array——tensor(中间需要np.array中转;且np.array的元素需要是int或float(原本是str),使用.astype(int)转化即可。
具体代码:
target=np.array(target).astype(int)
target=torch.from_numpy(target)
target = target.cuda()
ValueError
ValueError: matrix contains invalid numeric entries
解决办法:在_lasp.py函数中,加入一下判断语句,过滤掉这个错误。
if np.any(np.isneginf(cost_matrix) | np.isnan(cost_matrix)):
raise ValueError("matrix contains invalid numeric entries")
Other
QObject::moveToThread: Current thread (0x562cd93b0cd0) is not the object’s thread (0x562cd9528330). Cannot move to target thread (0x562cd93b0cd0)
pip uninstall opencv-python
pip install opencv-python==4.1.1.26 解决
size mismatch for stage4.3.branch2.3.weight: copying a param with shape torch.Size([96, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 1, 5, 5]).
原因:由于stage模块里面的参数是有差别的,所以需要把预训练模型的stage模块进行去除
解决:
if pretrain:
url = model_urls['shufflenetv2_{}'.format(self.model_size)]
if url is not None:
pretrained_state_dict = model_zoo.load_url(url)
print('=> loading pretrained model {}'.format(url))
model_dict = self.state_dict()
# 重新制作预训练的权重,主要是减去参数不匹配的层,楼主这边层名为“fc”
pretrained_dict = {k: v for k, v in pretrained_state_dict.items() if (k in model_dict and 'stage' not in k)}
# 更新权重
model_dict.update(pretrained_dict)
self.load_state_dict(model_dict, strict=False)
CUDA_ERROR_OUT_OF_MEMORY: out of memory
config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
Pycharm无法打开问题
pytorch计算模型参数量报错: size mismatch for stage2.0.branch2.3.weight: copying a param with shape torch.Size([58, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([58, 1, 5, 5]).
在加载模型时加上上面这句话,因为是多GPU训练出来的模型。
model = shufflenet_v2_x1_0(num_classes=args.num_classes).to(device)
if device !='cpu':
model = torch.nn.DataParallel(model)
cudnn.benchmark = True
model = model.cuda()
if args.weights != "":
if os.path.exists(args.weights):
weights_dict = torch.load(args.weights, map_location=device)
model_dict = model.state_dict()
#load_weights_dict = {k: v for k, v in weights_dict.items() if model.state_dict()[k].numel() == v.numel()}
load_weights_dict = {k: v for k, v in weights_dict.items() if (k in model_dict and 'fc' not in k)}
model_dict.update(load_weights_dict)
print(model.load_state_dict(model_dict, strict=False))
else:
raise FileNotFoundError("not found weights file: {}".format(args.weights))