运行的例子: UR5-Pick-and-Place-Simulation
要debug的指令:rosrun vision lego-vision.py -show。
家人们,就是说这篇文章里的错误都搞完,这条指令就能运行了~~
(TAMP) xjfeng@xjfeng:~/yolov5$ rosrun vision lego-vision.py -show
Loading model best.pt
YOLOv5 🚀 v7.0-305-g4456c953 Python-3.8.19 torch-2.3.0+cu121 CUDA:0 (NVIDIA GeForce GTX 1650, 3896MiB)
Fusing layers...
python3: relocation error: /home/xjfeng/anaconda3/envs/TAMP/lib/python3.8/site-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn_cnn_infer.so.8: symbol _ZN5cudnn24cublasLtMatmulDescCreateEPP26cublasLtMatmulDescOpaque_t19cublasComputeType_t14cudaDataType_t version libcudnn_ops_infer.so.8 not defined in file libcudnn_ops_infer.so.8 with link time reference
一种说法是cuDNN版本不匹配:
(TAMP) xjfeng@xjfeng:~$ python
Python 3.8.19 (default, Mar 20 2024, 19:58:24)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.version.cuda)
12.1
>>> exit()
(TAMP) xjfeng@xjfeng:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
(TAMP) xjfeng@xjfeng:~$ nvidia-smi
Sun Apr 28 17:53:23 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| N/A 48C P8 2W / 50W | 629MiB / 4096MiB | 31% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
把pytorch降到1.8.0版本,如下:
(TAMP) xjfeng@xjfeng:~$ pip install torch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0
报新的错:
(TAMP) xjfeng@xjfeng:~/yolov5$ rosrun vision lego-vision.py -show
Loading model best.pt
YOLOv5 🚀 v7.0-305-g4456c953 Python-3.8.19 torch-1.8.0 CUDA:0 (NVIDIA GeForce GTX 1650, 3896MiB)
Fusing layers...
Traceback (most recent call last):
File "/home/xjfeng/yolov5/hubconf.py", line 50, in _create
model = DetectMultiBackend(path, device=device, fuse=autoshape) # detection model
File "/home/xjfeng/yolov5/models/common.py", line 467, in __init__
model = attempt_load(weights if isinstance(weights, list) else w, device=device, inplace=True, fuse=fuse)
File "/home/xjfeng/yolov5/models/experimental.py", line 107, in attempt_load
model.append(ckpt.fuse().eval() if fuse and hasattr(ckpt, "fuse") else ckpt.eval()) # model in eval mode
File "/home/xjfeng/yolov5/models/yolo.py", line 192, in fuse
m.conv = fuse_conv_and_bn(m.conv, m.bn) # update conv
File "/home/xjfeng/yolov5/utils/torch_utils.py", line 286, in fuse_conv_and_bn
fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.shape))
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasCreate(handle)`
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/xjfeng/yolov5/hubconf.py", line 65, in _create
model = attempt_load(path, device=device, fuse=False) # arbitrary model
File "/home/xjfeng/yolov5/models/experimental.py", line 99, in attempt_load
ckpt = (ckpt.get("ema") or ckpt["model"]).to(device).float() # FP32 model
File "/home/xjfeng/anaconda3/envs/TAMP/lib/python3.8/site-packages/torch/nn/modules/module.py", line 673, in to
return self._apply(convert)
File "/home/xjfeng/yolov5/models/yolo.py", line 206, in _apply
self = super()._apply(fn)
File "/home/xjfeng/anaconda3/envs/TAMP/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/home/xjfeng/anaconda3/envs/TAMP/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/home/xjfeng/anaconda3/envs/TAMP/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/xjfeng/anaconda3/envs/TAMP/lib/python3.8/site-packages/torch/nn/modules/module.py", line 409, in _apply
param_applied = fn(param)
File "/home/xjfeng/anaconda3/envs/TAMP/lib/python3.8/site-packages/torch/nn/modules/module.py", line 671, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: invalid device function
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/xjfeng/catkin_ws/src/vision/scripts/lego-vision.py", line 475, in <module>
load_models()
File "/home/xjfeng/catkin_ws/src/vision/scripts/lego-vision.py", line 465, in load_models
model = torch.hub.load(path_yolo,'custom',path=weight, source='local')
File "/home/xjfeng/anaconda3/envs/TAMP/lib/python3.8/site-packages/torch/hub.py", line 339, in load
model = _load_local(repo_or_dir, model, *args, **kwargs)
File "/home/xjfeng/anaconda3/envs/TAMP/lib/python3.8/site-packages/torch/hub.py", line 368, in _load_local
model = entry(*args, **kwargs)
File "/home/xjfeng/yolov5/hubconf.py", line 88, in custom
return _create(path, autoshape=autoshape, verbose=_verbose, device=device)
File "/home/xjfeng/yolov5/hubconf.py", line 83, in _create
raise Exception(s) from e
Exception: CUDA error: invalid device function. Cache may be out of date, try `force_reload=True` or see https://docs.ultralytics.com/yolov5/tutorials/pytorch_hub_model_loading for help.
网上说11.1版本的torch1.8.0不会报上面的错,所以安装torch==1.8.0+cu111:
(TAMP) xjfeng@xjfeng:~/yolov5$ pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.8.0+cu111
Downloading https://download.pytorch.org/whl/cu111/torch-1.8.0%2Bcu111-cp38-cp38-linux_x86_64.whl (1982.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 GB 1.3 MB/s eta 0:00:00
Collecting torchvision==0.9.0+cu111
Downloading https://download.pytorch.org/whl/cu111/torchvision-0.9.0%2Bcu111-cp38-cp38-linux_x86_64.whl (17.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.6/17.6 MB 251.7 kB/s eta 0:00:00
Requirement already satisfied: torchaudio==0.8.0 in /home/xjfeng/anaconda3/envs/TAMP/lib/python3.8/site-packages (0.8.0)
Requirement already satisfied: typing-extensions in /home/xjfeng/anaconda3/envs/TAMP/lib/python3.8/site-packages (from torch==1.8.0+cu111) (4.11.0)
Requirement already satisfied: numpy in /home/xjfeng/anaconda3/envs/TAMP/lib/python3.8/site-packages (from torch==1.8.0+cu111) (1.24.4)
Requirement already satisfied: pillow>=4.1.1 in /home/xjfeng/anaconda3/envs/TAMP/lib/python3.8/site-packages (from torchvision==0.9.0+cu111) (10.3.0)
Installing collected packages: torch, torchvision
Attempting uninstall: torch
Found existing installation: torch 1.8.0
Uninstalling torch-1.8.0:
Successfully uninstalled torch-1.8.0
Attempting uninstall: torchvision
Found existing installation: torchvision 0.9.0
Uninstalling torchvision-0.9.0:
Successfully uninstalled torchvision-0.9.0
Successfully installed torch-1.8.0+cu111 torchvision-0.9.0+cu111
安装成功,然后运行rosrun vision lego-vision.py -show,又报错:
(TAMP) xjfeng@xjfeng:~/yolov5$ rosrun vision lego-vision.py -show
Loading model best.pt
YOLOv5 🚀 v7.0-305-g4456c953 Python-3.8.19 torch-1.8.0+cu111 CUDA:0 (NVIDIA GeForce GTX 1650, 3896MiB)
Fusing layers...
Model summary: 213 layers, 7039792 parameters, 0 gradients, 15.8 GFLOPs
Adding AutoShape...
Loading model orientation.pt
YOLOv5 🚀 v7.0-305-g4456c953 Python-3.8.19 torch-1.8.0+cu111 CUDA:0 (NVIDIA GeForce GTX 1650, 3896MiB)
Fusing layers...
Model summary: 213 layers, 7018216 parameters, 0 gradients, 15.8 GFLOPs
Adding AutoShape...
Starting Node Vision 1.0
Subscribing to camera images
Localization is starting..
Traceback (most recent call last):
File "/home/xjfeng/catkin_ws/src/vision/scripts/lego-vision.py", line 477, in <module>
start_node()
File "/home/xjfeng/catkin_ws/src/vision/scripts/lego-vision.py", line 452, in start_node
syncro = message_filters.TimeSynchronizer([rgb, depth], 1, reset=True)
TypeError: __init__() got an unexpected keyword argument 'reset'
// 翻译:类型错误:得到意外的关键字参数'reset'
尝试把多余的参数去掉:
// 打开File "/home/xjfeng/catkin_ws/src/vision/scripts/lego-vision.py", 找到line 452
//原始代码是这行:
syncro = message_filters.TimeSynchronizer([rgb, depth], 1, reset=True)
//删除reset=True
syncro = message_filters.TimeSynchronizer([rgb, depth], 1)
再次在终端输入 rosrun vision lego-vision.py -show,就没问题啦!!
(TAMP) xjfeng@xjfeng:~/yolov5$ rosrun vision lego-vision.py -show
Loading model best.pt
YOLOv5 🚀 v7.0-305-g4456c953 Python-3.8.19 torch-1.8.0+cu111 CUDA:0 (NVIDIA GeForce GTX 1650, 3896MiB)
Fusing layers...
Model summary: 213 layers, 7039792 parameters, 0 gradients, 15.8 GFLOPs
Adding AutoShape...
Loading model orientation.pt
YOLOv5 🚀 v7.0-305-g4456c953 Python-3.8.19 torch-1.8.0+cu111 CUDA:0 (NVIDIA GeForce GTX 1650, 3896MiB)
Fusing layers...
Model summary: 213 layers, 7018216 parameters, 0 gradients, 15.8 GFLOPs
Adding AutoShape...
Starting Node Vision 1.0
Subscribing to camera images
Localization is starting..
(Waiting for images..)