疑难杂症:torchvision0.3+CUDA10.0+PyTorch1.2+ubuntu18.03 出现ImportError: libcudart.so.9.0: cannot open shared object file: No such file or directory
运行
from torchvision import transforms
出现问题:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-1519560278f3> in <module>
4 import matplotlib.pyplot as plt
5 import shutil
----> 6 from torchvision import transforms
7 from torchvision import models
8 import torch
~/venv/pytorch/lib/python3.6/site-packages/torchvision/__init__.py in <module>
----> 1 from torchvision import models
2 from torchvision import datasets
3 from torchvision import ops
4 from torchvision import transforms
5 from torchvision import utils
~/venv/pytorch/lib/python3.6/site-packages/torchvision/models/__init__.py in <module>
9 from .shufflenetv2 import *
10 from . import segmentation
---> 11 from . import detection
~/venv/pytorch/lib/python3.6/site-packages/torchvision/models/detection/__init__.py in <module>
----> 1 from .faster_rcnn import *
2 from .mask_rcnn import *
3 from .keypoint_rcnn import *
~/venv/pytorch/lib/python3.6/site-packages/torchvision/models/detection/faster_rcnn.py in <module>
5 import torch.nn.functional as F
6
----> 7 from torchvision.ops import misc as misc_nn_ops
8 from torchvision.ops import MultiScaleRoIAlign
9
~/venv/pytorch/lib/python3.6/site-packages/torchvision/ops/__init__.py in <module>
----> 1 from .boxes import nms, box_iou
2 from .roi_align import roi_align, RoIAlign
3 from .roi_pool import roi_pool, RoIPool
4 from .poolers import MultiScaleRoIAlign
5 from .feature_pyramid_network import FeaturePyramidNetwork
~/venv/pytorch/lib/python3.6/site-packages/torchvision/ops/boxes.py in <module>
1 import torch
----> 2 from torchvision import _C
3
4
5 def nms(boxes, scores, iou_threshold):
ImportError: libcudart.so.9.0: cannot open shared object file: No such file or directory
情况:
import torch
是没问题
import tensorflow as tf
是没问题的(无意中在tensorflow的虚拟环境中也发现这个问题,噢我tensorflow中也安装了Pytorch)
两个调用GPU进行运算都是没问题的
查到的直接相关的资料:
[1] libcudart.so.9.0: cannot open shared object file: No such file or directory—也有人遇到了一样的问题,但还无人解答
[2] libcudart.so.9.0: cannot open shared object file: No such file or directory—情况基本一致,采取的方式是将torchvision将到0.2.2,即可解决;但我不想退版本,感觉不是直接相关的原因
问题原因:
torchvision0.3支持CUDA9,不支持10,更新至torchvison0.4即可;
直接更新会连带更新PyTorch,使用如下更新即可;
解决方案:
pip install torchvision==0.4.0 -f https://download.pytorch.org/whl/torch_stable.html
解决思路:
-
所安装的PyTorch1.2 会不会和cuda10.0不匹配
不是的,因为现在出问题的是,torchvison; 而且有见到更低PyTorch版本都没问题
-
会不会是torchvision本身的BUG?
尝试更新torchvison
执行
pip install -U torchvision
(pytorch) nicken@lll:~$ pip install -U torchvision Collecting torchvision Downloading https://files.pythonhosted.org/packages/fc/23/d418c9102d4054d19d57ccf0aca18b7c1c1f34cc0a136760b493f78ddb06/torchvision-0.4.1-cp36-cp36m-manylinux1_x86_64.whl (10.1MB) |████████████████████████████████| 10.1MB 270kB/s Requirement already satisfied, skipping upgrade: six in ./venv/pytorch/lib/python3.6/site-packages (from torchvision) (1.12.0) Collecting torch==1.3.0 Downloading https://files.pythonhosted.org/packages/ae/05/50a05de5337f7a924bb8bd70c6936230642233e424d6a9747ef1cfbde353/torch-1.3.0-cp36-cp36m-manylinux1_x86_64.whl (773.1MB) |█████ | 121.6MB 8.2kB/s eta 22:11:25ERROR: Exception:
结果,连带PyTorch也要更新,就先行中断了;
尝试卸载0.3, 再安装指定0.3;
pip uninstall torchvison pip install torchvison
问题依旧未解决
-
会不会是CUDA环境没配置好
检查CUDA环境配置,通过参考[1],检查显示CUDA没问题
cd /usr/local/cuda/samples/1_Utilities/deviceQuery #由自己电脑目录决定 sudo make sudo ./deviceQuery
查看各版本
cat /usr/local/cuda/version.txt cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
在检查cudnn的时候,发现并未配置cudnn (汗
配置cudnn, 略
然,配置完成后,原问题并未解决;
-
受参考[2]启发
执行
sudo cp /usr/local/cuda/lib64/libcudart.so.10.0 /usr/local/lib/libcudart.so.10.0 && sudo ldconfig
sudo cp /usr/local/cuda/lib64/libcublas.so.10.0 /usr/local/lib/libcublas.so.10.0 && sudo ldconfig
sudo cp /usr/local/cuda/lib64/libcurand.so.10.0 /usr/local/lib/libcurand.so.10.0 && sudo ldconfig
然,问题未为解决
-
受参考[3]启发
执行
sudo vim ~/.bashrc
加入
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} export CUDA_HOME=/usr/local/cuda
激活
source ~/.bashrc
检查 /usr/local/cuda-10.0/lib64 下是否有 libcublas.so.10.0,执行
sudo ldconfig /usr/local/cuda-10.0/lib64
然,原问题并未解决
- 受参考[6]启发,安装cudatoolkit-10.0
经发现,pip无法安装cudatoolkit,只能用conda安装,暂放弃
- 考虑重装PyTorch
去官网,下载,历史版1.2,发现conda 都安装cudatoolkit
而使用PyTorch是这么安装的:
pip install torch==1.2.0 torchvision==0.4.0 -f https://download.pytorch.org/whl/torch_stable.html
最终尝试单独安装
pip torchvision==0.4.0 -f https://download.pytorch.org/whl/torch_stable.html
问题解决
参考:
[2] libcudart.so.8.0: cannot open shared object file: No such file or directory
[3] ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
[4] https://blog.csdn.net/weixin_33910460/article/details/91722002
[5] https://blog.csdn.net/zqun817/article/details/88750321
[6] 【安装pytorch1.0 + cuda10.1】问题:ImportError:/usr/lib/libcudart.so.10.0:version ‘libcudart.so.10.0’ not…
[7] https://pytorch.org/get-started/previous-versions/
cudnn:
[2] https://developer.nvidia.com/rdp/cudnn-download