CUDA AND NVIDIA-DRIVER INSTALL


cuda-toolkit离线包:https://developer.nvidia.com/cuda-toolkit-archive
查看cuda和驱动的适配:https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

根据本地显卡型号选择合适版本驱动:https://www.nvidia.in/drivers/beta


查看内核版本
uname -a


查看可下载安装的软件:
apt list nvidia-driver*

nvidia-driver-515-server/focal-security,focal-updates,focal-updates,focal-security,focal-security,focal-updates 515.86.01-0ubuntu0.20.04.2 amd64
nvidia-driver-515/focal-security,focal-updates,focal-updates,focal-security,focal-security,focal-updates,focal 515.86.01-0ubuntu0.20.04.1 amd64
nvidia-driver-520-open/focal-security,focal-updates,focal-updates,focal-security,focal-security,focal-updates 525.60.11-0ubuntu0.20.04.2 amd64
nvidia-driver-520/focal-security,focal-updates,focal-updates,focal-security,focal-security,focal-updates 525.60.11-0ubuntu0.20.04.2 amd64
nvidia-driver-525-open/focal-security,focal-updates,focal-updates,focal-security,focal-security,focal-updates 525.60.11-0ubuntu0.20.04.2 amd64
nvidia-driver-525/focal-security,focal-updates,focal-updates,focal-security,focal-security,focal-updates 525.60.11-0ubuntu0.20.04.2 amd64


root@xufeng8REN9000:/usr/local# sudo apt-get install nvidia-driver-520
正在读取软件包列表… 完成
正在分析软件包的依赖关系树
正在读取状态信息… 完成
将会同时安装下列软件:
libatomic1:i386 libbsd0:i386 libdrm-amdgpu1:i386 libdrm-intel1:i386 libdrm-nouveau2:i386 libdrm-radeon1:i386 libdrm2:i386 libedit2:i386
libelf1:i386 libexpat1:i386 libffi7:i386 libgl1:i386 libgl1-mesa-dri:i386 libglapi-mesa:i386 libglvnd0:i386 libglx-mesa0:i386 libglx0:i386
libllvm12:i386 libnvidia-cfg1-525 libnvidia-common-525 libnvidia-compute-525 libnvidia-compute-525:i386 libnvidia-decode-525
libnvidia-decode-525:i386 libnvidia-encode-525 libnvidia-encode-525:i386 libnvidia-extra-525 libnvidia-fbc1-525 libnvidia-fbc1-525:i386
libnvidia-gl-525 libnvidia-gl-525:i386 libpciaccess0:i386 libsensors5:i386 libstdc++6:i386 libvdpau1 libvulkan1:i386 libwayland-client0:i386
libx11-6:i386 libx11-xcb1:i386 libxau6:i386 libxcb-dri2-0:i386 libxcb-dri3-0:i386 libxcb-glx0:i386 libxcb-present0:i386 libxcb-randr0:i386
libxcb-shm0:i386 libxcb-sync1:i386 libxcb-xfixes0:i386 libxcb1:i386 libxdmcp6:i386 libxext6:i386 libxfixes3:i386 libxnvctrl0 libxshmfence1:i386
libxxf86vm1:i386 mesa-vdpau-drivers mesa-vulkan-drivers:i386 nvidia-compute-utils-525 nvidia-dkms-525 nvidia-driver-525 nvidia-kernel-common-525
nvidia-kernel-source-525 nvidia-prime nvidia-settings nvidia-utils-525 screen-resolution-extra vdpau-driver-all xserver-xorg-video-nvidia-525
建议安装:
lm-sensors:i386 libvdpau-va-gl1 nvidia-vdpau-driver nvidia-legacy-340xx-vdpau-driver nvidia-legacy-304xx-vdpau-driver
下列【新】软件包将被安装:
libatomic1:i386 libbsd0:i386 libdrm-amdgpu1:i386 libdrm-intel1:i386 libdrm-nouveau2:i386 libdrm-radeon1:i386 libdrm2:i386 libedit2:i386
libelf1:i386 libexpat1:i386 libffi7:i386 libgl1:i386 libgl1-mesa-dri:i386 libglapi-mesa:i386 libglvnd0:i386 libglx-mesa0:i386 libglx0:i386
libllvm12:i386 libnvidia-cfg1-525 libnvidia-common-525 libnvidia-compute-525 libnvidia-compute-525:i386 libnvidia-decode-525
libnvidia-decode-525:i386 libnvidia-encode-525 libnvidia-encode-525:i386 libnvidia-extra-525 libnvidia-fbc1-525 libnvidia-fbc1-525:i386
libnvidia-gl-525 libnvidia-gl-525:i386 libpciaccess0:i386 libsensors5:i386 libstdc++6:i386 libvdpau1 libvulkan1:i386 libwayland-client0:i386
libx11-6:i386 libx11-xcb1:i386 libxau6:i386 libxcb-dri2-0:i386 libxcb-dri3-0:i386 libxcb-glx0:i386 libxcb-present0:i386 libxcb-randr0:i386
libxcb-shm0:i386 libxcb-sync1:i386 libxcb-xfixes0:i386 libxcb1:i386 libxdmcp6:i386 libxext6:i386 libxfixes3:i386 libxnvctrl0 libxshmfence1:i386
libxxf86vm1:i386 mesa-vdpau-drivers mesa-vulkan-drivers:i386 nvidia-compute-utils-525 nvidia-dkms-525 nvidia-driver-520 nvidia-driver-525
nvidia-kernel-common-525 nvidia-kernel-source-525 nvidia-prime nvidia-settings nvidia-utils-525 screen-resolution-extra vdpau-driver-all
xserver-xorg-video-nvidia-525


卸载软件,同时清除软件包和软件的配置文件
sudo apt-get --purge remove nvidia*
卸载所有自动安装且不再使用的软件包
sudo apt autoremove


通过cuda_tookit离线软件安装cuda(用来给torch调用的),选择不安装cuda驱动即可
在docker容器里安装
sudo sh cuda_11.3.0_465.19.01_linux.run


选择安装tookit带的驱动,会如下如下提示:

The package that is already installed is named nvidia-465-465.

You can upgrade the driver by running:
apt-get install nvidia-465-465

You can remove nvidia-465-465, and all related packages, by running:
apt-get remove --purge nvidia-465
apt-get autoremove

This package is maintained by NVIDIA (cudatools@nvidia.com).


可以查看目前的cuda driver:
Driver installation detected by command: apt list --installed | grep -e nvidia-driver-[0-9][0-9][0-9] --9][0-9][0-9]


如果不安装tookit带的驱动,只安装cuda,出现如下提示:

Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.3/
Samples: Not Selected

Please make sure that

  • PATH includes /usr/local/cuda-11.3/bin
  • LD_LIBRARY_PATH includes /usr/local/cuda-11.3/lib64, or, add /usr/local/cuda-11.3/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.3/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 465.00 is required for CUDA 11.3 functionality to work.
To install the driver using this installer, run the following command, replacing with the name of this run file:
sudo .run --silent --driver


要在~/.barshrc 配置LD_LIBRARY_PATH的值


git ssh 生成:
git config --global user.name “xxx”
git config --global user.email “xx@xx.com”
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub
git@gitlab.xpaas.lenovo.com:lr-ailab/machine-vision/mvlite/mvlite_svs.git


cybereason,duyz2


RuntimeError: CUDA error: no kernel image is available for execution on the device

2022-12-14 18:02:51,279 - /dmount/mvlite1.1/tools/mvlite/utils/logger.py[line:40] - INFO: Traceback (most recent call last):
File “/dmount/mvlite1.1/svs/apps/svs_inference.py”, line 215, in predict
predict_result = self.predict_reg(post_dict)
File “/dmount/mvlite1.1/svs/apps/svs_inference.py”, line 841, in predict_reg
inf_res = self.net.predict(val[‘path’])
File “/dmount/mvlite1.1/algApi/registration/SuperGlue/SuperGlue/apps/Inference.py”, line 170, in predict
Matrix, score = self.match(source_original)
File “/dmount/mvlite1.1/algApi/registration/SuperGlue/SuperGlue/apps/Inference.py”, line 136, in match
pred = self.matching({‘image0’: source_tensor,‘image1’:golden_sample_tensor})
File “/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 727, in _call_impl
result = self.forward(*input, **kwargs)
File “algApi/registration/SuperGlue/SuperGlue/lib/matching.py”, line 21, in forward
pred0 = self.superpoint({‘image’: data[‘image0’]})
File “/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 727, in _call_impl
result = self.forward(*input, **kwargs)
File “algApi/registration/SuperGlue/SuperGlue/lib/superpoint.py”, line 117, in forward
x = self.relu(self.conv1a(data[‘image’]))
File “/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 727, in _call_impl
result = self.forward(*input, **kwargs)
File “/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py”, line 423, in forward
return self._conv_forward(input, self.weight)
File “/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py”, line 420, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: no kernel image is available for execution on the device

原因:torch版本和当前cuda不匹配


nvidia-docker run -itd -v /home/xufeng8/dmount/:/dmount/ -v /home/mvlite/mvlite_data:/home/mvlite/mvlite_data -p 3666:5000 --shm-size 8g --name mvlite.re.0.6 mvlite.im.0.6:pytorch1.5-cuda10.1-cudnn7-devel sh -c “cd /dmount/mvlite1.1/ && python svs_train.py”


查看torch兼容的cuda版本
torch.version.cuda


查看目前cuda的版本
nvcc -V


安装后,commit到容器里
docker commit -a “xufeng8” -m “update cuda11.3 torch-1.10.0+cu113-cp37-cp37m-linux_x86_64.whl” mvlite.re.0.6 mvlite.im.0.6:pytorch1.5-cuda10.1-cudnn7-devel

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值