cuda-8.0安装
(这两天不小心把原来的显卡驱动搞崩,挣扎了好久,只好重新走一遍)
cuda 安装条件
gcc5.3.0(版本不能太高)
sudo apt-get install build-essential
tar -zxvf gcc-5.3.0.tar.gz
cd gcc-5.3.0
./contrib/download_prerequisites //下载依赖项
cd …
mkdir /opt/gcc-build-5.3.0 //建立编译输出目录
cd gcc-build-5.3.0
(解压后的目录)/gcc-5.3.0/configure --enable-checking=release --enable-languages=c,c++ --disable-multilib //
make
make install
(一个小时或大半个小时后…)
检查安装版本:
gcc --version
cuda官网上下载相应版本后
禁止nouveau服务,将其加入后名单
vi /etc/modprobe.d/blacklist-nouveau.conf
添加
blacklist nouveau
options nouveau modeset=0
sudo update-initramfs –u
Ctrl+Alt+F1进入字符操作页面:关掉图形操作页面(X 服务)
sudo service lightdm stop
sudo sh ./cuda_8.0.61_375.26_linux-run.run
出现说明之后,按Ctrl + C键跳过
accept
可能第二项选no
其他可以根据需要自定义安装路径,或者默认回车
Reboot
检查是否安装cuda成功
cd NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery
sudo make
./deviceQuery
安装成功
配置cuda环境:
vi ~/.bashrc
在后面追加就ok
export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export PATH=$PATH:/usr/local/cuda-10.0/bin
Q1: 在字符操作页面(ctrl+alt+F1) sudo service lightdm stop 出现黑屏
通过远程终端执行cuda的.run 报错
Installing the NVIDIA display driver… The driver installation is unable to locate the kernel source. Please make sure that the kernel source packages are installed and set up correctly. If you know that the kernel source packages are installed and set up correctly, you may pass the location of the kernel source with the ‘–kernel-source-path’ flag.
A1:
sudo apt install dkms
sudo ./cuda_8.0.61_375.26_linux.run
Q2:
Ubuntu出现循环登录问题,无法登入桌面问题
A2:显卡驱动出现问题,需要重装,
清空之前的显卡驱动信息:
sudo apt-get --purgeremove nvidia*
Ctrl+Alt+F1进入字符操作页面:关掉图形操作页面(X 服务)
sudo service lightdm stop
如果出现没有lightdm service的情况也无法安装的话试试
- 关闭用户图形界面
sudo systemctl set-default multi-user.target
sudo reboot - 开启用户图形界面
sudo systemctl set-default graphical.target
sudo reboot
开始重装显卡驱动:
(1)sudo apt-get install nvidia-XXX 驱动版本
(2)最好安装官网上最新的适合的驱动
sudo sh ./NVIDIA-Linux…xxxx.run
sudo reboot
Q3:
可能出现安装完cuda之后,显卡驱动会出现问题,出现分辨率降低或者其他问题
A3:重新安装显卡驱动,但是千万不要sudo apt-get --purgeremove nvidia*
Ctrl+Alt+F1进入字符操作页面:关掉图形操作页面(X 服务)
sudo service lightdm stop
直接重新安装之前下载好的显卡驱动就好
sudo sh ./NVIDIA-Linux…xxxx.run
sudo reboot
Q4:装显卡驱动时候出现
Failed to start Load Kernel Modules 解决方法
A4:网上的解决方案
apt-get update
dpkg --configure -a
apt-get dist-upgrade
apt-get -f install
reboot
可以多次尝试,没有特别的顺序
在我这里不好使,几十次尝试终失败,
最后选择重装系统了
大战了两天多,其中遇到的问题,都没有做笔记,毕竟没有奏效,也罢。
Q5:
缓存不足
Signal caught, cleaning up
A5:
sudo mkdir /home/tmp
sudo chmod 1777 /home/tmp
export TMPDIR=/home/tmp
sh ./cuda_10.0.130_410.48_linux.run --tmpdir=/home/tmp
Q6:我记得之前有做个cudnn的笔记,怎么没了???
再记下吧:
官网上下好解压:
tar -cudnn-10.0-linux-x64-v7.4.2.24.tgz
然后将解压好的目录的文件加权限移动到相应的目录下:
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
Q7:没有配置cuda、cudnn环境
2019-09-29 03:09:18.429878: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2019-09-29 03:09:18.430039: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2019-09-29 03:09:18.430191: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2019-09-29 03:09:18.430344: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2019-09-29 03:09:18.430490: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2019-09-29 03:09:18.430636: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2019-09-29 03:09:18.430782: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
A7:把cuda环境配好,以及cudnn版本要对应。