一:cuda安装
1:下载安装包
下载地址:链接
~~~链接如果没有这个界面,先登录,再点击链接就可以啦。
2:安装cuda
/etc/init.d/lightdm stop #先关闭显示X
sudo chmod a+x cuda_10.0.130_410.48_linux.run
sudo sh cuda_10.0.130_410.48_linux.run
注意:这里选择不安装Nvidia驱动(有个地方要求Nvidia version >=410.78),而cuda包含的Nvidia版本未410.48
二:Nvidia的安装
建议安装方式选择:
第一种:对版本要求比较高,使用.run方式安装(必须要安装>=Nvidia-410.48版本的)
第二种:对小版本没有要求(只要安装nvidia-410就行)
第三种:对版本没有要求(只要安装Nvidia驱动就行)
1:使用.run包安装
1:.run的下载,下载地址:链接
操作系统选择:Linux 64-bit,下载的就是.run。选择ubuntu下载的是.deb
2:禁用nouveau
安装NVIDIA需要把系统自带的驱动禁用,打开文件:
sudo gedit /etc/modprobe.d/blacklist.conf
在文本最后添加以下内容:
blacklist nouveau
option nouveau modeset=0
命令窗口会提示warn,无视之。
保存退出,执行以下命令生效:
sudo update-initramfs -u
重启电脑后输入:
lsmod | grep nouveau
没有任何输出说明禁用成功。
参考链接:https://blog.csdn.net/zhang970187013/article/details/81012845
3:安装NVidia驱动
sudo chmod a+x NVIDIA-Linux-x86_64-410.129-diagnostic.run
sudo ./NVIDIA-Linux-x86_64-410.129-diagnostic.run -no-opengl-files #只安装驱动
按照提示默认安装就行啦,
2:ppa安装方式
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-410 #cuda-10对应Nvidia-410版本,cuda-9对应Nvidia-384
sudo apt-get install mesa-common-dev
sudo apt-get install freeglut3-desudo ubuntu-drivers autoinstallv
3:自动安装驱动
sudo ubuntu-drivers autoinstall
三:验证方法
nvidia-smi
运维同事提醒:右上角的CUDA Version: 10.0 一定要正常显示才行,这说明cuda和Nvidia已经正常关联。如果没有显示CUDA的版本号,建议卸载cuda和Nvidia重装。不然后面的计算不会使用GPU运算,还是会使用CPU。(未亲测。。。),但是我重装之后,确实显示了版本号
程序运行时可以使用Nvidia-smi查看GPU的使用情况
卸载办法如下:
sudo ./NVIDIA-Linux-x86_64-410.129-diagnostic.run --uninstall #卸载Nvidia驱动
sudo /usr/local/cuda/bin/uninstall_cuda_10.0.pl #卸载cuda
本人之前卸载使用了如下命令,导致apt-get 报错缺少nvidia-410依赖,万万谨慎使用,重装系统解决
sudo apt-get remove --purge nvidia* #慎用
如果直接安装cuda无法成功,可尝试先安装Nvidia,再安装cuda
四:cudnn安装
1:下载地址
选择 cuDNN Library for Linux
2:将lib64以及include 复制到cuda目录
sudo cp -r include/ /usr/local/cuda-10.0/
sudo cp -r lib64/ /usr/local/cuda-10.0/
sudo chmod a+r /usr/local/cuda-10.0/include/*
sudo chmod a+r /usr/local/cuda-10.0/lib64/*
3:配置环境变量
vim /etc/profile
#在文件最后添加
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$PATH:/usr/local/cuda/lib64
source /etc/profile #更新环境变量
五:安装Paddlepaddle的GPU版本
1:参考上一篇文章安装Python3以及依赖
2:安装paddle
建议参考官方安装说明,极为详细
https://www.paddlepaddle.org.cn/install/quick
确认 Python 和 pip 是 64 bit,并且处理器架构是x86_64架构,目前PaddlePaddle不支持arm64架构
下面的两个命令分别输出的是 "64bit" 和 "x86_64" 即可:
python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
pip安装百度飞浆的GPU版本
python3 -m pip install paddlepaddle-gpu==1.8.1.post107 -i https://mirror.baidu.com/pypi/simple
安装中若有报错,如版本过低
WARNING: You are using pip version 19.2.3, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
升级pip即可
sudo pip3 install --upgrade pip
3:验证安装是否成功
使用 python3 进入python解释器,输入import paddle.fluid ,再输入 paddle.fluid.install_check.run_check()。
如果出现 Your Paddle Fluid is installed successfully!,说明您已成功安装。
Python 3.7.7 (default, Mar 30 2020, 13:50:15)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle.fluid
pad>>> paddle.fluid.install_check.run_check()
Running Verify Paddle Program ...
Your Paddle works well on SINGLE GPU or CPU.
I0330 14:55:49.900933 16450 parallel_executor.cc:440] The Program will be executed on CPU using ParallelExecutor, 2 cards are used, so 2 programs are executed in parallel.
W0330 14:55:49.902719 16450 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 2. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 1.
I0330 14:55:49.902873 16450 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1
I0330 14:55:49.903852 16450 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0330 14:55:49.904664 16450 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
Your Paddle works well on MUTIPLE GPU or CPU.
Your Paddle is installed successfully! Let's start deep Learning with Paddle now
如果验证出错,错误如下
Python 3.7.7 (default, Jun 10 2020, 16:46:20)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle.fluid
>>> paddle.fluid.install_check.run_check()
Running Verify Fluid Program ...
W0610 17:38:40.406365 2011 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0
W0610 17:38:40.406523 2011 dynamic_loader.cc:120] Can not find library: libcudnn.so. The process maybe hang. Please try to add the lib path to LD_LIBRARY_PATH.
W0610 17:38:40.406548 2011 dynamic_loader.cc:179] Failed to find dynamic library: libcudnn.so ( libcudnn.so: cannot open shared object file: No such file or directory )
Please specify its path correctly using following ways:
Method. set environment variable LD_LIBRARY_PATH on Linux or DYLD_LIBRARY_PATH on Mac OS.
For instance, issue command: export LD_LIBRARY_PATH=...
Note: After Mac OS 10.11, using the DYLD_LIBRARY_PATH is impossible unless System Integrity Protection (SIP) is disabled.
/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/install_check.py", line 124, in run_check
test_simple_exe()
File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/install_check.py", line 120, in test_simple_exe
exe0.run(startup_prog)
File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1071, in run
six.reraise(*sys.exc_info())
File "/home/panchan/.local/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1066, in run
return_merged=return_merged)
File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1154, in _run_impl
use_program_cache=use_program_cache)
File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1229, in _run_program
fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet:
--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2 paddle::platform::dynload::EnforceCUDNNLoaded(char const*)
3 paddle::platform::CUDADeviceContext::CUDADeviceContext(paddle::platform::CUDAPlace)
4 std::_Function_handler<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > (), std::reference_wrapper<std::_Bind_simple<void paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()> > >::_M_invoke(std::_Any_data const&)
5 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::__future_base::_Result_base::_Deleter>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > >::_M_invoke(std::_Any_data const&)
6 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
7 std::__future_base::_Deferred_state<std::_Bind_simple<void paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >::_M_run_deferred()
8 paddle::platform::DeviceContextPool::Get(paddle::platform::Place const&)
9 paddle::framework::GarbageCollector::GarbageCollector(paddle::platform::Place const&, unsigned long)
10 paddle::framework::UnsafeFastGPUGarbageCollector::UnsafeFastGPUGarbageCollector(paddle::platform::CUDAPlace const&, unsigned long)
11 paddle::framework::Executor::RunPartialPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, long, long, bool, bool, bool)
12 paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool)
13 paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool, bool)
----------------------
Error Message Summary:
----------------------
Error: Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion at (/paddle/paddle/fluid/platform/dynload/cudnn.cc:63)
检查4-3中的环境变量是否写错,或再次source /etc/profile
六:资源
如果你正好需要我所用到的资源python3,Nvidia-410,cuda-10,cudnn
百度网盘自取
链接:https://pan.baidu.com/s/1VuhwZ4bfcLO86M2G42I-Zw
提取码:bc2f
七:环境
GPU的安装在Ubuntu16.04,Nvidia-p4的环境下
也可参考I-am-Unique的文章
如有错误,还请评论提醒,感谢!
如对您有所帮助,欢迎点赞支持