ubuntu下百度飞浆Paddle 的环境搭建以及GPU Nvidia驱动安装 cuda和cudnn的安装和卸载

 

一:cuda安装

1:下载安装包

下载地址:链接

~~~链接如果没有这个界面,先登录,再点击链接就可以啦。

2:安装cuda

/etc/init.d/lightdm stop        #先关闭显示X
sudo chmod a+x cuda_10.0.130_410.48_linux.run
sudo sh cuda_10.0.130_410.48_linux.run    

注意:这里选择不安装Nvidia驱动(有个地方要求Nvidia version >=410.78),而cuda包含的Nvidia版本未410.48


二:Nvidia的安装

建议安装方式选择:

第一种:对版本要求比较高,使用.run方式安装(必须要安装>=Nvidia-410.48版本的)

第二种:对小版本没有要求(只要安装nvidia-410就行)

第三种:对版本没有要求(只要安装Nvidia驱动就行)

1:使用.run包安装

    1:.run的下载,下载地址:链接

操作系统选择:Linux 64-bit,下载的就是.run。选择ubuntu下载的是.deb

    2:禁用nouveau


安装NVIDIA需要把系统自带的驱动禁用,打开文件:

sudo gedit /etc/modprobe.d/blacklist.conf
在文本最后添加以下内容:

blacklist nouveau
option nouveau modeset=0
命令窗口会提示warn,无视之。

保存退出,执行以下命令生效:

sudo update-initramfs -u
重启电脑后输入:
lsmod | grep nouveau
没有任何输出说明禁用成功。

参考链接:https://blog.csdn.net/zhang970187013/article/details/81012845

    3:安装NVidia驱动

sudo chmod a+x NVIDIA-Linux-x86_64-410.129-diagnostic.run
sudo ./NVIDIA-Linux-x86_64-410.129-diagnostic.run -no-opengl-files  #只安装驱动

按照提示默认安装就行啦,


2:ppa安装方式

sudo add-apt-repository ppa:graphics-drivers/ppa  
sudo apt-get update  
sudo apt-get install nvidia-410 #cuda-10对应Nvidia-410版本,cuda-9对应Nvidia-384
sudo apt-get install mesa-common-dev  
sudo apt-get install freeglut3-desudo ubuntu-drivers autoinstallv

3:自动安装驱动

sudo ubuntu-drivers autoinstall

三:验证方法

nvidia-smi

运维同事提醒:右上角的CUDA Version: 10.0 一定要正常显示才行,这说明cuda和Nvidia已经正常关联。如果没有显示CUDA的版本号,建议卸载cuda和Nvidia重装。不然后面的计算不会使用GPU运算,还是会使用CPU。(未亲测。。。),但是我重装之后,确实显示了版本号

程序运行时可以使用Nvidia-smi查看GPU的使用情况

卸载办法如下:

sudo ./NVIDIA-Linux-x86_64-410.129-diagnostic.run --uninstall   #卸载Nvidia驱动
sudo /usr/local/cuda/bin/uninstall_cuda_10.0.pl    #卸载cuda

本人之前卸载使用了如下命令,导致apt-get 报错缺少nvidia-410依赖,万万谨慎使用,重装系统解决

sudo apt-get remove --purge nvidia*        #慎用

如果直接安装cuda无法成功,可尝试先安装Nvidia,再安装cuda

四:cudnn安装

1:下载地址
  选择 cuDNN Library for  Linux

2:将lib64以及include 复制到cuda目录
 

 sudo cp -r include/ /usr/local/cuda-10.0/
 sudo cp -r lib64/ /usr/local/cuda-10.0/
 sudo chmod a+r /usr/local/cuda-10.0/include/*
 sudo chmod a+r /usr/local/cuda-10.0/lib64/*


3:配置环境变量
 

vim /etc/profile


#在文件最后添加

export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$PATH:/usr/local/cuda/lib64

 

source /etc/profile         #更新环境变量

五:安装Paddlepaddle的GPU版本

1:参考上一篇文章安装Python3以及依赖

2:安装paddle

建议参考官方安装说明,极为详细

https://www.paddlepaddle.org.cn/install/quick

确认 Python 和 pip 是 64 bit,并且处理器架构是x86_64架构,目前PaddlePaddle不支持arm64架构
下面的两个命令分别输出的是 "64bit" 和 "x86_64" 即可:

python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"

pip安装百度飞浆的GPU版本
 

python3 -m pip install paddlepaddle-gpu==1.8.1.post107 -i https://mirror.baidu.com/pypi/simple

安装中若有报错,如版本过低

WARNING: You are using pip version 19.2.3, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.


升级pip即可

sudo pip3 install --upgrade pip

3:验证安装是否成功


使用 python3 进入python解释器,输入import paddle.fluid ,再输入 paddle.fluid.install_check.run_check()。
如果出现 Your Paddle Fluid is installed successfully!,说明您已成功安装。

Python 3.7.7 (default, Mar 30 2020, 13:50:15) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle.fluid
pad>>> paddle.fluid.install_check.run_check()
Running Verify Paddle Program ... 
Your Paddle works well on SINGLE GPU or CPU.
I0330 14:55:49.900933 16450 parallel_executor.cc:440] The Program will be executed on CPU using ParallelExecutor, 2 cards are used, so 2 programs are executed in parallel.
W0330 14:55:49.902719 16450 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 2. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 1.
I0330 14:55:49.902873 16450 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1
I0330 14:55:49.903852 16450 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0330 14:55:49.904664 16450 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
Your Paddle works well on MUTIPLE GPU or CPU.
Your Paddle is installed successfully! Let's start deep Learning with Paddle now
 

如果验证出错,错误如下

Python 3.7.7 (default, Jun 10 2020, 16:46:20) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle.fluid
>>> paddle.fluid.install_check.run_check()
Running Verify Fluid Program ... 
W0610 17:38:40.406365  2011 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0
W0610 17:38:40.406523  2011 dynamic_loader.cc:120] Can not find library: libcudnn.so. The process maybe hang. Please try to add the lib path to LD_LIBRARY_PATH.
W0610 17:38:40.406548  2011 dynamic_loader.cc:179] Failed to find dynamic library: libcudnn.so ( libcudnn.so: cannot open shared object file: No such file or directory ) 
 Please specify its path correctly using following ways: 
 Method. set environment variable LD_LIBRARY_PATH on Linux or DYLD_LIBRARY_PATH on Mac OS. 
 For instance, issue command: export LD_LIBRARY_PATH=... 
 Note: After Mac OS 10.11, using the DYLD_LIBRARY_PATH is impossible unless System Integrity Protection (SIP) is disabled.
/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/install_check.py", line 124, in run_check
    test_simple_exe()
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/install_check.py", line 120, in test_simple_exe
    exe0.run(startup_prog)
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1071, in run
    six.reraise(*sys.exc_info())
  File "/home/panchan/.local/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1066, in run
    return_merged=return_merged)
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1154, in _run_impl
    use_program_cache=use_program_cache)
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1229, in _run_program
    fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2   paddle::platform::dynload::EnforceCUDNNLoaded(char const*)
3   paddle::platform::CUDADeviceContext::CUDADeviceContext(paddle::platform::CUDAPlace)
4   std::_Function_handler<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > (), std::reference_wrapper<std::_Bind_simple<void paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()> > >::_M_invoke(std::_Any_data const&)
5   std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::__future_base::_Result_base::_Deleter>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > >::_M_invoke(std::_Any_data const&)
6   std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
7   std::__future_base::_Deferred_state<std::_Bind_simple<void paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >::_M_run_deferred()
8   paddle::platform::DeviceContextPool::Get(paddle::platform::Place const&)
9   paddle::framework::GarbageCollector::GarbageCollector(paddle::platform::Place const&, unsigned long)
10  paddle::framework::UnsafeFastGPUGarbageCollector::UnsafeFastGPUGarbageCollector(paddle::platform::CUDAPlace const&, unsigned long)
11  paddle::framework::Executor::RunPartialPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, long, long, bool, bool, bool)
12  paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool)
13  paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool, bool)

----------------------
Error Message Summary:
----------------------
Error: Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion at (/paddle/paddle/fluid/platform/dynload/cudnn.cc:63)

检查4-3中的环境变量是否写错,或再次source /etc/profile

六:资源

如果你正好需要我所用到的资源python3,Nvidia-410,cuda-10,cudnn

百度网盘自取

链接:https://pan.baidu.com/s/1VuhwZ4bfcLO86M2G42I-Zw 

提取码:bc2f

七:环境

GPU的安装在Ubuntu16.04,Nvidia-p4的环境下

 

也可参考I-am-Unique的文章


如有错误,还请评论提醒,感谢!

如对您有所帮助,欢迎点赞支持

  • 3
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值