ubuntu下百度飞浆Paddle 的环境搭建以及GPU Nvidia驱动安装 cuda和cudnn的安装和卸载

最新推荐文章于 2024-07-12 17:12:50 发布

乔烨

最新推荐文章于 2024-07-12 17:12:50 发布

阅读量2.8k

点赞数 3

分类专栏：百度飞浆 ubuntu环境搭建

本文链接：https://blog.csdn.net/yuhuqiao/article/details/106671978

版权

ubuntu环境搭建同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

百度飞浆

2 篇文章 1 订阅

订阅专栏

一：cuda安装

1：下载安装包

下载地址：链接

～～～链接如果没有这个界面，先登录，再点击链接就可以啦。

2：安装cuda

/etc/init.d/lightdm stop        #先关闭显示X

sudo chmod a+x cuda_10.0.130_410.48_linux.run
sudo sh cuda_10.0.130_410.48_linux.run

注意：这里选择不安装Nvidia驱动（有个地方要求Nvidia version >=410.78），而cuda包含的Nvidia版本未410.48

二：Nvidia的安装

建议安装方式选择：

第一种：对版本要求比较高，使用.run方式安装(必须要安装>=Nvidia-410.48版本的)

第二种：对小版本没有要求(只要安装nvidia-410就行)

第三种：对版本没有要求（只要安装Nvidia驱动就行）

1：使用.run包安装

1：.run的下载，下载地址：链接

操作系统选择：Linux 64-bit，下载的就是.run。选择ubuntu下载的是.deb

2：禁用nouveau

安装NVIDIA需要把系统自带的驱动禁用，打开文件：

sudo gedit /etc/modprobe.d/blacklist.conf
在文本最后添加以下内容：

blacklist nouveau
option nouveau modeset=0
命令窗口会提示warn，无视之。

保存退出，执行以下命令生效：

sudo update-initramfs -u
重启电脑后输入：
lsmod | grep nouveau
没有任何输出说明禁用成功。

参考链接：https://blog.csdn.net/zhang970187013/article/details/81012845

3：安装NVidia驱动

sudo chmod a+x NVIDIA-Linux-x86_64-410.129-diagnostic.run
sudo ./NVIDIA-Linux-x86_64-410.129-diagnostic.run -no-opengl-files  #只安装驱动

按照提示默认安装就行啦，

2：ppa安装方式

sudo add-apt-repository ppa:graphics-drivers/ppa  
sudo apt-get update  
sudo apt-get install nvidia-410 #cuda-10对应Nvidia-410版本，cuda-9对应Nvidia-384
sudo apt-get install mesa-common-dev  
sudo apt-get install freeglut3-desudo ubuntu-drivers autoinstallv

3：自动安装驱动

sudo ubuntu-drivers autoinstall

三：验证方法

nvidia-smi

运维同事提醒：右上角的CUDA Version: 10.0 一定要正常显示才行，这说明cuda和Nvidia已经正常关联。如果没有显示CUDA的版本号，建议卸载cuda和Nvidia重装。不然后面的计算不会使用GPU运算，还是会使用CPU。（未亲测。。。），但是我重装之后，确实显示了版本号

程序运行时可以使用Nvidia-smi查看GPU的使用情况

卸载办法如下：

sudo ./NVIDIA-Linux-x86_64-410.129-diagnostic.run --uninstall   #卸载Nvidia驱动
sudo /usr/local/cuda/bin/uninstall_cuda_10.0.pl    #卸载cuda

本人之前卸载使用了如下命令，导致apt-get 报错缺少nvidia-410依赖，万万谨慎使用，重装系统解决

sudo apt-get remove --purge nvidia*        #慎用

如果直接安装cuda无法成功，可尝试先安装Nvidia，再安装cuda

四：cudnn安装

1：下载地址
选择 cuDNN Library for Linux

2：将lib64以及include 复制到cuda目录

 sudo cp -r include/ /usr/local/cuda-10.0/
 sudo cp -r lib64/ /usr/local/cuda-10.0/
 sudo chmod a+r /usr/local/cuda-10.0/include/*
 sudo chmod a+r /usr/local/cuda-10.0/lib64/*

3：配置环境变量

vim /etc/profile

#在文件最后添加

export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$PATH:/usr/local/cuda/lib64

source /etc/profile         #更新环境变量

五：安装Paddlepaddle的GPU版本

1：参考上一篇文章安装Python3以及依赖

2：安装paddle

建议参考官方安装说明，极为详细

https://www.paddlepaddle.org.cn/install/quick

确认 Python 和 pip 是 64 bit，并且处理器架构是x86_64架构，目前PaddlePaddle不支持arm64架构
下面的两个命令分别输出的是 "64bit" 和 "x86_64" 即可：

python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"

pip安装百度飞浆的GPU版本

python3 -m pip install paddlepaddle-gpu==1.8.1.post107 -i https://mirror.baidu.com/pypi/simple

安装中若有报错，如版本过低

WARNING: You are using pip version 19.2.3, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

升级pip即可

sudo pip3 install --upgrade pip

3：验证安装是否成功

使用 python3 进入python解释器，输入import paddle.fluid ，再输入 paddle.fluid.install_check.run_check()。
如果出现 Your Paddle Fluid is installed successfully!，说明您已成功安装。

Python 3.7.7 (default, Mar 30 2020, 13:50:15) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle.fluid
pad>>> paddle.fluid.install_check.run_check()
Running Verify Paddle Program ... 
Your Paddle works well on SINGLE GPU or CPU.
I0330 14:55:49.900933 16450 parallel_executor.cc:440] The Program will be executed on CPU using ParallelExecutor, 2 cards are used, so 2 programs are executed in parallel.
W0330 14:55:49.902719 16450 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 2. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 1.
I0330 14:55:49.902873 16450 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1
I0330 14:55:49.903852 16450 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0330 14:55:49.904664 16450 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
Your Paddle works well on MUTIPLE GPU or CPU.
Your Paddle is installed successfully! Let's start deep Learning with Paddle now

如果验证出错，错误如下

Python 3.7.7 (default, Jun 10 2020, 16:46:20) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle.fluid
>>> paddle.fluid.install_check.run_check()
Running Verify Fluid Program ... 
W0610 17:38:40.406365  2011 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0
W0610 17:38:40.406523  2011 dynamic_loader.cc:120] Can not find library: libcudnn.so. The process maybe hang. Please try to add the lib path to LD_LIBRARY_PATH.
W0610 17:38:40.406548  2011 dynamic_loader.cc:179] Failed to find dynamic library: libcudnn.so ( libcudnn.so: cannot open shared object file: No such file or directory ) 
 Please specify its path correctly using following ways: 
 Method. set environment variable LD_LIBRARY_PATH on Linux or DYLD_LIBRARY_PATH on Mac OS. 
 For instance, issue command: export LD_LIBRARY_PATH=... 
 Note: After Mac OS 10.11, using the DYLD_LIBRARY_PATH is impossible unless System Integrity Protection (SIP) is disabled.
/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/install_check.py", line 124, in run_check
    test_simple_exe()
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/install_check.py", line 120, in test_simple_exe
    exe0.run(startup_prog)
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1071, in run
    six.reraise(*sys.exc_info())
  File "/home/panchan/.local/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1066, in run
    return_merged=return_merged)
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1154, in _run_impl
    use_program_cache=use_program_cache)
  File "/home/panchan/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1229, in _run_program
    fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2   paddle::platform::dynload::EnforceCUDNNLoaded(char const*)
3   paddle::platform::CUDADeviceContext::CUDADeviceContext(paddle::platform::CUDAPlace)
4   std::_Function_handler<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > (), std::reference_wrapper<std::_Bind_simple<void paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()> > >::_M_invoke(std::_Any_data const&)
5   std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::__future_base::_Result_base::_Deleter>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > >::_M_invoke(std::_Any_data const&)
6   std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
7   std::__future_base::_Deferred_state<std::_Bind_simple<void paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >::_M_run_deferred()
8   paddle::platform::DeviceContextPool::Get(paddle::platform::Place const&)
9   paddle::framework::GarbageCollector::GarbageCollector(paddle::platform::Place const&, unsigned long)
10  paddle::framework::UnsafeFastGPUGarbageCollector::UnsafeFastGPUGarbageCollector(paddle::platform::CUDAPlace const&, unsigned long)
11  paddle::framework::Executor::RunPartialPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, long, long, bool, bool, bool)
12  paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool)
13  paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool, bool)

----------------------
Error Message Summary:
----------------------
Error: Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion at (/paddle/paddle/fluid/platform/dynload/cudnn.cc:63)

检查4-3中的环境变量是否写错，或再次source /etc/profile