1. tensorflow已经正常安装,但import 之后报错···
import tensorflow as tf
import os
os.environ['TP_CPP_MIN_LOG_LEVEL'] = '2'
h = tf.constant('hello')
print(h)
2. 查看自己电脑GPU的情况
cd C:\Program Files\NVIDIA Corporation\NVSMI
nvidia-msi
3. linux下安装tensorflow gpu
问题描述:按照 官方指导(https://tensorflow.google.cn/install/gpu#ubuntu_1804_cuda_101),将Ubuntu 18.04 (CUDA 10.1)下面的命令 粘贴复制到 install.sh 并运行(sh +x install.sh), 按照指导来说,命令执行完,应该已经成功安装上了nvidia驱动,但是运行 nvidia-smi时, 并没有显示上图的内容,而是报错:
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
作为一个电脑软硬件的小白(尤其是cpu,gpu,显卡等一概不了解),如何解决
解决过程:
1.查看了解已经安装的各个模块的版本
- 查看cuda 版本
cat /usr/local/cuda/version.txt
10.1.243 - 查看cudnn 版本 (查看失败)
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
no such file - 查看显卡驱动版本 (查看失败)
cat /proc/driver/nvidia/version 或输入nvidia-smi
查看ubuntu内核版本:命令 uname -a(uname -r)
2.有人说可能是linux内核版本太高或者太低的问题:
2.1 解决内核版本过高的问题,参考两篇博客:
https://blog.csdn.net/sinat_23619409/article/details/85220561
https://blog.csdn.net/qq_41870658/article/details/93330041
Linux ubuntu 5.3.0-28-generic #30~18.04.1-Ubuntu SMP Fri Jan 17 06:14:09 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
另一种解决方法,参考:https://blog.csdn.net/Felaim/article/details/100516282(仍然没有解决,要哭死了)
2.2 使secure boot disable的方法
参考博客:https://blog.csdn.net/smcaa/article/details/86482872
2.3 .查看当前系统推荐你安装的驱动版本
ubuntu-drivers devices
正常的:
== /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001D10sv000017AAsd0000225Ebc03sc02i00
vendor : NVIDIA Corporation
model : GP108M [GeForce MX150]
driver : nvidia-driver-410 - third-party free
driver : nvidia-driver-396 - third-party free
driver : nvidia-driver-390 - third-party free
driver : nvidia-driver-415 - third-party free recommended
driver : xserver-xorg-video-nouveau - distro free builtin
我的(不正常吧):
@ubuntu:/usr/src$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:0f.0 ==
modalias : pci:v000015ADd00000405sv000015ADsd00000405bc03sc00i00
vendor : VMware
model : SVGA II Adapter
driver : open-vm-tools-desktop - distro free
查了一下,open-vm-tools-desktop是什么都东西:VMware自带的vmware-tools已经没效果,官方建议是安装open-vm-tools-desktop来代替其跟物理机交互。
今日份尝试已尽(20200304)……
4 服务器安装tensorflow gpu
4.1 了解电脑配置
- Cuda版本
cat
/usr/local/cuda-10.1/version.txt
CUDA Version 10.1.243 - Cudnn版本
cat /usr/local/cuda-10.1/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 2
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
- 显卡驱动版本
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 418.87.00 Thu Aug 8 15:35:46 CDT 2019
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10) - 查看内核版本
uname -r
4.4.0-21-generic
4.2 虚拟环境
按照
https://baijiahao.baidu.com/s?id=1642214623524909281&wfr=spider&for=pc
的方法安装
4.2.1 直接在我的账户下创建虚拟环境
查看虚拟环境
conda info --envs
# conda environments:
#
root * /home/heling/anaconda3
创建虚拟环境
conda create -n python_0413
接下来又有问题:
因为我的服务器上没有sudo权限,需要切换到suprod下安装软件,
我创建虚拟环境并激活,然后suprod之后安装,软件是安装在虚拟环境下面吗?
答:应该先切换到prod下面 再创建虚拟环境吗
另外:sudo apt-get install tensorflow-gpu报错
4.2.2 在prod下面创建虚拟环境![在这里插入图片描述](https://i-blog.csdnimg.cn/blog_migrate/e3f3c7b0f943bb1a2e1f93919153b71d.png)
4.2.3 找一个源更新一下(没解决)
查看目前的源: 位置 /etc/apt/sources.list
cat /etc/apt/sources.list, (都是ifengidc的地址)
没解决…………………………
4.3 安装tensorflow(继续按照简书的“2.安装tensorflow”里的方法)
https://baijiahao.baidu.com/s?id=1642214623524909281&wfr=spider&for=pc
4.3.1 官网下载
直接到“PyPi”网站下载TensorFlow2.0 Alpha版的安装包。进入网址:https://pypi.org/project/tensorflow/2.0.0a0/#files
下载了tensorflow-2.1.0-cp35-cp35m-manylinux2010_x86_64.whl
4.3.2 安装
pip版本低的问题
pip install tensorflow-2.1.0-cp35-cp35m-manylinux2010_x86_64.whl,
报错
解决:升级pip
方法:https://www.cnblogs.com/ElegantSmile/p/10766391.html
- 获取 get-pip.py.
执行脚本 :wget https://bootstrap.pypa.io/get-pip.py, - 执行 python get-pip.py ,更新
(python_0413_p)
prod@knowledge_graph_162v112_syq:~$ python get-pip.py
Looking in indexes:
http://pip.ifengidc.com/simple, http://pip.ifengidc.com:8080/simple
Collecting pip
Downloading http://pip.ifengidc.com/packages/54/0c/d01aa759fdc501a58f431eb594a17495f15b88da142ce14b5845662c13f3/pip-20.0.2-py2.py3-none-any.whl
(1.4 MB)
|████████████████████████████████| 1.4 MB 84.9 MB/s
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 8.1.2
Uninstalling pip-8.1.2:
Successfully uninstalled pip-8.1.2
Successfully installed pip-20.0.2
pip install tensorflow
pip install tensorflow-2.1.0-cp35-cp35m-manylinux2010_x86_64.whl
报错:
需要更新依赖包,自己没有解决,找了公司运维,
- 升级了python要把yum的文件指向python2 ,不然用不了yum(不知道为啥)
vim /usr/bin/yum, 把#! /usr/bin/python 改成#! /usr/bin/python2 - pip install --ignore-installed setuptools
重新 pip install tensorflow-2.1.0-cp35-cp35m-manylinux2010_x86_64.whl,成功
导入tensorflow报错
>>> import tensorflow as tf
2020-04-15 10:23:48.482399: W
tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load
dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared
object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-10.1/lib64
2020-04-15 10:23:48.482498: W
tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load
dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6:
cannot open shared object file: No such file or directory; LD_LIBRARY_PATH:
:/usr/local/cuda-10.1/lib64
2020-04-15 10:23:48.482508: W
tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some
TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please
make sure the missing libraries mentioned above are installed properly.
>>> tf.__version__
'2.1.0'