pip 安装的 不支持CPU指令 ,源码安装 tensorflow(机器学习) 内存溢出 。

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/jie_linux/article/details/80911496

林老板发我信息如下:
2018-06-29 14:40:11.224113: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 

  pip 安装的 不支持CPU指令

并没有 详细步骤,很乱,主要是内存溢出  cpu指令 与 GPU应用,

root@ubuntu:~# uname -a

Linux ubuntu 4.15.0-23-generic #25-Ubuntu SMP Wed May 23 18:02:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
root@ubuntu:~# cat /etc/is
iscsi/     issue      issue.net  
root@ubuntu:~# cat /etc/issue
Ubuntu 18.04 LTS \n \l

官网说gcc为 4.8 我降级为4.8了  cudn cudnn nccl  都安装了

文件

-rw-r--r--  1 root root   22649439 Jun 29 18:32 tensorflow-1.8.0.tar.gz

root@ubuntu:~/tensorflow-1.8.0#./configure

WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.15.0 installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3


Found possible Python library paths:
  /usr/lib/python3/dist-packages
  /usr/local/lib/python3.6/dist-packages
Please input the desired Python library path to use.  Default is [/usr/lib/python3/dist-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: y
Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: y
Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: y
Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: y
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: y
GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: y
VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]:


Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]:


Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.

Please specify the NCCL version you want to use. [Leave empty to default to NCCL 1.3]:


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1,6.1]


Do you want to use clang as CUDA compiler? [y/N]:
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:


Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
        --config=mkl            # Build with MKL support.
        --config=monolithic     # Config for mostly static monolithic build.
Configuration finished
root@ubuntu:~/tensorflow-1.8.0#

可以完成 编译

  bazel build --jvmopt="-server -Xms20480m"  -c opt --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma //tensorflow/tools/pip_package:build_pip_package

不可以 完成  因为加了--config=cuda 启用GPU   报错如下 内存溢出,30G 呀, 是不是哪里不对。

  bazel build --jvmopt="-server -Xms20480m"  -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package


[13,053 / 14,131] 32 actions running
    Compiling tensorflow/core/kernels/strided_slice_op_gpu.cu.cc; 320s local
    Compiling tensorflow/core/kernels/strided_slice_op_gpu.cu.cc; 319s local
    Compiling tensorflow/core/kernels/batch_matmul_op_real.cc; 277s local
    Compiling tensorflow/core/kernels/batch_matmul_op_complex.cc; 277s local
    Compiling tensorflow/core/kernels/batch_matmul_op_real.cc; 277s local
    Compiling tensorflow/core/kernels/argmax_op.cc; 277s local
    Compiling tensorflow/core/kernels/argmax_op.cc; 277s local
    Compiling tensorflow/core/kernels/conv_ops.cc; 232s local ...

Server terminated abruptly (error code: 14, error message: '', log file: '/root/.cache/bazel/_bazel_root/a1181ec4a71ba55a9d58ce400c896b81/server/jvm.out')

root@ubuntu:~# free -g

              total        used        free      shared  buff/cache   available
Mem:             15          15           0           0           0           0
Swap:            14          14           0

root@ubuntu:~#

root@ubuntu:~/tensorflow-1.8.0#
swapoff /tmp/swap  卸载行添加的交换分区
dd if=/dev/zero of=/tmp/swap bs=1MB count=27648 添加27GB 容量
mkswap /tmp/swap  创建交换分区  
root@ubuntu:~# swapon /tmp/swap  挂着交换分区
swapon: /tmp/swap: insecure permissions 0644, 0600 suggested.
root@ubuntu:~# free -g 验证 ,之前有3GB 交换分区
              total        used        free      shared  buff/cache   available
Mem:             15           0           0           0          14          14

Swap:            29           0          29

最后成功

INFO: Elapsed time: 576.296s, Critical Path: 467.47s
INFO: 766 processes: 766 local.

INFO: Build completed successfully, 898 total actions

生成 tensorflow-1.8.0-cp36-cp36m-linux_x86_64.whl 安装包

bazel-bin/tensorflow/tools/pip_package/build_pip_package tensorflow_pkg_GPU

卸载之前的

oot@ubuntu:~/tensorflow-1.8.0/tensorflow_pkg_GPU# pip3 uninstall tensorflow

Cannot uninstall requirement tensorflow, not installed

安装

root@ubuntu:~/tensorflow-1.8.0/tensorflow_pkg_GPU# pip3 install tensorflow-1.8.0-cp36-cp36m-linux_x86_64.whl
Processing ./tensorflow-1.8.0-cp36-cp36m-linux_x86_64.whl
Requirement already satisfied: gast>=0.2.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==1.8.0)
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==1.8.0)
Collecting protobuf>=3.4.0 (from tensorflow==1.8.0)
  Downloading https://files.pythonhosted.org/packages/fc/f0/db040681187496d10ac50ad167a8fd5f953d115b16a7085e19193a6abfd2/protobuf-3.6.0-cp36-cp36m-manylinux1_x86_64.whl (7.1MB)
    100% |鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 7.1MB 30kB/s
Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.6/dist-packages (from tensorflow==1.8.0)
Requirement already satisfied: six>=1.10.0 in /usr/lib/python3/dist-packages (from tensorflow==1.8.0)
Requirement already satisfied: tensorboard<1.9.0,>=1.8.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==1.8.0)
Requirement already satisfied: absl-py>=0.1.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow==1.8.0)
Requirement already satisfied: astor>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==1.8.0)
Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.6/dist-packages (from tensorflow==1.8.0)
Requirement already satisfied: grpcio>=1.8.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow==1.8.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.6/dist-packages (from protobuf>=3.4.0->tensorflow==1.8.0)
Requirement already satisfied: bleach==1.5.0 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow==1.8.0)
Requirement already satisfied: werkzeug>=0.11.10 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow==1.8.0)
Requirement already satisfied: html5lib==0.9999999 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow==1.8.0)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow==1.8.0)
Installing collected packages: protobuf, tensorflow
Successfully installed protobuf-3.6.0 tensorflow-1.8.0
    

    测试 文件

root@ubuntu:~# cat tf.py
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

测试

    root@ubuntu:~# python3 tf.py
2018-07-04 17:40:04.645445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2018-07-04 17:40:04.815237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:03:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2018-07-04 17:40:04.819954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1
2018-07-04 17:40:05.475453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-04 17:40:05.475524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 1
2018-07-04 17:40:05.475533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N Y
2018-07-04 17:40:05.475537: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1:   Y N
2018-07-04 17:40:05.476107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10413 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
2018-07-04 17:40:05.680812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10413 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)

b'Hello, TensorFlow!'



中文解说

https://www.tensorflow.org/install/install_sources

翻译老外的,没有采用

https://www.52cv.net/?p=511





展开阅读全文

没有更多推荐了,返回首页