ubuntu下tensorflow 2.0/2.5 c++动态库编译gpu版本

之前所有写的tensorflow相关的东西都是CPU下的。现在公司一台有Nvidia GTX 1060的电脑空余,于是在这台电脑上重装ubuntu后开始编译tensorflow_cc.so的GPU版本并使用。仔细说来有ABCDEF六步骤如下:

A---install bazel  (参考https://docs.bazel.build/versions/master/install-ubuntu.html#install-with-installer-ubuntu)
     1 apt-get install pkg-config zip g++ zlib1g-dev unzip python3
     2 google手动下载bazel in https://github.com/bazelbuild/bazel/releases
     3  chmod +x bazel-<version>-installer-linux-x86_64.sh
        ./bazel-<version>-installer-linux-x86_64.sh --user
     4 在/etc/bash.bashrc设置环境变量like the following shows:
         export BAZELPATH="$PATH:$HOME/bin"
         source /etc/bash.bashrc

B---GPU support
    0 我使用的都是root用户,建议你也这样做。computer user is root,not guest. 
    1 lspci | grep -i nvidia
    2 gcc --version
    3 https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html 在这个链接找到适合自己的CUDA Toolkit release.我下载的是https://developer.nvidia.com/cuda-10.1-download-archive-base?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal
    4 uname -r
    5 apt-get install linux-headers-$(uname -r)
      期间我的报错:Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable)
      我的解决办法:https://blog.csdn.net/u011596455/article/details/60322568
    6 禁用nouveau驱动 https://blog.csdn.net/wd1603926823/article/details/77473746
    7 关闭X-Window,很简单:sudo service lightdm stop,然后切换到tty1:Ctrl+Alt+F1即可
    8 sh cuda_10.1.105_418.39_linux.run
    9 service lightdm start
    10 nvidia-smi 能看到自己的gpu信息
    11 将nvidia驱动工具加入环境变量/etc/bash.bashrc
        export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.0/lib64
        export PATH=$PATH:/usr/local/cuda-9.0/bin
        export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-9.0

        source /etc/bash.bashrc
        nvcc --version
    12 以上都成功则证明你已安装好Nvidia GPU driver和CUDA Toolkit(including CUPTI).查看NVIDIA驱动版本信息 cat /proc/driver/nvidia/version 
    13 https://www.cnblogs.com/eugene0/p/11587987.html这里有一张参照表
    14 https://tensorflow.google.cn/install/gpu :根据这个链接我们还没安装cuDNN.
    15 dpkg -i libcudnn7_7.6.4.38-1+cuda10.1_amd64.deb
    16 dpkg -i libcudnn7-dev_7.6.4.38-1+cuda10.1_amd64.deb
    17 dpkg -i libcudnn7-doc_7.6.4.38-1+cuda10.1_amd64.deb (according to https://blog.csdn.net/dudu815110/article/details/88592558)至此CUDNN已成功安装

C---tensorflow 
  1 下载想要的tensorflow版本(There are many different versions in https://github.com/tensorflow/tensorflow/releases)
  2 ./configure
    其中这一项一定要选择Yes,其它按默认的走即可
    Do you wish to build TensorFlow with CUDA support? [y/N]: y
    CUDA support will be enabled for TensorFlow.
  3 apt install git
  4 bazel build --config=opt --config=cuda //tensorflow:libtensorflow_cc.so
    期间我的报错了:ERROR: Analysis of target '//tensorflow:libtensorflow_cc.so' failed; build aborted: no such package '@local_config_git//': Traceback (most recent call last):
    File "/home/jumper/workspace/tensorflow-2.0.0/third_party/git/git_configure.bzl", line 61
        _fail(result.stderr)
    File "/home/jumper/workspace/tensorflow-2.0.0/third_party/git/git_configure.bzl", line 14, in _fail
        fail(("%sGit Configuration Error:%s %...)))
Git Configuration Error: Traceback (most recent call last):File "/root/.cache/bazel/_bazel_root/ec80c569286571968027dca7bea4db07/external/org_tensorflow/tensorflow/tools/git/gen_git_source.py", line 29, in <module>
    from builtins import bytes  # pylint: disable=redefined-builtin
    ImportError: No module named builtins 
   解决办法1: https://blog.csdn.net/sinat_28442665/article/details/85325232对我来说没用
   解决办法2:apt install python-pip
                     pip install future
                     bazel build --config=opt --config=cuda //tensorflow:libtensorflow_cc.so

编译需要很久,上网高峰期我试了几次失败是因为time out,然后我就等没人后再试就可以了:


  5 在你的tensorflow路径下运行:  ./tensorflow/contrib/makefile/build_all_linux.sh 
    期间的报错信息:tensorflow/contrib/makefile/download_dependencies.sh: line 75: curl: command not found
                                 gzip: stdin: unexpected end of file
                                 tar: Child returned status 1
                                 tar: Error is not recoverable: exiting now
    解决办法:apt-get install curl
    然后继续在你的tensorflow路径下运行: ./tensorflow/contrib/makefile/build_all_linux.sh
    又报错 ./autogen.sh: 37: ./autogen.sh: autoreconf: not found
     解决办法:     apt-get install autoconf 
                         apt-get install automake
                         apt-get install libtool

注意:后面我又在另一台ubuntu上编译tensorflow 2.5.0发现编译好的tensorflow 2.5.0是没有这个build_all_linux.sh文件

所以无法执行此步骤,大家按照 linux 下 tensorflow C++ 提取include文件、第一个hello world - 耿明岩 - 博客园 按照这位博主的方式收集头文件即可。另外 .cache文件夹是在root下,大家用ctrl+h即可显示出来(缺什么就把.cache下搜到的文件或文件夹合并到缺的文件夹中)。然后按刚刚那个博主方式不会有我之前下面描述的编译2.0时的各种头文件报错,可以直接跑出测试结果:

#include <tensorflow/c/c_api.h>
#include <tensorflow/core/platform/env.h>
#include <tensorflow/core/public/session.h>

#include <iostream>
#include <algorithm>
#include <map>

using namespace std;
//using namespace tensorflow;

int main()
{
	cout<<"Tensorflow C++ library version : "<<TF_Version()<<endl;
	tensorflow::SessionOptions options;
	options.config.set_allow_soft_placement(true);
	std::unique_ptr<tensorflow::Session> session(NewSession(options));
	std::vector<tensorflow::DeviceAttributes> resp;
	TF_CHECK_OK(session->ListDevices(&resp));
	bool has_gpu = false;
	for (const auto& dev : resp) {
		if (dev.device_type() == "GPU") {
			has_gpu = true;
		}
	}
	if(has_gpu)
		std::cout << "Suport GPU !" << std::endl;
	else
		std::cout << "Not Suport GPU !!!" << std::endl;

	return 0;
}
2022-07-26 19:02:07.430903: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-07-26 19:02:07.461427: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Tensorflow C++ library version : 2.5.0
2022-07-26 19:02:07.471924: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-07-26 19:02:07.522096: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-26 19:02:07.522452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 SUPER computeCapability: 7.5
coreClock: 1.65GHz coreCount: 34 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2022-07-26 19:02:07.522469: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-07-26 19:02:07.538057: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-07-26 19:02:07.538092: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2022-07-26 19:02:07.547534: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2022-07-26 19:02:07.551228: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2022-07-26 19:02:07.568515: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.10
2022-07-26 19:02:07.572074: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2022-07-26 19:02:07.573388: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2022-07-26 19:02:07.573528: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-26 19:02:07.573924: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-26 19:02:07.574265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2022-07-26 19:02:07.574290: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-07-26 19:02:08.040291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-07-26 19:02:08.040318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2022-07-26 19:02:08.040323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2022-07-26 19:02:08.040430: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-26 19:02:08.040825: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-26 19:02:08.041266: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-26 19:02:08.041600: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6490 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
Suport GPU !


  6 如需Eigen库,则进入tensorflow/contrib/makefile/downloads/eigen,执行:
    mkdir build  
    cd build  
    cmake ..  
    make  
    sudo make install
   安装完毕后,在usr/local/include目录下会出现eigen3文件夹。
  7 整理库和头文件:

     a:5个so库在bazel-bin/tensorflow文件夹下libtensorflow_cc.so libtensorflow_cc.so.2 libtensorflow_cc.so.2.0.0  libtensorflow_framework.so.2 libtensorflow_framework.so.2.0.0 (something like these)。
     b:将5个头文件文件夹整理出来:
         <you_path>/tensorflow (只需tensorflow目录下tensorflow和third_party两个文件夹)
         <you_path>/tensorflow/bazel-genfiles 
         <you_path>/tensorflow/tensorflow/contrib/makefile/downloads/nsync/public 
         <you_path>/tensorflow/tensorflow/contrib/makefile/gen/protobuf/include
  8 将库路径加入系统 (我的库路径是:/home/jumper/workspace/tensorflow-gpu/lib)加在文件/etc/ld.so.conf的后面
  9   ldconfig
  10 ldconfig -v 运行这句可以看到刚刚的库路径。
     如果出现 '/home/jumper/workspace/tensorflow-gpu/lib...libtensorflow_cc.so:No such file or directory',意味着7-a步骤不正确重新试着放入正确的库。这里就是试,放哪几个库。
  11 将你刚刚放好的库,我的是 /home/jumper/workspace/tensorflow-gpu/lib下面的所有库复制一份到/usr/lib
  12 ldconfig
  13 gcc -ltensorflow_cc --verbose 
     不会出现如'can not find libtensorflow_cc'.之类的报错说明成功

D--install eclipse(网上大把资料)
E--run a demo testing tensorflow-gpu(demo如下)

#include "tensorflow/core/framework/graph.pb.h"
#include <tensorflow/core/public/session_options.h>
#include <tensorflow/core/protobuf/meta_graph.pb.h>
#include <fstream>
#include <utility>
#include <vector>
#include <Eigen/Core>
#include <Eigen/Dense>

#include "tensorflow/cc/ops/const_op.h"
#include "tensorflow/cc/ops/image_ops.h"
//#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/graph.pb.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/graph/default_device.h"
#include "tensorflow/core/graph/graph_def_builder.h"
#include "tensorflow/core/lib/core/errors.h"
#include "tensorflow/core/lib/core/stringpiece.h"
#include "tensorflow/core/lib/core/threadpool.h"
#include "tensorflow/core/lib/io/path.h"
#include "tensorflow/core/lib/strings/stringprintf.h"
#include "tensorflow/core/platform/env.h"
#include "tensorflow/core/platform/init_main.h"
#include "tensorflow/core/platform/logging.h"
#include "tensorflow/core/platform/types.h"
#include "tensorflow/core/public/session.h"
#include "tensorflow/core/util/command_line_flags.h"

//using namespace tensorflow;

#define MODELGRAPHRECT_PATH  "/home/SSD/EcologyAnalysis/math/font/cnnmodel/model.meta"
#define MODELRECT_PATH "/home/SSD/EcologyAnalysis/math/font/cnnmodel/model"

//#include <opencv2/opencv.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
using namespace cv;
using namespace std;

int getPredictLabel(tensorflow::Tensor &probabilities,int &output_class_id,double &output_prob)
{
	 int ndim2 = probabilities.shape().dims();             // Get the dimension of the tensor
	  auto tmap = probabilities.tensor<float, 2>();        // Tensor Shape: [batch_size, target_class_num]
	  int output_dim = probabilities.shape().dim_size(1);  // Get the target_class_num from 1st dimension
	  std::vector<double> tout;

	  // Argmax: Get Final Prediction Label and Probability
	  for (int j = 0; j < output_dim; j++)
	  {
			//std::cout << "Class " << j << " prob:" << tmap(0, j) << "," << std::endl;
			if (tmap(0, j) >= output_prob) {
					output_class_id = j;
					output_prob = tmap(0, j);
			}
	  }

	return 0;
}

int mainnet()
{
	tensorflow::Session* session_rect;
	/CNN initiation--Wang Dan 20190710 for rect algaes
	tensorflow::Status statusrect = NewSession(tensorflow::SessionOptions(), &session_rect);
	if (!statusrect.ok())
	{
		std::cout << "ERROR: NewSession() for rect algaes failed..." << std::endl;
		return -1;
	}
	tensorflow::MetaGraphDef graphdefrect;
	tensorflow::Status status_loadrect = ReadBinaryProto(tensorflow::Env::Default(), MODELGRAPHRECT_PATH, &graphdefrect); //从meta文件中读取图模型;
	if (!status_loadrect.ok()) {
			std::cout << "ERROR: Loading model for rect algaes failed..." << std::endl;
			std::cout << status_loadrect.ToString() << "\n";
			return -1;
	}
	tensorflow::Status status_createrect = session_rect->Create(graphdefrect.graph_def()); //将模型导入会话Session中;
	if (!status_createrect.ok()) {
			std::cout << "ERROR: Creating graph for rect algaes in session failed..." << status_createrect.ToString() << std::endl;
			return -1;
	}
	// 读入预先训练好的模型的权重
	tensorflow::Tensor checkpointPathTensorRect(tensorflow::DT_STRING, tensorflow::TensorShape());
	checkpointPathTensorRect.scalar<std::string>()() = MODELRECT_PATH;
	statusrect = session_rect->Run(
			  {{ graphdefrect.saver_def().filename_tensor_name(), checkpointPathTensorRect },},
			  {},{graphdefrect.saver_def().restore_op_name()},nullptr);
	if (!statusrect.ok())
	{
		  throw runtime_error("Error loading checkpoint for rect algaes ...");
	}
	int rectzao_rows=96;//48;
	int rectzao_cols=224;//80;


	char srcfile[200];
	char tmpfile[200];

	for(int index=1;index<1001;index++)
	{
		sprintf(srcfile, "/media/root/Windows3/projects/Ecology/images/resultimgs/temp1/%d.jpg", index);
		Mat src=imread(srcfile,0);
		if(!src.data)
		{
			continue;
		}


		//CNN start...20190710 wd
		tensorflow::Tensor resized_tensor(tensorflow::DT_FLOAT, tensorflow::TensorShape({1,rectzao_rows,rectzao_cols,1}));
		float *imgdata = resized_tensor.flat<float>().data();
		cv::Mat cnninputImg(rectzao_rows, rectzao_cols, CV_32FC1, imgdata);
		cv::Mat srccnn(rectzao_rows, rectzao_cols, CV_8UC1);
		cv::resize(src,srccnn,cv::Size(rectzao_cols,rectzao_rows));
		srccnn.convertTo(cnninputImg, CV_32FC1);
		//对图像做预处理
		cnninputImg=cnninputImg/255;
		//CNN input
		vector<std::pair<string, tensorflow::Tensor> > inputs;
		std::string Input1Name = "input";
		inputs.push_back(std::make_pair(Input1Name, resized_tensor));
		tensorflow::Tensor is_training_val(tensorflow::DT_BOOL,tensorflow::TensorShape());
		is_training_val.scalar<bool>()()=false;
		std::string Input2Name = "is_training";
		inputs.push_back(std::make_pair(Input2Name, is_training_val));
		//CNN predict
		vector<tensorflow::Tensor> outputs;
		string output="output";

		cv::TickMeter timer;
		timer.start();
		tensorflow::Status status_run = session_rect->Run(inputs, {output}, {}, &outputs);
		timer.stop();
		//cout<<"time is "<<timer.getTimeMilli()<<" ms!"<<endl;

		if (!status_run.ok()) {
		   std::cout << "ERROR: RUN failed in PreAlgaeRecognitionProcess()..."  << std::endl;
		   std::cout << status_run.ToString() << "\n";
		}
		int label=-1;
		double prob=0.0;
		getPredictLabel(outputs[0],label,prob);
		//CNN end...
		cout<<"image "<<index<<" label is "<<label<<"  ; time is "<<timer.getTimeMilli()<<" ms!"<<endl;
		timer.reset();
	}

	return 0;
}

也可以自己写一个demo。将你整理好的头文件和库加到这个工程。

出现报错如下:

tensorflow/core/framework/device_attributes.pb.h: No such file or directory
	google/protobuf/port_def.inc: No such file or directory
	tensorflow/core/framework/graph.pb.h: No such file or directory
	tensorflow/core/framework/node_def.pb.h: No such file or directory
	tensorflow/core/framework/attr_value.pb.h: No such file or directory
	tensorflow/core/framework/tensor.pb.h: No such file or directory
	tensorflow/core/framework/resource_handle.pb.h: No such file or directory
	tensorflow/core/framework/tensor_shape.pb.h: No such file or directory
	tensorflow/core/framework/types.pb.h: No such file or directory
	tensorflow/core/framework/function.pb.h: No such file or directory
	tensorflow/core/framework/op_def.pb.h: No such file or directory
	tensorflow/core/framework/versions.pb.h: No such file or directory
    unsupported/Eigen/CXX11/Tensor: No such file or directory

这种就是因为刚刚你整理的头文件不够正确。方法就是把之前你编译的大的tensorflow路径下的对应的这些报错的头文件放入你整理好的头文件的对应位置。比如这最后一个报错我就是复制/usr/local/include/eigen3下的unsupported文件夹到 我的/home/jumper/workspace/tensorflow-gpu/include路径下。

3  /home/jumper/workspace/tensorflow-gpu/include/unsupported/Eigen/CXX11/Tensor:14:31: fatal error: ../../../Eigen/Core: No such file or directory
      solution:copy the Eigen folder in your primer path(mine is /home/jumper/workspace/tensorflow-2.0.0/tensorflow/contrib/makefile/downloads/eigen) to your path (mine is /home/jumper/workspace/tensorflow-gpu/include)
   4  /home/jumper/workspace/tensorflow-gpu/include/tensorflow/core/framework/allocator.h:24:38: fatal error: absl/strings/string_view.h: No such file or directory
      solution:copy the absl folder in your primer path (mine is /home/jumper/workspace/tensorflow-2.0.0/tensorflow/contrib/makefile/downloads/absl) to your path (mine is /home/jumper/workspace/tensorflow-gpu/include)
   5 tensorflow/core/lib/core/error_codes.pb.h: No such file or directory
     solution:copy error_codes.pb.h in your primer path (mine is /home/jumper/workspace/tensorflow-2.0.0/tensorflow/contrib/makefile/gen/host_obj/tensorflow/core/lib/core) to your path (mine is /home/jumper/workspace/tensorflow-gpu/include/tensorflow/core/lib/core)
   6 tensorflow/core/protobuf/config.pb.h: No such file or directory
     solution:/home/jumper/workspace/tensorflow-2.0.0/tensorflow/contrib/makefile/gen/host_obj/tensorflow/core/protobuf to /home/jumper/workspace/tensorflow-gpu/include/tensorflow/core/protobuf
   7 tensorflow/core/framework/cost_graph.pb.h: No such file or directory
     solution:...
   ...like the following above.

诸如此类,看solution中写了我都是这样复制的。

8   在工程setting中加入-std=c++11好像是tensorflow2.0必须要有这个才能正常编译 
     将libtensorflow_cc.so库也加入工程
9   编译时报错Building target: testTensorflowGpu
    Invoking: GCC C++ Linker
    g++  -std=c++11 -L/home/jumper/workspace/tensorflow-gpu/lib -o "testTensorflowGpu"  ./src/testgpu.o   -lpthread -ltensorflow_cc
    /usr/bin/ld: ./src/testgpu.o: undefined reference to symbol '_ZN10tensorflow15ReadBinaryProtoEPNS_3EnvERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPN6google8protobuf11MessageLiteE'
    //home/jumper/workspace/tensorflow-gpu/lib/libtensorflow_framework.so.2: error adding symbols: DSO missing from command line
    makefile:45: recipe for target 'testTensorflowGpu' failed
    collect2: error: ld returned 1 exit status
    make: *** [testTensorflowGpu] Error 1

    解决办法:
               ln -s libtensorflow_framework.so.2 libtensorflow_framework.so
               ldconfig
               将刚刚软链接的这个库libtensorflow_framework.so也加到工程中编译,不再报错,编译成功。

F--编译成功运行demo
2019-10-28 14:08:16.082950: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2808000000 Hz
2019-10-28 14:08:16.083187: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1e5a0d0 executing computations on platform Host. Devices:
2019-10-28 14:08:16.083200: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2019-10-28 14:08:16.084910: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-10-28 14:08:16.138433: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-28 14:08:16.138899: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f0ad30 executing computations on platform CUDA. Devices:
2019-10-28 14:08:16.138913: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1060 6GB, Compute Capability 6.1
2019-10-28 14:08:16.139006: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-28 14:08:16.139385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
2019-10-28 14:08:16.139592: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-10-28 14:08:16.140711: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2019-10-28 14:08:16.141492: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2019-10-28 14:08:16.141673: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2019-10-28 14:08:16.142814: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2019-10-28 14:08:16.143635: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2019-10-28 14:08:16.146235: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-10-28 14:08:16.146303: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-28 14:08:16.146719: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-28 14:08:16.147077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-10-28 14:08:16.147099: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-10-28 14:08:16.147663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-28 14:08:16.147673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2019-10-28 14:08:16.147677: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2019-10-28 14:08:16.147848: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-28 14:08:16.148226: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-28 14:08:16.148627: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5454 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-10-28 14:08:19.537588: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
image 1 label is 0  ; time is 2120.38 ms!
image 2 label is 10  ; time is 3.45043 ms!
image 3 label is 25  ; time is 3.21582 ms!
image 4 label is 10  ; time is 3.46827 ms!
image 6 label is 10  ; time is 3.21907 ms!
image 7 label is 0  ; time is 3.10698 ms!

可以看到CUDA也有一个初始化过程,往往第一次耗时较长。

可以看到这个版本GPU此时跑到了74%。CPU版本的运行时GPU怎么浮动都始终低于10%。

我对比了一下,同一台电脑同样的图片程序模型用tensorflow-cpu和tensorflow-gpu的动态库的耗时。

tensorflow-cpu下预测分类耗时:4.13min

tensorflow-gpu下预测分类耗时:2.83min

可以看到还是提速了一些的。

我觉得GPU版本比CPU版本提速多少还与模型有关,比如https://download.csdn.net/download/wd1603926823/86263340 这个模型使用我在另一台电脑上新编译的tensorflow-2.5.0提速就非常大,看下面是这个模型TF-CPU版本的耗时:

2022-07-27 16:47:56.461203: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2022-07-27 16:47:56.481605: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3407925000 Hz
2022-07-27 16:47:56.481986: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x293fbb0 executing computations on platform Host. Devices:
2022-07-27 16:47:56.482014: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2022-07-27 16:47:56.603768: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 0. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
2022-07-27 16:47:56.607708: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 1. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
no low image data!
2022-07-27 16:47:57.119395: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 2. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
2022-07-27 16:47:57.141153: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 3. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
2022-07-27 16:47:57.608639: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 2. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
2022-07-27 16:47:57.630366: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 3. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
2022-07-27 16:47:58.202302: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 2. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
2022-07-27 16:47:58.221740: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 3. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
2022-07-27 16:47:58.663863: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 2. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
2022-07-27 16:47:58.682867: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 3. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
2022-07-27 16:47:59.136521: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 2. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
2022-07-27 16:47:59.156547: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 3. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
~~~~~~~~~~~~~~~~~~~~~~~~image 1 time: 3403.97 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 2 time: 1254.22 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 3 time: 1248.49 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 4 time: 1258.78 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 5 time: 1266.68 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 6 time: 1290.22 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 7 time: 1287.14 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 8 time: 1291.84 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 9 time: 1289.48 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 10 time: 1295.18 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 11 time: 1296.89 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 12 time: 1304.23 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 13 time: 1300.82 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 14 time: 1302.24 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 15 time: 1308.99 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 16 time: 1293.27 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 17 time: 1308.45 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 18 time: 1303.76 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 19 time: 1305.77 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 20 time: 1307.48 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 21 time: 1309.44 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 22 time: 1310.99 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 23 time: 1333.56 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 24 time: 1341.56 ms!

同样的代码使用TF-GPU版本:

 可以看到从1300ms提速到了80ms左右,提速真的非常非常大!!!!!!!!!!!!

继续看另外的一些测试图片,TF-CPU:

2022-07-27 16:50:19.476208: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2022-07-27 16:50:19.486736: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3407925000 Hz
2022-07-27 16:50:19.486975: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2260bb0 executing computations on platform Host. Devices:
2022-07-27 16:50:19.486993: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2022-07-27 16:50:19.526937: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 0. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
2022-07-27 16:50:19.528274: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 1. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
no low image data!
2022-07-27 16:50:20.051304: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 2. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
2022-07-27 16:50:20.072180: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 3. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
~~~~~~~~~~~~~~~~~~~~~~~~image 1 time: 636.977 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 2 time: 78.1083 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 3 time: 72.9115 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 4 time: 60.9467 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 5 time: 58.1019 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 6 time: 57.8415 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 7 time: 59.1749 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 8 time: 55.75 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 9 time: 59.174 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 10 time: 57.9812 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 11 time: 68.4325 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 12 time: 54.3037 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 13 time: 57.7275 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 14 time: 59.0513 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 15 time: 58.4224 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 16 time: 57.3028 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 17 time: 63.6716 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 18 time: 57.3687 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 19 time: 60.3293 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 20 time: 59.1631 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 21 time: 56.7095 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 22 time: 59.3233 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 23 time: 57.4973 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 24 time: 60.5902 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 25 time: 59.0194 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 26 time: 58.646 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 27 time: 63.9043 ms!
~~~~~~~~~~~~~~~~~~~~~~~~image 28 time: 53.9915 ms!

然后依旧是同样的代码,使用TF-GPU后:

 大家可以看到这个耗时TF-CPU 59ms改成TF-GPU 4ms,差别真的巨大。所以能用TF-GPU就尽量不用TF-CPU

附上参考的一些链接:

tensorflow各个版本需要的CUDA版本以及Cudnn的对应关系
https://blog.csdn.net/qq_27825451/article/details/89082978

https://blog.csdn.net/dragonchow123/article/details/80682787
https://tensorflow.google.cn/install/source
c++ tensorflow接口GPU使用
https://blog.csdn.net/luoyexuge/article/details/81877069

https://www.cnblogs.com/lvchaoshun/p/6614048.html
https://www.jianshu.com/p/31b00ec5bc74
https://blog.csdn.net/wanzhen4330/article/details/81699769
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#pre-installation-actions
https://blog.csdn.net/u014475479/article/details/81702392
https://blog.csdn.net/caroline_wendy/article/details/80868120
 

    

  • 4
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 69
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 69
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

元气少女缘结神

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值