c++嵌python3.5与 ubuntu解决tensorflow C++警告SSE4.1 SSE4.2 AVX AVX2 FMA XLA

元气少女缘结神

已于 2022-04-25 18:53:39 修改

阅读量6.2k

点赞数 4

分类专栏： AI Basises 文章标签：计算机视觉

于 2019-08-01 18:47:36 首次发布

本文链接：https://blog.csdn.net/wd1603926823/article/details/98086550

版权

AI Basises 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

看过前几篇的应该知道每次tensorflow c++预测时都会报警如下图所示：

即：

2019-07-16 10:33:52.057179: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-07-16 10:33:52.082548: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3407965000 Hz
2019-07-16 10:33:52.082883: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x44d56d0 executing computations on platform Host. Devices:
2019-07-16 10:33:52.082903: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-07-16 10:33:52.557067: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 0. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
2019-07-16 10:33:52.694202: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 1. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
2019-07-16 10:33:53.157970: I tensorflow/core/common_runtime/optimization_registry.cc:35] Running all optimization passes in grouping 2. If you see this a lot, you might be extending the graph too many times (which means you modify the graph many times before execution). Try reducing graph modifications or using SavedModel to avoid any graph modification
2019-07-16 10:33:53.228415: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1337] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.

虽然很多人说这些只是提速警告不用管，但是传说会有很大的性能提升，所以还是想解决。

一、解决AVX AVX2 SSE4.1 SSE4.2 FMA

我查了一些资料想解决这个警告

https://stackoverflow.com/questions/57049454/tensorflows-warningextending-the-graph-too-many-times-which-means-you-modify （翻墙找的）

https://stackoverflow.com/questions/47068709/your-cpu-supports-instructions-that-this-tensorflow-binary-was-not-compiled-to-u

https://github.com/lakshayg/tensorflow-build

https://stackoverflow.com/questions/41293077/how-to-compile-tensorflow-with-sse4-2-and-avx-instructions

https://www.tensorflow.org/install/source

https://blog.csdn.net/edisonleeee/article/details/89503365

https://github.com/lakshayg/tensorflow-build

有人说直接 pip install --ignore-installed --upgrade "Download URL" 或者

pip --upgrade tensorflow

pip unistall tensorflow

pip list

pip install tensorflow-1.9.0-cp36-cp36m-win_amd64.whl

但是还是不行。

root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/copy/ThirdParty/tensorflow-master# ./configure
WARNING: Running Bazel server needs to be killed, because the startup options are different.
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.24.1 installed.
Please specify the location of python. [Default is /usr/bin/python]: bazel shutdown


Invalid python path: bazel shutdown cannot be found.
Please specify the location of python. [Default is /usr/bin/python]: 


Found possible Python library paths:
  /usr/lib/python3/dist-packages
  /usr/local/lib/python3.5/dist-packages
Please input the desired Python library path to use.  Default is [/usr/lib/python3/dist-packages]

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: Y
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: 
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: 
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: 
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: 
Clang will not be downloaded.

Do you wish to build TensorFlow with MPI support? [y/N]: 
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
	--config=gdr         	# Build with GDR support.
	--config=verbs       	# Build with libverbs support.
	--config=ngraph      	# Build with Intel nGraph support.
	--config=numa        	# Build with NUMA support.
	--config=dynamic_kernels	# (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
	--config=noaws       	# Disable AWS S3 filesystem support.
	--config=nogcp       	# Disable GCP support.
	--config=nohdfs      	# Disable HDFS support.
	--config=noignite    	# Disable Apache Ignite support.
	--config=nokafka     	# Disable Apache Kafka support.
	--config=nonccl      	# Disable NVIDIA NCCL support.
Configuration finished
root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/copy/ThirdParty/tensorflow-master# bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 -k
Starting local Bazel server and connecting to it...
WARNING: Usage: bazel build <options> <targets>.
Invoke `bazel help build` for full description of usage and options.
Your request is correct, but requested an empty set of targets. Nothing will be built.
INFO: Analysed 0 targets (0 packages loaded, 0 targets configured).
INFO: Found 0 targets...
INFO: Elapsed time: 1.805s, Critical Path: 0.01s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/copy/ThirdParty/tensorflow-master#

后来又看到有人说升级到2.0就可以了 pip install tensorflow==2.0.0-beta1 但是还是不行！

root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/copy/ThirdParty# pip install tensorflow-2.0.0b1-cp35-cp35m-manylinux1_x86_64.whl
Processing ./tensorflow-2.0.0b1-cp35-cp35m-manylinux1_x86_64.whl
Requirement already satisfied: keras-applications>=1.0.6 in /root/.local/lib/python3.5/site-packages (from tensorflow==2.0.0b1) (1.0.8)
Requirement already satisfied: astor>=0.6.0 in /root/.local/lib/python3.5/site-packages (from tensorflow==2.0.0b1) (0.8.0)
Requirement already satisfied: numpy<2.0,>=1.14.5 in /root/.local/lib/python3.5/site-packages (from tensorflow==2.0.0b1) (1.16.4)
Requirement already satisfied: tb-nightly<1.14.0a20190604,>=1.14.0a20190603 in /usr/local/lib/python3.5/dist-packages (from tensorflow==2.0.0b1) (1.14.0a20190603)
Requirement already satisfied: keras-preprocessing>=1.0.5 in /root/.local/lib/python3.5/site-packages (from tensorflow==2.0.0b1) (1.1.0)
Requirement already satisfied: gast>=0.2.0 in /root/.local/lib/python3.5/site-packages (from tensorflow==2.0.0b1) (0.2.2)
Requirement already satisfied: google-pasta>=0.1.6 in /usr/local/lib/python3.5/dist-packages (from tensorflow==2.0.0b1) (0.1.7)
Requirement already satisfied: tf-estimator-nightly<1.14.0.dev2019060502,>=1.14.0.dev2019060501 in /usr/local/lib/python3.5/dist-packages (from tensorflow==2.0.0b1) (1.14.0.dev2019060501)
Requirement already satisfied: wheel>=0.26 in /root/.local/lib/python3.5/site-packages (from tensorflow==2.0.0b1) (0.33.4)
Collecting protobuf>=3.6.1 (from tensorflow==2.0.0b1)
  Using cached https://files.pythonhosted.org/packages/55/34/7158a5ec978f12307eb361a8c4fdd867a8e2a0ab63fac99e5f555ee796d2/protobuf-3.9.0-cp35-cp35m-manylinux1_x86_64.whl
Requirement already satisfied: grpcio>=1.8.6 in /root/.local/lib/python3.5/site-packages (from tensorflow==2.0.0b1) (1.22.0)
Requirement already satisfied: six>=1.10.0 in /root/.local/lib/python3.5/site-packages (from tensorflow==2.0.0b1) (1.12.0)
Requirement already satisfied: absl-py>=0.7.0 in /root/.local/lib/python3.5/site-packages (from tensorflow==2.0.0b1) (0.7.1)
Requirement already satisfied: termcolor>=1.1.0 in /root/.local/lib/python3.5/site-packages (from tensorflow==2.0.0b1) (1.1.0)
Requirement already satisfied: wrapt>=1.11.1 in /usr/local/lib/python3.5/dist-packages (from tensorflow==2.0.0b1) (1.11.2)
Requirement already satisfied: h5py in /root/.local/lib/python3.5/site-packages (from keras-applications>=1.0.6->tensorflow==2.0.0b1) (2.9.0)
Requirement already satisfied: werkzeug>=0.11.15 in /root/.local/lib/python3.5/site-packages (from tb-nightly<1.14.0a20190604,>=1.14.0a20190603->tensorflow==2.0.0b1) (0.15.4)
Requirement already satisfied: setuptools>=41.0.0 in /root/.local/lib/python3.5/site-packages (from tb-nightly<1.14.0a20190604,>=1.14.0a20190603->tensorflow==2.0.0b1) (41.0.1)
Requirement already satisfied: markdown>=2.6.8 in /root/.local/lib/python3.5/site-packages (from tb-nightly<1.14.0a20190604,>=1.14.0a20190603->tensorflow==2.0.0b1) (3.1.1)
Installing collected packages: protobuf, tensorflow
Successfully installed protobuf-3.9.0 tensorflow-2.0.0b1
root@rootwd-Default-string:/media/root/Ubuntu311/projects/Ecology_projects/copy/ThirdParty#

我在tensorflow的issue和stackoverflow以及别的网站上都有提问。

后来还是完完全全彻底卸载了所有的tensorflow（不止命令行，将所有相关的文件夹都删掉了），然后试着下载tensorflow-2.0.0-alpha0.tar然后：



./configure
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 //tensorflow:libtensorflow_cc.so

但是不行。

https://github.com/tensorflow/tensorflow/releases/tag/v2.0.0-alpha0

http://mirror.tensorflow.org/www.sqlite.org/2019/sqlite-amalgamation-3280000.zip 
https://www.sqlite.org/2019/sqlite-amalgamation-3280000.zip


Executing genrule //tensorflow/cc:nn_ops_genrule failed (Exit 127)
bazel-out/host/bin/tensorflow/cc/ops/nn_ops_gen_cc: symbol lookup error: bazel-out/host/bin/tensorflow/cc/ops/nn_ops_gen_cc: undefined symbol: _ZN10tensorflow15shape_inference21FusedBatchNormV3ShapeEPNS0_16InferenceContextE
Target //tensorflow:libtensorflow_cc.so failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2612.253s, Critical Path: 100.09s
INFO: 6033 processes: 6033 local.
FAILED: Build did NOT complete successfully

然后我将之前的又删掉卸载掉，然后下载了tensorflow-2.0.0-beta1.tar

./configure
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 //tensorflow:libtensorflow_cc.so

然后我发现警告已经消失了很多！！！！总结就是一定不要舍不得删掉卸载掉之前可用版本，舍不得孩子套不到狼，然后就是一定要下与自己对应的版本 CPU版本、ubuntu版本、gcc版本、python版本等都与之对应。

同时我发现这次的tensorflow不再需要-Wl,--no-as-needed 这些flag了。

现在只剩下下面这些警告了：

2019-07-25 15:47:02.775473: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3406455000 Hz
2019-07-25 15:47:02.775793: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2f59c70 executing computations on platform Host. Devices:
2019-07-25 15:47:02.775812: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
Session successfully created.
2019-07-25 15:47:02.858641: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1483] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.

二、解决XLA

我看了这些链接，说可以在代码中激活XLA：


https://www.tensorflow.org/xla/developing_new_backend

https://stackoverflow.com/questions/47977533/how-to-debug-tensorflow-compiler-xla-testsarray-elementwise-ops-test-cpu-para

https://stackoverflow.com/questions/56633372/how-can-i-activate-tensorflows-xla-for-the-c-api

#include "c_api_experimental.h"
TF_SessionOptions* options = TF_NewSessionOptions();
TF_EnableXLACompilation(options,true);

然而加了这三句后，我编译都通不过了，直接返回又缺少库。

后来又按照别人的：

$ TF_XLA_FLAGS=--tf_xla_cpu_global_jit path/to/your/program

export TF_XLA_FLAGS=--tf_xla_cpu_global_jit=/mytensorflowpath/tensorflow/compiler/xla:$TF_XLA_FLAGS=--tf_xla_cpu_global_jit
2019-07-23 10:17:57.259354: E tensorflow/core/util/command_line_flags.cc:106] Couldn't interpret value =/mytensorflowpath/tensorflow/compiler/xla:=--tf_xla_cpu_global_jit for flag tf_xla_cpu_global_jit.

但是返回上图所示无法这样添加环境变量。

我再次确认我安装tensorflow-2.0.0-beta1.tar 时已经加了允许xla的，Do you wish to build TensorFlow with XLA JIT support? [Y/n]: Y 所以实在想不明白为什么还是报XLA警告。

然后我又继续尝试别的解决办法:


https://blog.csdn.net/w285868925/article/details/88317112
http://quabr.com/49549364/layer-conv2d-53-was-called-with-an-input-that-isnt-a-symbolic-tensor


uint8_t intra_op_parallelism_threads = maxCores;
uint8_t inter_op_parallelism_threads = maxCores;
uint8_t config[]={0x10,intra_op_parallelism_threads,0x28,inter_op_parallelism_threads};
TF_SetConfig(sess_opts,config,sizeof(config),status);
uint8_t config[]={0x52,0x4,0x1a,0x2,0x28,0x1};
TF_SetConfig(sess_opts,config,sizeof(config),status);

这个依旧不行。

反正查了很久

https://mp.weixin.qq.com/s/RO3FrPxhK2GEoDCGE9DXrw

https://stackoverflow.com/questions/52943489/what-is-xla-gpu-and-xla-cpu-for-tensorflow

https://stackoverflow.com/questions/43673380/tensorflow-cross-compile-xla-to-android

https://stackoverflow.com/questions/52890108/how-to-open-tensorflow-xla

https://www.cnblogs.com/iyulang/p/6586866.html

https://stackoverflow.com/questions/57049454/tensorflows-warningextending-the-graph-too-many-times-which-means-you-modify

https://stackoverflow.com/questions/57197854/fma-avx-sse-flags-did-not-bring-me-good-performance

https://fast-depth-coding.readthedocs.io/en/latest/tf-speed.html //speed up solution

https://github.com/tensorflow/tensorflow/issues/8243

最后是直接添加环境变量

export TF_XLA_FLAGS=--tf_xla_cpu_global_jit

就这一句，让这个环境变量生效，就解决了问题。不要添加什么路径，就这样写。

看，已经没有那句XLA警告了。

至此，所有警告已解决。

但是我看了下预测速度：

这是用的手写字体测试的速度，并没有传说中那样300%的可观啊。

然后测我自己使用的模型时，解决这些优化提速与不解决几乎一样慢。

然后有人说与模型有关？！

然后又有人建议我使用MKL-DNN，这样会提速很多，我试过了自己编译的MKL-DNN或者直接使用tensorflow的MKL都很慢：

// initialize the number of worker threads
tensorflow::SessionOptions options;
tensorflow::ConfigProto & config = options.config;
if (coresToUse > 0)
{
	config.set_inter_op_parallelism_threads(coresToUse);
	config.set_intra_op_parallelism_threads(coresToUse);
	config.set_use_per_session_threads(false);  
}
// now create a session to make the change
std::unique_ptr<tensorflow::Session> 
	session(tensorflow::NewSession(options));
session->Close

对于这种预测的速度，目前用libtensorflow_cc.so和libtensorflow_framework.so，我是无能为力了。其实可能像别人说的，速度大头还是看训练的模型，模型复杂预测自然慢，模型简单预测自然快。

如果有大神，在优化AVX AVX2 SSE4.1 SSE4.2 FMA XLA后，预测速度大幅度提升请告诉我。

三、C++嵌python（embedding python3.5 in c++）

我在另一台ubuntu上将python3.5想嵌入c++中调用，按网上教程测试hello程序通过。

但是测试自己的预测图片例子时报错：

requests.packages.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.
SystemError: <built-in method locked of _thread.lock object at 0x7fe771c79148> returned a result with an error set

我找了很多资料，有的说是这种嵌入用法中的python文件中不能有“from import ”所以之前那种io.imread等等都用不了不然就会报错，于是我改成import cv2直接cv2.imread发现也不行。好像c++嵌入python时，python中不能有对系统操作的函数不然就报这些错。

后来发现了网上的一个例子他是自己在C++端读入图片，然后传进python：

int main()
{
	Py_Initialize();    
	import_array();   // 检查初始化是否成功   
	if ( !Py_IsInitialized() )    
	{        
		return -1;    
	}     
	PyRun_SimpleString("print 'hello'");    
	PyObject *pName,*pModule,*pDict,*pFunc,*pArgs;    
	
	PyRun_SimpleString("import sys");    
	PyRun_SimpleString("sys.path.append('/home/vetec-p/Pan/project/run-maskrcnn')");    
	PyRun_SimpleString("sys.path.append('/home/vetec-p/Pan/project/run-maskrcnn/build')");    
	PyRun_SimpleString("sys.path.append('/home/vetec-p/Pan/Detectron-master')");    
	// 载入名为pytest的脚本    
	pModule = PyImport_ImportModule("infer_one_pic");    
	if ( !pModule )    
	{        
		printf("can't find testvideo.py");        
		//getchar();        
		return -1;    
	}    
	pDict = PyModule_GetDict(pModule);    
	if ( !pDict )    
	{        
		return -1;    
	}    
	pFunc = PyDict_GetItemString(pDict, "run");    
	if ( !pFunc || !PyCallable_Check(pFunc) )    
	{        
		printf("can't find function [run]");        
		getchar();        
		return -1;    
	}    
	for(int i=1;i<200;i++)    
	{        
		clock_t start,finish;        
		double totaltime;        
		start=clock();        
		Mat img=imread("/media/vetec-p/Data/Rubbish/maskrcnn_dataset/0803_mask/train_all/pic/"+to_string(i)+".png");        
		if(img.empty())            
		return -1;        
		    
		clock_t s1;        
		s1=clock();        
		PyObject *PyList  = PyList_New(data_size);//定义一个与数组等长的PyList对象数组        
		PyObject *ArgList = PyTuple_New(1);          
		auto sz = img.size();        
		int x = sz.width;        
		int y = sz.height;        
		int z = img.channels();        
		uchar *CArrays = new uchar[x*y*z];        
		int iChannels = img.channels();        
		int iRows = img.rows;        
		int iCols = img.cols * iChannels;        
		if (img.isContinuous())        
		{            
			iCols *= iRows;            
			iRows = 1;        
		}         
		uchar* p;        
		int id = -1;        
		for (int i = 0; i < iRows; i++)        
		{                  
			p = img.ptr<uchar>(i);                
			for (int j = 0; j < iCols; j++)            
			{                
				CArrays[++id] = p[j];//连续空间            
			}        
		}         
		npy_intp Dims[3] = { y, x, z}; //注意这个维度数据！        
		PyObject *PyArray = PyArray_SimpleNewFromData(3, Dims, NPY_UBYTE, CArrays);        
		PyTuple_SetItem(ArgList, 0, PyArray);        
		clock_t e1=clock();           
		cout<<"\n赋值为"<<(double)(e1-s1)/CLOCKS_PER_SEC<<"秒！"<<endl;         
		//PyTuple_SetItem(ArgList, 0, PyList);//将PyList对象放入PyTuple对象中        
		PyObject *pReturn = PyObject_CallObject(pFunc, ArgList);        
		clock_t e2=clock();        
		cout<<"\n detect为"<<(double)(e2-e1)/CLOCKS_PER_SEC<<"秒！"<<endl;    
	}      
	Py_DECREF(pModule);     // 关闭Python    
	Py_Finalize();      
	return 0;
}

https://sportsmanlee.blogspot.com/2017/09/c-pass-opencv-mat-image-to-python.html

我参考这个改写，报了一个错：

/usr/include/python3.5m/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it by " \
  ^
In file included from /usr/include/c++/5/cstddef:45:0,
                 from /home/jumper/workspace/opencv3.4.1/include/opencv2/core/hal/interface.h:15,
                 from /home/jumper/workspace/opencv3.4.1/include/opencv2/core/cvdef.h:91,
                 from /home/jumper/workspace/opencv3.4.1/include/opencv2/core.hpp:52,
                 from /home/jumper/workspace/opencv3.4.1/include/opencv2/opencv.hpp:52,
                 from ../src/insertpython.cpp:11:
../src/insertpython.cpp: In function ‘void predictimg()’:
/usr/include/python3.5m/numpy/__multiarray_api.h:1527:35: error: return-statement with a value, in function returning 'void' [-fpermissive]
 #define NUMPY_IMPORT_ARRAY_RETVAL NULL
                                   ^
/usr/include/python3.5m/numpy/__multiarray_api.h:1532:151: note: in expansion of macro ‘NUMPY_IMPORT_ARRAY_RETVAL’
 #define import_array() {if (_import_array() < 0) {PyErr_Print(); PyErr_SetString

解决办法：加入一行代码：

#define NUMPY_IMPORT_ARRAY_RETVAL

终于搞定了将python3.5嵌入C++中预测图片，我最后的例子如下：

mymodel2.py文件：

import tensorflow as tf
import  numpy as np

def test_one_image(imagearray):
    print("进入模型")
    with tf.Graph().as_default():
        output_graph_def = tf.GraphDef()
    
        with open(r"/home/jumper/workspace/algaeprojects/insertpython/good_frozen.pb", "rb") as f:
            output_graph_def.ParseFromString(f.read())
            _ = tf.import_graph_def(output_graph_def, name="")
    
        with tf.Session() as sess:
            init = tf.global_variables_initializer()
            sess.run(init)
    
            input_x = sess.graph.get_tensor_by_name("input:0")
    #        out_softmax = sess.graph.get_tensor_by_name("softmax:0")
            out_softmax = sess.graph.get_tensor_by_name("output:0")
            is_training_x=sess.graph.get_tensor_by_name("is_training:0")
            print("模型加载完成")
            print("开始读图...")
            
            l=imagearray.shape
            k=l[0]
            print(k)

            img=imagearray*(1./255)
            print("get image...")
            feed_dict={input_x:np.reshape(img, [-1, 96, 224, 1]),is_training_x:False}
            print("开始预测...")
            img_out_softmax = sess.run(out_softmax,feed_dict)
            print(img_out_softmax)

c++文件：

#include <Python.h>
#include <iostream>
#include <string>
#include <opencv2/opencv.hpp>
#include <numpy/arrayobject.h>
using namespace std;

#define NUMPY_IMPORT_ARRAY_RETVAL

void predictimg()
{
	Py_Initialize();
	PyEval_InitThreads();
	PyObject*pFunc = NULL;
	PyObject*pArg = NULL;
	PyObject* module = NULL;
	PyRun_SimpleString("import sys");
	PyRun_SimpleString("sys.path.append('/home/jumper/workspace/algaeprojects/insertpython/')");
	module = PyImport_ImportModule("mymodel2");//myModel:Python文件名  
	PyObject *pDict=PyModule_GetDict(module);
	pFunc = PyDict_GetItemString(pDict, "test_one_image");
	//PyEval_CallObject(pFunc, NULL);

	clock_t start,finish;
	double totaltime;
	start=clock();

	cv::Mat img =cv::imread("/home/jumper/workspace/algaeprojects/insertpython/cnn-imgs/AABW22496.jpg");
	int m, n;
	n = img.cols;
	m = img.rows;
	unsigned char *data = (unsigned char*)malloc(sizeof(unsigned char) * m * n);
	int p = 0;
	for (int i = 0; i < m;i++)
	{
		for (int j = 0; j < n; j++)
		{
			data[p]= img.at<unsigned char>(i, j);
			p++;
		}
	}

	clock_t s1;
	s1=clock();

	npy_intp Dims[3]= { m, n,1 }; //给定维度信息
	import_array();
	PyObject *PyArray = PyArray_SimpleNewFromData(3, Dims, NPY_UBYTE, data);
	PyObject*ArgArray = PyTuple_New(1);
	PyTuple_SetItem(ArgArray,0, PyArray);


	//PyObject *pFuncFive = PyDict_GetItemString(pDict,"test_one_image");
	clock_t e1=clock();
	cout<<"\n赋值为"<<(double)(e1-s1)/CLOCKS_PER_SEC<<"秒！"<<endl;

	PyObject*pReturn = PyObject_CallObject(pFunc, ArgArray);
	clock_t e2=clock();
	cout<<"\n detect为"<<(double)(e2-e1)/CLOCKS_PER_SEC<<"秒！"<<endl;

	Py_DECREF(module);     // 关闭Python
	Py_Finalize();
}


void test()
{
	Py_Initialize();
	PyRun_SimpleString("print('hello c++ python')");
	Py_Finalize();
	return;
}

void test1()
{
	Py_Initialize();
	PyRun_SimpleString("import sys");
	PyRun_SimpleString("sys.path.append('/home/jumper/workspace/algaeprojects/insertpython/')");
	PyObject* module = PyImport_ImportModule("demo");
	PyObject* pFunc = PyObject_GetAttrString(module, "print_arg");
	PyObject* pArg = Py_BuildValue("(s)", "hello c++ python!!!");
	PyEval_CallObject(pFunc, pArg);

	Py_Finalize();
	return;
}



int main()
{
	char * path = "/home/jumper/workspace/algaeprojects/insertpython/cnn-imgs/AABW22496.jpg";
	test();//简单测试
	test1();//再简单测试
	predictimg();//预测图片测试
	
	return 0;
}

同时我的C++工程配置如下：

这条红色的是优化警告，我在我同事电脑上配置的，他的CPU比我电脑低，他报了AVX FMA可优化，也就是他电脑上可优化项比较少。但是我不想解决他电脑上这两个优化，因为看现在的时间预测是1.9秒，比我电脑低了太多，所以我估计优化了也还是比我电脑慢。哎...无论是用tensorflow C++ shared library还是embedding python in C++，预测速度对项目来说目前都太慢，无法使用上去。

看后期能找到什么解决办法没。其实简化模型了就非常快，但简化模型后精度会下降。。。。