linux-aarch64编译安装tensorflow1.14.0

一个啥也不会的废物

已于 2022-09-09 10:29:45 修改

阅读量1.6k

点赞数

文章标签： python tensorflow linux

于 2022-09-09 10:28:24 首次发布

本文链接：https://blog.csdn.net/qq_39058607/article/details/126762237

版权

linux-aarch64 编译安装tensorflow

1. 明确版本
2. 管理版本
3. 编译安装Bazel-0.24.1
- 3.1 pip安装python相关包
- 3.2 编译Bazel
4. 编译tensorflow1.14.0
5. 构建.whl文件，安装tensorflow
6. 安装tensorflow1.13.1的坑
- 6.1 编译时的坑
- 6.2 安装时的坑

1. 明确版本

首先要明确你的系统，本文是在ubuntu20.04（内核架构为aarch64）上安装tensorflow1.14.0，由于在网上找到的.whl文件不好使，选择编译安装。参考tensorflow官网，以下是对应版本。

Tensorflow	GCC	Bazel	Python	Numpy
`1.14.0`	`5.3.1`	`0.24.1`	`3.7.6`	`1.16.5`
`1.13.1`	`5.3.1`	`0.19.2`	`3.5.3`	`1.16.5`

Note： tensorflow官网上给的gcc版本为4.8，在编译的时候会由于C89和C99的区别导致编译问题，故在此选用了5.3.1。

Note：numpy的版本最好不要高于1.19.0。

2. 管理版本

由于编译过程中各种版本选择不是一次就能成功的，本文采用ubuntu的update-alternatives命令对Python以及GCC版本进行管理。参考链接：https://blog.csdn.net/a1809032425/article/details/122729307。

There are 4 choices for the alternative python (providing /usr/bin/python).

  Selection    Path                      Priority   Status
------------------------------------------------------------
  0            /usr/local/bin/python3.5   3         auto mode
  1            /usr/bin/python2.7         1         manual mode
  2            /usr/bin/python3.8         1         manual mode
  3            /usr/local/bin/python3.5   3         manual mode
* 4            /usr/local/bin/python3.7   1         manual mode

Press <enter> to keep the current choice[*], or type selection number:

There are 5 choices for the alternative gcc (providing /usr/bin/gcc).

  Selection    Path                                Priority   Status
------------------------------------------------------------
  0            /usr/bin/aarch64-linux-gnu-gcc-4.8   2         auto mode
  1            /usr/bin/aarch64-linux-gnu-gcc-4.8   2         manual mode
* 2            /usr/bin/aarch64-linux-gnu-gcc-5     1         manual mode
  3            /usr/bin/aarch64-linux-gnu-gcc-9     1         manual mode
  4            /usr/bin/gcc-4.8                     1         manual mode
  5            /usr/bin/gcc-9                       1         manual mode

Press <enter> to keep the current choice[*], or type selection number:

3. 编译安装Bazel-0.24.1

3.1 pip安装python相关包

若系统中有多个版本的python，一定要管理好，用对应版本的pip去安装相应的包。这里没有按照tensorflow官网中那样使用--user安装python包，目的是为了让不同版本的python将包装在自己的文件夹下从而不造成冲突。

sudo pip3.7 install numpy==1.16.5 wheel -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
sudo pip3.7 install keras_preprocessing --no-deps -i http://pypi.douban.com/simple --trusted-host pypi.douban.com

3.2 编译Bazel

同样地，通过脚本安装Bazel不好使，所以仍然采取编译安装bazel的方式。具体参考Bazel官方文档中的Build Bazel from scratch (bootstrapping)。

Note：从Github上下载Bazel源码的时候一定要注意下载形如bazel-<version>-dist.zip的文件，否则可能会出现无法编译的问题。

在编译Bazel的时候，出现了error: ambiguating new declaration of ‘long int gettid()’,这是在编译grpc的时候产生的问题，为此需要修改如下两个文件（将gettid更改为sys_gettid）。

vim bazel/third_party/grpc/src/core/lib/gpr/log_linux.cc:
/*修改之前*/
43:static long gettid(void) { return syscall(__NR_gettid); }
73:  if (tid == 0) tid = gettid();
/*修改之后*/
43:static long sys_gettid(void) { return syscall(__NR_gettid); }
73:  if (tid == 0) tid = sys_gettid();

vim bazel/third_party/grpc/src/core/lib/gpr/log_posix.cc:
/*修改之前*/
33：static intptr_t gettid(void) { return (intptr_t)pthread_self(); }
86：  gpr_asprintf(&prefix, "%s%s.%09d %7tu %s:%d]",
               gpr_log_severity_string(args->severity), time_buffer,
               (int)(now.tv_nsec), gettid(), display_file, args->line);
/*修改之后*/
33：static intptr_t sys_gettid(void) { return (intptr_t)pthread_self(); }
86：  gpr_asprintf(&prefix, "%s%s.%09d %7tu %s:%d]",
               gpr_log_severity_string(args->severity), time_buffer,
               (int)(now.tv_nsec), sys_gettid(), display_file, args->line);

修改完之后，再次输入env EXTRA_BAZEL_ARGS="--host_javabase=@local_jdk//:jdk" bash ./compile.sh命令编译，即可成功安装。

Note：Bazel的编译需要配置java环境，jdk最好选用1.8。

4. 编译tensorflow1.14.0

4.1 源码下载

终于要编译tensorflow了，首先要从github上下载1.14.0的源码，下载地址https://github.com/tensorflow/tensorflow/releases?q=1.14.0&expanded=true。

4.2 设置编译参数

解压后进入tensorflow1.14.0文件夹，然后运行./configure脚本设置编译参数

Extracting Bazel installation...
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.24.1- (@non-git) installed.
Please specify the location of python. [Default is /usr/bin/python]: 


Found possible Python library paths:
  /usr/local/lib/python3.7/site-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python3.7/site-packages]

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: 
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: 
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: 
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: 
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: 
Clang will not be downloaded.

Do you wish to build TensorFlow with MPI support? [y/N]: 
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
	--config=gdr         	# Build with GDR support.
	--config=verbs       	# Build with libverbs support.
	--config=ngraph      	# Build with Intel nGraph support.
	--config=numa        	# Build with NUMA support.
	--config=dynamic_kernels	# (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
	--config=noaws       	# Disable AWS S3 filesystem support.
	--config=nogcp       	# Disable GCP support.
	--config=nohdfs      	# Disable HDFS support.
	--config=noignite    	# Disable Apache Ignite support.
	--config=nokafka     	# Disable Apache Kafka support.
	--config=nonccl      	# Disable NVIDIA NCCL support.
Configuration finished

Note：建议都选否，cuda如果有可以选。

4.3 编译tensorflow

输入以下命令进行编译，--local_resources是指定系统给编译过程中分配的资源，分别分配内存大小（MB）、CPU核心数（个）、可利用IO的工作站（平均为1.0）。

bazel build --conlyopt="-std=gnu99" --conlyopt="-w" --local_resources 14436,4.0,1.0 //tensorflow/tools/pip_package:build_pip_package

4.3.1 依赖下载失败

编译tensorflow的时候需要许多的依赖包，由于本文不是离线编译，依赖包可以通过联网下载，但有一些包会下载失败，为此，需要手动下载这些包。并将./WORKSPACE、./third_party/icu/workspace.bzl、./tensorflow/workspace.bzl、
./third_party/flatbuffers/workspace.bzl4个文件中的相应包的路径换成下载包的路径。

Note：下载不下来包的大多是网络的问题，当然可以不去手动下载，重新尝试几次。

4.3.2 C++ compilation of rule ‘@grpc//:gpr_base’ failed (Exit 1):

这个错误与Bazel编译失败类似，也是在编译grpc中产生的错误，类似地，修改/.cache/bazel/_bazel_wsn/a7132f72a8b641c1ebbdd6a7fd1fd5fb/external/grpc/src/core/lib/gpr/log_linux.cc与home/wsn/.cache/bazel/_bazel_wsn/a7132f72a8b641c1ebbdd6a7fd1fd5fb/external/grpc/src/core/lib/gpr/log_posix.cc两个文件中的gettid为’sys_gettid’即可成功编译。

4.3.3 depthwiseconv_uint8_3x3_filter.h:3957:58: error:

在编译该文件的时候，产生 cannot convert ‘uint8x16_t {aka __vector(16) unsigned char}’ to ‘const int8x16_t {aka const __vector(16) signed char}’的错误，通过修改./tensorflow/lite/build_def.bzl文件添加如下内容即可成功编译。参考链接https://github.com/tensorflow/tensorflow/pull/29515。

            "/DTF_COMPILE_LIBRARY",
            "/wd4018",  # -Wno-sign-compare
        ],
+       str(Label("//tensorflow:linux_aarch64")): [
+           "-flax-vector-conversions",
+           "-fomit-frame-pointer",
+       ],
        "//conditions:default": [
            "-Wno-sign-compare",
        ],

之后就可以等待tensorflow编译完成了，这是一个漫长的过程…

5. 构建.whl文件，安装tensorflow

输入下面命令构建.whl文件，输出在/tmp/tensorflow_pkg文件夹下

./bazel-bin/tensorflow/tools/pip_package/build_pip_package  /tmp/tensorflow_pkg

进入输出目录，输入下面进行安装tensorflow1.14.0

sudo pip3.7 install tensorflow-1.14.0-cp37-cp37m-linux_aarch64.whl -i http://pypi.douban.com/simple --trusted-host pypi.douban.com

测试import tensorflow,出现错误`TypeError: Descriptors cannot not be created directly，按照下面提示降低protobuf的版本。

If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

sudo pip3.7 install tensorflow-1.14.0-cp37-cp37m-linux_aarch64.whl -i http://pypi.douban.com/simple --trusted-host pypi.douban.com

重新测试，安装成功。

>>> import tensorflow
>>> print(tensorflow.__version__)
1.14.0

6. 安装tensorflow1.13.1的坑

6.1 编译时的坑

除了上述4.3.1和4.3.2的问题外，由于tensorflow1.13.1中aws-sdk对linux-aarch64支持不好，会在编译时产生ImportError: ....undefined symbol: _ZN3Aws11Environment6GetEnvB5cxx11EPKc错误。解决方法是通过修改./tensorflow/BUILD文件添加如下内容：

		   visibility = ["//visibility:public"],
		)
+		config_setting(
+		    name = "linux_aarch64",
+		    values = {"cpu": "aarch64"},
+		    visibility = ["//visibility:public"],
+		)
		config_setting(
		    name = "linux_x86_64",
		    values = {"cpu": "k8"},

修改./third_party/aws/BUILD.bazel,添加如下内容：

		cc_library(
		    name = "aws",
		    srcs = select({
+		        "@org_tensorflow//tensorflow:linux_aarch64": glob([
+		            "aws-cpp-sdk-core/source/platform/linux-shared/*.cpp",
+		        ]),
		        "@org_tensorflow//tensorflow:linux_x86_64": glob([
		            "aws-cpp-sdk-core/source/platform/linux-shared/*.cpp",
		        ]),

参考链接:https://github.com/tensorflow/tensorflow/pull/22856。

6.2 安装时的坑

在安装tensorflow的时候，依赖库h5py安装失败,本想找.whl文件安装，找了半天发现aarch64架构只能找到python3.7版本，于是放弃了tensorflow1.13.1，重新编译python3.7，bazel-0.24.1，tensorflow1.14.0，完成安装。但好像也有其他解决方法？当时一气之下重新安装了，没有动脑子…