安装对应版本Bazel
今天下载的tensorflow
对应的commit为5a6c3d2d139547b19fe8ed5ae856ed8c6ebbd7f7,bazel
对应的版本为4.2.2。TFLITE-SOC
的docker
上没有正确版本的bazel,因此需要安装,步骤如下(此处默认在docker container
里,拥有根用户权限,否则部分命令行需要添加sudo):
首先安装依赖
apt-get install build-essential openjdk-11-jdk python zip unzip
若报错找不到openjdk-11-jdk
,则进行如下设置
add-apt-repository ppa:openjdk-r/ppa
apt-get update
apt-get install openjdk-11-jdk
若继续报错ModuleNotFoundError: No module named 'apt_pkg'
,则进行以下操作
cd /usr/lib/python3/dist-packages
ln -s apt_pkg.cpython-35m-x86_64-linux-gnu.so apt_pkg.so
然后从华为云下载对应的安装文件(此处是bazel-4.2.2-installer-linux-x86_64.sh
)
安装Bazel
参考知乎文章Bazel 4.0.0在Linux下的安装,进行如下操作:
注意!以上命令会覆盖原有的bazel,小心使用!
chmod +x bazel-<version>-installer-linux-x86_64.sh
sudo ./bazel-<version>-installer-linux-x86_64.sh
修改对应的tensorflow-lite源代码
下图是TFLITE-SOC
所用的tensorflow
版本(添加了对于SystemC
的支持以及若干示例)与官网版本的不同
我们以此为蓝图更改最新版本的tensorflow
添加示例
将./tensorflow/tensorflow/lite/examples/systemc
和./tensorflow/tensorflow/lite/kernels/modeling
拷贝到对应位置
添加SystemC库
打开./tensor/tensor/workspace2.bzl
文件,添加如下定义:
tf_http_archive( name = "systemc",
build_file = "//third_party:systemc.BUILD",
sha256 = "5781b9a351e5afedabc37d145e5f7edec08f3fd5de00ffeb8fa1f3086b1f7b3f",
urls = tf_mirror_urls("https://www.accellera.org/images/downloads/standards/systemc/systemc-2.3.3.tar.gz"),
)
修改kernels/BUILD文件
打开./tensorflow/tensorflow/kernels/BUILD
文件,用以下代码替换405行-442行
cc_library(
name = "cpu_backend_gemm_lib",
hdrs = [
"cpu_backend_gemm.h",
"cpu_backend_gemm_params.h",
],
copts = tflite_copts(),
deps = [
#"//tensorflow/lite/kernels/internal:common",
"//tensorflow/lite/kernels/internal:compatibility",
#"//tensorflow/lite/kernels/internal:types",
],
)
cc_library(
name = "cpu_backend_gemm",
srcs = [
"cpu_backend_gemm_custom_gemv.h",
"cpu_backend_gemm_eigen.cc",
"cpu_backend_gemm_eigen.h",
"cpu_backend_gemm_gemmlowp.h",
"cpu_backend_gemm_ruy.h",
"cpu_backend_gemm_x86.h",
],
# hdrs = [
# "cpu_backend_gemm.h",
# "cpu_backend_gemm_params.h",
# ],
copts = tflite_copts(),
deps = [
":tflite_with_ruy",
"//tensorflow/lite/kernels/internal:common",
"//tensorflow/lite/kernels/internal:compatibility",
"//tensorflow/lite/kernels/internal:types",
":cpu_backend_context",
":cpu_backend_threadpool",
# Isolated so it can be used by systemc
"//tensorflow/lite/kernels:cpu_backend_gemm_lib",
# Depend on ruy regardless of `tflite_with_ruy`. See the comment in
# cpu_backend_gemm.h about why ruy is the generic path.
"@ruy//ruy",
"@ruy//ruy:matrix",
"@ruy//ruy:path",
"@ruy//ruy/profiler:instrumentation",
# We only need to depend on gemmlowp and Eigen when tflite_with_ruy
# is false, but putting these dependencies in a select() seems to
# defeat copybara's rewriting rules.
"@gemmlowp",
"//third_party/eigen3",
"//tensorflow/lite/kernels/modeling:util"
],
)
简单解释
SystemC
示例修改了cpu_backend_gemm.h
和cpu_backend_gemm_lowp.h
的部分代码,用来打印matrix信息,所以此处将库函数cpu_backend_gemm_lib
提出来分开编译了一下,对整个项目的结构并没有什么本质的改变。值得注意的是,src
和hdrs
分别对应编译cpu_backend_gemm
库所需的源文件和头文件,需要包含cpu_backend_gemm.h
所申明的所有文件(相比于TFLITE-SOC
,最新版本添加了cpu_backend_gemm_x86.h
文件,因此需要在srcs
里添加此头文件)。
修改kernel代码
修改cpu_backend_gemm.h
- 申明头文件
#ifdef TOGGLE_TFLITE_SOC
#include "tensorflow/lite/kernels/modeling/util.sc.h"
#endif
- 添加matrix打印信息(
#ifdef TOGGLE_TFLITE_SOC
和#endif
之间的代码)
/* Public entry point */
template <typename LhsScalar, typename RhsScalar, typename AccumScalar,
typename DstScalar, QuantizationFlavor quantization_flavor>
void Gemm(const MatrixParams<LhsScalar>& lhs_params, const LhsScalar* lhs_data,
const MatrixParams<RhsScalar>& rhs_params, const RhsScalar* rhs_data,
const MatrixParams<DstScalar>& dst_params, DstScalar* dst_data,
const GemmParams<AccumScalar, DstScalar, quantization_flavor>& params,
CpuBackendContext* context) {
/*
* 此处省略中间代码
*/
#ifdef TOGGLE_TFLITE_SOC
// GemmImpl::Run(...) function executes a GEMM with different libraries
// depending on the context.
//
// For quantized inputs it will use gemmlowp library
// For floating pointing inputs it will use eigen library
tflite_soc::PrintMatricesInfo<LhsScalar, RhsScalar, DstScalar>(
lhs_params, rhs_params, dst_params);
#endif
// Generic case: dispatch to any backend as a general GEMM.
GemmImpl<LhsScalar, RhsScalar, AccumScalar, DstScalar,
quantization_flavor>::Run(lhs_params, lhs_data, rhs_params, rhs_data,
dst_params, dst_data, params, context);
}
修改cpu_backend_gemm_gemmlowp.h
template <typename LhsScalar, typename RhsScalar, typename AccumScalar,
typename DstScalar>
struct GemmImplUsingGemmlowp<
LhsScalar, RhsScalar, AccumScalar, DstScalar,
QuantizationFlavor::kIntegerWithUniformMultiplier> {
static_assert(std::is_same<LhsScalar, RhsScalar>::value, "");
static_assert(std::is_same<AccumScalar, std::int32_t>::value, "");
using SrcScalar = LhsScalar;
/*
* 此处省略中间代码
*/
using BitDepthParams = typename GemmlowpBitDepthParams<SrcScalar>::Type;
#ifdef TOGGLE_TFLITE_SOC
// printf("%d,%d,%d\n", lhs_params.rows, lhs_params.cols, rhs_params.cols);
tflite_soc::PrintMatrices<LhsScalar, RhsScalar, DstScalar>(
lhs_params, lhs_data, rhs_params, rhs_data, dst_params, dst_data);
#endif
if (params.bias) {
ColVectorMap bias_vector(params.bias, lhs_params.rows);
gemmlowp::OutputStageBiasAddition<ColVectorMap> bias_addition_stage;
/*
* 此处省略中间代码
*/
#ifdef TOGGLE_TFLITE_SOC
printf("\nout after OP execution (%d,%d)\n", dst_params.rows,
dst_params.cols);
tflite_soc::PrintMatrix<DstScalar>(dst_params, dst_data);
#endif
}
};
至此,所有需要修改的部分都已经完成了,其他版本的tensorflow
也可对照以上方法实现SystemC
支持(使用bazel
编译的时候可能会有报错,但基本都是依赖问题,根据报错修改对应的BUILD文件即可)。
现在可以运行tflite-soc上的所有的SystemC
的示例
# Hello World example
bazel build --jobs 1 //tensorflow/lite/examples/systemc:hello_systemc
bazel run //tensorflow/lite/examples/systemc:hello_systemc
# 2D Systolic array accelerator in isolation
bazel build --jobs 1 //tensorflow/lite/kernels/modeling:systolic_run
bazel run //tensorflow/lite/kernels/modeling:systolic_run
# 2D Systolic array accelerator with TFLite benchmarking tools
bazel build //tensorflow/lite/tools/benchmark:benchmark_model --cxxopt=-DTOGGLE_TFLITE_SOC=1
bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model \
--use_gpu=false --num_threads=1 --enable_op_profiling=true \
--graph=../tensorflow-models/mobilenet-v1/mobilenet_v1_1.0_224_quant.tflite \
--num_runs=1