tensorflow模型在移动端部署比服务端要复杂的多,需要考虑模型的精度、大小、速度、计算量、内存等方面的指标,下面对这些指标进行评价的工具进行相关介绍。
1. 计算量
tensorflow对浮点数计算量(FLOPs)和参数量进行统计
import tensorflow as tf
def stats_graph(graph):
flops = tf.profiler.profile(graph, options=tf.profiler.ProfileOptionBuilder.float_operation())
params = tf.profiler.profile(graph, options=tf.profiler.ProfileOptionBuilder.trainable_variables_parameter())
print('FLOPs: {}G; Trainable params: {}'.format(flops.total_float_ops / 1e9, params.total_parameters))
卷积层:
=input channel, K=kernel size, HW=output feature map size, =output channel.
2是因为一个MAC算包括乘法和加法2次operations。
不考虑bias时有-1,有bias时没有-1(相当于加1相互抵消)。
上面针对一个input feature map,没考虑batch size。
理解上面这个公式分两步,括号内是第一步,计算出output feature map的一个pixel,然后再乘以HWCo拓展到整个output feature map。括号内的部分又可以分为两步, 第一项是乘法运算数,第二项是加法运算数,因为n个数相加,要加n-1次,所以不考虑bias,会有一个-1,如果考虑bias,刚好中和掉
参考:https://www.zhihu.com/question/65305385
https://www.zhihu.com/question/65305385/answer/451060549
https://www.zhihu.com/question/65305385/answer/451060549
2. op耗时、内存分析
tfprof工具可以进行内存、耗时的分析
Build tfprof
# Build the tool.
bazel build --config opt tensorflow/core/profiler:profiler
# Help information, including detail 'option' instructions.
bazel-bin/tensorflow/core/profiler/profiler help
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/profiler/g3doc/command_line.md
在使用tf.profile可能会遇到下面的问题:
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcupti.so.8.0. LD_LIBRARY_PATH:
需要找到libcupti.so库,并将路径加入到LD_LIBRARY_PATH环境变量中
例如,发现该路径下找到/usr/local/cuda/extras/CUPTI/lib64/libcupti.so
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64
为了对内存消耗、运行时间进行统计,你需要首先运行下面的代码。
# Generate the RunMetadata that contains the memory and timing information.
#
# When run on accelerator (e.g. GPU), an operation might perform some
# cpu computation, enqueue the accelerator computation. The accelerator
# computation is then run asynchronously. The profiler considers 3
# times: 1) accelerator computation. 2) cpu computation (might wait on
# accelerator). 3) the sum of 1 and 2.
#
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
with tf.Session() as sess:
_ = sess.run(train_op,
options=run_options,
run_metadata=run_metadata)
然后,你需要运行 tf.profiler.profile
去查看模型的“运行耗时”、“内存消耗”情况。
# Print to stdout an analysis of the memory usage and the timing information
# broken down by python codes.
builder = tf.profiler.ProfileOptionBuilder
opts = builder(builder.time_and_memory())
opts.with_node_names(show_name_regexes=['.*my_code.py.*'])
#opts.with_step(0)
#opts.with_timeline_output('timeline.json')
opts = opts.build()
tf.profiler.profile(
tf.get_default_graph(),
run_meta=run_metadata,
cmd='code',
options=opts) # 该命令在cmd中运行时,输出的信息可读性强,IPython中,排版一团糟
# Print to stdout an analysis of the memory usage and the timing information
# broken down by operation types.
tf.profiler.profile(
tf.get_default_graph(),
run_meta=run_metadata,
cmd='op',
options=tf.profiler.ProfileOptionBuilder.time_and_memory()) # 该命令在cmd中运行时,输出的信息可读性强,IPython中
为了可视化 Python API 的结果: 调用 `with_step(0).with_timeline_output(filename)` 生成一个 timeline json 文件。 打开一个 Chrome 浏览器,输入URL:`chrome://tracing` 并加载生成的 json 文件。
参考:https://blog.csdn.net/u014061630/article/details/82799009