tensorflow性能工具

最新推荐文章于 2024-02-07 15:04:03 发布

ab0902cd

最新推荐文章于 2024-02-07 15:04:03 发布

阅读量544

点赞数

本文链接：https://blog.csdn.net/ab0902cd/article/details/105485416

版权

tensorflow模型在移动端部署比服务端要复杂的多，需要考虑模型的精度、大小、速度、计算量、内存等方面的指标，下面对这些指标进行评价的工具进行相关介绍。

1. 计算量

tensorflow对浮点数计算量（FLOPs）和参数量进行统计

import tensorflow as tf
def stats_graph(graph):
    flops = tf.profiler.profile(graph, options=tf.profiler.ProfileOptionBuilder.float_operation())
    params = tf.profiler.profile(graph, options=tf.profiler.ProfileOptionBuilder.trainable_variables_parameter())
print('FLOPs: {}G;  Trainable params: {}'.format(flops.total_float_ops / 1e9, params.total_parameters))

卷积层： $FLOPS=C_{in}*(2*K*K-1)*H_{out}*W_{out}*C_{out}$

$C_{in}$ =input channel, K=kernel size, HW=output feature map size, $C_{out}$ =output channel.

2是因为一个MAC算包括乘法和加法2次operations。

不考虑bias时有-1，有bias时没有-1（相当于加1相互抵消）。

上面针对一个input feature map，没考虑batch size。

理解上面这个公式分两步，括号内是第一步，计算出output feature map的一个pixel，然后再乘以HWCo拓展到整个output feature map。括号内的部分又可以分为两步，第一项是乘法运算数，第二项是加法运算数，因为n个数相加，要加n-1次，所以不考虑bias，会有一个-1，如果考虑bias，刚好中和掉

参考：https://www.zhihu.com/question/65305385

参考：https://robertlexis.github.io/2018/08/28/Tensorflow-%E6%A8%A1%E5%9E%8B%E6%B5%AE%E7%82%B9%E6%95%B0%E8%AE%A1%E7%AE%97%E9%87%8F%E5%92%8C%E5%8F%82%E6%95%B0%E9%87%8F%E7%BB%9F%E8%AE%A1/

https://www.zhihu.com/question/65305385/answer/451060549

2. op耗时、内存分析

tfprof工具可以进行内存、耗时的分析

Build tfprof
# Build the tool.
bazel build --config opt tensorflow/core/profiler:profiler

# Help information, including detail 'option' instructions.
bazel-bin/tensorflow/core/profiler/profiler help

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/profiler/g3doc/command_line.md

在使用tf.profile可能会遇到下面的问题：

I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcupti.so.8.0. LD_LIBRARY_PATH:

需要找到libcupti.so库，并将路径加入到LD_LIBRARY_PATH环境变量中

例如，发现该路径下找到/usr/local/cuda/extras/CUPTI/lib64/libcupti.so

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64

使用方式，可参考：https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/profiler/g3doc/profile_time.md#times-in-tensorflow-and-tfprof

为了对内存消耗、运行时间进行统计，你需要首先运行下面的代码。

# Generate the RunMetadata that contains the memory and timing information.
#
# When run on accelerator (e.g. GPU), an operation might perform some
#       cpu computation, enqueue the accelerator computation. The accelerator
#       computation is then run asynchronously. The profiler considers 3
#       times: 1) accelerator computation. 2) cpu computation (might wait on
#       accelerator). 3) the sum of 1 and 2.
#
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
with tf.Session() as sess:
  _ = sess.run(train_op,
               options=run_options,
               run_metadata=run_metadata)

然后，你需要运行 tf.profiler.profile 去查看模型的“运行耗时”、“内存消耗”情况。

# Print to stdout an analysis of the memory usage and the timing information
# broken down by python codes.
builder = tf.profiler.ProfileOptionBuilder
opts = builder(builder.time_and_memory())
opts.with_node_names(show_name_regexes=['.*my_code.py.*'])
#opts.with_step(0)
#opts.with_timeline_output('timeline.json')
opts = opts.build()

tf.profiler.profile(
    tf.get_default_graph(),
    run_meta=run_metadata,
    cmd='code',
    options=opts) # 该命令在cmd中运行时，输出的信息可读性强，IPython中，排版一团糟

# Print to stdout an analysis of the memory usage and the timing information
# broken down by operation types.
tf.profiler.profile(
    tf.get_default_graph(),
    run_meta=run_metadata,
    cmd='op',
    options=tf.profiler.ProfileOptionBuilder.time_and_memory()) # 该命令在cmd中运行时，输出的信息可读性强，IPython中

为了可视化 Python API 的结果：调用 `with_step(0).with_timeline_output(filename)` 生成一个 timeline json 文件。打开一个 Chrome 浏览器，输入URL：`chrome://tracing` 并加载生成的 json 文件。

参考：https://blog.csdn.net/u014061630/article/details/82799009

ab0902cd

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
tensorflow性能工具

tensorflow模型在移动端部署比服务端要复杂的多，需要考虑模型的精度、大小、速度、计算量、内存等方面的指标，下面对这些指标进行评价的工具进行相关介绍。1. 计算量tensorflow对浮点数计算量（FLOPs）和参数量进行统计import tensorflow as tfdef stats_graph(graph): flops = tf.profiler.profi...
复制链接

扫一扫