几行代码分析TensorFlow训练模型耗时_tensoflow 训练时间-CSDN博客

本文链接：https://blog.csdn.net/u011675334/article/details/124821878

三妹最近被领导质疑了，他说我模型训练时间太久：你看人家XXX公司，千万级的数量级几个小时就训练好了，你这个快40个小时了，必须得优化，你先看看训练模型时间都花在哪了吧。

经过一顿尝试，我要记录一下我觉得最简单高效的方法。

一、工具

timeline

二、结果展示

图片来源：tensorflow性能调优实践 - 简书

图片来源：使用TensorFlow训练WDL模型性能问题定位与调优 - 美团技术团队

（P.S. 感谢大佬们的无私分享）

三、Show Code

import tensorflow as tf

# 以下为重要代码片段，一般写在 train() 函数中
with tf.Session() as sess:
    # 定义 run_options 和 run_metadata, 用于保存op的属性
    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
    run_metadata = tf.RunMetadata()

    from tensorflow.python.client import timeline
    
    # 训练 10W step
    for i in range(100000):
        sess.run(train_op, options=run_options, run_metadata=run_metadata)
        if i % 1000 == 0:
            train_loss = sess.run(train_loss)
            sess.run(train_auc_op)
            train_auc = sess.run(train_auc_value)
                
            print("Step:", i, "train_loss:", train_loss, "train_auc:", train_auc)
            
            # 记录耗时
            fetched_timeline = timeline.Timeline(run_metadata.step_stats)
            chrome_trace = fetched_timeline.generate_chrome_trace_format()
            with open('./timeline/timeline_train_{}.json'.format(i), 'w') as f:
                    f.write(chrome_trace)