tensorflow Debugger教程（一）——使用自带的tfdbg进行调试

最新推荐文章于 2025-05-05 22:33:48 发布

LoveMIss-Y

最新推荐文章于 2025-05-05 22:33:48 发布

阅读量8.8k

点赞数 13

分类专栏：深度学习 TensorFlow 文章标签： tensorflow高级技巧 tensorflow调试技巧 tensorflow debugger tfdbg keras程序调试

本文链接：https://blog.csdn.net/qq_27825451/article/details/94741752

版权

深度学习同时被 2 个专栏收录

85 篇文章

订阅专栏

TensorFlow

28 篇文章

订阅专栏

前言：在tensorflow2.0之前，由于tensorflow采用的都是静态图机制，所以在调试起来的时候不像我们平时编写python程序那样便于查看，我们没有办法直接使用编辑器的调试功能，也没有办法使用python自带的pdb调试器，tensorflow自己提供了一套专门用于调试的工具。

一、TensorFlow 调试器简介

tfdbg 是 TensorFlow 的专用调试程序。借助该调试程序，您可以在训练和推理期间查看运行中 TensorFlow 图的内部结构和状态，由于 TensorFlow 的计算图模式，使用通用调试程序（如 Python 的 pdb）很难完成调试。

注意：TensorFlow 调试程序使用基于 curses 的文本界面。

（1）在 Mac OS X 上，ncurses 库是必需的，而且可以使用 brew install ncurses 进行安装。

（2）在 Windows 上，curses 并没有得到同样的支持，因此基于 readline 的界面可以与 tfdbg 配合使用（具体方法是使用 pip 安装 pyreadline）。

因此，必须先安装pyreadline包：pip install pyreadline

另外，不管是低阶的tensorflowAPI还是像tf.Estimator,tf.keras,tf.slim这样的高层接口，调试的时候都需要使用到tfdbg。本文先讨论低层的API使用。

二、tensorflow调试的步骤——两步走

第一步：包装Session会话对象。一般代码如下：

import tensorflow as tf
from tensorflow.python import debug as tf_dbg  # 第一步：导入这个包

'''
代码定义部分
'''

with tf.Session() as sess:
    sess=tf_dbg.LocalCLIDebugWrapperSession(sess) # 第二步：包装会话对象sess
    
    '''
    sess.run(.....代码运行部分.....)
    '''

第二步：启动调试程序（三种方式都可以）。

python -m my_python_file.py --debug
python my_python_file.py --debug
python my_python_file.py

启动调试之后，会出现类似于下面的界面

run-start: run #1: 1 fetch (b:0); 0 feeds

TTTTTT FFFF DDD  BBBB   GGG
  TT   F    D  D B   B G
  TT   FFF  D  D BBBB  G  GG
  TT   F    D  D B   B G   G
  TT   F    DDD  BBBB   GGG

TensorFlow version: 1.12.0

======================================
Session.run() call #1:

Fetch(es):
  b:0

Feed dict:
  (Empty)
======================================

Select one of the following commands to proceed ---->
  run:
    Execute the run() call with debug tensor-watching
  run -n:
    Execute the run() call without debug tensor-watching
  run -t <T>:
    Execute run() calls (T - 1) times without debugging, then execute run() once more with debugging and drop back to the CLI
  run -f <filter_name>:
    Keep executing run() calls until a dumped tensor passes a given, registered filter (conditional breakpoint mode)
    Registered filter(s):
        * has_inf_or_nan
  invoke_stepper:
    Use the node-stepper interface, which allows you to interactively step through nodes involved in the graph run() call and inspect/modify their values

For more details, see help..


tfdbg>

使用过python的pdb调试的就应该很熟悉，其实接下来tfpdb的调试就和原生的pythonpdb调试很像，通过输入相应的指令来执行具体的操作。

三、一个简单的实例

在编写tensorflow的时候，我们常常希望看到某一个变量的值，当然我们可以在会话里面直接print出来一个一个看，但是这很显然不方便，我们可以通过调试来完成这一功能，这里主要使用到一个命令run，代码如下：

import tensorflow as tf
from tensorflow.python import debug as tf_dbg  # 导入模块

a=tf.Variable(1,name="a")
b=tf.assign_add(a,1,name="b")

with tf.Session() as sess:
    sess=tf_dbg.LocalCLIDebugWrapperSession(sess) # 包装会话
    tf.global_variables_initializer().run()
    for i in range(10): #10个epoch，模拟训练过程，一个一个epoch的来
        b_=sess.run(b)
        print(b_)

启动调试，执行run命令

tfdbg> run  # 执行的命令
run-end: run #1: 1 fetch (b:0); 0 feeds
3 dumped tensor(s):

时间     大小      操作类型    tensor名称         # 这段中文是我自己添加的
t (ms)   Size (B) Op type    Tensor name
[0.000]  178      Const      b/value:0  # 对应于tf.assign_add(a,1)的第二个参数
[0.003]  166      VariableV2 a:0        # 对应于变量 a
[4.227]  166      AssignAdd  b:0        # 对应于操作 b

我现在想要查看代码中定义的变量a,b两个变量的值

tfdbg> pt b:0   # 查看b
Tensor "b:0:DebugIdentity":
  dtype: int32  # b的类型
  shape: ()     # b的形状

array(2)        # b的值，这里只运行了一次循环所以b=1+1=2

tfdbg> pt a:0   # 查看a
Tensor "a:0:DebugIdentity":
  dtype: int32
  shape: ()

array(1)

这里涉及到两个关键命令的使用，

（1）run命令

run 命令会让 tfdbg 一直执行，直到一次 Session.run() 调用结束，然后下一次run命令，再继续执行一次会话，在上面的循环中，一次run只会运行一次循环。

注意，这个时候如果这个循环后面还有其他的sess.run，也是不会执行的，比如下面

import tensorflow as tf
from tensorflow.python import debug as tf_dbg  # 导入模块

a=tf.Variable(1,name="a")
b=tf.assign_add(a,1,name="b")
c=a=tf.Variable(100,name="c")
d=tf.assign_sub(c,10)

with tf.Session() as sess:
    sess=tf_dbg.LocalCLIDebugWrapperSession(sess) # 包装会话
    tf.global_variables_initializer().run()
    for i in range(10): #10个epoch，模拟训练过程，一个一个epoch的来
        b_=sess.run(b)
        print(b_)
    print("====================================")
    d_=sess.run(d)
    print(d_)

后面的sess.run(d)要等到运行了10次run之后，第十一次run才会执行，第10次run执行结果如下：

====================================
run-end: run #3: 1 fetch (d:0); 0 feeds
3 dumped tensor(s):

t (ms)   Size (B) Op type    Tensor name
[0.000]  178      Const      d/value:0
[0.031]  166      VariableV2 c:0
[5.563]  166      AssignSub  d:0

注意的是，run命令结束之后会将这一次run所涉及到的变量全部显示出来，这一次没有run到的变量并不会显示，除此之外，

在执行 run 之后运行命令 lt 也可以获得此张量列表。

总结：run一次只运行一次sess.run，一次只会这一次所涉及到的变量和操作显示出来。

（2）变量查看命令

pt 变量名称

print_tensor 变量名称

这二者是等价的，pt 就是print_tensor的简写，这里类似于python调试的p命令，实际上也就是print的简写。

当然，tensorflow调试还有更多高级的功能，我们最常见的就是这两个，后面的相关命令会在后面的文章中再继续说明。