RuntimeError: Coordinator stopped with threads still running: Thread-2 Thread-3 Thread-1 Thread-4

最新推荐文章于 2024-05-17 13:48:37 发布

spirits_of_snail

最新推荐文章于 2024-05-17 13:48:37 发布

阅读量1.7k

点赞数 1

分类专栏：深度学习文章标签： tensorflow

深度学习专栏收录该内容

21 篇文章 5 订阅

订阅专栏

参考网址

https://github.com/tensorflow/tensorflow/issues/2130

报错原因：创建tf.FIFOQueue队列并且启动多个进程对同一个队列进行操作时，往往是通过调用tf.Coordinator类的should_stop,request_stop和join三个方法进行停止；
当某一个线程退出时，则should_stop返回True时，停止当前线程；
通常使用tf.FIFOQueue先入先出队列，should_top不会自动返回true，关闭当前线程，只有当调用request_stop停止其他线程，才会在下一个step去判断should_stop的返回值，但此时当前线程的入队操作依然在进行，等待数据入队，因此在Coordinator stopped会报错Coordinator stopped with threads still running: Thread-2 Thread-3 Thread-1 Thread-4
解决方法：在执行coord.request_stop()方法之前，sess.run(queue.close())把队列关闭就可以；有时可能需要给队列指定参数cancel_pending_enqueues=True，如sess.run(queue.close(cancel_pending_enqueues=True))

“queue.close()”, but that actually returns an op which needs to be run to do anything. You need to do sess.run(q.close()). Since your first queue is not closed, your “batch” queue is waiting forever for something to be added to the first queue.

Furthermore, this wait is happening in C++ mutex, so stop_grace_period_secs is useless – the queue runner thread checks for “stop_requested” between session run calls, but because dequeue op never returns, it’s stuck inside session.run forever.

def create_session():
  """Resets local session, returns new InteractiveSession"""

  config = tf.ConfigProto(log_device_placement=True)
  config.gpu_options.per_process_gpu_memory_fraction=0.3 # don't hog all vRAM
  config.operation_timeout_in_ms=5000   # terminate on long hangs
  sess = tf.InteractiveSession("", config=config)
  return sess

tf.reset_default_graph()
q = tf.FIFOQueue(4, tf.string)
enqueue_val = tf.placeholder(dtype=tf.string)
enqueue_op = q.enqueue(enqueue_val)
size_op = q.size()
dequeue_op = q.dequeue()
sess = create_session()
def enqueueit(val):
  sess.run([enqueue_op], feed_dict={enqueue_val:val})
  print "queue1 size: ", sess.run(size_op)
enqueueit("1")
enqueueit("2")
enqueueit("3")

dequeue_op.set_shape([])
queue2 = tf.train.batch([dequeue_op], batch_size=1, num_threads=1, capacity=1)
threads = tf.train.start_queue_runners()

def dequeueit():
  print "queue1 size: ", sess.run(size_op)
  print "queue2 size before: ", sess.run("batch/fifo_queue_Size:0")
  print "result: ", sess.run(queue2)
  print "queue2 size after: ", sess.run("batch/fifo_queue_Size:0")

dequeueit()
dequeueit()
dequeueit()
#solution here
# Ask the threads to stop and wait until they do it
sess.run(q.close(cancel_pending_enqueues=True))
coord.request_stop()
coord.join(threads, stop_grace_period_secs=5)
#sess.close()

spirits_of_snail

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
RuntimeError: Coordinator stopped with threads still running: Thread-2 Thread-3 Thread-1 Thread-4

参考网址https://github.com/tensorflow/tensorflow/issues/2130报错原因：创建tf.FIFOQueue队列并且启动多个进程对同一个队列进行操作时，往往是通过调用tf.Coordinator类的should_stop,request_stop和join三个方法进行停止；当某一个线程退出时，则should_stop返回True时，停止当前线程；...
复制链接

扫一扫

专栏目录