参考网址
https://github.com/tensorflow/tensorflow/issues/2130
- 报错原因:创建
tf.FIFOQueue
队列并且启动多个进程对同一个队列进行操作时,往往是通过调用tf.Coordinator
类的should_stop
,request_stop
和join
三个方法进行停止; - 当某一个线程退出时,则
should_stop
返回True时,停止当前线程; - 通常使用
tf.FIFOQueue
先入先出队列,should_top
不会自动返回true,关闭当前线程,只有当调用request_stop
停止其他线程,才会在下一个step去判断should_stop
的返回值,但此时当前线程的入队操作依然在进行,等待数据入队,因此在Coordinator stopped
会报错Coordinator stopped with threads still running: Thread-2 Thread-3 Thread-1 Thread-4
- 解决方法:在执行
coord.request_stop()
方法之前,sess.run(queue.close())
把队列关闭就可以;有时可能需要给队列指定参数cancel_pending_enqueues=True
,如sess.run(queue.close(cancel_pending_enqueues=True))
“queue.close()”, but that actually returns an op which needs to be run to do anything. You need to do sess.run(q.close()). Since your first queue is not closed, your “batch” queue is waiting forever for something to be added to the first queue.
Furthermore, this wait is happening in C++ mutex, so stop_grace_period_secs is useless – the queue runner thread checks for “stop_requested” between session run calls, but because dequeue op never returns, it’s stuck inside session.run forever.
def create_session():
"""Resets local session, returns new InteractiveSession"""
config = tf.ConfigProto(log_device_placement=True)
config.gpu_options.per_process_gpu_memory_fraction=0.3 # don't hog all vRAM
config.operation_timeout_in_ms=5000 # terminate on long hangs
sess = tf.InteractiveSession("", config=config)
return sess
tf.reset_default_graph()
q = tf.FIFOQueue(4, tf.string)
enqueue_val = tf.placeholder(dtype=tf.string)
enqueue_op = q.enqueue(enqueue_val)
size_op = q.size()
dequeue_op = q.dequeue()
sess = create_session()
def enqueueit(val):
sess.run([enqueue_op], feed_dict={enqueue_val:val})
print "queue1 size: ", sess.run(size_op)
enqueueit("1")
enqueueit("2")
enqueueit("3")
dequeue_op.set_shape([])
queue2 = tf.train.batch([dequeue_op], batch_size=1, num_threads=1, capacity=1)
threads = tf.train.start_queue_runners()
def dequeueit():
print "queue1 size: ", sess.run(size_op)
print "queue2 size before: ", sess.run("batch/fifo_queue_Size:0")
print "result: ", sess.run(queue2)
print "queue2 size after: ", sess.run("batch/fifo_queue_Size:0")
dequeueit()
dequeueit()
dequeueit()
#solution here
# Ask the threads to stop and wait until they do it
sess.run(q.close(cancel_pending_enqueues=True))
coord.request_stop()
coord.join(threads, stop_grace_period_secs=5)
#sess.close()