在IPython Notebook中执行训练步骤的时候,可能没有任何提示的就突然崩溃了,这时候要注意看控制台的提示。
会显示报错failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
原文解释在这里https://www.tensorflow.org/programmers_guide/using_gpu
The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations: it starts out allocating very little memory, and as Sessions get run and more GPU memory is needed, we extend the GPU memory region needed by the TensorFlow process. Note that we do not release memory, since that can lead to even worse memory fragmentation. To turn this option on, set the option in the ConfigProto by:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
主要原因是我们需要手动设置内存的自动分配。解决方法就是将以上代码贴在训练代码前就可以了,当然原贴里面还提供了其他设置GPU资源分配的方法。