Tensorflow SSE报错
TensorFlow wasn't compiled to use SSE (etc.) instructions, but these are available
解决:os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
参数出错
beta = tf.Variable(beta0, name='beta') #不能这么写,l2计算就马上出错,变为nan
改为
beta = tf.Constant(beta0)
0.12 Saver.restore broken? Unsuccessful TensorSliceReader constructor: Failed to find any matching files
saver.restore训练好的模型出错saver.restore(sess, checkpoint_path)
解决:
use a model name without the character [],即要回复的模型文件名中不能含有[]
when you tried to restore, use the full relative path ./model_epoch10 rather than model_epoch10
FLAGS = tf.flags.FLAGS
tf.flags.DEFINE_string("checkpoint_path","","...")
后面如果再定义FLAGS.checkpoint_path = "./.../..."是无效的,也就是说FLAGS.***是不能赋值的!导致出错。
tensorflow.python.framework.errors_impl.NotFoundError: FindFirstFile failed for:...
checkpoint_path =tf.train.latest_checkpoint(checkpoint_path)
训练结果的文件夹居然要和恢复时文件所在文件夹一样(训练文件夹不能改名字)!不然就会报错。
RuntimeError: Coordinator stopped with threads still running: Thread-4
tf.contrib.slim.learning.train(..., saver=saver) # 模型中间结果保存时出错
sv.saver.save(sess, sv.save_path, global_step=sv.global_step)
解决:1 use tf.reset_default_graph() to reset the graph instead of using with tf.Graph().as_default()
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[2580,200] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
error显示的是gpu0的内存耗尽了,可能是多用户时其它用户使用中,这时可以通过nvidia-smi查看gpu使用情况,并在代码中指定使用其它闲置的gpu如gpu1:os.environ["CUDA_VISIBLE_DEVICES"] = "1"。
from: -柚子皮-
ref: