estimator有个问题就是验证时是从文件中载入模型的,这样存在问题是无法保证从保存到载入期间的完全正确性。对于这种问题,我们一般采用少量数据,然后在训练集上进行验证。确认预测数据是否一致。主要是使用cond的控制最后几轮不进行训练,并且把数据打印出来。
def train_func():
# 构建训练节点
train_op = create_optimizer(
total_loss, lr, optimizer_params, 1., variables_to_train, use_fp16=FLAGS.use_fp16)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
update_ops.append(train_op)
update_op = tf.group(*update_ops)
with tf.control_dependencies([update_op]):
train_tensor = tf.identity(total_loss)
return train_tensor
train_tensor = tf.cond(global_step_tensor>5, lambda : tf.Print(total_loss,
[tf.shape(per_example_loss), per_example_loss,
tf.reduce_mean(per_example_loss),
tf.reduce_mean(tf.to_float(per_example_loss<FLAGS.margin))], summarize=32), lambda : train_func() )
tf.cond and tf.case execute all branches
tensorflow: Initializer for variable… is from inside a control-flow construct, a loop or conditional
另外可以使用hook把训练完或者验证完的模型参数保存下来,进行比较。
class LogSessionEvalHook(tf.train.SessionRunHook):
def __init__(self):
pass
def after_create_session(self, session, coord):
pass
def before_run(self, run_context):
pass
def after_run(self, run_context, run_values):
pass
def end(self, sess):
pass
# sess.graph._unsafe_unfinalize()
# saver = tf.train.Saver()
# save_path = saver.save(sess, "/data00/huangqingkang/repos/bert/exps/bertrel/tmp/eval")
# print("******* eval checkpoint: ", save_path, "********")