tensorflow estimator api train时的 checkpoint save 行为和 val时的chekpoint skip行为

最新推荐文章于 2024-07-02 17:30:24 发布

andylei777

最新推荐文章于 2024-07-02 17:30:24 发布

阅读量3.2k

点赞数

分类专栏： tensorflow

本文链接：https://blog.csdn.net/andylei777/article/details/79067181

版权

本文介绍了使用TensorFlow Estimator API进行训练时的checkpoint保存行为，以及在验证阶段如何跳过checkpoint的详细过程。通过`experiment.train_and_evaluate()`，在训练部分，`experiment.train()`会调用`estimator._train_model()`并利用CheckpointSaverHook来定期保存模型。在验证部分，具体策略如`SecondOrStepTimer.should_trigger_for_step`被用于决定是否保存验证时的checkpoint。

摘要由CSDN通过智能技术生成

INFO:tensorflow:Create CheckpointSaverHook.
2018-01-15 16:24:33.513942: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-01-15 16:24:34.390763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:89:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-01-15 16:24:34.390813: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:89:00.0, compute capability: 6.1)
2018-01-15 16:25:58.010092: I tensorflow/core/kernels/shuffle_dataset_op.cc:110] Filling up shuffle buffer (this may take a while): 499 of 1000
2018-01-15 16:26:07.689469: I tensorflow/core/kernels/shuffle_dataset_op.cc:121] Shuffle buffer filled.
INFO:tensorflow:Saving checkpoints for 1 into /train/mymodels/model.ckpt.
INFO:tensorflow:loss = 22.2663, step = 1
......
EBUG:tensorflow:Skipping evaluation due to same checkpoint /train/mymodels/model.ckpt-1 for step 100 as for step 50.

执行流程如下：

experiment.train_and_evaluate()

# 验证部分用hook实现, 
if self._min_eval_frequency:
   self._train_monitors += [
       monitors.ValidationMonitor(
           input_fn=self._eval_input_fn,
           eval_steps=self._eval_steps,
           metrics=self._eval_metrics,
           every_n_steps=self._min_eval_frequency,
           name=eval_dir_suffix,
           hooks=self._eval_hooks)
   ]

# 训练部分最终调用estimator._train_model(), 第一次训练会保存一下快照！！！
self.train(delay_secs=0)

训练部分

experiment.train(delay_secs=0) -> experiment._estimator.train-> estimator._train_model()

#estimator._train_model()代码
# ...
      # 1. 增加loss监控 （通过hooks&#x

最低0.47元/天解锁文章

andylei777

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
4
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

tensorflow estimator api train时的 checkpoint save 行为 和 val时的chekpoint skip行为

训练部分

tensorflow estimator api train时的 checkpoint save 行为和 val时的chekpoint skip行为