Chinese-ELECTRA ‘adam_m not found in checkpoint ‘

最新推荐文章于 2022-04-01 20:36:01 发布

农民小飞侠

最新推荐文章于 2022-04-01 20:36:01 发布

阅读量2.6k

点赞数

分类专栏： tensorflow bert

本文链接：https://blog.csdn.net/w5688414/article/details/107947028

版权

tensorflow 同时被 2 个专栏收录

15 篇文章 0 订阅

订阅专栏

bert

3 篇文章 0 订阅

订阅专栏

今天想利用Chinese-ELECTRA加载预训练权重来进一步训练，结果出现了下面的错误：

2020-08-11 22:40:26.262591: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key discriminator_predictions/dense/bias/adam_m not found in checkpoint
ERROR:tensorflow:Error recorded from training_loop: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key discriminator_predictions/dense/bias/adam_m not found in checkpoint
	 [[node save/RestoreV2 (defined at run_pretraining.py:363) ]]

Original stack trace for 'save/RestoreV2':
  File "run_pretraining.py", line 404, in <module>
    main()
  File "run_pretraining.py", line 400, in main
    args.model_name, args.data_dir, **hparams))
  File "run_pretraining.py", line 363, in train_or_eval
    max_steps=config.num_train_steps)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
    saving_listeners=saving_listeners)
.....

大概意思是adam不在了，简直把我气哭了眼泪。

解决方法

我下载了exectra-base的预训练模型，放到了models文件夹下面的electra_base文件夹内，然后修改代码configure_pretraining.py：

def model_fn_builder(config: configure_pretraining.PretrainingConfig):
  """Build the model for training."""

  def model_fn(features, labels, mode, params):
    """Build the model for training."""
    model = PretrainingModel(config, features,
                             mode == tf.estimator.ModeKeys.TRAIN)
    utils.log("Model is built!")
    init_checkpoint = config.init_checkpoint
    if config is not None:
      init_checkpoint = tf.train.latest_checkpoint(config.init_checkpoint)
      utils.log("Using checkpoint", init_checkpoint)
    tvars = tf.trainable_variables()
    scaffold_fn = None
    if init_checkpoint:
      assignment_map, _ = modeling.get_assignment_map_from_checkpoint(
            tvars, init_checkpoint)
      # print(assignment_map)
      if config.use_tpu:
        def tpu_scaffold():
          tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
          return tf.train.Scaffold()
        scaffold_fn = tpu_scaffold
      else:
          tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
    if mode == tf.estimator.ModeKeys.TRAIN:
      train_op = optimization.create_optimizer(
          model.total_loss, config.learning_rate, config.num_train_steps,
          weight_decay_rate=config.weight_decay_rate,
          use_tpu=config.use_tpu,
          warmup_steps=config.num_warmup_steps,
          lr_decay_power=config.lr_decay_power
      )
      output_spec = tf.estimator.tpu.TPUEstimatorSpec(
          mode=mode,
          loss=model.total_loss,
          train_op=train_op,
          training_hooks=[training_utils.ETAHook(
              {} if config.use_tpu else dict(loss=model.total_loss),
              config.num_train_steps, config.iterations_per_loop,
              config.use_tpu)]
      )
    elif mode == tf.estimator.ModeKeys.EVAL:
      output_spec = tf.estimator.tpu.TPUEstimatorSpec(
          mode=mode,
          loss=model.total_loss,
          eval_metrics=model.eval_metrics,
          evaluation_hooks=[training_utils.ETAHook(
              {} if config.use_tpu else dict(loss=model.total_loss),
              config.num_eval_steps, config.iterations_per_loop,
              config.use_tpu, is_training=False)])
    else:
      raise ValueError("Only TRAIN and EVAL modes are supported")
    return output_spec

  return model_fn

主要添加了加载预训练模型的代码，然后运行命令：

python run_pretraining.py --data-dir data_speech_transformer \
                          --model-name electra_base_train \
                          --hparams config.json

注意这里model-name不要填成跟你前面electra_base一样，不然错误会让你调试哭瞎，然后就可以运行了。部分日志为：

333/10000 = 3.3%, SPS: 1.0, ELAP: 5:48, ETA: 2:48:34 - loss: 9.9810
334/10000 = 3.3%, SPS: 1.0, ELAP: 5:49, ETA: 2:48:32 - loss: 9.9372
335/10000 = 3.4%, SPS: 1.0, ELAP: 5:50, ETA: 2:48:30 - loss: 9.9835
336/10000 = 3.4%, SPS: 1.0, ELAP: 5:51, ETA: 2:48:28 - loss: 10.0930
337/10000 = 3.4%, SPS: 1.0, ELAP: 5:52, ETA: 2:48:26 - loss: 10.3831
338/10000 = 3.4%, SPS: 1.0, ELAP: 5:53, ETA: 2:48:23 - loss: 9.8002
339/10000 = 3.4%, SPS: 1.0, ELAP: 5:54, ETA: 2:48:21 - loss: 10.8526
340/10000 = 3.4%, SPS: 1.0, ELAP: 5:55, ETA: 2:48:19 - loss: 9.4941
341/10000 = 3.4%, SPS: 1.0, ELAP: 5:56, ETA: 2:48:17 - loss: 9.4434
342/10000 = 3.4%, SPS: 1.0, ELAP: 5:57, ETA: 2:48:14 - loss: 10.4484
343/10000 = 3.4%, SPS: 1.0, ELAP: 5:58, ETA: 2:48:12 - loss: 10.0355
344/10000 = 3.4%, SPS: 1.0, ELAP: 5:59, ETA: 2:48:10 - loss: 10.0131

参考文献

[1].'adam_m not found in checkpoint ' when further pretraining. https://github.com/google-research/electra/issues/45

[2].Estimator should be able to partially load checkpoints. https://github.com/tensorflow/tensorflow/issues/10155

[3].Chinese-ELECTRA. https://github.com/ymcui/Chinese-ELECTRA

农民小飞侠

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
2
评论
Chinese-ELECTRA ‘adam_m not found in checkpoint ‘

今天想利用Chinese-ELECTRA加载预训练权重来进一步训练，结果出现了下面的错误：2020-08-11 22:40:26.262591: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key discriminator_predictions/dense/bias/adam_m not found in checkpoint
复制链接

扫一扫