Chinese-ELECTRA ‘adam_m not found in checkpoint ‘

今天想利用Chinese-ELECTRA加载预训练权重来进一步训练,结果出现了下面的错误:

2020-08-11 22:40:26.262591: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key discriminator_predictions/dense/bias/adam_m not found in checkpoint
ERROR:tensorflow:Error recorded from training_loop: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key discriminator_predictions/dense/bias/adam_m not found in checkpoint
	 [[node save/RestoreV2 (defined at run_pretraining.py:363) ]]

Original stack trace for 'save/RestoreV2':
  File "run_pretraining.py", line 404, in <module>
    main()
  File "run_pretraining.py", line 400, in main
    args.model_name, args.data_dir, **hparams))
  File "run_pretraining.py", line 363, in train_or_eval
    max_steps=config.num_train_steps)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
    saving_listeners=saving_listeners)
.....

大概意思是adam不在了,简直把我气哭了眼泪。

解决方法

我下载了exectra-base的预训练模型,放到了models文件夹下面的electra_base文件夹内,然后修改代码configure_pretraining.py:

def model_fn_builder(config: configure_pretraining.PretrainingConfig):
  """Build the model for training."""

  def model_fn(features, labels, mode, params):
    """Build the model for training."""
    model = PretrainingModel(config, features,
                             mode == tf.estimator.ModeKeys.TRAIN)
    utils.log("Model is built!")
    init_checkpoint = config.init_checkpoint
    if config is not None:
      init_checkpoint = tf.train.latest_checkpoint(config.init_checkpoint)
      utils.log("Using checkpoint", init_checkpoint)
    tvars = tf.trainable_variables()
    scaffold_fn = None
    if init_checkpoint:
      assignment_map, _ = modeling.get_assignment_map_from_checkpoint(
            tvars, init_checkpoint)
      # print(assignment_map)
      if config.use_tpu:
        def tpu_scaffold():
          tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
          return tf.train.Scaffold()
        scaffold_fn = tpu_scaffold
      else:
          tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
    if mode == tf.estimator.ModeKeys.TRAIN:
      train_op = optimization.create_optimizer(
          model.total_loss, config.learning_rate, config.num_train_steps,
          weight_decay_rate=config.weight_decay_rate,
          use_tpu=config.use_tpu,
          warmup_steps=config.num_warmup_steps,
          lr_decay_power=config.lr_decay_power
      )
      output_spec = tf.estimator.tpu.TPUEstimatorSpec(
          mode=mode,
          loss=model.total_loss,
          train_op=train_op,
          training_hooks=[training_utils.ETAHook(
              {} if config.use_tpu else dict(loss=model.total_loss),
              config.num_train_steps, config.iterations_per_loop,
              config.use_tpu)]
      )
    elif mode == tf.estimator.ModeKeys.EVAL:
      output_spec = tf.estimator.tpu.TPUEstimatorSpec(
          mode=mode,
          loss=model.total_loss,
          eval_metrics=model.eval_metrics,
          evaluation_hooks=[training_utils.ETAHook(
              {} if config.use_tpu else dict(loss=model.total_loss),
              config.num_eval_steps, config.iterations_per_loop,
              config.use_tpu, is_training=False)])
    else:
      raise ValueError("Only TRAIN and EVAL modes are supported")
    return output_spec

  return model_fn

主要添加了加载预训练模型的代码,然后运行命令:

python run_pretraining.py --data-dir data_speech_transformer \
                          --model-name electra_base_train \
                          --hparams config.json    

注意这里model-name不要填成跟你前面electra_base一样,不然错误会让你调试哭瞎,然后就可以运行了。部分日志为:

333/10000 = 3.3%, SPS: 1.0, ELAP: 5:48, ETA: 2:48:34 - loss: 9.9810
334/10000 = 3.3%, SPS: 1.0, ELAP: 5:49, ETA: 2:48:32 - loss: 9.9372
335/10000 = 3.4%, SPS: 1.0, ELAP: 5:50, ETA: 2:48:30 - loss: 9.9835
336/10000 = 3.4%, SPS: 1.0, ELAP: 5:51, ETA: 2:48:28 - loss: 10.0930
337/10000 = 3.4%, SPS: 1.0, ELAP: 5:52, ETA: 2:48:26 - loss: 10.3831
338/10000 = 3.4%, SPS: 1.0, ELAP: 5:53, ETA: 2:48:23 - loss: 9.8002
339/10000 = 3.4%, SPS: 1.0, ELAP: 5:54, ETA: 2:48:21 - loss: 10.8526
340/10000 = 3.4%, SPS: 1.0, ELAP: 5:55, ETA: 2:48:19 - loss: 9.4941
341/10000 = 3.4%, SPS: 1.0, ELAP: 5:56, ETA: 2:48:17 - loss: 9.4434
342/10000 = 3.4%, SPS: 1.0, ELAP: 5:57, ETA: 2:48:14 - loss: 10.4484
343/10000 = 3.4%, SPS: 1.0, ELAP: 5:58, ETA: 2:48:12 - loss: 10.0355
344/10000 = 3.4%, SPS: 1.0, ELAP: 5:59, ETA: 2:48:10 - loss: 10.0131

参考文献

[1].'adam_m not found in checkpoint ' when further pretraining. https://github.com/google-research/electra/issues/45

[2].Estimator should be able to partially load checkpoints. https://github.com/tensorflow/tensorflow/issues/10155

[3].Chinese-ELECTRA. https://github.com/ymcui/Chinese-ELECTRA

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

农民小飞侠

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值