INFO:tensorflow:global step 900: loss = 2.1905 (1.032 sec/step)
INFO:tensorflow:global step 910: loss = 2.7086 (1.192 sec/step)
INFO:tensorflow:global step 920: loss = 1.5002 (1.041 sec/step)
INFO:tensorflow:global step 930: loss = 1.7680 (1.033 sec/step)
INFO:tensorflow:Saving checkpoint to path /project/train/src_repo/training/model.ckpt
INFO:tensorflow:Recording summary at step 938.
INFO:tensorflow:global step 940: loss = 1.5987 (1.348 sec/step)
INFO:tensorflow:global step 950: loss = 2.0576 (1.185 sec/step)
INFO:tensorflow:global step 960: loss = 1.9906 (1.237 sec/step)
INFO:tensorflow:global step 970: loss = 2.2797 (1.186 sec/step)
INFO:tensorflow:global step 980: loss = 1.9031 (1.080 sec/step)
INFO:tensorflow:Saving checkpoint to path /project/train/src_repo/training/model.ckpt
INFO:tensorflow:Recording summary at step 986.
INFO:tensorflow:global step 990: loss = 1.1643 (1.237 sec/step)
INFO:tensorflow:global step 1000: loss = 2.0253 (1.130 sec/step)
从这部分打印看是能正常保存模型的,并且在训练过程中存储了多个checkpoint。
finetune_inceptionv3.sh: line 47: 643 Killed python train.py --train_dir=${TRAIN_DIR}/ --dataset_split_name=train --dataset_dir=${DATASET_DIR} --model_name=inception_resnet_v2 --checkpoint_path=${TRAIN_DIR}/pre-train/model.ckpt-300 --max_number_of_steps=1000 --batch_size=16 --learning_rate=0.0001 --learning_rate_decay_type=fixed --save_interval_secs=60 --save_summaries_secs=60 --log_every_n_steps=10 --optimizer=rmsprop --weight_decay=0.00004
Exporting and saving models to /project/train/models...
从这部分打印来看,在执行finetune inceptionv3的时候模型却写了v2的版本,结合之前的打印:
Assign requires shapes of both tensors to match. lhs shape= [3,3,128,768] rhs shape= [5,5,128,768]
建议检查训练模型和预训练模型是否匹配。