在slim文件夹训练过程报错解决说明

最新推荐文章于 2023-04-20 21:31:27 发布

一颗温暖的心_lucky

最新推荐文章于 2023-04-20 21:31:27 发布

阅读量318

点赞数

分类专栏：深度学习框架文章标签： tensorflow 深度学习人工智能

本文链接：https://blog.csdn.net/weixin_41946146/article/details/90294209

版权

深度学习框架专栏收录该内容

11 篇文章 0 订阅

订阅专栏

Tensorflow报错

方法一：

在控制台运行一下代码slim中：出现以下运行错误

python train_image_classifier.py \
--train_dir=satellite/train_dir \
--dataset_name=satellite \
--dataset_split_name=train \
--dataset_dir=satellite/data \
--model_name=inception_v3 \
--checkpoint_path=satellite/pretrained/inception_v3.ckpt \
--checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits \
--trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits \
--max_number_of_steps=30000 --batch_size=32 --learning_rate=0.001 \
--learning_rate_decay_type=fixed --save_interval_secs=300 \
--save_summaries_secs=2 \
--log_every_n_steps=10 --optimizer=rmsprop --weight_decay=0.00004

在用slim中的train_image_classifier.py调用tfrecord数据进行分类时，错误如下：

意思是：大概意思是程序中指定使用GPU设备运行，但电脑没有GPU只有CPU，

参考修改

将train_image_classifier.py中的

tf.app.Flags.DEFINE_boolean('clone_on_cpu',False,'use CPUs to deploy clones.')

改为：

tf.app.Flags.DEFINE_boolean('clone_on_cpu',True,'use

CPUs to deploy clones.')

通过以上修改只是完成让CPU参与训练，速度还是很慢的，因此需要在GPU上才能加速训练，经过查找终于找到方法二

方法二：

将train_image_classifier.py中的最后几行稍微修改下即可：或则直接将帖子中代码直接替换掉就可以了。然后运行后发现运算速度很快。

不过此时的GPU温度比较高：

这是验证模型的结果，达到77%，训练1万多步

训练的指令：

python train_image_classifier.py \
--train_dir=satellite/train_dir \
--dataset_name=satellite \
--dataset_split_name=train \
--dataset_dir=satellite/data \
--model_name=inception_v3 \
--checkpoint_path=satellite/pretrained/inception_v3.ckpt \
--checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits \
--trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits --max_number_of_steps=30000 \
--batch_size=32 --learning_rate=0.001 --learning_rate_decay_type=fixed \
--save_interval_secs=300 --save_summaries_secs=2 --log_every_n_steps=10 \
--optimizer=rmsprop --weight_decay=0.00004

python train_image_classifier.py \

--train_dir=satellite/train_dir \

--dataset_name=satellite \

--dataset_split_name=train \

--dataset_dir=satellite/data \

--model_name=inception_v3 \

--checkpoint_path=satellite/pretrained/inception_v3.ckpt \

--checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits \

--trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits \

--max_number_of_steps=30000 \

--batch_size=32 \

--learning_rate=0.001 \

--learning_rate_decay_type=fixed \

--save_interval_secs=300 \

--save_summaries_secs=2 \

--log_every_n_steps=10 \

--optimizer=rmsprop \

--weight_decay=0.00004

InvalidArgumentError<see above for traceback>:Cannot assign a device for operation 'gradients/aux_loss/xentropy_grad/LogSoftmax':Could not satisfy explicit device specification'/device:GPU:0'because no supported kernel for GPU devices is available.

Registered kernels:

device='CPU';T in [DT_HALF]

device='CPU';T in [DT_FLOAT]

device='CPU';T in [DT_DOUBLE]

InvalidArgumentError<see above for traceback>:Cannot assign a device for operation 'InceptionV3AuxLogits/Conv2d_2b_1×1/weights/RMSProp_1':Could not satisfy explicit device specification'/device:GPU:0'because no supported kernel for GPU devices is available.

Registered kernels:

device='GPU';dtype in [DT_HALF]

device='GPU';dtype in [DT_FLOAT]

device='GPU';dtype in [DT_DOUBLE]

device='GPU';dtype in [DT_INT64]

验证训练模型的指令：

python eval_image_classifier.py \
--checkpoint_path=satellite/train_dir \
--eval_dir=satellite/eval_dir \
--dataset_name=satellite \
--dataset_split_name=validation --dataset_dir=satellite/data \
--model_name=inception_v3

python eval_image_classifier.py \

--checkpoint_path=satellite/train_dir \

--eval_dir=satellite/eval_dir \

--dataset_name=satellite \

--dataset_split_name=validation \

--dataset_dir=satellite/data \

--model_name=inception_v3

tensorflow.python.framework.errors_imp1.InternalError:Dsr tensor is not initialized

网络使用GPU训练时，一般当GPU显存被占满的时候会出现这个错误

导出模型并对单张图片进行识别

第一步将Inception V3的网络结构保存下来

python export_inference_graph.py \

--alsologtostderr \

--model_name=inception_v3 \

--output_file=satellite/inception_v3_inf_graph.pb \

--dataset_name satellite

python export_inference_graph.py \
--alsologtostderr --model_name=inception_v3 \
--output_file=satellite/inception_v3_inf_graph.pb \
--dataset_name satellite

第二步：将checkpoint中的模型参数保存进来

python freeze_graph.py \

--input_graph slim/satellite/inception_V3_inf_graph.pb \

--input_checkpoint slim/satellite/train_dir/model.ckpt-18789 \

--input_binary true \

--output_node_names InceptionV3/Predictions/Reshape_1 \

--output_graph slim/satellite/frozen_graph.pb

python freeze_graph.py --input_graph slim/satellite/inception_V3_inf_graph.pb \
--input_checkpoint slim/satellite/train_dir/model.ckpt-18789 --input_binary true \
--output_node_names InceptionV3/Predictions/Reshape_1 \
--output_graph slim/satellite/frozen_graph.pb

使用导出的模型识别图片

python classify_image_inception_v3.py \

--model_path slim/satellite/frozen_graph.pb \

--label_path data_prepare/pic/label.txt \

--image_file test_image.jpg

python classify_image_inception_v3.py \
--model_path slim/satellite/frozen_graph.pb \
--label_path data_prepare/pic/label.txt \
--image_file test_image.jpg

参考网址：tensorflow错误 - 简书

：http://wiki.jikexueyuan.com/project/tensorflow-zh/how_tos/using_gpu.html

：tensorflw-gpu 运行。py程序出现gpu不匹配的问题 - 在下小白 - 博客园

'InceptionV3/Predictions/Softmax': Could not satisfy explicit device specification '/device:GPU:0' · Issue #3118 · tensorflow/models · GitHub

一颗温暖的心_lucky

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
在slim文件夹训练过程报错解决说明

Tensorflow报错方法一：参考网址：https://www.jianshu.com/p/2636869f5e14 ：http://wiki.jikexueyuan.com/project/tensorf...
复制链接

扫一扫