21个项目玩儿转Tensorflow的BUG记录
使用环境
win10+Python3.6+Tensorflow1.4。
BUG历程
第三章
运行data_convert.py出现错误:
Traceback (most recent call last):
File "E:/03personal/DeepLearning/03IMG/data_prepare/data_convert.py", line 35, in <module>
main(args)
File "E:\03personal\DeepLearning\03IMG\data_prepare\src\tfrecord.py", line 409, in main
command_args.validation_shards, command_args.labels_file, command_args)
File "E:\03personal\DeepLearning\03IMG\data_prepare\src\tfrecord.py", line 361, in _process_dataset
filenames, texts, labels = _find_image_files(directory, labels_file, command_args)
File "E:\03personal\DeepLearning\03IMG\data_prepare\src\tfrecord.py", line 341, in _find_image_files
random.shuffle(shuffled_index)
File "C:\ProgramData\Anaconda3\lib\random.py", line 275, in shuffle
x[i], x[j] = x[j], x[i]
TypeError: 'range' object does not support item assignment`
修改方式:
在shuffled_index = range(len(filenames))
改为`shuffled_index = list(range(len(filenames)))
错误:
Traceback (most recent call last):
File "E:/03personal/DeepLearning/03IMG/data_prepare/data_convert.py", line 35, in <module>
main(args)
File "E:\03personal\DeepLearning\03IMG\data_prepare\src\tfrecord.py", line 409, in main
command_args.validation_shards, command_args.labels_file, command_args)
File "E:\03personal\DeepLearning\03IMG\data_prepare\src\tfrecord.py", line 362, in _process_dataset
_process_image_files(name, filenames, texts, labels, num_shards, command_args)
File "E:\03personal\DeepLearning\03IMG\data_prepare\src\tfrecord.py", line 259, in _process_image_files
for i in xrange(len(spacing) - 1):
NameError: name 'xrange' is not defined
修改方式
在for i in xrange(len(spacing) - 1):
改为for i in range(len(spacing) - 1):
再次运行data_convert.py时出现下列错误:
UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xff in position 0: illega
TypeError:tf.train.Feature TypeError: ‘RGB’ has type str, but expected one of: bytes
TypeError: ‘water’ has type str, but expected one of: bytes
需要修改下列地方:
tfrecord.py第160行改为 with open(filename, 'rb') as f:
tfrecord.py第94和96行修改为 colorspace = b'RGB' image_format = b'JPEG'
tfrecord.py第104行修改为 'image/class/text': _bytes_feature(str.encode(text)),
tfrecord.py第106行修改为 'image/filename':_bytes_feature(os.path.basename(str.encode(filename)))```
运行train_image_classifier.py出现错误:
Cannot assign a device for operation ‘InceptionV3/AuxLogits/Conv2d_2b_1x1/weights/RMSProp1’: Could not satisfy explicit device specification ‘/device:GPU:0’ because no supported kernel for GPU devices is available
修改方式
#修改代码
###########################
# Kicks off the training. #
###########################
config=tf.ConfigProto(allow_soft_placement=True)#修改这里
slim.learning.train(
train_tensor,
logdir=FLAGS.train_dir,
master=FLAGS.master,
is_chief=(FLAGS.task == 0),
init_fn=_get_init_fn(),
summary_op=summary_op,
number_of_steps=FLAGS.max_number_of_steps,
log_every_n_steps=FLAGS.log_every_n_steps,
save_summaries_secs=FLAGS.save_summaries_secs,
save_interval_secs=FLAGS.save_interval_secs,
sync_optimizer=optimizer if FLAGS.sync_replicas else None,
session_config=config)
第四章
安装protoc问题
安装教程:https://blog.csdn.net/mr_jor/article/details/79071963
安装protoc时在models/research路径下cmd执行命令:
protoc object_detection/protos/*.proto --python_out=.
E:\03personal\DeepLearning\05ObjectDec\models\research>protoc object_detection/protos/*.proto --python_out=.
object_detection/protos/*.proto: No such file or directory
protoc版本高于3.5有BUG,使用3.4的。下载地址:https://github.com/google/protobuf/releases/tag/v3.4.0
E:\03personal\DeepLearning\05ObjectDec\models\research>protoc object_detection/protos/*.proto --python_out=.
E:\03personal\DeepLearning\05ObjectDec\models\research>
model_builder_test.py 问题
(base) E:\03personal\DeepLearning\05ObjectDec\models\research>python object_detection/builders/model_builder_test.py
C:\ProgramData\Anaconda3\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "object_detection/builders/model_builder_test.py", line 23, in <module>
from object_detection.builders import model_builder
ModuleNotFoundError: No module named 'object_detection'
(base) E:\03personal\DeepLearning\05ObjectDec\models\research>SET PYTHONPATH=%cd%;%cd%\slim
run object_detection/builders/model_builder_test.py时候
object_detection/builders/model_builder_test.py:None (object_detection/builders/model_builder_test.py)
model_builder_test.py:23: in <module>
from object_detection.builders import model_builder
model_builder.py:22: in <module>
from object_detection.builders import box_predictor_builder
box_predictor_builder.py:20: in <module>
from object_detection.predictors import convolutional_box_predictor
..\predictors\convolutional_box_predictor.py:19: in <module>
from object_detection.core import box_predictor
..\core\box_predictor.py:137: in <module>
class KerasBoxPredictor(tf.keras.Model):
E AttributeError: module 'tensorflow.python.keras' has no attribute 'Model'
需要升级tensowflow 更新到版本1.12,到现在的1.14版本有其他问题
pip install -U tensorflow==1.12
在run model_builder_test.py出现错误No module named ‘nets’
object_detection/builders/model_builder_test.py:None (object_detection/builders/model_builder_test.py)
ImportError while importing test module 'E:\03personal\DeepLearning\05ObjectDec\models\research\object_detection\builders\model_builder_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
model_builder_test.py:25: in <module>
from object_detection.builders import model_builder
model_builder.py:35: in <module>
from object_detection.models import faster_rcnn_inception_resnet_v2_feature_extractor as frcnn_inc_res
..\models\faster_rcnn_inception_resnet_v2_feature_extractor.py:28: in <module>
from nets import inception_resnet_v2
E ModuleNotFoundError: No module named 'nets'
需要在model_builder_test.py文件最前面添加
import sys
sys.path.append("E:/03personal/DeepLearning/05ObjectDec/models")
sys.path.append("E:/03personal/DeepLearning/05ObjectDec/models/research/slim")
sys.path.append("E:/03personal/DeepLearning/05ObjectDec/models/research")
运行tutorial
The backend was *originally* set to 'Qt5Agg' by the following code:
File "E:/03personal/DeepLearning/05ObjectDec/models/research/object_detection/object_detection_tutorial.py", line 12, in <module>
from matplotlib import pyplot as plt
File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\pyplot.py", line 71, in <module>
from matplotlib.backends import pylab_setup
File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\backends\__init__.py", line 16, in <module>
line for line in traceback.format_stack()
import matplotlib; matplotlib.use('Agg') # pylint: disable=multiple-statements
修改:
import matplotlib
matplotlib.use(‘Agg’)
import matplotlib.pyplot as plt
第六章
运行第六章时候的环境为:
Win7+ pycharm + tensorflow1.6+ python3.6.4
出现导入facenet失败时;
Traceback (most recent call last):
File "src/align/align_dataset_mtcnn.py", line 34, in <module>
import facenet
ImportError: No module named 'facenet'
在文件前加入:
import sys
sys.path.append(“I:/github/DL_21tensorflow/06FaceDect/src”)
sys.path.append(“I:/github/DL_21tensorflow/06FaceDect”)
需要在anaconda prompt里运行,在文件位置正确的情况下,在I:\github\DL_21tensorflow\06FaceDect下运行
python src/align/align_dataset_mtcnn.py datasets/lfw/raw datasets/lfw/lfw_mtcnnpy_160 --image_size 160 --margin 32 --random_order
如果出现Failed to get convolution algorithm. This is probably because cuDNN failed to initialize.....
这样的错误,那么可能是现存不够,可以设置GPU限制:
Traceback (most recent call last):
File "src/align/align_dataset_mtcnn.py", line 155, in <module>
main(parse_arguments(sys.argv[1:]))
File "src/align/align_dataset_mtcnn.py", line 104, in main
bounding_boxes, _ = align.detect_face.detect_face(img, minsize, pnet, rnet, onet, threshold, factor)
File "E:/03personal/DeepLearning/06FaceDec/src\align\detect_face.py", line 336, in detect_face
out = pnet(img_y)
File "E:/03personal/DeepLearning/06FaceDec/src\align\detect_face.py", line 299, in <lambda>
pnet_fun = lambda img: sess.run(('pnet/conv4-2/BiasAdd:0', 'pnet/prob1:0'), feed_dict={'pnet/input:0': img})
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
run_metadata_ptr)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
run_metadata)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node pnet/conv1/Conv2D (defined at E:/03personal/DeepLearning/06FaceDec/src\align\detect_face.py:154) ]]
[[node pnet/prob1 (defined at E:/03personal/DeepLearning/06FaceDec/src\align\detect_face.py:215) ]]
在运行指令后加一个参数--gpu_memory_fraction 0.6
,参数后是一个小于等于1的数,根据自己的机器改大小:
python src/align/align_dataset_mtcnn.py datasets/lfw/raw datasets/lfw/lfw_mtcnn_160 --image_size 160 --margin 32 --random_order --gpu_memory_fraction 0.6
验证数据集也在项目跟目录下运行文件,python validate_on_lfw.py datasets/lfw/lfw_mtcnnpy_160 models
前面是数据集文件夹位置,后面是模型文件夹位置
第七章
运行eval.py时候遇到问题:
Traceback (most recent call last):
File "I:/github/DL_21tensorflow/07StyleWand/eval.py", line 76, in <module>
tf.app.run()
File "C:\Program Files (x86)\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
_sys.exit(main(argv))
File "I:/github/DL_21tensorflow/07StyleWand/eval.py", line 46, in main
generated = model.net(image, training=False)
File "I:\github\DL_21tensorflow\07StyleWand\model.py", line 102, in net
conv1 = relu(instance_norm(conv2d(image, 3, 32, 9, 1)))
File "I:\github\DL_21tensorflow\07StyleWand\model.py", line 9, in conv2d
x_padded = tf.pad(x, [[0, 0], [kernel / 2, kernel / 2], [kernel / 2, kernel / 2], [0, 0]], mode=mode)
File "C:\Program Files (x86)\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1896, in pad
tensor, paddings, mode="REFLECT", name=name)
File "C:\Program Files (x86)\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 3341, in _mirror_pad
"MirrorPad", input=input, paddings=paddings, mode=mode, name=name)
File "C:\Program Files (x86)\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 609, in _apply_op_helper
param_name=input_name)
File "C:\Program Files (x86)\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 60, in _SatisfiesTypeConstraint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: Value passed to parameter 'paddings' has DataType float32 not in list of allowed values: int32, int64
需要将model.py的
x_padded = tf.pad(x, [[0, 0], [kernel / 2, kernel / 2], [kernel / 2, kernel / 2], [0, 0]], mode=mode)
修改为
x_padded = tf.pad(x,
[[0, 0], [np.int(kernel / 2), np.int(kernel / 2)], [np.int(kernel / 2), np.int(kernel / 2)],
[0, 0]], mode=mode)
第八章
当运行python main.py --input_height 96 --input_width 96
–output_height 48 --output_width 48
–dataset anime --crop -–train
–epoch 300 --input_fname_pattern "*.jpg"时候出错:
_sys.exit(main(argv))
File "E:/03personal/DeepLearning/08GAN/main.py", line 86, in main
raise Exception("[!] Train a model first, then run test mode")
Exception: [!] Train a model first, then run test mode
需要把运行的命令修正一下:
python main.py --input_height 96 --output_height 48 --dataset anime --crop True --train True --epoch 10
第十二章
运行sample.py时候:
AttributeError: 'str' object has no attribute 'decode'
将main函数里面的第一行代码注释掉。
# FLAGS.start_string = FLAGS.start_string.decode('utf-8')