21个项目玩儿转Tensorflow的BUG记录

最新推荐文章于 2024-06-25 23:19:16 发布

sinat_18131557

最新推荐文章于 2024-06-25 23:19:16 发布

阅读量1.2k

点赞数 4

分类专栏： Python 文章标签： Python Tensorflow

本文链接：https://blog.csdn.net/sinat_18131557/article/details/88725361

版权

Python 专栏收录该内容

66 篇文章 40 订阅

订阅专栏

21个项目玩儿转Tensorflow的BUG记录

使用环境

win10+Python3.6+Tensorflow1.4。

BUG历程

第三章

运行data_convert.py出现错误：

Traceback (most recent call last):
  File "E:/03personal/DeepLearning/03IMG/data_prepare/data_convert.py", line 35, in <module>
    main(args)
  File "E:\03personal\DeepLearning\03IMG\data_prepare\src\tfrecord.py", line 409, in main
    command_args.validation_shards, command_args.labels_file, command_args)
  File "E:\03personal\DeepLearning\03IMG\data_prepare\src\tfrecord.py", line 361, in _process_dataset
    filenames, texts, labels = _find_image_files(directory, labels_file, command_args)
  File "E:\03personal\DeepLearning\03IMG\data_prepare\src\tfrecord.py", line 341, in _find_image_files
    random.shuffle(shuffled_index)
  File "C:\ProgramData\Anaconda3\lib\random.py", line 275, in shuffle
    x[i], x[j] = x[j], x[i]
TypeError: 'range' object does not support item assignment`

修改方式：
在shuffled_index = range(len(filenames))改为`shuffled_index = list(range(len(filenames)))
错误：

Traceback (most recent call last):
  File "E:/03personal/DeepLearning/03IMG/data_prepare/data_convert.py", line 35, in <module>
    main(args)
  File "E:\03personal\DeepLearning\03IMG\data_prepare\src\tfrecord.py", line 409, in main
    command_args.validation_shards, command_args.labels_file, command_args)
  File "E:\03personal\DeepLearning\03IMG\data_prepare\src\tfrecord.py", line 362, in _process_dataset
    _process_image_files(name, filenames, texts, labels, num_shards, command_args)
  File "E:\03personal\DeepLearning\03IMG\data_prepare\src\tfrecord.py", line 259, in _process_image_files
    for i in xrange(len(spacing) - 1):
NameError: name 'xrange' is not defined

修改方式
在for i in xrange(len(spacing) - 1):改为for i in range(len(spacing) - 1):

再次运行data_convert.py时出现下列错误：
UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xff in position 0: illega
TypeError：tf.train.Feature TypeError: ‘RGB’ has type str, but expected one of: bytes
TypeError: ‘water’ has type str, but expected one of: bytes

需要修改下列地方：

tfrecord.py第160行改为  with open(filename, 'rb') as f:
tfrecord.py第94和96行修改为  colorspace = b'RGB'     image_format = b'JPEG'
tfrecord.py第104行修改为  'image/class/text': _bytes_feature(str.encode(text)),
tfrecord.py第106行修改为   'image/filename':_bytes_feature(os.path.basename(str.encode(filename)))```

运行train_image_classifier.py出现错误：

Cannot assign a device for operation ‘InceptionV3/AuxLogits/Conv2d_2b_1x1/weights/RMSProp1’: Could not satisfy explicit device specification ‘/device:GPU:0’ because no supported kernel for GPU devices is available

修改方式

#修改代码

    ###########################
    # Kicks off the training. #
    ###########################
    config=tf.ConfigProto(allow_soft_placement=True)#修改这里
    slim.learning.train(
        train_tensor,
        logdir=FLAGS.train_dir,
        master=FLAGS.master,
        is_chief=(FLAGS.task == 0),
        init_fn=_get_init_fn(),
        summary_op=summary_op,
        number_of_steps=FLAGS.max_number_of_steps,
        log_every_n_steps=FLAGS.log_every_n_steps,
        save_summaries_secs=FLAGS.save_summaries_secs,
        save_interval_secs=FLAGS.save_interval_secs,
        sync_optimizer=optimizer if FLAGS.sync_replicas else None,
        session_config=config)

第四章

安装protoc问题

安装教程：https://blog.csdn.net/mr_jor/article/details/79071963
安装protoc时在models/research路径下cmd执行命令：
protoc object_detection/protos/*.proto --python_out=.

E:\03personal\DeepLearning\05ObjectDec\models\research>protoc object_detection/protos/*.proto --python_out=.
object_detection/protos/*.proto: No such file or directory

protoc版本高于3.5有BUG,使用3.4的。下载地址：https://github.com/google/protobuf/releases/tag/v3.4.0

E:\03personal\DeepLearning\05ObjectDec\models\research>protoc object_detection/protos/*.proto --python_out=.

E:\03personal\DeepLearning\05ObjectDec\models\research>

model_builder_test.py 问题

(base) E:\03personal\DeepLearning\05ObjectDec\models\research>python object_detection/builders/model_builder_test.py
C:\ProgramData\Anaconda3\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "object_detection/builders/model_builder_test.py", line 23, in <module>
    from object_detection.builders import model_builder
ModuleNotFoundError: No module named 'object_detection'

(base) E:\03personal\DeepLearning\05ObjectDec\models\research>SET PYTHONPATH=%cd%;%cd%\slim

run object_detection/builders/model_builder_test.py时候

object_detection/builders/model_builder_test.py:None (object_detection/builders/model_builder_test.py)
model_builder_test.py:23: in <module>
    from object_detection.builders import model_builder
model_builder.py:22: in <module>
    from object_detection.builders import box_predictor_builder
box_predictor_builder.py:20: in <module>
    from object_detection.predictors import convolutional_box_predictor
..\predictors\convolutional_box_predictor.py:19: in <module>
    from object_detection.core import box_predictor
..\core\box_predictor.py:137: in <module>
    class KerasBoxPredictor(tf.keras.Model):
E   AttributeError: module 'tensorflow.python.keras' has no attribute 'Model'

需要升级tensowflow 更新到版本1.12，到现在的1.14版本有其他问题

pip install -U tensorflow==1.12

在run model_builder_test.py出现错误No module named ‘nets’

object_detection/builders/model_builder_test.py:None (object_detection/builders/model_builder_test.py)
ImportError while importing test module 'E:\03personal\DeepLearning\05ObjectDec\models\research\object_detection\builders\model_builder_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
model_builder_test.py:25: in <module>
    from object_detection.builders import model_builder
model_builder.py:35: in <module>
    from object_detection.models import faster_rcnn_inception_resnet_v2_feature_extractor as frcnn_inc_res
..\models\faster_rcnn_inception_resnet_v2_feature_extractor.py:28: in <module>
    from nets import inception_resnet_v2
E   ModuleNotFoundError: No module named 'nets'

需要在model_builder_test.py文件最前面添加

import sys
sys.path.append("E:/03personal/DeepLearning/05ObjectDec/models")
sys.path.append("E:/03personal/DeepLearning/05ObjectDec/models/research/slim")
sys.path.append("E:/03personal/DeepLearning/05ObjectDec/models/research")

运行tutorial

The backend was *originally* set to 'Qt5Agg' by the following code:
  File "E:/03personal/DeepLearning/05ObjectDec/models/research/object_detection/object_detection_tutorial.py", line 12, in <module>
    from matplotlib import pyplot as plt
  File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\pyplot.py", line 71, in <module>
    from matplotlib.backends import pylab_setup
  File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\backends\__init__.py", line 16, in <module>
    line for line in traceback.format_stack()
  import matplotlib; matplotlib.use('Agg')  # pylint: disable=multiple-statements

修改：
import matplotlib
matplotlib.use(‘Agg’)
import matplotlib.pyplot as plt

第六章

运行第六章时候的环境为：
Win7+ pycharm + tensorflow1.6+ python3.6.4
出现导入facenet失败时；

Traceback (most recent call last):
  File "src/align/align_dataset_mtcnn.py", line 34, in <module>
    import facenet

ImportError: No module named 'facenet'

在文件前加入：
import sys
sys.path.append(“I:/github/DL_21tensorflow/06FaceDect/src”)
sys.path.append(“I:/github/DL_21tensorflow/06FaceDect”)
需要在anaconda prompt里运行，在文件位置正确的情况下，在I:\github\DL_21tensorflow\06FaceDect下运行

python  src/align/align_dataset_mtcnn.py   datasets/lfw/raw  datasets/lfw/lfw_mtcnnpy_160 --image_size 160 --margin 32 --random_order

如果出现Failed to get convolution algorithm. This is probably because cuDNN failed to initialize.....这样的错误，那么可能是现存不够，可以设置GPU限制：

Traceback (most recent call last):
  File "src/align/align_dataset_mtcnn.py", line 155, in <module>
    main(parse_arguments(sys.argv[1:]))
  File "src/align/align_dataset_mtcnn.py", line 104, in main
    bounding_boxes, _ = align.detect_face.detect_face(img, minsize, pnet, rnet, onet, threshold, factor)
  File "E:/03personal/DeepLearning/06FaceDec/src\align\detect_face.py", line 336, in detect_face
    out = pnet(img_y)
  File "E:/03personal/DeepLearning/06FaceDec/src\align\detect_face.py", line 299, in <lambda>
    pnet_fun = lambda img: sess.run(('pnet/conv4-2/BiasAdd:0', 'pnet/prob1:0'), feed_dict={'pnet/input:0': img})
  File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
    run_metadata_ptr)
  File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
    run_metadata)
  File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[node pnet/conv1/Conv2D (defined at E:/03personal/DeepLearning/06FaceDec/src\align\detect_face.py:154) ]]
         [[node pnet/prob1 (defined at E:/03personal/DeepLearning/06FaceDec/src\align\detect_face.py:215) ]]

在运行指令后加一个参数--gpu_memory_fraction 0.6，参数后是一个小于等于1的数，根据自己的机器改大小：

python src/align/align_dataset_mtcnn.py datasets/lfw/raw datasets/lfw/lfw_mtcnn_160 --image_size 160 --margin 32 --random_order --gpu_memory_fraction 0.6

验证数据集也在项目跟目录下运行文件，python validate_on_lfw.py datasets/lfw/lfw_mtcnnpy_160 models
前面是数据集文件夹位置，后面是模型文件夹位置

在项目根目录下运行validate_on_lfw.py

compare

第七章

运行eval.py时候遇到问题：

Traceback (most recent call last):
  File "I:/github/DL_21tensorflow/07StyleWand/eval.py", line 76, in <module>
    tf.app.run()
  File "C:\Program Files (x86)\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
    _sys.exit(main(argv))
  File "I:/github/DL_21tensorflow/07StyleWand/eval.py", line 46, in main
    generated = model.net(image, training=False)
  File "I:\github\DL_21tensorflow\07StyleWand\model.py", line 102, in net
    conv1 = relu(instance_norm(conv2d(image, 3, 32, 9, 1)))
  File "I:\github\DL_21tensorflow\07StyleWand\model.py", line 9, in conv2d
    x_padded = tf.pad(x, [[0, 0], [kernel / 2, kernel / 2], [kernel / 2, kernel / 2], [0, 0]], mode=mode)
  File "C:\Program Files (x86)\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1896, in pad
    tensor, paddings, mode="REFLECT", name=name)
  File "C:\Program Files (x86)\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 3341, in _mirror_pad
    "MirrorPad", input=input, paddings=paddings, mode=mode, name=name)
  File "C:\Program Files (x86)\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 609, in _apply_op_helper
    param_name=input_name)
  File "C:\Program Files (x86)\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 60, in _SatisfiesTypeConstraint
    ", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: Value passed to parameter 'paddings' has DataType float32 not in list of allowed values: int32, int64

需要将model.py的

x_padded = tf.pad(x, [[0, 0], [kernel / 2, kernel / 2], [kernel / 2, kernel / 2], [0, 0]], mode=mode)

修改为

        x_padded = tf.pad(x,
                          [[0, 0], [np.int(kernel / 2), np.int(kernel / 2)], [np.int(kernel / 2), np.int(kernel / 2)],
                           [0, 0]], mode=mode)

第八章

当运行python main.py --input_height 96 --input_width 96
–output_height 48 --output_width 48
–dataset anime --crop -–train
–epoch 300 --input_fname_pattern "*.jpg"时候出错：

    _sys.exit(main(argv))
  File "E:/03personal/DeepLearning/08GAN/main.py", line 86, in main
    raise Exception("[!] Train a model first, then run test mode")
Exception: [!] Train a model first, then run test mode

需要把运行的命令修正一下：

python main.py --input_height 96 --output_height 48 --dataset anime --crop True --train True --epoch 10

第十二章

运行sample.py时候：

AttributeError: 'str' object has no attribute 'decode'

将main函数里面的第一行代码注释掉。

# FLAGS.start_string = FLAGS.start_string.decode('utf-8')

sinat_18131557

关注

4
点赞
踩
8

收藏

觉得还不错? 一键收藏
14
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录