HRNet使用中遇到的问题记录

背景

类似文章:HRNET使用过程中的问题记录
我运行的环境是docker,pytorch1.4-cuda10.1-cudnn7-devel
运行的是HigherHRNet-Human-Pose-Estimation
运行代码

python tools/dist_train.py \
    --cfg experiments/coco/higher_hrnet/w32_512_adam_lr1e-3.yaml 

在这里插入图片描述在这里插入图片描述

遇到的问题

第一个错误

File "tools/dist_train.py", line 114, in main_worker
args=(ngpus_per_node, args, final_output_dir, tb_log_dir)

这部分代码是

    if cfg.MULTIPROCESSING_DISTRIBUTED:
        # Since we have ngpus_per_node processes per node, the total world_size
        # needs to be adjusted accordingly
        args.world_size = ngpus_per_node * args.world_size
        # Use torch.multiprocessing.spawn to launch distributed processes: the
        # main_worker process function
        mp.spawn(
            main_worker,
            nprocs=ngpus_per_node,
            args=(ngpus_per_node, args, final_output_dir, tb_log_dir)
        )
    else:
        # Simply call main_worker function
        main_worker(
            ','.join([str(i) for i in cfg.GPUS]),
            ngpus_per_node,
            args,
            final_output_dir,
            tb_log_dir
        )

MULTIPROCESSING_DISTRIBUTED这个参数默认为True,所以默认是执行分布式训练的,如果你的机器只有一张显卡,或是没有设置分布式的话就会报错。所以到lib/config/default.py下,把MULTIPROCESSING_DISTRIBUTED这个参数改为False

第二个错误

报错代码:

Traceback (most recent call last):
  File "tools/dist_train.py", line 319, in <module>
    main()
  File "tools/dist_train.py", line 123, in main
    tb_log_dir
  File "tools/dist_train.py", line 196, in main_worker
    writer_dict['writer'].add_graph(model, (dump_input, ))
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 707, in add_graph
    self._get_file_writer().add_graph(graph(model, input_to_model, verbose))
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 285, in graph
    trace = torch.jit.trace(model, args)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 882, in trace
    check_tolerance, _force_outplace, _module_class)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 1028, in trace_module
    module = make_module(mod, _module_class, _compilation_unit)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 727, in make_module
    return _module_class(mod, _compilation_unit=_compilation_unit)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 1836, in __init__
    tmp_module._modules[name] = make_module(submodule, TracedModule, _compilation_unit=None)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 727, in make_module
    return _module_class(mod, _compilation_unit=_compilation_unit)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 1836, in __init__
    tmp_module._modules[name] = make_module(submodule, TracedModule, _compilation_unit=None)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 727, in make_module
    return _module_class(mod, _compilation_unit=_compilation_unit)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 1836, in __init__
    tmp_module._modules[name] = make_module(submodule, TracedModule, _compilation_unit=None)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 727, in make_module
    return _module_class(mod, _compilation_unit=_compilation_unit)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 1836, in __init__
    tmp_module._modules[name] = make_module(submodule, TracedModule, _compilation_unit=None)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 727, in make_module
    return _module_class(mod, _compilation_unit=_compilation_unit)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 1836, in __init__
    tmp_module._modules[name] = make_module(submodule, TracedModule, _compilation_unit=None)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 727, in make_module
    return _module_class(mod, _compilation_unit=_compilation_unit)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 1809, in __init__
    assert(isinstance(orig, torch.nn.Module))
AssertionError

和这篇HRNET使用过程中的问题记录文章中写出的错误出奇的类似。
都是提示这句 writer_dict[‘writer’].add_graph(model, (dump_input, ))有问题
我用脚本查看category_id,annotations下的category_id都是等于1的。
我也试着将dist_train.py中的from tensorboardX import SummaryWriter换成from torch.utils.tensorboard import SummaryWriter。还是同样的报错。
给出我用来查找category_id的代码

import json
import os
j_path = 'data/coco/annotations/person_keypoints_train2017.json'  # coco中需要查找的json文件路径
with open(j_path, 'r', encoding='UTF-8') as f:
    info = json.loads(f.read(), strict=False)  # 读json文件,转为dict
    count = len(info['annotations'])  # 总共数据的个数
    num_id1 = 0  # category_id=1的个数
    num_id2 = 0  # category_id=0的个数
    ann_images_id = 0
    del_img_path = {}
    for i in range(0, count):
        ann_id = info['annotations'][i]['category_id']
        if ann_id == 1:
            num_id1 += 1
        else:
            del info['annotations'][i]  # 删除category_id=0的标签
            ann_images_id = info['annotations'][i]['image_id']  # 记录要删除的图片的地址
            num_id2 += 1
    img_count = len(info['images'])  # 总共数据的个数
    for i in range(0, img_count):
        images_id = info['images'][i]['id']
        if images_id == ann_images_id:  # 获取图片地址
            images_dir = 'data/coco/images/'+info['images'][i]['file_name']
            del_img_path.append(images_dir)  # 把图片地址放进del_img_path
            print(images_dir)
f.close()
for path in del_img_path:  # 删除要删除的图片文件
    os.remove(path)
print(num_id1, num_id2)  # 输出删除的个数

而且运行测试集时,
运行测试集代码:

python tools/valid.py \
    --cfg experiments/coco/higher_hrnet/w32_512_adam_lr1e-3.yaml \
    TEST.MODEL_FILE models/pytorch/pose_coco/pose_higher_hrnet_w32_512.pth

本来应该在log文件夹下生成的tensorboard的文件都没有正常生成。只是生成了空的文件夹。
我猜测:

  1. 我没有装好tensorboardx
  2. 关于tensorboardx生成的代码有问题
  3. tensorflow等版本太高

对于猜测1,我用其他程序实验了tensorboardx 是正常。
有没有大神知道是什么问题。

  • 1
    点赞
  • 19
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 10
    评论
评论 10
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

勇气的动力

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值