FAMI-Pose训练

文章讲述了尝试运行FAMI-Pose官方代码时遇到的一系列错误,包括模块未找到、导入错误、文件路径不正确等问题,以及如何通过修改代码进行解决。尽管最终实现了运行,但损失函数的计算存在差异,导致结果与论文中报道的有出入。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

之前写过FAMI-Pose的论文解析,最近跑了一下官方代码,链接是:FAMI-Pose,但有很多问题,感觉是不是作者上传错了。这篇博客讲一下FAMI-Pose的训练。

运行

首先,安装环境,这个根据官方requirement.txt来就行。数据集配置在DCPose训练那篇文章有讲解。主要跑的还是posetrack2017,运行命令和DCPose类似,进入tools文件夹,python run.py --cfg …/configs/Alignment/posetrack17/Alignment_V15.yaml --train --val即可。

报错与解决

一开始会有如下报错:

  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/posetrack/__init__.py", line 9, in <module>
    from .PoseTrack_Alignment import PoseTrack_Alignment
  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/posetrack/PoseTrack_Alignment.py", line 15, in <module>
    from datasets.process import get_affine_transform, fliplr_joints, exec_affine_transform, generate_heatmaps, \
  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/process/__init__.py", line 16, in <module>
    from .structure import *
  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/process/structure/__init__.py", line 9, in <module>
    from .keypoints_ord import coco2posetrack_ord, coco2posetrack_ord_infer,coco2jhmdb_ord_infer
  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/process/structure/keypoints_ord.py", line 10, in <module>
    from datasets.zoo.coco import COCO_joint, COCO_joint_paris
ModuleNotFoundError: No module named 'datasets.zoo.coco'

查看发现,在datasets下的zoo文件夹中没有coco这个东西,包括jhmdb也没有,只有posetrack,如下图所示。估计是作者忘记上传了?
在这里插入图片描述
但是keypoints_ord.py文件的函数中又用到了COCO_joint,所以我参考了DCPose的该文件,按照DCPose代码进行修改。引用posetrack中的两个东西,并注释掉其他的:

# from datasets.zoo.coco import COCO_joint, COCO_joint_paris
from datasets.zoo.posetrack import PoseTrack_Official_Keypoint_Ordering, PoseTrack_COCO_Keypoint_Ordering
# from datasets.zoo.posetrack.pose_topology import POSETRACK_joint
# from datasets.zoo.jhmdb.pose_topology import JHMDB_Keypoint_Ordering

将coco2posetrack_ord函数和coco2posetrack_ord_infer函数中的src_kps和dst_kps修改一下,并注释掉coco2jhmdb_ord_infer函数,因为跑posetrack用不到,然后把DCPose中zoo/posetrack/pose_skeleton.py粘到FAMI-Pose对应位置。

src_kps = PoseTrack_COCO_Keypoint_Ordering
dst_kps = PoseTrack_Official_Keypoint_Ordering

再次运行会发现有下面报错,无法导入

  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/posetrack/__init__.py", line 9, in <module>
    from .PoseTrack_Alignment import PoseTrack_Alignment
  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/posetrack/PoseTrack_Alignment.py", line 15, in <module>
    from datasets.process import get_affine_transform, fliplr_joints, exec_affine_transform, generate_heatmaps, \
  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/process/__init__.py", line 16, in <module>
    from .structure import *
  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/process/structure/__init__.py", line 9, in <module>
    from .keypoints_ord import coco2posetrack_ord, coco2posetrack_ord_infer,coco2jhmdb_ord_infer
  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/process/structure/keypoints_ord.py", line 11, in <module>
    from datasets.zoo.posetrack import PoseTrack_Official_Keypoint_Ordering, PoseTrack_COCO_Keypoint_Ordering
ImportError: cannot import name 'PoseTrack_Official_Keypoint_Ordering'

需要在zoo/posetrack/init.py文件中修改如下:

#from .PoseTrack_Alignment import PoseTrack_Alignment
from .pose_skeleton import *

因为引入PoseTrack_Alignment会造成循环引用。之后再次运行run.py,会有如下报错:

  File "/home/dsp/ljh/lab/FAMI-Pose/engine/core/__init__.py", line 12, in <module>
    from .functions import *
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/core/functions/__init__.py", line 7, in <module>
    from .alignment_mi_function_term6_1 import AlignmentMIFunction_Term6_V1
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/core/functions/alignment_mi_function_term6_1.py", line 28, in <module>
    from posetimation.loss.mse_loss import JointMSELoss
  File "/home/dsp/ljh/lab/FAMI-Pose/posetimation/loss/__init__.py", line 9, in <module>
    from .base import build_loss
  File "/home/dsp/ljh/lab/FAMI-Pose/posetimation/loss/base.py", line 11, in <module>
    from .integral_loss import IntegralMSELoss, IntegralL1Loss
ModuleNotFoundError: No module named 'posetimation.loss.integral_loss'

这个就是说没有这个loss,查看会发现压根没有integral_loss这个东西,只有mse_loss,也可能是作者忘记上传了。但这里其实也没有用到这个loss,所以就把这部分注释掉即可。
在这里插入图片描述
注释掉相关部分后,base.py文件就如下所示:

import logging

# from .integral_loss import IntegralMSELoss, IntegralL1Loss
from .mse_loss import JointMSELoss

logger = logging.getLogger(__name__)


def build_loss(cfg, **kwargs):
    if "NAME" in cfg.LOSS:
        logger.warning("NAME 将会在之后被删除,请使用NAMES")
        if cfg.LOSS.NAME == "MSELOSS":
            return JointMSELoss(cfg.LOSS.USE_TARGET_WEIGHT)
        # elif cfg.LOSS.NAME == "IntegralMSELoss":
        #     return IntegralMSELoss(True)
        # elif cfg.LOSS.NAME == "IntegralL1Loss":
        #     return IntegralL1Loss(True)

再次运行run.py会有如下报错:

Traceback (most recent call last):
  File "/home/dsp/ljh/lab/FAMI-Pose/tools/run.py", line 46, in <module>
    main()
  File "/home/dsp/ljh/lab/FAMI-Pose/tools/run.py", line 42, in main
    runner.launch()
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/runner.py", line 56, in launch
    trainer = DefaultTrainer(self.cfg, self.output_path_dict, PE_Name=self.args.PE_Name)
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/trainer.py", line 33, in __init__
    self.dataloader = build_train_loader(cfg)
  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/build.py", line 38, in build_train_loader
    dataset = DATASET_REGISTRY.get(dataset_name)(cfg=cfg, phase='train')
  File "/home/dsp/ljh/lab/FAMI-Pose/utils/utils_registry.py", line 71, in get
    name, self._name
KeyError: "No object named 'PoseTrack_Alignment' found in 'DATASET' registry!"

意思是PoseTrack_Alignment没有注册,我们需要修改datasets下的init.py文件,将PoseTrack_Alignment导入。将下面一行加入即可。

from .zoo.posetrack.PoseTrack_Alignment import PoseTrack_Alignment

再次运行,又会有如下报错:

  File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/__init__.py", line 16, in <module>
    from .runner import DefaultRunner
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/runner.py", line 11, in <module>
    from .trainer import DefaultTrainer
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/trainer.py", line 15, in <module>
    from datasets import build_train_loader
  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/__init__.py", line 12, in <module>
    from .zoo.posetrack.PoseTrack_Alignment import PoseTrack_Alignment
  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/posetrack/PoseTrack_Alignment.py", line 27, in <module>
    from thirdparty.clustering import k_means
ModuleNotFoundError: No module named 'thirdparty.clustering'

可以发现,在thirdparty文件夹下只有nms,没有clustering,如下图所示。
在这里插入图片描述
所以只能注释掉PoseTrack_Alignment.py文件中k_means这一行。其实这一行也是没有用到的。
再次运行会有如下报错:

  File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/runner.py", line 56, in launch
    trainer = DefaultTrainer(self.cfg, self.output_path_dict, PE_Name=self.args.PE_Name)
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/trainer.py", line 33, in __init__
    self.dataloader = build_train_loader(cfg)
  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/build.py", line 38, in build_train_loader
    dataset = DATASET_REGISTRY.get(dataset_name)(cfg=cfg, phase='train')
  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/posetrack/PoseTrack_Alignment.py", line 78, in __init__
    osp.join(self.json_dir, 'posetrack_train.json' if self.is_train else 'posetrack_val.json'))
  File "/home/dsp/.local/lib/python3.6/site-packages/pycocotools/coco.py", line 81, in __init__
    with open(annotation_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/dsp/ljh/lab/FAMI-Pose/DcPose_supp_files/posetrack17_json_files/posetrack_train.json'

这个问题就比较简单了,json文件路径不对,这里就需要修改Base_PoseTrack17.yaml中的一些路径,在上一篇DCPose训练那篇文章有讲过,基本就是json、图片和预训练模型的路径。修改成你自己的PoseTrack数据路径就行了。
然后再次运行run.py,会有如下报错:

  File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/runner.py", line 56, in launch
    trainer = DefaultTrainer(self.cfg, self.output_path_dict, PE_Name=self.args.PE_Name)
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/trainer.py", line 33, in __init__
    self.dataloader = build_train_loader(cfg)
  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/build.py", line 38, in build_train_loader
    dataset = DATASET_REGISTRY.get(dataset_name)(cfg=cfg, phase='train')
  File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/posetrack/PoseTrack_Alignment.py", line 96, in __init__
    '17val.json'))
  File "/home/dsp/ljh/lab/FAMI-Pose/utils/utils_json.py", line 14, in write_json_to_file
    with open(output_path, "w") as write_file:
FileNotFoundError: [Errno 2] No such file or directory: '/media/Z/frunyang/FAMI-Pose/thirdparty/clustering/pose_analysis/17val.json'

会发现在PoseTrack_Alignment.py文件中有下面的代码,是写的绝对路径:
在这里插入图片描述
这段代码意义不明,大概是跟聚类相关,但是应该是没用到的,所以将self.clustering改为False就行了。所以我感觉作者是不是传错了代码、、、
再次运行,会有如下报错:

  File "/home/dsp/ljh/lab/FAMI-Pose/tools/run.py", line 42, in main
    runner.launch()
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/runner.py", line 56, in launch
    trainer = DefaultTrainer(self.cfg, self.output_path_dict, PE_Name=self.args.PE_Name)
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/trainer.py", line 49, in __init__
    self.core_function = build_core_function(cfg, criterion=self.loss_criterion, **kwargs)
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/core/base.py", line 65, in build_core_function
    core_function = CORE_FUNCTION_REGISTRY.get(cfg.CORE_FUNCTION)(cfg, **kwargs)
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/core/functions/alignment_mi_function_term6_1.py", line 61, in __init__
    self.IntegralL1Loss_criterion = IntegralL1Loss()
NameError: name 'IntegralL1Loss' is not defined

这里是因为前面删掉了Integral这个Loss,将alignment_mi_function_term6_1.py中IntegralL1Loss和StructureCosineSimilarity这两行注释掉即可。其实就是有个定义,压根也没用到。

# self.IntegralL1Loss_criterion = IntegralL1Loss()
# self.StructureCosineSimilarityLoss_criterion = StructureCosineSimilarity()

再次运行,会有如下报错(已经说烦了,但这是最后一个了)

  File "/home/dsp/ljh/lab/FAMI-Pose/tools/run.py", line 42, in main
    runner.launch()
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/runner.py", line 57, in launch
    trainer.exec()
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/trainer.py", line 27, in exec
    self.train()
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/trainer.py", line 71, in train
    tb_writer_dict=self.tb_writer_dict)
  File "/home/dsp/ljh/lab/FAMI-Pose/engine/core/functions/alignment_mi_function_term6_1.py", line 104, in train
    pred_heatmaps, local_warped_sup_hm_list, kf_bb_heatmaps, mi_loss_list = model(input_x.cuda(), sup_x.cuda())
ValueError: not enough values to unpack (expected 4, got 3)

意思是只能解析出3个变量,但希望的是4个,查看相关代码发现forward函数在训练时返回的变量是3个:
在这里插入图片描述
因此将local_warped_sup_hm_list删掉,在下面加一行 local_warped_sup_hm_list=[] 即可。、
再次运行,出现以下情况就代表运行成功了:
在这里插入图片描述

互信息损失问题

在运行中会发现loss_MI损失函数都是负的,这个是互信息的损失函数,官方代码中计算kl散度时没有取对数,如下所示。

def feat_feat_mi_estimation(self, F1, F2):
    """
    F1: [B,48,96,72]
        F2: [B,48,96,72]
        F1 -> F2
    """
    batch_size = F1.shape[0]
    temperature = 0.05
    F1 = F1.reshape(batch_size, 48, -1).reshape(batch_size * 48, -1)
    F2 = F2.reshape(batch_size, 48, -1).reshape(batch_size * 48, -1)
    mi = kl_div(input=self.softmax(F1.detach() / temperature), target=self.softmax(F2 / temperature))

    return mi

kl_div的input参数需要使用log_softmax函数,这里只使用了softmax函数,所以才会有负数的loss。
但是奇怪的是,经过几个互信息损失函数的计算之后,最后的损失还是正的,但加上log_softmax就变成了负数了所以,我还是直接用这个代码跑了一个结果,当做参考吧。最后的结果如下:
在这里插入图片描述
结果是83.3,和论文的84.8差距有点大,我觉得可能是loss这儿有点问题,不过也不好说。

总结

这篇文章思路挺不错的,但是开源的代码问题真挺多的,不知道是不是作者传错了代码,我改完后与官方结果还是有较大差距,也不知道是啥问题,希望原作者或者有大佬能解释一下吧。

评论 35
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值