PFD_net和TransReID学习——二：HRNet提取人体关键点

Ce今天早睡了吗

已于 2023-11-21 16:55:03 修改

阅读量249

点赞数

文章标签：学习

于 2023-11-20 22:14:54 首次发布

本文链接：https://blog.csdn.net/jokerCe/article/details/134514509

版权

最近在学习行人重识别方向的《Pose-Guided Feature Disentangling for Occluded Person Re-identification Based on Transformer》文章和源码，这篇文章网络PDF的backbone是采用《TransReID: Transformer-based Object Re-Identification》的主干网络TransReID，所以下面在书写的时候会一起说明。
该篇文章是对TransReID源码中关于调用HRNet识别人体关键点并返回heatmap部分，网络图中存在红箭头标记的部分，代码位于model\pose_net.py的类SimpleHRNet。
在这里插入图片描述
这里主要调用了predict函数，_predict_single(self, image)函数是用于调用HRNet直接检测单张图片的，学习这个函数即可明白调用过程。

def _predict_single(self, image):
    # 判断是否只检测单人，在该文章中至用到该部分
    if not self.multiperson:
        # 记录原始图像分辨率
        old_res = image.shape
        # 如果有指定分辨率，则将图像缩放到这个分辨率
        if self.resolution is not None:
            image = cv2.resize(
                image,
                (self.resolution[1], self.resolution[0]),  # (width, height)
                interpolation=self.interpolation
            )
        # 对图像进行颜色空间转换并进行预处理，增加一个新维度以作为模型输入
        images = self.transform(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)).unsqueeze(dim=0)
        # 创建一个全0的边界框数组，代表整个图像
        boxes = np.asarray([[0, 0, old_res[1], old_res[0]]], dtype=np.float32)
        # 创建一个全0的热图数组，用于存储关节热图
        heatmaps = np.zeros((1, self.nof_joints, self.resolution[0] // 4, self.resolution[1] // 4),
                            dtype=np.float32)
    else:
        # 检测多人，本文中该部分不用
        detections = self.detector.predict_single(image)
        # 计算检测到的人数
        nof_people = len(detections) if detections is not None else 0
        # 初始化边界框和图像的数组
        boxes = np.empty((nof_people, 4), dtype=np.int32)
        images = torch.empty((nof_people, 3, self.resolution[0], self.resolution[1]))
        heatmaps = np.zeros((nof_people, self.nof_joints, self.resolution[0] // 4, self.resolution[1] // 4),
                            dtype=np.float32)

        # 如果检测到人体，对每个人体进行处理
        if detections is not None:
            for i, (x1, y1, x2, y2, conf, cls_conf, cls_pred) in enumerate(detections):
                # 调整边界框大小以适应HRNet输入的宽高比
                x1 = int(round(x1.item()))
                x2 = int(round(x2.item()))
                y1 = int(round(y1.item()))
                y2 = int(round(y2.item()))

                correction_factor = self.resolution[0] / self.resolution[1] * (x2 - x1) / (y2 - y1)
                if correction_factor > 1:
                    # 增加y边界
                    center = y1 + (y2 - y1) // 2
                    length = int(round((y2 - y1) * correction_factor))
                    y1 = max(0, center - length // 2)
                    y2 = min(image.shape[0], center + length // 2)
                elif correction_factor < 1:
                    # 增加x边界
                    center = x1 + (x2 - x1) // 2
                    length = int(round((x2 - x1) * 1 / correction_factor))
                    x1 = max(0, center - length // 2)
                    x2 = min(image.shape[1], center + length // 2)

                boxes[i] = [x1, y1, x2, y2]
                images[i] = self.transform(image[y1:y2, x1:x2, ::-1])

    # 如果有处理后的图像，则进行模型预测
    if images.shape[0] > 0:
        images = images.to(self.device)
        with torch.no_grad():
            # 直接预测或分批预测
            if len(images) <= self.max_batch_size:
                out = self.model(images)
            else:
                out = torch.empty(
                    (images.shape[0], self.nof_joints, self.resolution[0] // 4, self.resolution[1] // 4),
                    device=self.device
                )
                for i in range(0, len(images), self.max_batch_size):
                    out[i:i + self.max_batch_size] = self.model(images[i:i + self.max_batch_size])

        # 将模型输出转换为numpy数组
        out = out.detach().cpu().numpy()
        pts = np.empty((out.shape[0], out.shape[1], 3), dtype=np.float32)
        # 对每个人的每个关节，找到热图上的最大值位置，并计算其在原始图像中的坐标和置信度
        for i, human in enumerate(out):
            heatmaps[i] = human
            for j, joint in enumerate(human):
                pt = np.unravel_index(np.argmax(joint), (self.resolution[0] // 4, self.resolution[1] // 4))
                pts[i, j, 0] = pt[0] * 1. / (self.resolution[0] // 4) * (boxes[i][3] - boxes[i][1]) + boxes[i][1]
                pts[i, j, 1] = pt[1] * 1. / (self.resolution[1] // 4) * (boxes[i][2] - boxes[i][0]) + boxes[i][0]
                pts[i, j, 2] = joint[pt]
    else:
        # 如果没有检测到人体，则创建一个空的关节点数组
        pts = np.empty((0, 0, 3), dtype=np.float32)

    # 根据需要返回热图、边界框和关节点坐标
    res = list()
    if self.return_heatmaps:
        res.append(heatmaps)
    if self.return_bounding_boxes:
        res.append(boxes)
    res.append(pts)

    # 根据返回值的数量，返回结果
    if len(res) > 1:
        return res
    else:
        return res[0]

在该部分，其实我们需要的只是heatmap，对应的调用部分在model\make_pfd.py的forward方法中

    def forward(self, x, label=None, cam_label= None, view_label=None): #ht optinal

        bs, c, h, w = x.shape # [batch, 3, 256, 128]

        # HRNet:
        heatmaps, joints = self.pose.predict(x)
        heatmaps = torch.from_numpy(heatmaps).cuda()    #[bs, 17, 64, 32]

所以，我们应该关注predict函数中的下面这部分代码。out的维度应该是(human_index, joint_y, joint_x, confidence)

        if images.shape[0] > 0:
            images = images.to(self.device)

            with torch.no_grad():
                if len(images) <= self.max_batch_size:
                    out = self.model(images)

                else:
                    out = torch.empty(
                        (images.shape[0], self.nof_joints, self.resolution[0] // 4, self.resolution[1] // 4),
                        device=self.device
                    )
                    for i in range(0, len(images), self.max_batch_size):
                        out[i:i + self.max_batch_size] = self.model(images[i:i + self.max_batch_size])

            out = out.detach().cpu().numpy()
            pts = np.empty((out.shape[0], out.shape[1], 3), dtype=np.float32)
            # For each human, for each joint: y, x, confidence
            for i, human in enumerate(out):
                heatmaps[i] = human
                for j, joint in enumerate(human):
                    pt = np.unravel_index(np.argmax(joint), (self.resolution[0] // 4, self.resolution[1] // 4))
                    # 0: pt_y / (height // 4) * (bb_y2 - bb_y1) + bb_y1
                    # 1: pt_x / (width // 4) * (bb_x2 - bb_x1) + bb_x1
                    # 2: confidences
                    pts[i, j, 0] = pt[0] * 1. / (self.resolution[0] // 4) * (boxes[i][3] - boxes[i][1]) + boxes[i][1]
                    pts[i, j, 1] = pt[1] * 1. / (self.resolution[1] // 4) * (boxes[i][2] - boxes[i][0]) + boxes[i][0]
                    pts[i, j, 2] = joint[pt]

heatmaps的维度是[bs, 17, 64, 32]尝试print了heatmaps中的一个关节，结果如下，说明我们得到的其实是一个heatmap(其实就应该是，但是我比较较真- -)，反映了每一4*4像素代表当前部位的置信度。这样就能更清晰的展示我们从HRNet得到了什么。

print(heatmaps[0][1])
tensor([[ 2.3842e-05, 2.3842e-05, 2.3842e-05, …, 2.3842e-05, 2.3842e-05, 2.3842e-05],
[ 2.3842e-05, 2.3842e-05, 2.3842e-05, …, 2.3842e-05, 2.3842e-05, 2.3842e-05],
[ 2.3842e-05, 2.3842e-05, 2.3842e-05, …, 2.3842e-05, 2.3842e-05, 2.3842e-05],
…,
[ 2.3842e-05, 2.3842e-05, 2.2471e-05, …, 5.1856e-06, 2.3842e-05, 2.3842e-05],
[ 2.2411e-05, 2.2233e-05, 2.9743e-05, …, 9.9003e-05, -9.8050e-05, -9.8050e-05],
[ 2.0862e-05, 2.1398e-05, 1.8537e-05, …, -3.6538e-05, -1.1623e-04, -9.8050e-05]],
device=‘cuda:0’)

然后针对每一个关节点（此处为17）过滤阈值（0.3）：
在这里插入图片描述
，对应论文的以下部分。但是这里写的跟论文中不一致，这里把小于阈值的设置成为1，大于阈值的设置为0。但是因为后续处理一致，所以没有出错。

Ce今天早睡了吗

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
PFD_net和TransReID学习——二：HRNet提取人体关键点

最近在学习行人重识别方向的《Pose-Guided Feature Disentangling for Occluded Person Re-identification Based on Transformer》文章和源码，这篇文章网络PDF的backbone是采用《TransReID: Transformer-based Object Re-Identification》的主干网络TransReID，所以下面在书写的时候会一起说明。在该部分，其实我们需要的只是heatmap，对应的调用部分在。
复制链接

扫一扫