基于MediaPipe模型能力的多种人体关键点姿态估计可视化

最新推荐文章于 2025-03-20 16:57:22 发布

Hylan_J

最新推荐文章于 2025-03-20 16:57:22 发布

阅读量1k

点赞数 29

分类专栏：学习项目文章标签：人工智能 python 学习

本文链接：https://blog.csdn.net/qq_53457019/article/details/142672312

版权

学习项目专栏收录该内容

3 篇文章

订阅专栏

基于MediaPipe模型能力的多种人体关键点姿态估计可视化

背景介绍
项目难点
环境配置
Step1：建立两组人体关键点映射
Step2：连接映射后的关键点
Step3：结合官方例程进行可视化
结果展示

背景介绍

人体姿态估计是计算机视觉（computer vision，CV）中一个有趣的任务，发展至今已有大量优秀的解决方案产生，较为经典的2D人体姿态估计解决方案有：

Github开源项目Pytorch-Human-Pose-Estimation提供了上述解决方案在MPII和COCO数据集上的统一训练框架。

值得注意的是，在使用该框架的过程中，由于其nms计算部分的代码修改自py-faster-rcnn项目，其Cython源代码编译生成的是.pyx文件，该文件在Linux系统上可直接运行，在Windows系统上不会被加载使用¹（.pyx等python文件类型的介绍参照文章《详解Python文件： .py、.ipynb、.pyi、.pyc、.pyd ！》²）。

同时，训练上述解决方案需要大量的计算资源，并且在应用各解决方案时，可视化速度较慢，难以应对实时视频中人体关键点的姿态估计。在不考虑对解决方案进行算法创新时，MediaPipe提供的人体关键点姿态估计模型能力非常具有吸引力。

项目难点

对于MediaPipe与其模型能力不再赘述，感兴趣的可浏览其官方主页以及各解决方案的详细介绍，本项目侧重于应用其提供的模型能力进行多种人体关键点姿态估计可视化。

MediaPipe在进行姿态估计时，其内置的人体关键点如下图所示，共包含33个人体关键点。
MediaPipe内置的人体关键点
MediaPipe定义了其关键点的数据类型landmark，而非简单的列表或字典数据类型，并且MediaPipe封装了可视化方法draw_landmarks。因此本项目的主要难点在于分析其数据类型以及可视化方法，利用其数据类型以及可视化方法完成多种人体关键点的姿态估计可视化。

本项目以COCO数据集的人体关键点为例，介绍如何利用MediaPipe进行COCO数据集的人体关键点姿态估计可视化。

环境配置

本项目在Python环境下进行，使用的Python环境为mediapipe=0.10.11、opencv-python & opencv-contrib-python=4.10.0.84。

其中opencv-contrib-python应当是mediapipe的依赖库之一，其为mediapipe安装时自带安装的，opencv-python则是为了与opencv-contrib-python保持一致，实际上非必要安装。

Step1：建立两组人体关键点映射

COCO数据集的人体关键点如下图所示，共包含17个人体关键点。

将其与MediaPipe内置的人体关键点分别进行对应，可得到相同关键点处的索引的映射，代码实现如下：

# MediaPipe关键点对应的索引
mediapipe_keypoints_mapping = {
    "nose": 0, "left_eye_inner": 1, "left_eye": 2, "left_eye_outer": 3, "right_eye_inner": 4, "right_eye": 5,
    "right_eye_outer": 6, "left_ear": 7, "right_ear": 8, "mouth_left": 9, "mouth_right": 10, "left_shoulder": 11,
    "right_shoulder": 12, "left_elbow": 13, "right_elbow": 14, "left_wrist": 15, "right_wrist": 16,
    "left_pinky": 17, "right_pinky": 18, "left_index": 19, "right_index": 20, "left_thumb": 21, "right_thumb": 22,
    "left_hip": 23, "right_hip": 24, "left_knee": 25, "right_knee": 26, "left_ankle": 27, "right_ankle": 28,
    "left_heel": 29, "right_heel": 30, "left_foot_index": 31, "right_foot_index": 32
}
# COCO关键点对应的索引
coco_keypoints_mapping = {
    0: "nose", 1: "left_eye", 2: "right_eye", 3: "left_ear", 4: "right_ear", 5: "left_shoulder",
    6: "right_shoulder", 7: "left_elbow", 8: "right_elbow", 9: "left_wrist", 10: "right_wrist",
    11: "left_hip", 12: "right_hip", 13: "left_knee", 14: "right_knee", 15: "left_ankle", 16: "right_ankle"
}
# 根据COCO关键点对应的索引位置构建MediaPipe关键点映射字典
coco_to_mediapipe_mapping = {}
for coco_idx, coco_keypoint in coco_keypoints_mapping.items():
    if coco_keypoint in mediapipe_keypoints_mapping.keys():
        coco_to_mediapipe_mapping[coco_idx] = []
        mediapipe_idx = mediapipe_keypoints_mapping[coco_keypoint]
        coco_to_mediapipe_mapping[coco_idx].append(mediapipe_idx)

Step2：连接映射后的关键点

获取两组关键点的映射后，需要建立映射后的各关键点间的连接，以便MediaPipe进行各关键点间连接的可视化，比如COCO数据集的人体关键点中，关键点0（nose）需要与关键点1（left_eye）、2（right_eye）分别建立连接。各关键点间连接的代码如下所示：

coco_connections = frozenset({(0, 2), (2, 4), (4, 6), (6, 8), (8, 10), (6, 12), (12, 14), (14, 16), (12, 11), 
                              (6, 5), (2, 1), (0, 1), (1, 3), (3, 5), (5, 7), (7, 9), (5, 11), (11, 13), (13, 15)})

Step3：结合官方例程进行可视化

完成上述步骤后，结合官方例程即可利用MediaPipe提供的模型能力完成COCO数据集的人体关键点姿态估计可视化。

完整的代码如下所示：

import os

import cv2
import mediapipe
from mediapipe.framework.formats import landmark_pb2


def detect_landmarks(detector, image, keypoints_map, connections_map, display, drawer, draw_style):
    # 利用模型处理图像
    results = detector.process(image)
    # 获取检测到的landmarks
    detected_landmarks = results.pose_landmarks
    # print(detected_landmarks)

    # 如果检测到的landmarks不为空
    if detected_landmarks:
        mapped_landmarks = landmark_pb2.NormalizedLandmarkList()

        for key, value in keypoints_map.items():
            mapped_landmark = mapped_landmarks.landmark.add()
            temp_x = []
            temp_y = []
            temp_z = []
            temp_visibility = []
            for i in value:
                landmark = detected_landmarks.landmark[i]
                temp_x.append(landmark.x)
                temp_y.append(landmark.y)
                temp_z.append(landmark.z)
                temp_visibility.append(landmark.visibility)
            mapped_landmark.x = sum(temp_x) / len(temp_x)
            mapped_landmark.y = sum(temp_y) / len(temp_y)
            mapped_landmark.z = sum(temp_z) / len(temp_z)
            mapped_landmark.visibility = sum(temp_visibility) / len(temp_visibility)

        print(mediapipe.solutions.pose.POSE_CONNECTIONS)
        print(type(mediapipe.solutions.pose.POSE_CONNECTIONS))
        if display:
            drawer.draw_landmarks(
                image=image,
                landmark_list=mapped_landmarks,
                connections=connections_map,
                landmark_drawing_spec=drawer.DrawingSpec(color=draw_style["landmark"]["color"],
                                                         thickness=draw_style["landmark"]["thickness"],
                                                         circle_radius=draw_style["landmark"]["circle_radius"]),
                connection_drawing_spec=drawer.DrawingSpec(color=draw_style["connection"]["color"],
                                                           thickness=draw_style["connection"]["thickness"])
            )

        return True, image, mapped_landmarks
    else:
        return False, image, []


if __name__ == "__main__":
    path = 'videos/2.mp4'
    static = True
    if os.path.split(path)[0] == "videos":
        static = False

    pose_detector = mediapipe.solutions.pose.Pose(static_image_mode=static,
                                                  model_complexity=1,
                                                  smooth_landmarks=True,
                                                  enable_segmentation=True,
                                                  smooth_segmentation=True,
                                                  min_detection_confidence=0.5,
                                                  min_tracking_confidence=0.5)
    drawer = mediapipe.solutions.drawing_utils
    draw_style = {"landmark": {"color": (255, 0, 0), "thickness": 2, "circle_radius": 5},
                  "connection": {"color": (0, 255, 0), "thickness": 2}}

    # MediaPipe关键点对应的索引
    mediapipe_keypoints_mapping = {
        "nose": 0, "left_eye_inner": 1, "left_eye": 2, "left_eye_outer": 3, "right_eye_inner": 4, "right_eye": 5,
        "right_eye_outer": 6, "left_ear": 7, "right_ear": 8, "mouth_left": 9, "mouth_right": 10, "left_shoulder": 11,
        "right_shoulder": 12, "left_elbow": 13, "right_elbow": 14, "left_wrist": 15, "right_wrist": 16,
        "left_pinky": 17, "right_pinky": 18, "left_index": 19, "right_index": 20, "left_thumb": 21, "right_thumb": 22,
        "left_hip": 23, "right_hip": 24, "left_knee": 25, "right_knee": 26, "left_ankle": 27, "right_ankle": 28,
        "left_heel": 29, "right_heel": 30, "left_foot_index": 31, "right_foot_index": 32
    }

    # COCO关键点对应的索引
    coco_keypoints_mapping = {
        0: "nose", 1: "left_eye", 2: "right_eye", 3: "left_ear", 4: "right_ear", 5: "left_shoulder",
        6: "right_shoulder", 7: "left_elbow", 8: "right_elbow", 9: "left_wrist", 10: "right_wrist",
        11: "left_hip", 12: "right_hip", 13: "left_knee", 14: "right_knee", 15: "left_ankle", 16: "right_ankle"
    }

    # 根据COCO关键点对应的索引位置构建MediaPipe关键点映射字典
    coco_to_mediapipe_mapping = {}
    for coco_idx, coco_keypoint in coco_keypoints_mapping.items():
        if coco_keypoint in mediapipe_keypoints_mapping.keys():
            coco_to_mediapipe_mapping[coco_idx] = []
            mediapipe_idx = mediapipe_keypoints_mapping[coco_keypoint]
            coco_to_mediapipe_mapping[coco_idx].append(mediapipe_idx)

    # 打印结果
    print("COCO索引 -> MediaPipe索引 映射:")
    print(coco_to_mediapipe_mapping)

    coco_connections = frozenset({(0, 2), (2, 4), (4, 6), (6, 8), (8, 10), (6, 12), (12, 14), (14, 16), (12, 11),
                                  (6, 5), (2, 1), (0, 1), (1, 3), (3, 5), (5, 7), (7, 9), (5, 11), (11, 13), (13, 15)})
    if static:
        origin_image = cv2.imread(path)
        img = origin_image.copy()
        result, output, landmarks = detect_landmarks(pose_detector, img, coco_to_mediapipe_mapping, coco_connections,
                                                     True, drawer,
                                                     draw_style)
        cv2.namedWindow("Detected", cv2.WINDOW_NORMAL)
        cv2.resizeWindow("Detected", 480, 640)
        cv2.imshow("Detected", output)
        cv2.imwrite(path.replace("imgs", "outputs"), output)
        cv2.waitKey(0)

    else:
        cap = cv2.VideoCapture(path)

        width = int(cap.get(3))
        height = int(cap.get(4))
        # 保存视频
        out = cv2.VideoWriter(path.replace("videos", "videos/outputs"),
                              cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), 10, (width, height))

        while True:
            ret, image = cap.read()
            if not ret:
                print("Video Over")
                break

            img = image.copy()
            result, output, landmarks = detect_landmarks(pose_detector, img, coco_to_mediapipe_mapping,
                                                         coco_connections, True,
                                                         drawer, draw_style)
            cv2.namedWindow("Detected", cv2.WINDOW_NORMAL)
            cv2.resizeWindow("Detected", 480, 640)
            cv2.imshow("Detected", output)
            out.write(output)

            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

        cap.release()
    cv2.destroyAllWindows()