基于MediaPipe模型能力的多种人体关键点姿态估计可视化

背景介绍

人体姿态估计是计算机视觉(computer vision,CV)中一个有趣的任务,发展至今已有大量优秀的解决方案产生,较为经典的2D人体姿态估计解决方案有:

Github开源项目Pytorch-Human-Pose-Estimation提供了上述解决方案在MPIICOCO数据集上的统一训练框架。

值得注意的是,在使用该框架的过程中,由于其nms计算部分的代码修改自py-faster-rcnn项目,其Cython源代码编译生成的是.pyx文件,该文件在Linux系统上可直接运行,在Windows系统上不会被加载使用1(.pyx等python文件类型的介绍参照文章《详解Python文件: .py、.ipynb、.pyi、.pyc、​.pyd !》2)。

同时,训练上述解决方案需要大量的计算资源,并且在应用各解决方案时,可视化速度较慢,难以应对实时视频中人体关键点的姿态估计。在不考虑对解决方案进行算法创新时,MediaPipe提供的人体关键点姿态估计模型能力非常具有吸引力。

项目难点

对于MediaPipe与其模型能力不再赘述,感兴趣的可浏览其官方主页以及各解决方案的详细介绍,本项目侧重于应用其提供的模型能力进行多种人体关键点姿态估计可视化。

MediaPipe在进行姿态估计时,其内置的人体关键点如下图所示,共包含33个人体关键点。
MediaPipe内置的人体关键点
MediaPipe定义了其关键点的数据类型landmark,而非简单的列表或字典数据类型,并且MediaPipe封装了可视化方法draw_landmarks。因此本项目的主要难点在于分析其数据类型以及可视化方法,利用其数据类型以及可视化方法完成多种人体关键点的姿态估计可视化。

本项目以COCO数据集的人体关键点为例,介绍如何利用MediaPipe进行COCO数据集的人体关键点姿态估计可视化。

环境配置

本项目在Python环境下进行,使用的Python环境为mediapipe=0.10.11、opencv-python & opencv-contrib-python=4.10.0.84

其中opencv-contrib-python应当是mediapipe的依赖库之一,其为mediapipe安装时自带安装的,opencv-python则是为了与opencv-contrib-python保持一致,实际上非必要安装。

Step1:建立两组人体关键点映射

COCO数据集的人体关键点如下图所示,共包含17个人体关键点。
COCO数据集的人体关键点
将其与MediaPipe内置的人体关键点分别进行对应,可得到相同关键点处的索引的映射,代码实现如下:

# MediaPipe关键点对应的索引
mediapipe_keypoints_mapping = {
    "nose": 0, "left_eye_inner": 1, "left_eye": 2, "left_eye_outer": 3, "right_eye_inner": 4, "right_eye": 5,
    "right_eye_outer": 6, "left_ear": 7, "right_ear": 8, "mouth_left": 9, "mouth_right": 10, "left_shoulder": 11,
    "right_shoulder": 12, "left_elbow": 13, "right_elbow": 14, "left_wrist": 15, "right_wrist": 16,
    "left_pinky": 17, "right_pinky": 18, "left_index": 19, "right_index": 20, "left_thumb": 21, "right_thumb": 22,
    "left_hip": 23, "right_hip": 24, "left_knee": 25, "right_knee": 26, "left_ankle": 27, "right_ankle": 28,
    "left_heel": 29, "right_heel": 30, "left_foot_index": 31, "right_foot_index": 32
}
# COCO关键点对应的索引
coco_keypoints_mapping = {
    0: "nose", 1: "left_eye", 2: "right_eye", 3: "left_ear", 4: "right_ear", 5: "left_shoulder",
    6: "right_shoulder", 7: "left_elbow", 8: "right_elbow", 9: "left_wrist", 10: "right_wrist",
    11: "left_hip", 12: "right_hip", 13: "left_knee", 14: "right_knee", 15: "left_ankle", 16: "right_ankle"
}
# 根据COCO关键点对应的索引位置构建MediaPipe关键点映射字典
coco_to_mediapipe_mapping = {}
for coco_idx, coco_keypoint in coco_keypoints_mapping.items():
    if coco_keypoint in mediapipe_keypoints_mapping.keys():
        coco_to_mediapipe_mapping[coco_idx] = []
        mediapipe_idx = mediapipe_keypoints_mapping[coco_keypoint]
        coco_to_mediapipe_mapping[coco_idx].append(mediapipe_idx)

Step2:连接映射后的关键点

获取两组关键点的映射后,需要建立映射后的各关键点间的连接,以便MediaPipe进行各关键点间连接的可视化,比如COCO数据集的人体关键点中,关键点0(nose)需要与关键点1(left_eye)、2(right_eye)分别建立连接。各关键点间连接的代码如下所示:

coco_connections = frozenset({(0, 2), (2, 4), (4, 6), (6, 8), (8, 10), (6, 12), (12, 14), (14, 16), (12, 11), 
                              (6, 5), (2, 1), (0, 1), (1, 3), (3, 5), (5, 7), (7, 9), (5, 11), (11, 13), (13, 15)})

Step3:结合官方例程进行可视化

完成上述步骤后,结合官方例程即可利用MediaPipe提供的模型能力完成COCO数据集的人体关键点姿态估计可视化。

完整的代码如下所示:

import os

import cv2
import mediapipe
from mediapipe.framework.formats import landmark_pb2


def detect_landmarks(detector, image, keypoints_map, connections_map, display, drawer, draw_style):
    # 利用模型处理图像
    results = detector.process(image)
    # 获取检测到的landmarks
    detected_landmarks = results.pose_landmarks
    # print(detected_landmarks)

    # 如果检测到的landmarks不为空
    if detected_landmarks:
        mapped_landmarks = landmark_pb2.NormalizedLandmarkList()

        for key, value in keypoints_map.items():
            mapped_landmark = mapped_landmarks.landmark.add()
            temp_x = []
            temp_y = []
            temp_z = []
            temp_visibility = []
            for i in value:
                landmark = detected_landmarks.landmark[i]
                temp_x.append(landmark.x)
                temp_y.append(landmark.y)
                temp_z.append(landmark.z)
                temp_visibility.append(landmark.visibility)
            mapped_landmark.x = sum(temp_x) / len(temp_x)
            mapped_landmark.y = sum(temp_y) / len(temp_y)
            mapped_landmark.z = sum(temp_z) / len(temp_z)
            mapped_landmark.visibility = sum(temp_visibility) / len(temp_visibility)

        print(mediapipe.solutions.pose.POSE_CONNECTIONS)
        print(type(mediapipe.solutions.pose.POSE_CONNECTIONS))
        if display:
            drawer.draw_landmarks(
                image=image,
                landmark_list=mapped_landmarks,
                connections=connections_map,
                landmark_drawing_spec=drawer.DrawingSpec(color=draw_style["landmark"]["color"],
                                                         thickness=draw_style["landmark"]["thickness"],
                                                         circle_radius=draw_style["landmark"]["circle_radius"]),
                connection_drawing_spec=drawer.DrawingSpec(color=draw_style["connection"]["color"],
                                                           thickness=draw_style["connection"]["thickness"])
            )

        return True, image, mapped_landmarks
    else:
        return False, image, []


if __name__ == "__main__":
    path = 'videos/2.mp4'
    static = True
    if os.path.split(path)[0] == "videos":
        static = False

    pose_detector = mediapipe.solutions.pose.Pose(static_image_mode=static,
                                                  model_complexity=1,
                                                  smooth_landmarks=True,
                                                  enable_segmentation=True,
                                                  smooth_segmentation=True,
                                                  min_detection_confidence=0.5,
                                                  min_tracking_confidence=0.5)
    drawer = mediapipe.solutions.drawing_utils
    draw_style = {"landmark": {"color": (255, 0, 0), "thickness": 2, "circle_radius": 5},
                  "connection": {"color": (0, 255, 0), "thickness": 2}}

    # MediaPipe关键点对应的索引
    mediapipe_keypoints_mapping = {
        "nose": 0, "left_eye_inner": 1, "left_eye": 2, "left_eye_outer": 3, "right_eye_inner": 4, "right_eye": 5,
        "right_eye_outer": 6, "left_ear": 7, "right_ear": 8, "mouth_left": 9, "mouth_right": 10, "left_shoulder": 11,
        "right_shoulder": 12, "left_elbow": 13, "right_elbow": 14, "left_wrist": 15, "right_wrist": 16,
        "left_pinky": 17, "right_pinky": 18, "left_index": 19, "right_index": 20, "left_thumb": 21, "right_thumb": 22,
        "left_hip": 23, "right_hip": 24, "left_knee": 25, "right_knee": 26, "left_ankle": 27, "right_ankle": 28,
        "left_heel": 29, "right_heel": 30, "left_foot_index": 31, "right_foot_index": 32
    }

    # COCO关键点对应的索引
    coco_keypoints_mapping = {
        0: "nose", 1: "left_eye", 2: "right_eye", 3: "left_ear", 4: "right_ear", 5: "left_shoulder",
        6: "right_shoulder", 7: "left_elbow", 8: "right_elbow", 9: "left_wrist", 10: "right_wrist",
        11: "left_hip", 12: "right_hip", 13: "left_knee", 14: "right_knee", 15: "left_ankle", 16: "right_ankle"
    }

    # 根据COCO关键点对应的索引位置构建MediaPipe关键点映射字典
    coco_to_mediapipe_mapping = {}
    for coco_idx, coco_keypoint in coco_keypoints_mapping.items():
        if coco_keypoint in mediapipe_keypoints_mapping.keys():
            coco_to_mediapipe_mapping[coco_idx] = []
            mediapipe_idx = mediapipe_keypoints_mapping[coco_keypoint]
            coco_to_mediapipe_mapping[coco_idx].append(mediapipe_idx)

    # 打印结果
    print("COCO索引 -> MediaPipe索引 映射:")
    print(coco_to_mediapipe_mapping)

    coco_connections = frozenset({(0, 2), (2, 4), (4, 6), (6, 8), (8, 10), (6, 12), (12, 14), (14, 16), (12, 11),
                                  (6, 5), (2, 1), (0, 1), (1, 3), (3, 5), (5, 7), (7, 9), (5, 11), (11, 13), (13, 15)})
    if static:
        origin_image = cv2.imread(path)
        img = origin_image.copy()
        result, output, landmarks = detect_landmarks(pose_detector, img, coco_to_mediapipe_mapping, coco_connections,
                                                     True, drawer,
                                                     draw_style)
        cv2.namedWindow("Detected", cv2.WINDOW_NORMAL)
        cv2.resizeWindow("Detected", 480, 640)
        cv2.imshow("Detected", output)
        cv2.imwrite(path.replace("imgs", "outputs"), output)
        cv2.waitKey(0)

    else:
        cap = cv2.VideoCapture(path)

        width = int(cap.get(3))
        height = int(cap.get(4))
        # 保存视频
        out = cv2.VideoWriter(path.replace("videos", "videos/outputs"),
                              cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), 10, (width, height))

        while True:
            ret, image = cap.read()
            if not ret:
                print("Video Over")
                break

            img = image.copy()
            result, output, landmarks = detect_landmarks(pose_detector, img, coco_to_mediapipe_mapping,
                                                         coco_connections, True,
                                                         drawer, draw_style)
            cv2.namedWindow("Detected", cv2.WINDOW_NORMAL)
            cv2.resizeWindow("Detected", 480, 640)
            cv2.imshow("Detected", output)
            out.write(output)

            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

        cap.release()
    cv2.destroyAllWindows()

结果展示

在这里插入图片描述


  1. 成功解决:将后缀.pyx格式文件(linux环境)编译成pyd文件(windows环境下)实现python编程加载或导入 ↩︎

  2. 详解Python文件: .py、.ipynb、.pyi、.pyc、​.pyd ! ↩︎

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Hylan_J

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值