基于MediaPipe模型能力的多种人体关键点姿态估计可视化
背景介绍
人体姿态估计是计算机视觉(computer vision,CV)中一个有趣的任务,发展至今已有大量优秀的解决方案产生,较为经典的2D人体姿态估计解决方案有:
- 《DeepPose: Human Pose Estimation via Deep Neural Networks》
- 《Stacked Hourglass Networks for Human Pose Estimation》
- 《Chained Predictions Using Convolutional Neural Networks》
- 《Multi-Context Attention for Human Pose Estimation》
- 《Learning Feature Pyramids for Human Pose Estimation》
Github开源项目Pytorch-Human-Pose-Estimation提供了上述解决方案在MPII和COCO数据集上的统一训练框架。
值得注意的是,在使用该框架的过程中,由于其nms计算部分的代码修改自py-faster-rcnn项目,其Cython源代码编译生成的是.pyx文件,该文件在Linux系统上可直接运行,在Windows系统上不会被加载使用1(.pyx等python文件类型的介绍参照文章《详解Python文件: .py、.ipynb、.pyi、.pyc、.pyd !》2)。
同时,训练上述解决方案需要大量的计算资源,并且在应用各解决方案时,可视化速度较慢,难以应对实时视频中人体关键点的姿态估计。在不考虑对解决方案进行算法创新时,MediaPipe提供的人体关键点姿态估计模型能力非常具有吸引力。
项目难点
对于MediaPipe与其模型能力不再赘述,感兴趣的可浏览其官方主页以及各解决方案的详细介绍,本项目侧重于应用其提供的模型能力进行多种人体关键点姿态估计可视化。
MediaPipe在进行姿态估计时,其内置的人体关键点如下图所示,共包含33个人体关键点。
MediaPipe定义了其关键点的数据类型landmark,而非简单的列表或字典数据类型,并且MediaPipe封装了可视化方法draw_landmarks。因此本项目的主要难点在于分析其数据类型以及可视化方法,利用其数据类型以及可视化方法完成多种人体关键点的姿态估计可视化。
本项目以COCO数据集的人体关键点为例,介绍如何利用MediaPipe进行COCO数据集的人体关键点姿态估计可视化。
环境配置
本项目在Python环境下进行,使用的Python环境为mediapipe=0.10.11、opencv-python & opencv-contrib-python=4.10.0.84
。
其中opencv-contrib-python
应当是mediapipe
的依赖库之一,其为mediapipe
安装时自带安装的,opencv-python
则是为了与opencv-contrib-python
保持一致,实际上非必要安装。
Step1:建立两组人体关键点映射
COCO数据集的人体关键点如下图所示,共包含17个人体关键点。
将其与MediaPipe内置的人体关键点分别进行对应,可得到相同关键点处的索引的映射,代码实现如下:
# MediaPipe关键点对应的索引
mediapipe_keypoints_mapping = {
"nose": 0, "left_eye_inner": 1, "left_eye": 2, "left_eye_outer": 3, "right_eye_inner": 4, "right_eye": 5,
"right_eye_outer": 6, "left_ear": 7, "right_ear": 8, "mouth_left": 9, "mouth_right": 10, "left_shoulder": 11,
"right_shoulder": 12, "left_elbow": 13, "right_elbow": 14, "left_wrist": 15, "right_wrist": 16,
"left_pinky": 17, "right_pinky": 18, "left_index": 19, "right_index": 20, "left_thumb": 21, "right_thumb": 22,
"left_hip": 23, "right_hip": 24, "left_knee": 25, "right_knee": 26, "left_ankle": 27, "right_ankle": 28,
"left_heel": 29, "right_heel": 30, "left_foot_index": 31, "right_foot_index": 32
}
# COCO关键点对应的索引
coco_keypoints_mapping = {
0: "nose", 1: "left_eye", 2: "right_eye", 3: "left_ear", 4: "right_ear", 5: "left_shoulder",
6: "right_shoulder", 7: "left_elbow", 8: "right_elbow", 9: "left_wrist", 10: "right_wrist",
11: "left_hip", 12: "right_hip", 13: "left_knee", 14: "right_knee", 15: "left_ankle", 16: "right_ankle"
}
# 根据COCO关键点对应的索引位置构建MediaPipe关键点映射字典
coco_to_mediapipe_mapping = {}
for coco_idx, coco_keypoint in coco_keypoints_mapping.items():
if coco_keypoint in mediapipe_keypoints_mapping.keys():
coco_to_mediapipe_mapping[coco_idx] = []
mediapipe_idx = mediapipe_keypoints_mapping[coco_keypoint]
coco_to_mediapipe_mapping[coco_idx].append(mediapipe_idx)
Step2:连接映射后的关键点
获取两组关键点的映射后,需要建立映射后的各关键点间的连接,以便MediaPipe进行各关键点间连接的可视化,比如COCO数据集的人体关键点中,关键点0(nose)需要与关键点1(left_eye)、2(right_eye)分别建立连接。各关键点间连接的代码如下所示:
coco_connections = frozenset({(0, 2), (2, 4), (4, 6), (6, 8), (8, 10), (6, 12), (12, 14), (14, 16), (12, 11),
(6, 5), (2, 1), (0, 1), (1, 3), (3, 5), (5, 7), (7, 9), (5, 11), (11, 13), (13, 15)})
Step3:结合官方例程进行可视化
完成上述步骤后,结合官方例程即可利用MediaPipe提供的模型能力完成COCO数据集的人体关键点姿态估计可视化。
完整的代码如下所示:
import os
import cv2
import mediapipe
from mediapipe.framework.formats import landmark_pb2
def detect_landmarks(detector, image, keypoints_map, connections_map, display, drawer, draw_style):
# 利用模型处理图像
results = detector.process(image)
# 获取检测到的landmarks
detected_landmarks = results.pose_landmarks
# print(detected_landmarks)
# 如果检测到的landmarks不为空
if detected_landmarks:
mapped_landmarks = landmark_pb2.NormalizedLandmarkList()
for key, value in keypoints_map.items():
mapped_landmark = mapped_landmarks.landmark.add()
temp_x = []
temp_y = []
temp_z = []
temp_visibility = []
for i in value:
landmark = detected_landmarks.landmark[i]
temp_x.append(landmark.x)
temp_y.append(landmark.y)
temp_z.append(landmark.z)
temp_visibility.append(landmark.visibility)
mapped_landmark.x = sum(temp_x) / len(temp_x)
mapped_landmark.y = sum(temp_y) / len(temp_y)
mapped_landmark.z = sum(temp_z) / len(temp_z)
mapped_landmark.visibility = sum(temp_visibility) / len(temp_visibility)
print(mediapipe.solutions.pose.POSE_CONNECTIONS)
print(type(mediapipe.solutions.pose.POSE_CONNECTIONS))
if display:
drawer.draw_landmarks(
image=image,
landmark_list=mapped_landmarks,
connections=connections_map,
landmark_drawing_spec=drawer.DrawingSpec(color=draw_style["landmark"]["color"],
thickness=draw_style["landmark"]["thickness"],
circle_radius=draw_style["landmark"]["circle_radius"]),
connection_drawing_spec=drawer.DrawingSpec(color=draw_style["connection"]["color"],
thickness=draw_style["connection"]["thickness"])
)
return True, image, mapped_landmarks
else:
return False, image, []
if __name__ == "__main__":
path = 'videos/2.mp4'
static = True
if os.path.split(path)[0] == "videos":
static = False
pose_detector = mediapipe.solutions.pose.Pose(static_image_mode=static,
model_complexity=1,
smooth_landmarks=True,
enable_segmentation=True,
smooth_segmentation=True,
min_detection_confidence=0.5,
min_tracking_confidence=0.5)
drawer = mediapipe.solutions.drawing_utils
draw_style = {"landmark": {"color": (255, 0, 0), "thickness": 2, "circle_radius": 5},
"connection": {"color": (0, 255, 0), "thickness": 2}}
# MediaPipe关键点对应的索引
mediapipe_keypoints_mapping = {
"nose": 0, "left_eye_inner": 1, "left_eye": 2, "left_eye_outer": 3, "right_eye_inner": 4, "right_eye": 5,
"right_eye_outer": 6, "left_ear": 7, "right_ear": 8, "mouth_left": 9, "mouth_right": 10, "left_shoulder": 11,
"right_shoulder": 12, "left_elbow": 13, "right_elbow": 14, "left_wrist": 15, "right_wrist": 16,
"left_pinky": 17, "right_pinky": 18, "left_index": 19, "right_index": 20, "left_thumb": 21, "right_thumb": 22,
"left_hip": 23, "right_hip": 24, "left_knee": 25, "right_knee": 26, "left_ankle": 27, "right_ankle": 28,
"left_heel": 29, "right_heel": 30, "left_foot_index": 31, "right_foot_index": 32
}
# COCO关键点对应的索引
coco_keypoints_mapping = {
0: "nose", 1: "left_eye", 2: "right_eye", 3: "left_ear", 4: "right_ear", 5: "left_shoulder",
6: "right_shoulder", 7: "left_elbow", 8: "right_elbow", 9: "left_wrist", 10: "right_wrist",
11: "left_hip", 12: "right_hip", 13: "left_knee", 14: "right_knee", 15: "left_ankle", 16: "right_ankle"
}
# 根据COCO关键点对应的索引位置构建MediaPipe关键点映射字典
coco_to_mediapipe_mapping = {}
for coco_idx, coco_keypoint in coco_keypoints_mapping.items():
if coco_keypoint in mediapipe_keypoints_mapping.keys():
coco_to_mediapipe_mapping[coco_idx] = []
mediapipe_idx = mediapipe_keypoints_mapping[coco_keypoint]
coco_to_mediapipe_mapping[coco_idx].append(mediapipe_idx)
# 打印结果
print("COCO索引 -> MediaPipe索引 映射:")
print(coco_to_mediapipe_mapping)
coco_connections = frozenset({(0, 2), (2, 4), (4, 6), (6, 8), (8, 10), (6, 12), (12, 14), (14, 16), (12, 11),
(6, 5), (2, 1), (0, 1), (1, 3), (3, 5), (5, 7), (7, 9), (5, 11), (11, 13), (13, 15)})
if static:
origin_image = cv2.imread(path)
img = origin_image.copy()
result, output, landmarks = detect_landmarks(pose_detector, img, coco_to_mediapipe_mapping, coco_connections,
True, drawer,
draw_style)
cv2.namedWindow("Detected", cv2.WINDOW_NORMAL)
cv2.resizeWindow("Detected", 480, 640)
cv2.imshow("Detected", output)
cv2.imwrite(path.replace("imgs", "outputs"), output)
cv2.waitKey(0)
else:
cap = cv2.VideoCapture(path)
width = int(cap.get(3))
height = int(cap.get(4))
# 保存视频
out = cv2.VideoWriter(path.replace("videos", "videos/outputs"),
cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), 10, (width, height))
while True:
ret, image = cap.read()
if not ret:
print("Video Over")
break
img = image.copy()
result, output, landmarks = detect_landmarks(pose_detector, img, coco_to_mediapipe_mapping,
coco_connections, True,
drawer, draw_style)
cv2.namedWindow("Detected", cv2.WINDOW_NORMAL)
cv2.resizeWindow("Detected", 480, 640)
cv2.imshow("Detected", output)
out.write(output)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()