MediaPipe与YOLO已训练模型实现可视化人脸和手势关键点检测

项目首页 - ZiTai_YOLOV11:基于前沿的 MediaPipe 技术与先进的 YOLOv11 预测试模型,精心打造一款强大的实时检测应用。该应用无缝连接摄像头,精准捕捉画面,能即时实现人脸检测、手势识别以及骨骼关键点检测,将检测结果实时、直观地呈现在屏幕上,为用户带来高效且便捷的视觉分析体验 。 - GitCode

一、技术原理

  1. MediaPipe 原理:MediaPipe 是一款由谷歌开发的跨平台机器学习框架,在人脸和手势关键点检测方面表现出色。它通过一系列的机器学习模型和算法,对输入的图像或视频流进行处理。对于人脸检测,MediaPipe 利用其预训练的人脸检测模型,能够快速准确地定位人脸在图像中的位置。接着,通过人脸关键点检测模型,基于深度学习算法学习人脸图像的特征,从而精确地预测出人脸的各个关键点,如眼睛、鼻子、嘴巴、脸颊等部位的关键点坐标,这些关键点能够完整地描述人脸的形状和姿态。对于手势检测,MediaPipe 同样依赖其预训练模型,分析手部的图像特征,识别出手部的各个关节点,并确定手势的姿态和关键点位置,进而实现对手势的理解和分析。
  2. YOLO 原理:YOLO(You Only Look Once)系列算法是当下流行的实时目标检测算法。其核心在于将输入图像划分为多个网格,每个网格负责预测目标的类别、位置和大小。在人脸和手势检测场景中,YOLO 模型通过在大量包含人脸和手势的图像数据集上进行训练,学习到人脸和手势的特征模式。当输入新的图像时,模型能够快速判断图像中是否存在人脸或手势,并预测出它们的边界框位置。为了进一步实现关键点检测,可结合其他基于深度学习的关键点检测算法,在检测到的人脸和手势区域内,精准预测出关键点的坐标。

二、 实现步骤

1.环境搭建

        1.使用Anaconda创建一个虚拟环境(建议使用Python3.6以上的环境)

conda create -n ZiTai python=3.8

        2.激活刚创建的虚拟环境

conda activate ZiTai

        3.安装PyTorch

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

        4.安装必要的库(open-cv,ultralytics,mediapipe)

pip install opencv - python mediapipe ultralytics

        5.在PyCharm中导入虚拟环境

 

        6.到此环境搭建已经结束了 

2.代码实现

        1.导入必要的库

import cv2
import mediapipe as mp
from ultralytics import YOLO
  • cv2:OpenCV 库,用于计算机视觉任务,如读取摄像头帧、绘制图形和显示图像。
  • mediapipe:Google 开发的跨平台框架,用于构建多模式机器学习应用程序,这里用于手部和面部关键点检测。
  • YOLO:来自ultralytics库,用于加载和运行 YOLO(You Only Look Once)目标检测模型。

        2. 定义常量

YOLO_MODEL_PATH = "GesTure.pt"
GESTURE_BASE_DISTANCE = 0.3
HAND_MAX_NUM = 2
HAND_MIN_DETECTION_CONFIDENCE = 0.5
HAND_MIN_TRACKING_CONFIDENCE = 0.5
FACE_MAX_NUM = 1
FACE_MIN_DETECTION_CONFIDENCE = 0.5
FACE_MIN_TRACKING_CONFIDENCE = 0.5
TEXT_FONT = cv2.FONT_HERSHEY_SIMPLEX
TEXT_SCALE = 1
TEXT_COLOR = (0, 0, 255)
TEXT_THICKNESS = 2
TEXT_POSITION = (50, 50)
EXIT_KEY = ord('q')
HAND_DRAWING_SPEC_1 = mp.solutions.drawing_utils.DrawingSpec(color=(255, 0, 0), thickness=2, circle_radius=2)
HAND_DRAWING_SPEC_2 = mp.solutions.drawing_utils.DrawingSpec(color=(0, 255, 0), thickness=2)
FACE_DRAWING_SPEC_1 = mp.solutions.drawing_utils.DrawingSpec(color=(0, 255, 0), thickness=1, circle_radius=1)
FACE_DRAWING_SPEC_2 = mp.solutions.drawing_utils.DrawingSpec(color=(0, 255, 255), thickness=1)
  • 这些常量定义了模型路径、检测参数、文本显示样式、退出键以及绘制手部和面部关键点的样式。

        3. 加载 YOLO 模型

def load_yolo_model(model_path):
    try:
        return YOLO(model_path)
    except Exception as e:
        print(f"Failed to load YOLO model: {e}")
        raise
  • 该函数尝试加载指定路径的 YOLO 模型,如果加载失败,会打印错误信息并抛出异常。

        4. 摄像头捕获类

class CameraCapture:
    def __init__(self):
        self.cap = cv2.VideoCapture(0)
        if not self.cap.isOpened():
            print("Failed to open camera")
            raise Exception("Camera could not be opened.")

    def __enter__(self):
        return self.cap

    def __exit__(self, exc_type, exc_val, exc_tb):
        if self.cap.isOpened():
            self.cap.release()
  • __init__:初始化摄像头捕获对象,如果无法打开摄像头,会打印错误信息并抛出异常。
  • __enter__:实现上下文管理器的进入方法,返回摄像头捕获对象。
  • __exit__:实现上下文管理器的退出方法,释放摄像头资源。

        5. 计算两点之间的距离

def distance(m, n):
    return ((n.x - m.x) ** 2 + (n.y - m.y) ** 2) ** 0.5
  • 该函数接受两个点mn,返回它们之间的欧几里得距离。

        6.  手势检测函数

def detect_gesture(handLms):
    distance_0_8 = distance(handLms.landmark[0], handLms.landmark[8])
    distance_0_12 = distance(handLms.landmark[0], handLms.landmark[12])
    distance_0_16 = distance(handLms.landmark[0], handLms.landmark[16])
    distance_0_20 = distance(handLms.landmark[0], handLms.landmark[20])

    gesture = "One"
    if distance_0_8 >= GESTURE_BASE_DISTANCE and distance_0_12 >= GESTURE_BASE_DISTANCE and \
            distance_0_16 < GESTURE_BASE_DISTANCE and distance_0_20 < GESTURE_BASE_DISTANCE:
        gesture = "Scissor"
    elif distance_0_8 >= GESTURE_BASE_DISTANCE and distance_0_12 >= GESTURE_BASE_DISTANCE and \
            distance_0_16 >= GESTURE_BASE_DISTANCE and distance_0_20 >= GESTURE_BASE_DISTANCE:
        gesture = "Paper"
    elif distance_0_8 < GESTURE_BASE_DISTANCE and distance_0_12 < GESTURE_BASE_DISTANCE and \
            distance_0_16 < GESTURE_BASE_DISTANCE and distance_0_20 < GESTURE_BASE_DISTANCE:
        gesture = "Rock"
    return gesture
  • 该函数根据手部关键点之间的距离判断手势,返回相应的手势名称。

        7. 面部网格检测函数

def face_mesh_detection(image, face_mesh, mp_drawing, mp_face_mesh):
    results = face_mesh.process(image)
    if results.multi_face_landmarks:
        for face_landmarks in results.multi_face_landmarks:
            mp_drawing.draw_landmarks(
                image, face_landmarks, mp_face_mesh.FACEMESH_CONTOURS,
                FACE_DRAWING_SPEC_1,
                FACE_DRAWING_SPEC_2
            )
    return image

        8.主函数

def main():
    try:
        model = load_yolo_model(YOLO_MODEL_PATH)
        myDraw = mp.solutions.drawing_utils
        mpHands = mp.solutions.hands
        hands = mpHands.Hands(
            static_image_mode=False,
            max_num_hands=HAND_MAX_NUM,
            min_detection_confidence=HAND_MIN_DETECTION_CONFIDENCE,
            min_tracking_confidence=HAND_MIN_TRACKING_CONFIDENCE
        )
        mp_face_mesh = mp.solutions.face_mesh
        face_mesh = mp_face_mesh.FaceMesh(
            static_image_mode=False,
            max_num_faces=FACE_MAX_NUM,
            min_detection_confidence=FACE_MIN_DETECTION_CONFIDENCE,
            min_tracking_confidence=FACE_MIN_TRACKING_CONFIDENCE
        )
        mp_drawing = mp.solutions.drawing_utils

        with CameraCapture() as cap:
            while True:
                success, frame = cap.read()
                if not success:
                    print("Failed to read frame from camera.")
                    break

                results = model.predict(source=frame, device=0)
                annotated_frame = results[0].plot(line_width=2)

                for result in results:
                    boxes = result.boxes
                    for box in boxes:
                        cls = int(box.cls[0])
                        conf = float(box.conf[0])
                        x1, y1, x2, y2 = map(int, box.xyxy[0])
                        label = f"{result.names[cls]} {conf:.2f}"
                        cv2.putText(annotated_frame, label, (x1, y1 - 10), TEXT_FONT, 0.5, TEXT_COLOR, 1)
                results_hands = hands.process(frame)
                if results_hands.multi_hand_landmarks:
                    for handLms in results_hands.multi_hand_landmarks:
                        gesture = detect_gesture(handLms)
                        cv2.putText(annotated_frame, gesture, TEXT_POSITION, TEXT_FONT, TEXT_SCALE, TEXT_COLOR, TEXT_THICKNESS)
                        myDraw.draw_landmarks(
                            annotated_frame, handLms, mpHands.HAND_CONNECTIONS,
                            HAND_DRAWING_SPEC_1,
                            HAND_DRAWING_SPEC_2
                        )
                annotated_frame = face_mesh_detection(annotated_frame, face_mesh, mp_drawing, mp_face_mesh)
                cv2.imshow('Combined Detection', annotated_frame)

                # 通过按下指定键退出循环
                if cv2.waitKey(1) & 0xFF == EXIT_KEY:
                    break

    except Exception as e:
        import traceback
        print(f"An error occurred: {e}")
        traceback.print_exc()
    finally:
        cv2.destroyAllWindows()
        if 'hands' in locals():
            hands.close()
        if 'face_mesh' in locals():
            face_mesh.close()
  • 加载 YOLO 模型、初始化手部和面部检测对象。
  • 使用CameraCapture上下文管理器打开摄像头。
  • 循环读取摄像头帧,进行目标检测、手势检测和面部网格检测。
  • 在帧上绘制检测结果,并显示处理后的帧。
  • 按下指定键(q)退出循环。
  • 捕获并处理异常,最后释放资源,关闭窗口。

        9.  程序入口

if __name__ == "__main__":
    main()
  • 确保代码作为脚本运行时,调用main函数

3.完整代码

import cv2
import mediapipe as mp
from ultralytics import YOLO

YOLO_MODEL_PATH = "GesTure.pt"
GESTURE_BASE_DISTANCE = 0.3
HAND_MAX_NUM = 2
HAND_MIN_DETECTION_CONFIDENCE = 0.5
HAND_MIN_TRACKING_CONFIDENCE = 0.5
FACE_MAX_NUM = 1
FACE_MIN_DETECTION_CONFIDENCE = 0.5
FACE_MIN_TRACKING_CONFIDENCE = 0.5
TEXT_FONT = cv2.FONT_HERSHEY_SIMPLEX
TEXT_SCALE = 1
TEXT_COLOR = (0, 0, 255)
TEXT_THICKNESS = 2
TEXT_POSITION = (50, 50)
EXIT_KEY = ord('q')
HAND_DRAWING_SPEC_1 = mp.solutions.drawing_utils.DrawingSpec(color=(255, 0, 0), thickness=2, circle_radius=2)
HAND_DRAWING_SPEC_2 = mp.solutions.drawing_utils.DrawingSpec(color=(0, 255, 0), thickness=2)
FACE_DRAWING_SPEC_1 = mp.solutions.drawing_utils.DrawingSpec(color=(0, 255, 0), thickness=1, circle_radius=1)
FACE_DRAWING_SPEC_2 = mp.solutions.drawing_utils.DrawingSpec(color=(0, 255, 255), thickness=1)

def load_yolo_model(model_path):
    try:
        return YOLO(model_path)
    except Exception as e:
        print(f"Failed to load YOLO model: {e}")
        raise

class CameraCapture:
    def __init__(self):
        self.cap = cv2.VideoCapture(0)
        if not self.cap.isOpened():
            print("Failed to open camera")
            raise Exception("Camera could not be opened.")

    def __enter__(self):
        return self.cap

    def __exit__(self, exc_type, exc_val, exc_tb):
        if self.cap.isOpened():
            self.cap.release()

def distance(m, n):
    return ((n.x - m.x) ** 2 + (n.y - m.y) ** 2) ** 0.5

def detect_gesture(handLms):
    distance_0_8 = distance(handLms.landmark[0], handLms.landmark[8])
    distance_0_12 = distance(handLms.landmark[0], handLms.landmark[12])
    distance_0_16 = distance(handLms.landmark[0], handLms.landmark[16])
    distance_0_20 = distance(handLms.landmark[0], handLms.landmark[20])

    gesture = "One"
    if distance_0_8 >= GESTURE_BASE_DISTANCE and distance_0_12 >= GESTURE_BASE_DISTANCE and \
            distance_0_16 < GESTURE_BASE_DISTANCE and distance_0_20 < GESTURE_BASE_DISTANCE:
        gesture = "Scissor"
    elif distance_0_8 >= GESTURE_BASE_DISTANCE and distance_0_12 >= GESTURE_BASE_DISTANCE and \
            distance_0_16 >= GESTURE_BASE_DISTANCE and distance_0_20 >= GESTURE_BASE_DISTANCE:
        gesture = "Paper"
    elif distance_0_8 < GESTURE_BASE_DISTANCE and distance_0_12 < GESTURE_BASE_DISTANCE and \
            distance_0_16 < GESTURE_BASE_DISTANCE and distance_0_20 < GESTURE_BASE_DISTANCE:
        gesture = "Rock"
    return gesture

def face_mesh_detection(image, face_mesh, mp_drawing, mp_face_mesh):
    results = face_mesh.process(image)
    if results.multi_face_landmarks:
        for face_landmarks in results.multi_face_landmarks:
            mp_drawing.draw_landmarks(
                image, face_landmarks, mp_face_mesh.FACEMESH_CONTOURS,
                FACE_DRAWING_SPEC_1,
                FACE_DRAWING_SPEC_2
            )
    return image

def main():
    try:
        model = load_yolo_model(YOLO_MODEL_PATH)
        myDraw = mp.solutions.drawing_utils
        mpHands = mp.solutions.hands
        hands = mpHands.Hands(
            static_image_mode=False,
            max_num_hands=HAND_MAX_NUM,
            min_detection_confidence=HAND_MIN_DETECTION_CONFIDENCE,
            min_tracking_confidence=HAND_MIN_TRACKING_CONFIDENCE
        )
        mp_face_mesh = mp.solutions.face_mesh
        face_mesh = mp_face_mesh.FaceMesh(
            static_image_mode=False,
            max_num_faces=FACE_MAX_NUM,
            min_detection_confidence=FACE_MIN_DETECTION_CONFIDENCE,
            min_tracking_confidence=FACE_MIN_TRACKING_CONFIDENCE
        )
        mp_drawing = mp.solutions.drawing_utils

        with CameraCapture() as cap:
            while True:
                success, frame = cap.read()
                if not success:
                    print("Failed to read frame from camera.")
                    break

                results = model.predict(source=frame, device=0)
                annotated_frame = results[0].plot(line_width=2)

                for result in results:
                    boxes = result.boxes
                    for box in boxes:
                        cls = int(box.cls[0])
                        conf = float(box.conf[0])
                        x1, y1, x2, y2 = map(int, box.xyxy[0])
                        label = f"{result.names[cls]} {conf:.2f}"
                        cv2.putText(annotated_frame, label, (x1, y1 - 10), TEXT_FONT, 0.5, TEXT_COLOR, 1)
                results_hands = hands.process(frame)
                if results_hands.multi_hand_landmarks:
                    for handLms in results_hands.multi_hand_landmarks:
                        gesture = detect_gesture(handLms)
                        cv2.putText(annotated_frame, gesture, TEXT_POSITION, TEXT_FONT, TEXT_SCALE, TEXT_COLOR, TEXT_THICKNESS)
                        myDraw.draw_landmarks(
                            annotated_frame, handLms, mpHands.HAND_CONNECTIONS,
                            HAND_DRAWING_SPEC_1,
                            HAND_DRAWING_SPEC_2
                        )
                annotated_frame = face_mesh_detection(annotated_frame, face_mesh, mp_drawing, mp_face_mesh)
                cv2.imshow('Combined Detection', annotated_frame)

                # 通过按下指定键退出循环
                if cv2.waitKey(1) & 0xFF == EXIT_KEY:
                    break

    except Exception as e:
        import traceback
        print(f"An error occurred: {e}")
        traceback.print_exc()
    finally:
        cv2.destroyAllWindows()
        if 'hands' in locals():
            hands.close()
        if 'face_mesh' in locals():
            face_mesh.close()

if __name__ == "__main__":
    main()

三、总结

        作者已将源码和预测试模型文件上传至GitCode仓库,链接在文章顶端,有何问题可以发在评论区,感谢各读者阅读。

OpenCV (Open Source Computer Vision Library) 的组成主要包括以下几个部分: 1. **基本库**:提供了一系列图像视频处理函数,如像素操作、颜色空间转换、几何变换、滤波器等。 2. **核心模块**:负责计算机视觉任务的核心算法,如特征检测(SIFT、SURF)、模板匹配、目标跟踪等。 3. **相机接口**:允许用户从各种来源获取视频输入,如摄像头、视频文件或网络流。 4. **机器学习**:虽然不是OpenCV的核心,但它也包含了机器学习相关的工具,例如OpenCV的深度学习模块DNN,用于运行预先训练好的卷积神经网络模型MediaPipe则是另一个层次更深的框架,其组成更为复杂: 1. **数据流水线**:MediaPipe设计了一种数据流的概念,使得开发者能够组合多个模块形成一个处理链,每个模块可以处理音频、视频、传感器数据等多种类型的数据。 2. **预训练模型**:它内置了许多预训练的计算机视觉机器学习模型,如人体关键点检测、面部识别、手部跟踪等,供开发者直接使用。 3. **模块化组件**:包括姿态估计、文本识别、语音识别等多个独立的模块,开发者可以根据需要选择并串联它们。 4. **跨平台支持**:MediaPipe能够在多个平台上运行,如Android、iOS、WindowsLinux等。 两者都是为了帮助开发者处理多媒体数据,但侧重点不同,OpenCV更基础,适合底层图像处理;MediaPipe则倾向于构建完整的应用程序,提供更高的抽象度易用性。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

糖炒狗子

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值