【OpenCV】手势识别实现电脑音量控制

FanHua3000

已于 2024-09-08 18:59:41 修改

阅读量1.9k

点赞数 41

分类专栏：计算机视觉文章标签： opencv 人工智能计算机视觉 python

于 2024-09-02 02:41:45 首次发布

本文链接：https://blog.csdn.net/Eden_Hazard7/article/details/141792399

版权

计算机视觉专栏收录该内容

2 篇文章 0 订阅

订阅专栏

从Advance Computer Vision with Python - Computer Vision Zone 这个网站上找到了一些有关人脸识别和手势识别的示例代码与课程，记录一下学习的过程和结果，Python初学，写得比较基础。

代码基本是流程类的代码，核心功能在导入的库中已经实现。程序包括两个文件。

【注】本文写于2024.8.30，修改了原代码的库中已不支持的函数用法，截止到目前，可以直接复制运行。

一、HandTrackingModule.py

1.导入库

cv2--OpenCV库，用于计算机视觉任务。

mediapipe--预训练的模型，包括手部与面部识别、检测。

import cv2 # OpenCV库，用于计算机视觉任务。
import mediapipe as mp # 预训练的模型，包括手部与面部识别、检测。
import math

2.定义 HandDetector 类，用于封装手部检测的功能，初始化方法

mode：布尔类型，是否使用静态图像模式；

maxHands：最大检测的手数；

detectionCon，trackCon：检测、跟踪的置信度阈值；

mpDraw：MediaPipe绘图工具，绘制手的关键点和连线；

tipIds：指尖的标识符列表，拇指4、食指8、中指12、无名指16、小拇指20.

class HandDetector:
    def __init__(self, mode=False, maxHands=2, detectionCon=0.5, trackCon=0.5):
        self.mode = mode # 布尔类型，是否使用静态图像模式
        self.maxHands = maxHands # 最大检测的手数
        self.detectionCon = float(detectionCon) # 检测的置信度阈值
        self.trackCon = float(trackCon) # 跟踪的置信度阈值
        self.mpHands = mp.solutions.hands
        self.hands = self.mpHands.Hands(
            static_image_mode=self.mode,
            max_num_hands=self.maxHands,
            min_detection_confidence=self.detectionCon,
            min_tracking_confidence=self.trackCon
        )
        self.mpDraw = mp.solutions.drawing_utils # 绘制手的关键点和连线
        self.tipIds = [4, 8, 12, 16, 20] # 指尖的标识符列表

3.检测手部

imgRGB：将图像从BGR转为RGB；

        在 BGR 颜色空间中，图像的颜色信息按蓝色、绿色、红色通道的顺序存储。换句话说，BGR 是 RGB 的顺序颠倒版本。在每个像素点中，第一个通道代表蓝色，第二个通道代表绿色，第三个通道代表红色。

        许多图像处理库和计算机视觉框架（如 OpenCV）使用 BGR 作为默认的颜色空间格式，因为它与一些图像文件格式（如 BMP）和图像数据的存储方式兼容。

        将 BGR 通道的顺序逆转即可获得 RGB 格式。

    def findHands(self, img, draw=True):
        imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # 将图像从BGR转为RGB
        self.results = self.hands.process(imgRGB)
        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:
                if draw:
                    self.mpDraw.draw_landmarks(img, handLms, self.mpHands.HAND_CONNECTIONS)
        return img

4.获取手部关键点位置

bbox：手部的边界框（bounding box）。

self.lmList：存储每个关键点的 ID 和坐标。

（1）如果检测到手部，通过 handNo 获取指定手部的关键点数据；

（2）遍历 myHand.landmark 中的关键点，将其位置转换为图像的像素坐标

（3）将每个关键点的 ID 和坐标存储在 self.lmList 中；

（4）如果 draw 为 True，在图像上绘制每个关键点的圆圈；

（5）如果 xList 和 yList 中有数据，计算手部边界框的最小和最大坐标；

（6）如果 draw 为 True，在图像上绘制手部的边界框；

（7）返回 self.lmList 和 bbox，分别表示关键点的列表和手部边界框。

干看迷糊，图片长这样：

    def findPosition(self, img, handNo=0, draw=True):
        xList = []
        yList = []
        bbox = [] # 手部的边界框
        self.lmList = []
        if self.results.multi_hand_landmarks:
            myHand = self.results.multi_hand_landmarks[handNo]
            for id, lm in enumerate(myHand.landmark): # 遍历 myHand.landmark 中的关键点，将其位置转换为图像的像素坐标
                h, w, c = img.shape
                cx, cy = int(lm.x * w), int(lm.y * h)
                xList.append(cx)
                yList.append(cy)
                self.lmList.append([id, cx, cy]) # 将每个关键点的 ID 和坐标存储在 self.lmList 中
                if draw: # 在图像上绘制每个关键点的圆圈
                    cv2.circle(img, (cx, cy), 5, (255, 0, 255), cv2.FILLED)
            if xList and yList: # 计算手部边界框的最小和最大坐标
                xmin, xmax = min(xList), max(xList)
                ymin, ymax = min(yList), max(yList)
                bbox = xmin, ymin, xmax, ymax

                if draw: # 在图像上绘制手部的边界框
                    cv2.rectangle(img, (bbox[0] - 20, bbox[1] - 20),
                                  (bbox[2] + 20, bbox[3] + 20), (0, 255, 0), 2)
        return self.lmList, bbox # 分别表示关键点的列表和手部边界框

5.判断手指是否张开

fingers：用于存储每根手指是否张开的状态；

这里的检测分为两种：拇指与其余四指。对于拇指。检测拇指指尖的x值和指根的x值大小；对于其他四指，检测的是y值，以左手手心举例，拇指指尖x值大于指根说明拇指张开了，食指指尖y值大于指根说明没张开。

返回包含每个手指张开状态的列表（1 表示张开，0 表示闭合）。

    def fingersUp(self):
        fingers = []
        # Thumb
        if len(self.lmList) > self.tipIds[0] and len(self.lmList) > self.tipIds[0] - 1:
            if self.lmList[self.tipIds[0]][1] > self.lmList[self.tipIds[0] - 1][1]:
                fingers.append(1)
            else:
                fingers.append(0)
        else:
            fingers.append(0)
        # 4 Fingers
        for id in range(1, 5):
            if len(self.lmList) > self.tipIds[id] and len(self.lmList) > self.tipIds[id] - 2:
                if self.lmList[self.tipIds[id]][2] < self.lmList[self.tipIds[id] - 2][2]:
                    fingers.append(1)
                else:
                    fingers.append(0)
            else:
                fingers.append(0)
        return fingers

6.计算两个关键点之间的距离

math.hypot：计算两点之间的距离，即sqrt( (x2 - x1) ** 2 + (y2 - y1) ** 2)；

// ：是整除运算符，确保坐标是整数

（1）如果 self.lmList 中包含这两个关键点，获取它们的坐标，并计算它们的中点坐标（cx, cy）；

（2）在两个关键点的位置画圆圈；

（3）画一条连线连接这两个关键点；

（4）在中点画一个圆圈，作为两个点之间的中心标记；

（5）计算两点之间的直线距离；（这个距离就是将来映照到音量条上音量的大小）

（6）如果 self.lmList 中不包含 p1 或 p2 对应的关键点，返回 0 作为距离，原图像 img，以及一个空列表 []。

    def findDistance(self, p1, p2, img, draw=True):
        if len(self.lmList) > max(p1, p2):
            x1, y1 = self.lmList[p1][1], self.lmList[p1][2]
            x2, y2 = self.lmList[p2][1], self.lmList[p2][2]
            cx, cy = (x1 + x2) // 2, (y1 + y2) // 2
            if draw:
                cv2.circle(img, (x1, y1), 15, (255, 0, 255), cv2.FILLED)
                cv2.circle(img, (x2, y2), 15, (255, 0, 255), cv2.FILLED)
                cv2.line(img, (x1, y1), (x2, y2), (255, 0, 255), 3)
                cv2.circle(img, (cx, cy), 15, (255, 0, 255), cv2.FILLED)
            length = math.hypot(x2 - x1, y2 - y1)
            return length, img, [x1, y1, x2, y2, cx, cy]
        else:
            return 0, img, []

以上是第一份代码，运行一下：

没毛病，这份代码写好了基本的功能，下一份代码直接调用即可。

二、VolumeHandControl.py

1.导入库

time--时间库，用于计算帧率；

cast, POINTER--类型转换；

CLSCTX_ALL--音量控制的接口激活；

AudioUtilities, IAudioEndpointVolume--音频控制相关的工具和接口。

import cv2 # 用于图像处理和显示
import time # 用于计算帧率
import numpy as np # 用于科学计算和插值操作
import HandTrackingModule as htm # 自定义的手部跟踪模块
import math # 计算距离
from ctypes import cast, POINTER # 类型转换
from comtypes import CLSCTX_ALL # 音量控制的接口激活
from pycaw.pycaw import AudioUtilities, IAudioEndpointVolume # 音频控制相关的工具和接口

2.初始化摄像头和音量控制

wCam, hCam：设置摄像头的宽度和高度；

cap = cv2.VideoCapture(0)：初始化摄像头捕获对象，笔记本自带的为0；

cap.set()：设置摄像头的分辨率；

AudioUtilities.GetSpeakers()：获取音频设备（扬声器）；

################################
wCam, hCam = 640, 480
################################
cap = cv2.VideoCapture(0)
cap.set(3, wCam)
cap.set(4, hCam)
pTime = 0
detector = htm.HandDetector(detectionCon=0.7) # 创建一个 HandDetector 实例，检测置信度设置为0.7
devices = AudioUtilities.GetSpeakers()
interface = devices.Activate(
    IAudioEndpointVolume._iid_, CLSCTX_ALL, None) # 激活音频接口
volume = cast(interface, POINTER(IAudioEndpointVolume)) # 将接口转换为 IAudioEndpointVolume 指针，允许控制音量
volRange = volume.GetVolumeRange() # 获取音量范围
minVol = volRange[0]
maxVol = volRange[1]
vol = 0 # 当前音量
volBar = 400 # 音量条的高度（从图像底部到顶端）
volPer = 0 # 音量百分比

3.主循环

启动一个无限循环以持续捕获摄像头图像。

while True:
    success, img = cap.read()
    img = detector.findHands(img) # 在图像中检测手部关键点
    lmList, _ = detector.findPosition(img, draw=False) # 获取手部关键点的位置，不在图像上绘制
    if len(lmList) > 8:  # 检查手部关键点列表是否包含足够的关键点（至少 9 个）
        x1, y1 = lmList[4][1], lmList[4][2]
        x2, y2 = lmList[8][1], lmList[8][2] # 提取拇指尖（x1, y1）和食指尖（x2, y2）的坐标
        cx, cy = (x1 + x2) // 2, (y1 + y2) // 2 # 计算拇指尖和食指尖的中点坐标 (cx, cy)
        cv2.circle(img, (x1, y1), 15, (255, 0, 255), cv2.FILLED) # 在图像上绘制关键点和连接线，使用紫色 (255, 0, 255)
        cv2.circle(img, (x2, y2), 15, (255, 0, 255), cv2.FILLED)
        cv2.line(img, (x1, y1), (x2, y2), (255, 0, 255), 3)
        cv2.circle(img, (cx, cy), 15, (255, 0, 255), cv2.FILLED)
        length = math.hypot(x2 - x1, y2 - y1) # 计算手指之间的距离 length
        # Hand range 50 - 300
        # Volume Range -65 - 0
        vol = np.interp(length, [50, 300], [minVol, maxVol]) # 使用 np.interp 将手指间距离从 [50, 300] 范围线性映射到音量范围 [minVol, maxVol]
        volBar = np.interp(length, [50, 300], [400, 150]) # 同样映射音量条的高度 volBar 和音量百分比 volPer
        volPer = np.interp(length, [50, 300], [0, 100])
        print(int(length), vol)
        volume.SetMasterVolumeLevel(vol, None) # 调整系统音量到计算出的值
        if length < 50:
            cv2.circle(img, (cx, cy), 15, (0, 255, 0), cv2.FILLED)
    cv2.rectangle(img, (50, 150), (85, 400), (255, 0, 0), 3) # 画音量条边框
    cv2.rectangle(img, (50, int(volBar)), (85, 400), (255, 0, 0), cv2.FILLED) # 填充音量条的当前高度
    cv2.putText(img, f'{int(volPer)} %', (40, 450), cv2.FONT_HERSHEY_COMPLEX,
                1, (255, 0, 0), 3) # 使用 cv2.putText 显示音量百分比
    cTime = time.time()
    fps = 1 / (cTime - pTime) # 计算当前时间 cTime 和上一帧时间 pTime 之间的差值，计算帧率 fps
    pTime = cTime
    cv2.putText(img, f'FPS: {int(fps)}', (40, 50), cv2.FONT_HERSHEY_COMPLEX,
                1, (255, 0, 0), 3)
    cv2.imshow("Img", img)
    if cv2.waitKey(1) & 0xFF == ord('q'):  # 如果按下 'q' 键，退出循环
        break
cap.release()
cv2.destroyAllWindows()

三、程序运行结果

不多截了，因为帧数不高所以有一些小延迟（实际用起来感觉不到，截屏看还是挺明显，如何提高帧率，请指点）。手指张得很大也就是最大音量的70％左右，因为考虑到一般情况下用不到最大音量，会很吵。

四、源代码

1.HandTrackingModule.py

import cv2
import mediapipe as mp
import math

class HandDetector:
    def __init__(self, mode=False, maxHands=2, detectionCon=0.5, trackCon=0.5):
        self.mode = mode
        self.maxHands = maxHands
        self.detectionCon = float(detectionCon)  # Ensure it is a float
        self.trackCon = float(trackCon)  # Ensure it is a float

        self.mpHands = mp.solutions.hands
        self.hands = self.mpHands.Hands(
            static_image_mode=self.mode,
            max_num_hands=self.maxHands,
            min_detection_confidence=self.detectionCon,
            min_tracking_confidence=self.trackCon
        )
        self.mpDraw = mp.solutions.drawing_utils
        self.tipIds = [4, 8, 12, 16, 20]

    def findHands(self, img, draw=True):
        imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        self.results = self.hands.process(imgRGB)

        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:
                if draw:
                    self.mpDraw.draw_landmarks(img, handLms, self.mpHands.HAND_CONNECTIONS)
        return img

    def findPosition(self, img, handNo=0, draw=True):
        xList = []
        yList = []
        bbox = []
        self.lmList = []
        if self.results.multi_hand_landmarks:
            myHand = self.results.multi_hand_landmarks[handNo]
            for id, lm in enumerate(myHand.landmark):
                h, w, c = img.shape
                cx, cy = int(lm.x * w), int(lm.y * h)
                xList.append(cx)
                yList.append(cy)
                self.lmList.append([id, cx, cy])
                if draw:
                    cv2.circle(img, (cx, cy), 5, (255, 0, 255), cv2.FILLED)
            if xList and yList:
                xmin, xmax = min(xList), max(xList)
                ymin, ymax = min(yList), max(yList)
                bbox = xmin, ymin, xmax, ymax

                if draw:
                    cv2.rectangle(img, (bbox[0] - 20, bbox[1] - 20),
                                  (bbox[2] + 20, bbox[3] + 20), (0, 255, 0), 2)
        return self.lmList, bbox

    def fingersUp(self):
        fingers = []
        # Thumb
        if len(self.lmList) > self.tipIds[0] and len(self.lmList) > self.tipIds[0] - 1:
            if self.lmList[self.tipIds[0]][1] > self.lmList[self.tipIds[0] - 1][1]:
                fingers.append(1)
            else:
                fingers.append(0)
        else:
            fingers.append(0)
        # 4 Fingers
        for id in range(1, 5):
            if len(self.lmList) > self.tipIds[id] and len(self.lmList) > self.tipIds[id] - 2:
                if self.lmList[self.tipIds[id]][2] < self.lmList[self.tipIds[id] - 2][2]:
                    fingers.append(1)
                else:
                    fingers.append(0)
            else:
                fingers.append(0)
        return fingers

    def findDistance(self, p1, p2, img, draw=True):
        if len(self.lmList) > max(p1, p2):
            x1, y1 = self.lmList[p1][1], self.lmList[p1][2]
            x2, y2 = self.lmList[p2][1], self.lmList[p2][2]
            cx, cy = (x1 + x2) // 2, (y1 + y2) // 2

            if draw:
                cv2.circle(img, (x1, y1), 15, (255, 0, 255), cv2.FILLED)
                cv2.circle(img, (x2, y2), 15, (255, 0, 255), cv2.FILLED)
                cv2.line(img, (x1, y1), (x2, y2), (255, 0, 255), 3)
                cv2.circle(img, (cx, cy), 15, (255, 0, 255), cv2.FILLED)

            length = math.hypot(x2 - x1, y2 - y1)
            return length, img, [x1, y1, x2, y2, cx, cy]
        else:
            return 0, img, []

2.VolumeHandControl.py

import cv2
import time
import numpy as np
import HandTrackingModule as htm
import math
from ctypes import cast, POINTER
from comtypes import CLSCTX_ALL
from pycaw.pycaw import AudioUtilities, IAudioEndpointVolume

################################
wCam, hCam = 640, 480
################################
cap = cv2.VideoCapture(0)
cap.set(3, wCam)
cap.set(4, hCam)
pTime = 0
detector = htm.HandDetector(detectionCon=0.7)
devices = AudioUtilities.GetSpeakers()
interface = devices.Activate(
    IAudioEndpointVolume._iid_, CLSCTX_ALL, None)
volume = cast(interface, POINTER(IAudioEndpointVolume))
volRange = volume.GetVolumeRange()
minVol = volRange[0]
maxVol = volRange[1]
vol = 0
volBar = 400
volPer = 0

while True:
    success, img = cap.read()
    img = detector.findHands(img)
    lmList, _ = detector.findPosition(img, draw=False)

    if len(lmList) > 8:  # Check if there are enough landmarks
        x1, y1 = lmList[4][1], lmList[4][2]
        x2, y2 = lmList[8][1], lmList[8][2]
        cx, cy = (x1 + x2) // 2, (y1 + y2) // 2
        cv2.circle(img, (x1, y1), 15, (255, 0, 255), cv2.FILLED)
        cv2.circle(img, (x2, y2), 15, (255, 0, 255), cv2.FILLED)
        cv2.line(img, (x1, y1), (x2, y2), (255, 0, 255), 3)
        cv2.circle(img, (cx, cy), 15, (255, 0, 255), cv2.FILLED)
        length = math.hypot(x2 - x1, y2 - y1)
        # Hand range 50 - 300
        # Volume Range -65 - 0
        vol = np.interp(length, [50, 300], [minVol, maxVol])
        volBar = np.interp(length, [50, 300], [400, 150])
        volPer = np.interp(length, [50, 300], [0, 100])
        print(int(length), vol)
        volume.SetMasterVolumeLevel(vol, None)
        if length < 50:
            cv2.circle(img, (cx, cy), 15, (0, 255, 0), cv2.FILLED)

    cv2.rectangle(img, (50, 150), (85, 400), (255, 0, 0), 3)
    cv2.rectangle(img, (50, int(volBar)), (85, 400), (255, 0, 0), cv2.FILLED)
    cv2.putText(img, f'{int(volPer)} %', (40, 450), cv2.FONT_HERSHEY_COMPLEX,
                1, (255, 0, 0), 3)
    cTime = time.time()
    fps = 1 / (cTime - pTime)
    pTime = cTime
    cv2.putText(img, f'FPS: {int(fps)}', (40, 50), cv2.FONT_HERSHEY_COMPLEX,
                1, (255, 0, 0), 3)
    cv2.imshow("Img", img)
    if cv2.waitKey(1) & 0xFF == ord('q'):  # Press 'q' to quit
        break

cap.release()
cv2.destroyAllWindows()