学生课堂抬头率检测计算基于YOLOv7与视觉大模型

本文链接：https://blog.csdn.net/WhiffeYF/article/details/147855641

文章目录

0 平台环境
1 基于YOLOv7来计算抬头率
模型下载
- de.py 检测人头数与低头数量
- count.py计算抬头率
2 基于视觉大模型来计算抬头率
- 提示词
- 结果
3 总结

b站视频：https://www.bilibili.com/video/BV1qZEKzkEa8/

0 平台环境

autodl：https://www.autodl.com/home
在这里插入图片描述

1 基于YOLOv7来计算抬头率

模型下载

这里用到了2个模型：YOLOv7 CrowdedHuman模型、学生行为检测-低头转头模型。
这两个模型都可以在这里下载：https://github.com/Whiffe/SCB-dataset

在这里插入图片描述

de.py 检测人头数与低头数量

de.py（检测人头数与低头数量）
执行：

python de.py --weights best.pt --source ./input --classes 0 --count head
python de.py --weights /root/yolov7/runs/train/SCB5_Teacher_Behavior_BowHead_20250509/exp/weights/best.pt --source ./input --classes 0 --count bowhead

'''

python de.py --weights best.pt --source ./input --classes 0 --count head
python de.py --weights /root/yolov7/runs/train/SCB5_Teacher_Behavior_BowHead_20250509/exp/weights/best.pt --source ./input --classes 0 --count bowhead

'''
import argparse
import time
from pathlib import Path
import json
import os

import cv2
import torch
import torch.backends.cudnn as cudnn
from numpy import random

from models.experimental import attempt_load
from utils.datasets import LoadStreams, LoadImages
from utils.general import check_img_size, check_requirements, check_imshow, non_max_suppression, apply_classifier, \
    scale_coords, xyxy2xywh, strip_optimizer, set_logging, increment_path
from utils.plots import plot_one_box
from utils.torch_utils import select_device, load_classifier, time_synchronized, TracedModel


def detect(save_img=False):
    source, weights, view_img, save_txt, imgsz, trace = opt.source, opt.weights, opt.view_img, opt.save_txt, opt.img_size, not opt.no_trace
    save_img = not opt.nosave and not source.endswith('.txt')  # save inference images
    webcam = source.isnumeric() or source.endswith('.txt') or source.lower().startswith(
        ('rtsp://', 'rtmp://', 'http://', 'https://'))

    # Directories
    save_dir = Path(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok))  # increment run
    (save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir
    
    # 处理 JSON 保存目录
    json_dir = Path('./jsonCount')
    
    '''
    # 如果目录存在且为目录类型，清空其中的内容
    if json_dir.exists() and json_dir.is_dir():
        for item in json_dir.iterdir():
            if item.is_file():
                item.unlink()  # 删除文件
            elif item.is_dir():
                import shutil
                shutil.rmtree(item)  # 删除目录
        print(f"已清空目录: {json_dir}")
    # 如果目录不存在，创建目录
    else:
        json_dir.mkdir(parents=True, exist_ok=True)
        print(f"已创建目录: {json_dir}")
    '''
    json_path = json_dir / (opt.count + '.json')
    
    # 存储图片名与检测数量的字典
    detection_counts = {}

    # Initialize
    set_logging()
    device = select_device(opt.device)
    half = device.type != 'cpu'  # half precision only supported on CUDA

    # Load model
    model = attempt_load(weights, map_location=device)  # load FP32 model
    stride = int(model.stride.max())  # model stride
    imgsz = check_img_size(imgsz, s=stride)  # check img_size

    if trace:
        model = TracedModel(model, device, opt.img_size)

    if half:
        model.half()  # to FP16

    # Second-stage classifier
    classify = False
    if classify:
        modelc = load_classifier(name='resnet101', n=2)  # initialize
        modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model']).to(device).eval()

    # Set Dataloader
    vid_path, vid_writer = None, None
    if webcam:
        view_img = check_imshow()
        cudnn.benchmark = True  # set True to speed up constant image size inference
        dataset = LoadStreams(source, img_size=imgsz, stride=stride)
    else:
        dataset = LoadImages(source, img_size=imgsz, stride=stride)

    # Get names and colors
    names = model.module.names if hasattr(model, 'module') else model.names
    colors = [[random.randint(0, 255) for _ in range(3)] for _ in names]

    # Run inference
    if device.type != 'cpu':
        model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters())))  # run once
    t0 = time.time()
    for path, img, im0s, vid_cap in dataset:
        img = torch.from_numpy(img).to(device)
        img = img.half() if half else img.float()  # uint8 to fp16/32
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        if img.ndimension() == 3:
            img = img.unsqueeze(0)

        # Inference
        t1 = time_synchronized()
        pred = model(img, augment=opt.augment)[0]

        # Apply NMS
        pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
        t2 = time_synchronized()

        # Apply Classifier
        if classify:
            pred = apply_classifier(pred, modelc, img, im0s)

        # Process detections
        for i, det in enumerate(pred):  # detections per image
            if webcam:  # batch_size >= 1
                p, s, im0, frame = path[i], '%g: ' % i, im0s[i].copy(), dataset.count
            else:
                p, s, im0, frame = path, '', im0s, getattr(dataset, 'frame', 0)

            p = Path(p)  # to Path
            save_path = str(save_dir / p.name)  # img.jpg
            txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # img.txt
            s += '%gx%g ' % img.shape[2:]  # print string
            gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
            
            # 获取图片名（不包含路径）
            image_name = p.name
            
            # 记录检测数量
            detection_count = len(det)
            detection_counts[image_name] = detection_count
            
            print(f"Image: {image_name}, Detection Count: {detection_count}")

            if len(det):
                # Rescale boxes from img_size to im0 size
                det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()

                # Print results
                for c in det[:, -1].unique():
                    n = (det[:, -1] == c).sum()  # detections per class
                    s += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # add to string

                # Write results
                for *xyxy, conf, cls in reversed(det):
                    if save_txt:  # Write to file
                        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                        line = (cls, *xywh, conf) if opt.save_conf else (cls, *xywh)  # label format
                        with open(txt_path + '.txt', 'a') as f:
                            f.write(('%g ' * len(line)).rstrip() % line + '\n')

                    if save_img or view_img:  # Add bbox to image
                        label = f'{names[int(cls)]} {conf:.2f}'
                        plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=3)

            # Print time (inference + NMS)
            #print(f'{s}Done. ({t2 - t1:.3f}s)')

            # Stream results
            if view_img:
                cv2.imshow(str(p), im0)
                cv2.waitKey(1)  # 1 millisecond

            # Save results (image with detections)
            if save_img:
                if dataset.mode == 'image':
                    cv2.imwrite(save_path, im0)
                    print(f" The image with the result is saved in: {save_path}")
                else:  # 'video' or 'stream'
                    if vid_path != save_path:  # new video
                        vid_path = save_path
                        if isinstance(vid_writer, cv2.VideoWriter):
                            vid_writer.release()  # release previous video writer
                        if vid_cap:  # video
                            fps = vid_cap.get(cv2.CAP_PROP_FPS)
                            w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                            h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                        else:  # stream
                            fps, w, h = 30, im0.shape[1], im0.shape[0]
                            save_path += '.mp4'
                        vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
                    vid_writer.write(im0)

    if save_txt or save_img:
        s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ''
        #print(f"Results saved to {save_dir}{s}")
    
    # 保存 JSON 文件
    with open(json_path, 'w') as f:
        json.dump(detection_counts, f, indent=4)
    
    print(f"Detection counts saved to {json_path}")
    print(f'Done. ({time.time() - t0:.3f}s)')


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', nargs='+', type=str, default='yolov7.pt', help='model.pt path(s)')
    parser.add_argument('--source', type=str, default='inference/images', help='source')  # file/folder, 0 for webcam
    parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
    parser.add_argument('--conf-thres', type=float, default=0.25, help='object confidence threshold')
    parser.add_argument('--iou-thres', type=float, default=0.45, help='IOU threshold for NMS')
    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--view-img', action='store_true', help='display results')
    parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
    parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
    parser.add_argument('--nosave', action='store_true', help='do not save images/videos')
    parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
    parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
    parser.add_argument('--augment', action='store_true', help='augmented inference')
    parser.add_argument('--update', action='store_true', help='update all models')
    parser.add_argument('--project', default='runs/detect', help='save results to project/name')
    parser.add_argument('--name', default='exp', help='save results to project/name')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    parser.add_argument('--no-trace', action='store_true', help='don`t trace model')
    parser.add_argument('--count', type=str,) 
    opt = parser.parse_args()
    print(opt)
    #check_requirements(exclude=('pycocotools', 'thop'))

    with torch.no_grad():
        if opt.update:  # update all models (to fix SourceChangeWarning)
            for opt.weights in ['yolov7.pt']:
                detect()
                strip_optimizer(opt.weights)
        else:
            detect()

运行结果如下：

bowhead.json

{
    "0086_000443.jpg": 27,
    "10_000692.jpg": 2,
    "49_002289.jpg": 2,
    "5002_001265.jpg": 1,
    "6_000506.jpg": 12
}

head.json

{
    "0086_000443.jpg": 47,
    "10_000692.jpg": 33,
    "49_002289.jpg": 36,
    "5002_001265.jpg": 71,
    "6_000506.jpg": 29
}

0086_000443.jpg
在这里插入图片描述

10_000692.jpg

49_002289.jpg

5002_001265.jpg

6_000506.jpg

count.py计算抬头率

执行

python count.py

'''
python count.py
'''

import json
import os

# 定义两个JSON文件的路径
bowhead_json_path = os.path.join('./jsonCount', 'bowhead.json')
head_json_path = os.path.join('./jsonCount', 'head.json')
headup_rate_json_path = os.path.join('./jsonCount', 'headup_rate.json')

# 读取 bowhead.json 文件
with open(bowhead_json_path, 'r') as f:
    bowhead_data = json.load(f)

# 读取 head.json 文件
with open(head_json_path, 'r') as f:
    head_data = json.load(f)

# 计算抬头率
headup_rates = {}
for img_name in head_data:
    if img_name in bowhead_data:
        try:
            rate = ((head_data[img_name] - bowhead_data[img_name]) / head_data[img_name]) * 100
            headup_rates[img_name] = f"{rate:.1f}%"
        except ZeroDivisionError:
            headup_rates[img_name] = "无法计算"
    else:
        headup_rates[img_name] = "未找到对应数据"

# 将抬头率保存为 headup_rate.json 文件
with open(headup_rate_json_path, 'w') as f:
    json.dump(headup_rates, f, indent=4)

print("抬头率计算完成并保存到 headup_rate.json 文件中。")

结果如下：
headup_rate.json

{
    "0086_000443.jpg": "42.6%",
    "10_000692.jpg": "93.9%",
    "49_002289.jpg": "94.4%",
    "5002_001265.jpg": "98.6%",
    "6_000506.jpg": "58.6%"
}

2 基于视觉大模型来计算抬头率

提示词

请帮我计算图片中学生的抬头率，抬头率=抬头的学生数量/学生总数。你需要先数出图片中有多少学生，然后数出有多少学生抬头，最后输出计算的抬头率。

结果

0086_000443.jpg
在这里插入图片描述
豆包：

由于图片无法精准计数，难以准确数出学生总数和抬头学生数。你可以通过放大图片，逐个数出学生总数以及抬头学生数，再根据“抬头率 = 抬头的学生数量÷学生总数”的公式来计算抬头率。

通义：

从图片中可以看出，这是一个教室的场景，学生们坐在课桌前。为了计算抬头率，我需要数出学生总数和抬头的学生数量。

1. **学生总数**：通过仔细观察图片，可以看到大约有50名学生。
2. **抬头的学生数量**：在这些学生中，只有极少数的学生似乎在抬头，具体来说，大约有3名学生明显地抬起头来。

根据这些数据，我们可以计算抬头率：
\[ \text{抬头率} = \frac{\text{抬头的学生数量}}{\text{学生总数}} = \frac{3}{50} = 0.06 \]

因此，图片中学生的抬头率为0.06，或者说是6%。

10_000692.jpg
在这里插入图片描述

豆包：

通过仔细观察图片：

学生总数：经点数可得学生总数为 36 人。
抬头的学生数量：经点数可得抬头的学生数量为 8 人。

根据抬头率公式：抬头率 = 抬头的学生数量 ÷ 学生总数 ，则抬头率为：
8÷36≈0.222，转化为百分数约为22.2% 。

通义：

从图片中可以看到，教室里坐满了学生。经过仔细观察和计数：

- 学生总数：大约有40名学生。
- 抬头的学生数量：大约有35名学生在抬头看前方的屏幕或讲台。

根据这些数据，我们可以计算抬头率：

\[ \text{抬头率} = \frac{\text{抬头的学生数量}}{\text{学生总数}} = \frac{35}{40} = 0.875 \]

因此，学生的抬头率为87.5%。这表明大多数学生都在关注前方的内容，课堂参与度较高。

49_002289.jpg
在这里插入图片描述
豆包：

通过仔细观察图片：
- 学生总数：经逐一清点，约为 30 人。
- 抬头的学生数量：经仔细辨认，约为 20 人。 

根据抬头率公式：抬头率 = 抬头的学生数量÷学生总数 ，即 20÷30 ≈ 0.67（或表示为 67% ） 。

所以，图片中学生的抬头率约为 67% 。

通义：

从图片中可以看到，教室里有40名学生。在这40名学生中，有28名学生抬头看着前方，而其余的学生则低头或看向其他方向。

根据抬头率的计算公式：抬头率 = 抬头的学生数量 / 学生总数

我们可以计算出抬头率为：28 / 40 = 0.7 或者说70%。

因此，这张图片中学生的抬头率为70%。

5002_001265.jpg
在这里插入图片描述
豆包：

通过仔细观察图片：
 - 学生总数：约40人。
 - 抬头的学生数量：约12人。

根据抬头率公式：抬头率 = 抬头的学生数量÷学生总数，可得抬头率为：12÷40 = 0.3，即30% 。

通义：

要计算图片中学生的抬头率，我们需要按照以下步骤进行：

1. **数出学生总数**：仔细观察图片中的每一排和每一列，统计所有在场的学生数量。
2. **数出抬头的学生数量**：逐一检查每个学生，确定哪些学生的头部是抬起的，即他们的视线朝向前方或上方。
3. **计算抬头率**：使用公式 `抬头率 = 抬头的学生数量 / 学生总数` 来计算。

### 步骤 1: 数出学生总数
- 第一排有 8 名学生。
- 第二排有 9 名学生。
- 第三排有 9 名学生。
- 第四排有 9 名学生。
- 第五排有 9 名学生。
- 第六排有 9 名学生。
- 第七排有 9 名学生。
- 第八排有 9 名学生。
- 第九排有 9 名学生。
- 第十排有 9 名学生。

总共有 \(8 + 9 \times 9 = 89\) 名学生。

### 步骤 2: 数出抬头的学生数量
- 第一排：0 名学生抬头。
- 第二排：0 名学生抬头。
- 第三排：0 名学生抬头。
- 第四排：0 名学生抬头。
- 第五排：0 名学生抬头。
- 第六排：0 名学生抬头。
- 第七排：0 名学生抬头。
- 第八排：0 名学生抬头。
- 第九排：0 名学生抬头。
- 第十排：0 名学生抬头。

没有学生抬头。

### 步骤 3: 计算抬头率
\[ \text{抬头率} = \frac{\text{抬头的学生数量}}{\text{学生总数}} = \frac{0}{89} = 0 \]

因此，图片中学生的抬头率为 0。

6_000506.jpg
在这里插入图片描述
豆包：

图片中可观察到学生总数约为28人，抬头的学生数量为1人 。
抬头率 = 抬头的学生数量÷学生总数 = 1÷28 ≈ 0.0357，即3.57% 。

通义：

从图片中可以看到，教室里有30名学生。在这30名学生中，只有1名学生抬头，其余的学生都在低头做作业或看书。

根据抬头率的计算公式：抬头率 = 抬头的学生数量 / 学生总数

我们可以得出：
抬头率 = 1 / 30 ≈ 0.0333 或者说大约是3.33%

所以，这张图片中学生的抬头率为大约3.33%。