借你一双慧眼 ——《知识与数据双轮驱动的课堂观察系统》

青年有志

已于 2023-05-16 10:48:00 修改

阅读量326

点赞数

分类专栏：深度学习文章标签： python 深度学习

于 2022-06-20 19:22:05 首次发布

本文链接：https://blog.csdn.net/qq_46450354/article/details/125367390

版权

深度学习专栏收录该内容

10 篇文章 8 订阅

订阅专栏

前言

Gitee 源码地址：https://gitee.com/futurelqh/deep-learning

一、数据集

1、AFLW

每个人脸都被标注了21个特征点。此数据库信息量非常大，包括了各种姿态、表情、光照、种族等因素影响的图片。AFLW人脸数据库大约包括25000万已手工标注的人脸图片，其中59%为女性，41%为男性，大部分的图片都是彩色，只有少部分是灰色图片。该数据库非常适合用于人脸识别、人脸检测、人脸对齐等方面的研究，具有很高的研究价值。
在这里插入图片描述

2、Pointing’04

Point04 数据集：

共有15个人的数据（每个人 2 个文件夹），分为了30个文件夹。

垂直方向：-90、-60、-30、-15、+0、+15、+30、+60、+90，共计 7 个角度，其中 -90 与 90 这两个角度分别都只有 -90+0 和 +90+0 一张图片。

水平方向：-90、-75、-60、-45、-30、-15、+0、+15、+30、+45、+60、+75、+90，共计 13 个角度。

每个文件夹中有数据：7 * 13 + 2 = 93 张图片。

数据集共有：93 * 30 = 2790 张图片。

Personne01-1 戴眼镜 , Personne01-2 没戴眼镜；

Personne02-1 正常亮度、包含头肩 , Personne02-2 偏亮、只含头部；

Personne03-1 , Personne03-2 ；　　　　　　　　　　　　　个人感觉区别不大。

Personne04-1 没戴眼镜, Personne04-2 戴眼镜；

Personne05-1 戴眼镜, Personne05-2 没戴眼镜；

Personne06-1 , Personne06-2 ；　　　　　　　　　　　　　个人感觉区别不大。

Personne07-1 , Personne07-2 ；　　　　　　　　　　　　　个人感觉区别不大。

Personne08-1 正常亮度 , Personne08-2 偏暗；　　　　　　　个人感觉区别不大。

Personne09-1 , Personne09-2 ；上衣不同。

Personne010-1 没戴眼镜, Personne010-2 戴眼镜；上衣不同。

Personne011-1 偏暗, Personne011-2 更暗；　　　　　　　　个人感觉区别不大。

Personne012-1 戴眼镜, Personne012-2 没戴眼镜；

Personne013-1 正常亮度, Personne013-2 偏暗；　　　　　　个人感觉区别不大。

Personne014-1 没戴眼镜, Personne014-2 戴眼镜；

Personne015-1 正常亮度 , Personne015-2 偏暗；　　　　　　个人感觉区别不大。
在这里插入图片描述

二、数据集工具包

1、dataset_utils.py

首先按照 pan 排序后进行分割，再此基础上再按照 tilt 进行排序分割开。实现 pan tilt 的合理分配
database_dir ：包含了标签属性的 csv 文件
num_splits_tilt ： tilt 间隔大小
num_splits_pan ： pan 间隔大小
将结果写入 labels_class.csv 的 csv 文件

def class_assign(database_dir, num_splits_tilt, num_splits_pan):
    '''
    通过先按 tilt 再按 pan 分割数据集，将数据集划分为包含等量图片的类。

    Arguments:
        database_dir: 目标文件
        num_splits_tilt: 通过 tilt 分割大小
        num_splits_pan: 通过 pan 分割大小
    '''

对经过 class_assign（）处理后的函数进行划分，打开相应 labels_class.csv 的 csv 文件
借助 from sklearn.model_selection import train_test_split 包分层的将图片划分为训练集、验证集、测试集
test_ratio ：测试集的比例
validation_ratio ：验证集的比例

def split_dataset(database_dir, test_ratio, validation_ratio):
    '''
    根据数据集中每张图片的类别，分层地将数据集划分为训练、测试和验证集，并将每个子集的图片记录在数据集目录的不同 .csv中。

    Arguments:
        database_dir: 目标文件
        test_ratio: 测试集占比
        validation_ratio: 验证集占比
    '''

计算数据集中 img、pan、tilt 的均值和方差

def find_norm_parameters(database_dir):
    '''
    找到 图片 tilt 和 pan 对应的归一化参数，写入 .csv 文件中

    Arguments:
        database_dir: 目标文件
    '''

存储为 numpy 类型

def store_dataset_arrays(database_dir):
    '''
    将数据集存储为 numpy 文件

    Arguments:
        database_dir
    '''

2、clean_utils.py

计算检测的边界框和标记的边界框之间重叠的面积
bbox_1：检测的边框
bbox_2：标记的边框
返回：
area ：重叠部分的面积

def overlapped_area(bbox_1, bbox_2):
    '''
    给定两个 box，返回它们之间的重叠区域。

    Arguments:
        bbox_1
        bbox_2
    Returns:
        area: 重叠部分的面积
    '''

计算两个边界框的重叠面积
return :
area: 覆盖两个框的区域

def union_area(bbox_1, bbox_2):
    '''
    返回两个 box 联合的覆盖部分

    Arguments:
        bbox_1
        bbox_2
    Returns:
        area: 联合起来的面积
    '''

计算所有的真实标记与检测到的多个边界框值之间的 IOU ，并利用 matrix[i, j] : 来表示小标 i 对应的下表 j 的 IOU
return :
matrix : 返回 i 号真实对应的下标与 j 号真实下标之间的 IOU 值

def jaccard_index(real_bboxes, detected_bboxes):
    '''
    返回真实边界框和检测到的边界框的两个列表之间的 Jaccard 索引的值。

    Arguments:
        real_bboxes: 真实边界框
        detected_bboxes: 检测边界框
    Returns:
        matrix: 包含每个矩形对的 Jaccard 索引值的矩阵; 每一行表示一个 ground 真值边界框; 每一列表示一个检测到的边界框。
    '''

找出与真实边界框最匹配的那个检测边界框
return：
indexes[detected_index] = i ：真实边界框 detected_index 对应的与之最匹配的边界框 i

def bbox_match(detected_bboxes, real_bboxes):
    '''
    给定两个包含边界框的列表，使用 Jaccard 索引在检测到的边界框和实际的边界框之间找到匹配。

    Arguments:
        detected_bboxes: 检测的边界框
        real_bboxes: 真实的边界框
    Returns:
        indexes:用于检测到对应于某个位置的边界框(顺序与检测到的边界框列表相同)  , the index of its matching ground truth bounding box (in the list of
        ground truth bounding boxes).
    '''

source_csv ：包含没长图片的路径，tilt 、 pan 值的 csv 文件
return :
img_array : 图片的 np 形式
tilt、pan ：对应的每张图片的

def array_from_csv(source_csv, img_dir):
    '''
    从数据集中(之前处理过，并列在 .csv文件中)加载每一张图片，以及它们的标签(tilt、pan)到3个 Numpy 数组中。

    Arguments:
        source_csv: 包含了数据集图片路径、 titl pan 的 .csv 文件
        img_dir: 图片 .csv 文件
    Returns:
        img_array
        tilt
        pan
    '''

从之前处理过并列在.csv文件中的数据集中加载每一张图片，以及它们的标签(tile and pan)到 3 维Numpy数组中。图片必须为.npy文件。
return :
img_array : 图片的 np 形式
tilt、pan ：对应的每张图片的

def array_from_npy(img_npy, source_csv):
    '''
    从数据集中(之前处理过并列在.csv文件中)加载每一张图片，以及它们的标签( til 和 pan 值)到 3 个 Numpy 数组中。 图片必须存储为.npy文件。

    Arguments:
        img_npy: npy 图片文件
        source_csv: .csv 文件
    Returns:
        img_array
        tilt
        pan
    '''

3、head_detector_utils.py

img_ori ：原始图片
model ：头部检测模型
confidence_threshold ：置信度设置
return ：
bboxes ：过滤后的所有头部边界框

def get_head_bboxes(img_ori, model, confidence_threshold):
    '''
    检测头部，并通过置信度过滤

    Arguments:
        img_ori: 原始图片
        model: Head detector model.
        confidence_threshold: 置信度
    Returns:
        bboxes: 原始图片中包含了头部的矩形列表
    '''

裁剪 get_head_bboxes 出来的边界框，将所有的矩形框变为正方形，去最小的一边。并返回所有满足的边界框。

def get_cropped_pics(img_ori, bboxes, crop_size, offset_perc, cropping = '', interpolation = cv2.INTER_LINEAR):
    '''
    按照一定规则裁剪图片
    Arguments:
        img_ori: 原始图片名
        bboxes: 之前经过 get_head_bboxes 处理的 Bounding boxes
        crop_size: 裁剪的大小。如果 cropping = 'small' 或者 cropping = 'large' 图片按照一定规则变为正方形大小，否则去较短的一条边
        offset_perc: 超过原始裁剪图片长度的百分比(在0和1之间)也包括在最终输出图片中，在原始裁剪图片的边框周围; 输出图片的最终边长将等于 crop_size * (1 + 2 * offset_perc)。
        cropping: Cropping 类型. 默认是原始大小;
        ’small‘ ：选取小的一条边
        ‘large’ ：选取大的一条边
    Returns:
        pics
    '''

4. loadmat_stackoverflow.py

加载 mat 文件

def loadmat(filename):

三、AFLW 数据集处理

clean_aflw.py

下载 .mat 数据包含了三个字段，图片路径，头部姿态、边界框，对每张图片利用头部检测模型检测出边界框，并对边界框进行裁剪，然后对每张图片进行水平翻转，pan 值变为负、增大数据集的容量、并保存图片。
destination_dir + ‘labels.csv’ ：存储每张图片的 tilt 和 pan
return：
t_ratio ：正确检测的比例
f_ratio ：错误检测的比例
count：总共的图片数量

def clean_aflw(aflw_dir, aflw_mat, destination_dir, detector, confidence_threshold, out_size, grayscale = False, interpolation = cv2.INTER_LINEAR, start_count = 0):
    '''
    对 AFLW 进行处理, 包含了对每张图片的裁剪以及获取面部地标

    Arguments:
        aflw_dir: AFLW 数据图片的路径
        aflw_mat: 包含有头部标记的 .mat 文件
        destination_dir: 用于存储每张裁剪的图片和 .csv 文件的目标路径
        detector: Keras model
        confidence_threshold: 过滤头部检测
        out_size: 输出图片的形状大小
        start_count: 图片计数的初始值
    Returns:
        count: 所有处理后的图片数量
        t_ratio: 检测的头部和标注的头部的比值
        f_ratio: 所有检测错误的和总图片的比值
    '''

def main():
# 获取 crop 后的图片，图片和标签存储在后缀为 labels.csv 对应的文件下
clean_aflw(aflw_dir, aflw_mat, destination_dir, detector, confidence_threshold, out_size, grayscale_output, downscaling_interpolation)

# 打开 labels.csv 文件， 按照 pan tilt 对数据划分类，增加模型稳健性。
class_assign(destination_dir, num_splits_tilt, num_splits_pan)

# 在此基础上划分数据集
split_dataset(destination_dir, test_ratio, validation_ratio)

# 计算标签和图片的 均值和方差，以便进行标准化
find_norm_parameters(destination_dir)

#存储结果 
store_dataset_arrays(destination_dir)

处理后的结果

在这里插入图片描述

四、Pointing04 数据集处理

clean_pointing04.py

传入图片路径获取 tilt 和 pan 值、利用正则表达式过滤出图片的值
return tilt, pan

def pose_from_filename(img):
    '''
    从 Pointing'04 数据集中获取每张图片的 tilt 、pan 值

    Arguments:
        img: 图片
    Returns:
        tilt、pan
    '''

提取图片的 pics 进行保存同 AFLW ，并复制每张图片

def clean_pointing04(pointing04_dir, destination_dir, detector, confidence_threshold, out_size, grayscale = False, interpolation = cv2.INTER_LINEAR, start_count = 0, duplicate_until = 0):
    '''
    处理 Pointing'04 数据集, 获取对每张图片进行裁剪侯的结果

    Arguments:
        pointing04_dir: Pointing'04 数据集的路径
        destination_dir: 目的文件路径
        detector: Keras model
        confidence_threshold: 过滤头部检测
        out_size: 输出图片的形状大小
        start_count: 图片计数的初始值
        duplicate_until: 每个类的目标图片数量; 用于通过在每个类上复制图片来增加数据集的大小。 数据集中的每个姿态对应于一个不同的类。
    Returns:
        count: 处理后总图片数量 (start_count 开始).
        t_ratio: 检测的头部和标注的头部的比值
        f_ratio: 所有检测错误的和总图片的比值
    '''

处理后结果

在这里插入图片描述

五、数据集合并处理

clean_all.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""
处理数据集 AFLW 和 Pointing'04 的代码 Clean_aflw.py和clean_pointing04.py，并以一种对训练头部姿态检测模型有用的方式存储它们。
"""

import os
import shutil
import cv2

from keras_ssd512 import ssd_512

from clean_aflw import clean_aflw
from clean_pointing04 import clean_pointing04
from dataset_utils import class_assign, split_dataset, find_norm_parameters, store_dataset_arrays

# 数据集与标签路径

aflw_dir = 'D:/Code_file/AI/headpose_final-master/original/aflw/photo/'
aflw_mat = 'original/aflw/data/aflwinfo_all.mat'
pointing04_dir = 'original/HeadPoseImageDatabase/'

# 处理后的数据路径

destination_dir = 'clean/aflw_pointing04/'
detector_log_path = 'models/detector_log_corrected.csv'

# 头部检测模型参数路径

head_detector_path = 'models/head-detector.h5'

# 头部器参数

confidence_threshold = 0.65
in_size = 512
out_size = 64

# 输出参数

grayscale_output = True
downscaling_interpolation = cv2.INTER_LINEAR

# 分层数据

num_splits_tilt = 8
num_splits_pan = 8

# train/test 和 train/validation

test_ratio = 0.2
validation_ratio = 0.2

# 头部检测器

detector = ssd_512(image_size=(in_size, in_size, 3), n_classes=1, min_scale=0.1, max_scale=1, mode='inference')
detector.load_weights(head_detector_path)

# 检查路径

try:
    os.mkdir(destination_dir)
    print("Directory", destination_dir, "created.")

except FileExistsError:
    print("Directory", destination_dir, "already exists.")
    shutil.rmtree(destination_dir)
    os.mkdir(destination_dir)

# 开始处理数据

count_aflw, t_ratio_aflw, f_ratio_aflw = clean_aflw(aflw_dir, aflw_mat, destination_dir, detector, confidence_threshold,
                                                    out_size, grayscale_output, downscaling_interpolation)
count_p04, t_ratio_p04, f_ratio_p04 = clean_pointing04(pointing04_dir, destination_dir, detector, confidence_threshold,
                                                       out_size, grayscale_output, downscaling_interpolation, count_aflw,
                                                       duplicate_until=-1)

"""
# ratios.

if os.path.isfile(detector_log_path):
    file = open(detector_log_path, 'a')
    file.write("%.2f,%d,%f,%f,%d,%f,%f\n" % (confidence_threshold, count_aflw, t_ratio_aflw, f_ratio_aflw, count_p04 - count_aflw, t_ratio_p04, f_ratio_p04))
else:
    file = open(detector_log_path, 'w')
    file.write('threshold,count_aflw,t_ratio_aflw,f_ratio_aflw,count_p04,t_ratio_p04,f_ratio_p04\n')
    file.write("%.2f,%d,%f,%f,%d,%f,%f\n" % (confidence_threshold, count_aflw, t_ratio_aflw, f_ratio_aflw, count_p04 - count_aflw, t_ratio_p04, f_ratio_p04))

file.close()
"""

# 分配类

class_assign(destination_dir, num_splits_tilt, num_splits_pan)

# 划分数据集

split_dataset(destination_dir, test_ratio, validation_ratio)

# 获取归一化参数

find_norm_parameters(destination_dir)

# 存储数据集为 numpy 类型

store_dataset_arrays(destination_dir)

六、数据增强

data_generator_array.py

利用了 from tensorflow.keras.preprocessing.image import ImageDataGenerator 模块对图片进行增强

class HeadPoseDataGenerator(Sequence):
    '''
    这个类实现了一个基本的Keras数据生成器，它覆盖了父类Sequence的方法
    这个数据生成器的目的是在每个批处理中交付一组来自用于初始化生成器的子集的图像，其中每个类包含相同数量的成员
    '''

    def __init__(self, pose_dataframe, img_array, batch_size,
                 normalize=False, input_norm=None, tilt_norm=None, pan_norm=None,
                 augment=False, shift_range=None, zoom_range=None, brightness_range=None,
                 img_rescale=1, out_rescale=1):
        '''
        使用来自原始数据集的给定子集的数据以及在进行数据扩充时使用的值初始化数据生成器

        Arguments:
            pose_dataframe: Dataframe 包含给定子集中每个图片及其姿态值的列表的数据框架
            img_array: 包含来自给定子集的图像的 Numpy 数组
            batch_size
            normalize: 数据是否需要归一化
            input_norm: 元组，包含用于规范化数据集中图片的平均值和标准值
            tilt_norm: 元组，包含用于规范化数据集中倾斜值的平均值和标准值
            pan_norm: 元组，包含用于规范化数据集中泛型值的平均值和标准值
            augment: 是否应用数据扩充
            shift_range: 值(在0到1之间)，表示每张图片的边长中可用于移动图片的部分(在两个轴中)
            zoom_range: 元组，包含用于对每张图片进行缩放的最小值和最大值
            brightness_range: 包含用于对每张图片应用亮度变换的最小值和最大值的元组
            img_rescale: 子集中每个图片的每个像素都将乘以这个值
            out_rescale: 子集中每张图片的 tilt 和 pan 将乘以此值
        '''

七、基于 SSD 的头部检测器

1.convert_ssd_512.py

将 MatConvNet SSD512 模型转换为 Keras SSD512模型，即进行相关映射

2.keras_ssd512.py

SSD512 network

3.keras_layer_AnchorBoxes.py

4.keras_layer_DecodeDetections.py

5.keras_layer_DecodeDetectionsFast.py

6.keras_layer_L2Normalization.py

八、头部姿态模型构建、训练与评估

1、architectures.py

利用 tensorflow 构建网络模型

def mpatacchiola_generic(in_size, num_conv_blocks, num_filters_start, num_dense_layers, dense_layer_size, dropout_rate=0, batch_size=None):
    '''
    构建不同的 CNN 模型
    参数:
        in_size: 输入模型的图片大小
        num_conv_blocks: 卷积块的数量
        num_filters_start: 第一层的卷积核数量
        num_dense_layers: 全连接层的数量
        dense_layer_size: 全连接层中神经元的个数
        dropout_rate: 在每一个隐藏层之后，dropout 的比例
        batch_size: 批量大小
    Returns:
        model
    '''

2、train_architectures.py

卷积层从 1 到 6 ，第一层的卷积核分别设置为 32 64 128 256
隐藏层设置为1 到 3，包含的神经元个数为 64 128 256 512

3、train.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""
在本地系统上训练 姿态估计器模型，需要手动设置参数。
"""

# 设置随机数种子

import numpy as np
import tensorflow as tf
import random as rn
import os

seed = 0

os.environ['PYTHONHASHSEED'] = '0'

np.random.seed(seed)
rn.seed(seed)

session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
session_conf.gpu_options.allow_growth = True

from tensorflow.keras import backend as K

tf.set_random_seed(seed)

sess = tf.Session(graph = tf.get_default_graph(), config=session_conf)
K.set_session(sess)



import time
import pandas as pd
from tensorflow.keras.callbacks import EarlyStopping, CSVLogger, ReduceLROnPlateau

from architectures import mpatacchiola_generic
from data_generator_array import HeadPoseDataGenerator

# 控制参数

batch_size = 128
epochs = 500
verbose = True
patience = 10



mean = 0.408808
std = 0.237583

t_mean = -0.041212
t_std = 0.323931

p_mean = -0.000276
p_std = 0.540958

shift_range = 0.0
brightness_range = [0.5, 1.5]
zoom_range = [1.0, 1.0]

# 文件路径

clean_dir = 'clean/'
db_name = 'aflw_pointing04'

model_dir = 'models/'
model_csv = model_dir + 'models.csv'

# Callbacks.

stop = EarlyStopping(monitor='val_mean_absolute_error', patience=patience, verbose=verbose, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau()

# 模型参数

in_size = 64
num_conv_blocks = 6
num_filters_start = 32
num_dense_layers = 1
dense_layer_size = 512
dropout_rate = 0

# 数据集路径

img_dir = clean_dir + db_name + '/'

train_csv = img_dir + 'train.csv'
validation_csv = img_dir + 'validation.csv'
test_csv = img_dir + 'test.csv'

# 加载 dataframes.

train_df = pd.read_csv(train_csv)
validation_df = pd.read_csv(validation_csv)
test_df = pd.read_csv(test_csv)

# 加载 image arrays.

train_array = np.load(img_dir + 'train_img.npy')
validation_array = np.load(img_dir + 'validation_img.npy')
test_array = np.load(img_dir + 'test_img.npy')

# 配置数据增强

train_generator = HeadPoseDataGenerator(train_df, train_array, batch_size, normalize=True, input_norm=[mean, std],
                                        tilt_norm=[t_mean, t_std], pan_norm=[p_mean, p_std], augment=True,
                                        shift_range=shift_range, zoom_range=zoom_range,
                                        brightness_range=brightness_range, img_rescale=1./255, out_rescale=1./90)

validation_generator = HeadPoseDataGenerator(validation_df, validation_array, batch_size, normalize=True, input_norm=[mean, std],
                                             tilt_norm=[t_mean, t_std], pan_norm=[p_mean, p_std], img_rescale=1./255,
                                             out_rescale=1./90)

STEP_SIZE_TRAIN = train_generator.__len__()
STEP_SIZE_VALID = validation_generator.__len__()

# 设置模型名称

model_name = 'headpose' + str(int(time.time()))
model_path = model_dir + model_name + '1.h5'
loss_csv = model_dir + model_name + '_loss.csv'

#  .csv 文件中配置记录

csv_logger = CSVLogger(loss_csv)

# 获取 FLOPs.

run_meta = tf.RunMetadata()

with tf.Session(graph=tf.Graph()) as sess_2:
    K.set_session(sess_2)

    model = mpatacchiola_generic(in_size, num_conv_blocks, num_filters_start, num_dense_layers, dense_layer_size, dropout_rate, batch_size=1)

    # 计算浮点操作
    opts = tf.profiler.ProfileOptionBuilder.float_operation()
    flops = tf.profiler.profile(sess_2.graph, run_meta=run_meta, cmd='op', options=opts).total_float_ops

# session

K.set_session(sess)

# 配置头部姿态估计的模型参数

model = mpatacchiola_generic(in_size, num_conv_blocks, num_filters_start, num_dense_layers, dense_layer_size, dropout_rate)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])

print(model.summary())

# 在增强的数据集上训练模型

history = model.fit_generator(generator=train_generator, steps_per_epoch=STEP_SIZE_TRAIN, validation_data=validation_generator,
                              validation_steps=STEP_SIZE_VALID, epochs=epochs, callbacks=[reduce_lr, stop, csv_logger], verbose=verbose)

# 获取预测值，计算误差， score (tilt, pan and global error).

pred = model.predict((test_array / 255.0 - mean) / std)

mean_tilt_error = np.mean(np.abs(test_df['tilt'] - ((pred[:,0] * t_std + t_mean) * 90.0)))
mean_pan_error = np.mean(np.abs(test_df['pan'] - ((pred[:,1] * p_std + p_mean) * 90.0)))

score = (mean_pan_error + mean_tilt_error) / 2

# 保存模型

model.save(model_path)


# 记录模型结构参数， 数据增强参数， 最终的 score

t_epochs = len(history.history['loss'])

if os.path.exists(model_csv):
    with open(model_csv, "a") as file:
        file.write(model_name + '.h5,%d,%d,%d,%d,%d,%.2f,%.1f,%.2f,%.2f,%.2f,%.2f,%.2f,%.2f,%.2f,%d,%d,%d\n' %
                   (in_size, num_conv_blocks, num_filters_start, num_dense_layers, dense_layer_size, dropout_rate,
                    shift_range, brightness_range[0], brightness_range[1], zoom_range[0], zoom_range[1],
                    mean_tilt_error, mean_pan_error, score, t_epochs, model.count_params(), flops))
else:
    with open(model_csv, "w") as file:
        file.write('model,in_size,num_conv_blocks,num_filters_start,num_dense_layers,dense_layer_size,dropout_rate,'
                   'shift_range,brightness_min,brightness_max,zoom_min,zoom_max,tilt_error,pan_error,score,stop_epochs,num_weights,flops\n')

        file.write(model_name + '.h5,%d,%d,%d,%d,%d,%.2f,%.1f,%.2f,%.2f,%.2f,%.2f,%.2f,%.2f,%.2f,%d,%d,%d\n' %
                   (in_size, num_conv_blocks, num_filters_start, num_dense_layers, dense_layer_size, dropout_rate,
                    shift_range, brightness_range[0], brightness_range[1], zoom_range[0], zoom_range[1],
                    mean_tilt_error, mean_pan_error, score, t_epochs, model.count_params(), flops))

4、pose_estimator_utils.py

获取姿态值

def get_pose(cropped_img, model, img_norm = [0, 1], tilt_norm = [0, 1], pan_norm = [0, 1], rescale = 1):
    '''
    估计头部 tilt 和 pan 值

    Arguments:
        cropped_img: 裁剪侯的图片
        model: Estimator model
        img_norm: 图片的均值和方差
        tilt_norm: tilt 的均值和方差
        pan_norm: pan 的均值和方差
        rescale: 用于调整输出
    Returns:
        tilt
        pan
    '''

5、test_on_dataset.py

在测试集上评估结果

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import numpy as np
import time

from clean_utils import array_from_npy
from architectures import mpatacchiola_generic


dataset_dir = 'clean/aflw_pointing04/'
csv_file = 'test.csv'
npy_file = 'test_img.npy'

dataset_csv = dataset_dir + csv_file
dataset_npy = dataset_dir + npy_file

models_path = 'models/'

estimator_file = 'pose-estimator.h5'
estimator_path = models_path + estimator_file

# 头部检测参数

in_size_detector = 512
confidence_threshold = 0.65

# 头部估计参数

in_size_estimator = 64
num_conv_blocks = 6
num_filters_start = 32
num_dense_layers = 1
dense_layer_size = 512

# 归一化参数

mean = 0.408808
std = 0.237583

t_mean = -0.041212
t_std = 0.323931

p_mean = -0.000276
p_std = 0.540958

# 加载图片，tilt，pan

img, tilt, pan = array_from_npy(dataset_npy, dataset_csv)

# 增加通道数

if len(img.shape) == 3:
    img = np.expand_dims(img, -1)

# 加载模型

pose_estimator = mpatacchiola_generic(in_size_estimator, num_conv_blocks, num_filters_start, num_dense_layers, dense_layer_size)
pose_estimator.load_weights(estimator_path)

# 获取结果并计算误差

start_time = time.time()
pred = pose_estimator.predict((img / 255.0 - mean) / std)
end_time = time.time()

mean_time = (end_time - start_time) / len(img)

mean_tilt_error = np.mean(np.abs(tilt - ((pred[:, 0] * t_std + t_mean) * 90.0)))
mean_pan_error = np.mean(np.abs(pan - ((pred[:, 1] * p_std + p_mean) * 90.0)))

score = (mean_pan_error + mean_tilt_error) / 2

# 打印结果

print("Tilt: %.2fº Pan: %.2fº Global: %.2fº Mean time: %fs" % (mean_tilt_error, mean_pan_error, score, mean_time))

九、姿态估计展示

test.py

opencv 调用摄像头或传入视频进行检测，并计算 FPS

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""
进行模型的效果展示
"""

import cv2
from math import sin, radians
from datetime import datetime

from keras_ssd512 import ssd_512

from architectures import mpatacchiola_generic
from head_detector_utils import get_head_bboxes, get_cropped_pics
from pose_estimator_utils import get_pose

# 模型参数路径

detector_file = 'head-detector.h5'
estimator_file = 'pose-estimator.h5'

models_path = 'models/'

detector_path = models_path + detector_file
estimator_path = models_path + estimator_file

# 头部检测参数

in_size_detector = 512
confidence_threshold = 0.65

# 头部估计参数

in_size_estimator = 64
num_conv_blocks = 6
num_filters_start = 32
num_dense_layers = 1
dense_layer_size = 512

# 归一化参数

mean = 0.408808
std = 0.237583

t_mean = -0.041212
t_std = 0.323931

p_mean = -0.000276
p_std = 0.540958

# 加载模型

head_detector = ssd_512(image_size=(in_size_detector, in_size_detector, 3), n_classes=1, min_scale=0.1, max_scale=1, mode='inference')
head_detector.load_weights(detector_path)

pose_estimator = mpatacchiola_generic(in_size_estimator, num_conv_blocks, num_filters_start, num_dense_layers, dense_layer_size)
pose_estimator.load_weights(estimator_path)

# 获取视频资源

video_source = input("Input stream: ")

# 判断是否为数值

try:
    video_source = int(video_source)
except:
    pass

# 初始化摄像头

cam = cv2.VideoCapture(video_source)
ori_width = int(cam.get(3))
ori_height = int(cam.get(4))

cam.set(cv2.CAP_PROP_SETTINGS, 1)

# 设置输出的文件名，若为 -1 不输出

output_name = input("Output file name (without extension): ")
output_path = output_name + '.mp4'

if output_name != "-1":
    writer = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'mp4v'), cam.get(cv2.CAP_PROP_FPS), (ori_width, ori_height))

# 设置图片是否翻转

flip = None

while flip != 'Y' and flip != 'N':
    flip = input("Flip? (Y/N): ")

# 控制输出是否展示

show = None

while show != 'Y' and show != 'N':
    show = input("Show output? (Y/N): ")

if show == 'Y':
    print("Exit on \'Esc\' or \'Ctrl + C\'.")
else:
    print("Exit on \'Ctrl + C\'.")

# 初始化参数

out = True

frame_count = 0
heads_mean = 0
detection_mean = 0
estimation_mean = 0
fps_mean = 0

#

try:
    while out == True:

        # 获取开始检测时间
        frame_start = datetime.now()


        out, img = cam.read()


        if out == False:
            break

        # 翻转图片
        if flip == 'Y':
            img = cv2.flip(img, 1)

        #  bounding boxes 获取

        detection_start = datetime.now()
        bboxes = get_head_bboxes(img, head_detector, confidence_threshold)

        detection_end = datetime.now()

        # 计算头部检测运行时间

        detection_time = (detection_end - detection_start).total_seconds()

        # 获取裁剪的图片
        gray_pic = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        heads = get_cropped_pics(gray_pic, bboxes, in_size_estimator, 0, cropping='small')


        head_count = 0

        # 重置时间
        estimation_time = 0

        # 遍历所有裁剪后的头部图片
        for i in range(len(heads)):

            cv2.imshow("head", heads[i])

            if heads[i].shape == (in_size_estimator, in_size_estimator):


                head_count = head_count + 1

                # 获取姿态值

                estimation_start = datetime.now()

                tilt, pan = get_pose(heads[i], pose_estimator, img_norm = [mean, std], tilt_norm = [t_mean, t_std],
                                     pan_norm = [p_mean, p_std], rescale=90.0)

                estimation_end = datetime.now()

                # 更新姿态估计时长
                estimation_time = estimation_time + (estimation_end - estimation_start).total_seconds()

                # 获取边界框
                xmin, ymin, xmax, ymax = bboxes[i]

                # 画出边框

                rect = cv2.rectangle(img, (xmax, ymin), (xmin, ymax), (0, 255, 0), 2, lineType=cv2.LINE_AA)
                cv2.putText(rect, 'TILT: ' + str(round(tilt, 2)) + ' PAN: ' + str(round(pan, 2)), (xmin, ymin - 10), cv2.FONT_HERSHEY_DUPLEX, 0.5, (0, 255, 0), 1)

                # 画出姿态的中心

                centerx = int((xmin + xmax) / 2)
                centery = int((ymin + ymax) / 2)
                center = (centerx, centery)

                max_arrow_len = (xmax - xmin + 1) / 2

                offset_x = -1 * int(sin(radians(pan)) * max_arrow_len)
                offset_y = -1 * int(sin(radians(tilt)) * max_arrow_len)

                end = (centerx + offset_x, centery + offset_y)
                cv2.arrowedLine(img, center, end, (0, 0, 255), 2, line_type=cv2.LINE_AA)

        # 展现图片

        if show == 'Y':
            cv2.imshow('Detections', img)
            # cv2.imwrite('./result', img)

        if output_name != "-1":
            writer.write(img)

        # 获取当前运行时间
        frame_end = datetime.now()

        # 计算差值
        total_time = frame_end - frame_start

        # 计算 fps
        fps = 1 / total_time.total_seconds()

        
        print("Frame %d. Heads: %d, Detection: %fs, Estimation: %fs, FPS: %.2f" % (frame_count + 1, head_count, detection_time, estimation_time, fps))

        
        if frame_count != 0:

            # 更新 head count mean.
            heads_mean = heads_mean * ((frame_count - 1) / frame_count) + head_count / frame_count

            # 更新 detection time mean.
            detection_mean = detection_mean * ((frame_count - 1) / frame_count) + detection_time / frame_count

            # 更新 estimation time mean.
            estimation_mean = estimation_mean * ((frame_count - 1) / frame_count) + estimation_time / frame_count

            # 更新 FPS mean.
            fps_mean = fps_mean * ((frame_count - 1) / frame_count) + fps / frame_count

        # 更新 processed frame counter.
        frame_count = frame_count + 1

    
        if show == 'Y' and cv2.waitKey(1) == 27:
            break

except KeyboardInterrupt:
    pass


print("Average. Heads: %.2f, Detection: %fs, Estimation: %fs, FPS: %.2f" % (heads_mean, detection_mean, estimation_mean, fps_mean))


cam.release()


cv2.destroyAllWindows()

-------------------------------- 系统开发 ----------------------------------

十、数据库设计

databaseServer.py

连接数据库实现将结果写入数据库

import pymysql

# sys.path.append(r'D:\python\Aproject\contest\main_win')
# import main_win.Login_Ui_From2



db = pymysql.connect(host='127.0.0.1',
                    port=3306,
                    user='root',
                    password='root',
                    database='mysql',
                    charset='utf8')
cursor = db.cursor()


class sqlData():

    # 注册
    def insert_sql(self, username, phone, password):
        # username = self.lineEdit_User.text()
        # phone = self.lineEdit_3.text()
        # password = self.lineEdit_2.text()
        sql = 'select username from class_project where username="{}"'.format(username)
        cursor.execute(sql)
        if cursor.rowcount:
            print("用户名已存在")
            return
        sql1 = 'insert into class_project (username,phone,password) values ("{}","{}","{}")'.format(username, phone, password)
        cursor.execute(sql1)
        db.commit()
        print("注册成功")
        return


    #登录
    def select_sql(self, username, password):
        # username = self.lineEdit_User.text()
        # password = self.lineEdit_2.text()
        sql = 'select password from class_project where username="{}"'.format(username)
        cursor.execute(sql)
        if not cursor.rowcount:
                print("用户不存在")
                # self.open_new_window()
                return
        if cursor.fetchone()[0] != password:
                print("密码错误")
                return
        print("登录成功")
        return

    def the_rise_rate(self, time, num, username):
        sql = 'insert into rise(time,num,username) values ("{}","{}","{}")'.format(time, num, username)
        cursor.execute(sql)
        db.commit()
        # print("成功")
        return

    def get_data(self, name):
        sql = 'select time,num from rise where username = "{}"'.format(name)
        cursor.execute(sql)

        # # 直接返回元组
        # return cursor.fetchall()

        # 或者返回两个列表
        list1 = []
        list2 = []
        for i in range(0, cursor.rowcount):
            const = cursor.fetchone()
            time = const[0]
            num = const[1]
            list1.append(time)
            list2.append(num)

        return list1,list2