Gaze Estimation视线估计数据集处理知识（MPIIFaceGaze、EyeDiap、Gaze360、ETH-Gaze）

晚风何处来

已于 2024-02-12 00:24:06 修改

阅读量1.8k

点赞数 32

分类专栏：视线估计文章标签：机器学习计算机视觉 python pytorch

于 2024-02-11 02:24:26 首次发布

本文链接：https://blog.csdn.net/Justineone/article/details/135004075

版权

视线估计专栏收录该内容

2 篇文章 2 订阅

订阅专栏

本文将逐步介绍四个数据集的处理知识，四个数据集均为脸部图像数据，并非眼部数据集。

主要介绍四个数据集的处理代码和处理过程，数据集介绍看主页另一篇博客。

Gaze Estimation人脸数据集学习（MPIIFaceGaze、EyeDiap、Gaze360、ETH-Gaze）-CSDN博客https://blog.csdn.net/Justineone/article/details/134931879?spm=1001.2014.3001.5502

本文所有处理代码均来自北航GazeHub@Phi-ai Lab

MPIIFaceGaze

MPIIFaceGaze原始数据集下载网址：It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation

数据集处理过程如下：

准备工作：
- 定义了数据集的根目录、样本列表目录以及输出结果目录等路径，导入需要的库函数。
数据处理主函数：ImageProcessing_MPII()：
- 遍历每个人的样本列表。
- 调用 ImageProcessing_Person() 函数处理每个人的数据。
处理每个人的数据函数：ImageProcessing_Person()：
- 归一化。读取相机矩阵和注释信息：从数据集中读取相机矩阵，这些矩阵用于将像素坐标转换为真实世界坐标。同时，从注释文件中读取人脸、眼睛和头部姿态等信息。、利用相机矩阵和注释信息，计算出需要的归一化参数，如人脸中心点位置、头部姿态等。、根据计算得到的归一化参数，将图像进行平移、旋转和缩放等变换，使得人脸处于固定的位置和大小，方便后续处理。
- 裁剪。根据注释信息中的眼睛关键点坐标，计算出眼睛的位置信息。利用眼睛位置信息，对归一化后的图像进行裁剪操作，提取出左眼和右眼的图像。
- 翻转。翻转操作是根据左右眼标志来判断是否需要将图像进行水平翻转，以保持数据的一致性。根据注释信息中的左右眼标志，判断当前处理的是左眼图像还是右眼图像。如果当前处理的是右眼图像，则对图像进行水平翻转，以确保所有图像都以左眼为基准。同时，更新相关注释信息中的坐标信息，使其与翻转后的图像相匹配。对于翻转后的右眼图像，需要将左右眼图像和注释信息进行交换，以保持左右眼对应关系。
注释解析函数：AnnoDecode()：
将处理后得到的眼部和头部信息，添加到标签文件中：
- out["left_left_corner"] = annotation[2:4]：提取左眼左角的坐标信息，这些信息在注释数组的索引位置 2 到 3 之间。
- out["left_right_corner"] = annotation[4:6]：提取左眼右角的坐标信息，这些信息在注释数组的索引位置 4 到 5 之间。
- out["right_left_corner"] = annotation[6:8]：提取右眼左角的坐标信息，这些信息在注释数组的索引位置 6 到 7 之间。
- out["right_right_corner"] = annotation[8:10]：提取右眼右角的坐标信息，这些信息在注释数组的索引位置 8 到 9 之间。
- out["headrotvectors"] = annotation[14:17]：提取头部旋转向量的信息，这些信息在注释数组的索引位置 14 到 16 之间。
- out["headtransvectors"] = annotation[17:20]：提取头部平移向量的信息，这些信息在注释数组的索引位置 17 到 19 之间。
- out["facecenter"] = annotation[20:23]：提取人脸中心点的坐标信息，这些信息在注释数组的索引位置 20 到 22 之间。
- out["target"] = annotation[23:26]：提取注视目标的坐标信息，这些信息在注释数组的索引位置 23 到 25 之间。

数据集预处理完整代码及注释如下：

import numpy as np
import scipy.io as sio
import cv2
import os
import sys

sys.path.append("../core/")  # 添加核心功能模块的路径
import data_processing_core as dpc  # 导入数据处理核心模块

root = "/home/cyh/GazeDataset20200519/Original/MPIIFaceGaze"  # 数据集根目录
sample_root = "/home/cyh/GazeDataset20200519/Original/MPIIGaze/Origin/Evaluation Subset/sample list for eye image"  # 样本列表根目录
out_root = "/home/cyh/GazeDataset20200519/FaceBased/MPIIFaceGaze"  # 输出结果根目录
scale = True  # 是否缩放标志


def ImageProcessing_MPII():
    persons = os.listdir(sample_root)  # 获取样本人物列表
    persons.sort()  # 排序，保持一致性
    for person in persons:
        sample_list = os.path.join(sample_root, person)  # 获取当前人物的样本列表路径

        person = person.split(".")[0]  # 获取人物名称
        im_root = os.path.join(root, person)  # 当前人物的图像数据路径
        anno_path = os.path.join(root, person, f"{person}.txt")  # 当前人物的注释文件路径

        im_outpath = os.path.join(out_root, "Image", person)  # 图像输出路径
        label_outpath = os.path.join(out_root, "Label", f"{person}.label")  # 标签输出路径

        if not os.path.exists(im_outpath):
            os.makedirs(im_outpath)
        if not os.path.exists(os.path.join(out_root, "Label")):
            os.makedirs(os.path.join(out_root, "Label"))

        print(f"开始处理 {person}")
        ImageProcessing_Person(im_root, anno_path, sample_list, im_outpath, label_outpath, person)


def ImageProcessing_Person(im_root, anno_path, sample_list, im_outpath, label_outpath, person):
    # 读取相机矩阵
    camera = sio.loadmat(os.path.join(f"{im_root}", "Calibration", "Camera.mat"))
    camera = camera["cameraMatrix"]

    # 读取注释
    annotation = os.path.join(anno_path)
    with open(annotation) as infile:
        anno_info = infile.readlines()
    anno_dict = {line.split(" ")[0]: line.strip().split(" ")[1:-1] for line in anno_info}

    # 创建标签文件
    outfile = open(label_outpath, 'w')
    outfile.write("Face Left Right Origin WhichEye 3DGaze 3DHead 2DGaze 2DHead Rmat Smat GazeOrigin\n")
    if not os.path.exists(os.path.join(im_outpath, "face")):
        os.makedirs(os.path.join(im_outpath, "face"))
    if not os.path.exists(os.path.join(im_outpath, "left")):
        os.makedirs(os.path.join(im_outpath, "left"))
    if not os.path.exists(os.path.join(im_outpath, "right")):
        os.makedirs(os.path.join(im_outpath, "right"))

    # 图像处理
    with open(sample_list) as infile:
        im_list = infile.readlines()
        total = len(im_list)

    for count, info in enumerate(im_list):

        progressbar = "".join(["\033[41m%s\033[0m" % '   '] * int(count / total * 20))
        progressbar = "\r" + progressbar + f" {count}|{total}"
        print(progressbar, end="", flush=True)

        # 读取图像信息
        im_info, which_eye = info.strip().split(" ")
        day, im_name = im_info.split("/")
        im_number = int(im_name.split(".")[0])

        # 读取图像注释和图像
        im_path = os.path.join(im_root, day, im_name)
        im = cv2.imread(im_path)
        annotation = anno_dict[im_info]
        annotation = AnnoDecode(annotation)
        origin = annotation["facecenter"]

        # 图像归一化
        norm = dpc.norm(center=annotation["facecenter"],
                        gazetarget=annotation["target"],
                        headrotvec=annotation["headrotvectors"],
                        imsize=(224, 224),
                        camparams=camera)

        im_face = norm.GetImage(im)

        # 裁剪左眼图像
        llc = norm.GetNewPos(annotation["left_left_corner"])
        lrc = norm.GetNewPos(annotation["left_right_corner"])
        im_left = norm.CropEye(llc, lrc)
        im_left = dpc.EqualizeHist(im_left)

        # 裁剪右眼图像
        rlc = norm.GetNewPos(annotation["right_left_corner"])
        rrc = norm.GetNewPos(annotation["right_right_corner"])
        im_right = norm.CropEye(rlc, rrc)
        im_right = dpc.EqualizeHist(im_right)

        # 获取基本信息
        gaze = norm.GetGaze(scale=scale)
        head = norm.GetHeadRot(vector=True)
        origin = norm.GetCoordinate(annotation["facecenter"])
        rvec, svec = norm.GetParams()

        # 当为右眼时翻转图像
        if which_eye == "left":
            pass
        elif which_eye == "right":
            im_face = cv2.flip(im_face, 1)
            im_left = cv2.flip(im_left, 1)
            im_right = cv2.flip(im_right, 1)

            temp = im_left
            im_left = im_right
            im_right = temp

            gaze = dpc.GazeFlip(gaze)
            head = dpc.HeadFlip(head)
            origin[0] = -origin[0]

        gaze_2d = dpc.GazeTo2d(gaze)
        head_2d = dpc.HeadTo2d(head)

        # 保存获取到的信息
        cv2.imwrite(os.path.join(im_outpath, "face", str(count + 1) + ".jpg"), im_face)
        cv2.imwrite(os.path.join(im_outpath, "left", str(count + 1) + ".jpg"), im_left)
        cv2.imwrite(os.path.join(im_outpath, "right", str(count + 1) + ".jpg"), im_right)

        save_name_face = os.path.join(person, "face", str(count + 1) + ".jpg")
        save_name_left = os.path.join(person, "left", str(count + 1) + ".jpg")
        save_name_right = os.path.join(person, "right", str(count + 1) + ".jpg")

        save_origin = im_info
        save_flag = which_eye
        save_gaze = ",".join(gaze.astype("str"))
        save_head = ",".join(head.astype("str"))
        save_gaze2d = ",".join(gaze_2d.astype("str"))
        save_head2d = ",".join(head_2d.astype("str"))
        save_rvec = ",".join(rvec.astype("str"))
        save_svec = ",".join(svec.astype("str"))
        origin = ",".join(origin.astype("str"))

        save_str = " ".join(
            [save_name_face, save_name_left, save_name_right, save_origin, save_flag, save_gaze, save_head, save_gaze2d,
             save_head2d, save_rvec, save_svec, origin])

        outfile.write(save_str + "\n")
    print("")
    outfile.close()


def AnnoDecode(anno_info):
    annotation = np.array(anno_info).astype("float32")
    out = {}
    out["left_left_corner"] = annotation[2:4]
    out["left_right_corner"] = annotation[4:6]
    out["right_left_corner"] = annotation[6:8]
    out["right_right_corner"] = annotation[8:10]
    out["headrotvectors"] = annotation[14:17]
    out["headtransvectors"] = annotation[17:20]
    out["facecenter"] = annotation[20:23]
    out["target"] = annotation[23:26]
    return out


if __name__ == "__main__":
    ImageProcessing_MPII()

EyeDiap（Face）

EyeDiap原始数据集下载链接：EYEDIAP — EN

数据预处理过程如下：

数据收集和准备：遍历数据集文件夹，获取视频序列路径，并读取头部姿态、眼球注视和屏幕坐标等注释信息。同时解析相机参数，包括相机内参、旋转矩阵和平移向量。
视频处理和图像抽取：使用 OpenCV 打开视频文件，并逐帧处理。对于每一帧图像，计算头部姿态的旋转和平移向量，以及眼睛在屏幕坐标系下的注视点。根据眼睛位置信息裁剪图像，并将其保存为左右眼图像和面部图像。
归一化：
- 在代码中，归一化主要通过 dpc.norm() 函数实现。该函数根据输入的参数，对眼部图像进行归一化处理，使其适应模型的训练需求。
- 具体来说，该函数接受的参数包括眼部中心坐标、注视点坐标、头部旋转向量、图像尺寸以及相机参数等。
- 在函数内部，根据这些参数，进行了以下归一化处理：
  - 将眼部图像和注视点映射到相同的参考坐标系中，以确保输入数据的一致性。
  - 根据头部旋转向量，将眼部图像旋转至与相机坐标系一致的角度。
  - 根据相机参数，将眼部图像进行缩放，以适应模型的输入尺寸要求。
增强：
- 在代码中，增强的主要方式包括图像的裁剪和坐标的随机扰动。
- 图像的裁剪通过 norm.CropEyeWithCenter() 函数实现，该函数根据给定的眼睛中心坐标，对眼部图像进行裁剪，以获得更多的训练样本。
- 坐标的随机扰动是通过在眼部中心坐标和注视点坐标上添加随机噪声实现的。这样可以使模型更好地适应不同的眼部位置和注视点位置。
标签信息整理和存储：将处理后的图像保存到相应的目录下，并将图像路径及相关标签信息写入标签文件，包括面部、左右眼图像路径，注视点坐标等。

数据集预处理完整代码及注释如下：

import numpy as np
import scipy.io as sio
import cv2
import os
import sys

sys.path.append("../core/")
import data_processing_core as dpc  # 导入自定义数据处理模块

root = "/home/cyh/GazeDataset20200519/Original/EyeDiap/Data"  # EyeDiap数据集原始路径
out_root = "/home/cyh/GazeDataset20200519/FaceBased/EyeDiap_temp"  # 处理后的数据集输出路径
scale = False  # 是否对眼睛进行尺度缩放的标志位


def ImageProcessing_Diap():
    folders = os.listdir(root)  # 获取数据集中所有子文件夹
    folders.sort(key=lambda x: int(x.split("_")[0]))  # 根据文件夹名称排序

    count_dict = {}  # 初始化计数字典
    for i in range(20):
        count_dict[str(i)] = 0

    for folder in folders:
        if "FT" not in folder:
            video_path = os.path.join(root, folder, "rgb_vga.mov")  # 视频路径
            head_path = os.path.join(root, folder, "head_pose.txt")  # 头部姿态路径
            anno_path = os.path.join(root, folder, "eye_tracking.txt")  # 眼球注视路径
            camparams_path = os.path.join(root, folder, "rgb_vga_calibration.txt")  # 相机参数路径
            target_path = os.path.join(root, folder, "screen_coordinates.txt")  # 屏幕坐标路径

            number = int(folder.split("_")[0])  # 获取人物编号
            count = count_dict[str(number)]  # 获取当前人物编号对应的计数
            person = "p" + str(number)  # 人物标识符

            im_outpath = os.path.join(out_root, "Image", person)  # 图像输出路径
            label_outpath = os.path.join(out_root, "Label", f"{person}.label")  # 标签输出路径

            if not os.path.exists(im_outpath):
                os.makedirs(im_outpath)
            if not os.path.exists(os.path.join(out_root, "Label")):
                os.makedirs(os.path.join(out_root, "Label"))
            if not os.path.exists(label_outpath):
                with open(label_outpath, 'w') as outfile:
                    outfile.write("Face Left Right metapath 3DGaze 3DHead 2DGaze 2DHead Rvec Svec GazeOrigin\n")

            print(f"开始处理 p{number}: {folder}")
            count_dict[str(number)] = ImageProcessing_PerVideos(video_path, head_path, anno_path, camparams_path,
                                                                target_path, im_outpath, label_outpath, folder, count,
                                                                person)


def ImageProcessing_PerVideos(video_path, head_path, anno_path, camparams_path, target_path, im_outpath, label_outpath,
                              folder, count, person):
    # 读取注释信息
    with open(head_path) as infile:
        head_info = infile.readlines()
    with open(anno_path) as infile:
        anno_info = infile.readlines()
    with open(target_path) as infile:
        target_info = infile.readlines()
    length = len(target_info) - 1  # 获取注释信息的长度

    # 读取相机参数
    cam_info = CamParamsDecode(camparams_path)
    camera = cam_info["intrinsics"]  # 相机内参
    cam_rot = cam_info["R"]  # 相机旋转矩阵
    cam_trans = cam_info["T"] * 1000  # 相机平移向量

    # 读取视频
    cap = cv2.VideoCapture(video_path)

    # 创建标签文件
    outfile = open(label_outpath, 'a')
    if not os.path.exists(os.path.join(im_outpath, "left")):
        os.makedirs(os.path.join(im_outpath, "left"))

    if not os.path.exists(os.path.join(im_outpath, "right")):
        os.makedirs(os.path.join(im_outpath, "right"))

    if not os.path.exists(os.path.join(im_outpath, "face")):
        os.makedirs(os.path.join(im_outpath, "face"))

    num = 1
    # 图像处理
    for index in range(1, length + 1):
        ret, frame = cap.read()  # 读取视频帧

        if (index - 1) % 15 != 0:
            continue

        # 显示进度条
        progressbar = "".join(["\033[41m%s\033[0m" % '   '] * int(index / length * 20))
        progressbar = "\r" + progressbar + f" {index}|{length}"
        print(progressbar, end="", flush=True)

        # 计算头部姿态的旋转矩阵，并转换为相机坐标系下的旋转矩阵
        head = head_info[index]
        head = list(map(eval, head.strip().split(";")))
        if len(head) != 13:
            print("[头部信息错误]")
            continue

        head_rot = head[1:10]
        head_rot = np.array(head_rot).reshape([3, 3])
        head1 = cv2.Rodrigues(head_rot)[0].T[0]
        head2d = dpc.HeadTo2d(head1)
        print(head2d, end="------")

        head_rot = np.dot(cam_rot, head_rot)
        head1 = cv2.Rodrigues(head_rot)[0].T[0]
        head2d = dpc.HeadTo2d(head1)
        print(head2d, end="------")
        exit()

        # 将头部旋转向量转换为相机坐标系下
        head_trans = np.array(head[10:13]) * 1000
        head_trans = np.dot(cam_rot, head_trans)
        head_trans = head_trans + cam_trans

        # 计算眼球在相机坐标系下的位置，用于归一化图像
        anno = anno_info[index]
        anno = list(map(eval, anno.strip().split(";")))
        if len(anno) != 19:
            print("[注释信息错误]")
            continue
        anno = np.array(anno)

        left3d = anno[13:16] * 1000
        left3d = np.dot(cam_rot, left3d) + cam_trans
        right3d = anno[16:19] * 1000
        right3d = np.dot(cam_rot, right3d) + cam_trans

        face3d = (left3d + right3d) / 2
        face3d = (face3d + head_trans) / 2

        left2d = anno[1:3]
        right2d = anno[3:5]

        # 计算目标在屏幕上的注视点
        target = target_info[index]
        target = list(map(eval, target.strip().split(";")))
        if len(target) != 6:
            print("[目标信息错误]")
            continue
        target3d = np.array(target)[3:6] * 1000

        # 归一化左眼图像
        norm = dpc.norm(center=face3d,
                        gazetarget=target3d,
                        headrotvec=head_rot,
                        imsize=(224, 224),
                        camparams=camera)

        # 获取关键信息
        im_face = norm.GetImage(frame)
        gaze = norm.GetGaze(scale=scale)
        head = norm.GetHeadRot(vector=False)
        head = cv2.Rodrigues(head)[0].T[0]

        origin = norm.GetCoordinate(face3d)
        rvec, svec = norm.GetParams()

        gaze2d = dpc.GazeTo2d(gaze)
        head2d = dpc.HeadTo2d(head)

        # 裁剪眼睛图像
        left2d = norm.GetNewPos(left2d)
        right2d = norm.GetNewPos(right2d)

        im_left = norm.CropEyeWithCenter(left2d)
        im_left = dpc.EqualizeHist(im_left)
        im_right = norm.CropEyeWithCenter(right2d)
        im_right = dpc.EqualizeHist(im_right)

        # 保存图像和标签信息
        cv2.imwrite(os.path.join(im_outpath, "face", str(count + num) + ".jpg"), im_face)
        cv2.imwrite(os.path.join(im_outpath, "left", str(count + num) + ".jpg"), im_left)
        cv2.imwrite(os.path.join(im_outpath, "right", str(count + num) + ".jpg"), im_right)

        save_name_face = os.path.join(person, "face", str(count + num) + ".jpg")
        save_name_left = os.path.join(person, "left", str(count + num) + ".jpg")
        save_name_right = os.path.join(person, "right", str(count + num) + ".jpg")
        save_metapath = folder + f"_{index}"
        save_gaze = ",".join(gaze.astype("str"))
        save_head = ",".join(head.astype("str"))
        save_gaze2d = ",".join(gaze2d.astype("str"))
        save_head2d = ",".join(head2d.astype("str"))
        save_origin = ",".join(origin.astype("str"))
        save_rvec = ",".join(rvec.astype("str"))
        save_svec = ",".join(svec.astype("str"))

        save_str = " ".join(
            [save_name_face, save_name_left, save_name_right, save_metapath, save_gaze, save_head, save_gaze2d,
             save_head2d, save_rvec, save_svec, save_origin])
        outfile.write(save_str + "\n")
        num += 1

    count += (num - 1)
    outfile.close()
    print("")
    return count


def CamParamsDecode(path):
    cal = {}
    fh = open(path, 'r')
    # 解析相机参数文件
    fh.readline().strip()  # 读取 [resolution] 部分
    cal['size'] = [int(val) for val in fh.readline().strip().split(';')]  # 读取图像分辨率
    cal['size'] = cal['size'][0], cal['size'][1]
    # 读取 [intrinsics] 部分
    fh.readline().strip()
    vals = []
    for i in range(3):
        vals.append([float(val) for val in fh.readline().strip().split(';')])
    cal['intrinsics'] = np.array(vals).reshape(3, 3)  # 相机内参
    # 读取 [R] 部分
    fh.readline().strip()
    vals = []
    for i in range(3):
        vals.append([float(val) for val in fh.readline().strip().split(';')])
    cal['R'] = np.array(vals).reshape(3, 3)  # 相机旋转矩阵
    # 读取 [T] 部分
    fh.readline().strip()
    vals = []
    for i in range(3):
        vals.append([float(val) for val in fh.readline().strip().split(';')])
    cal['T'] = np.array(vals).reshape(3)  # 相机平移向量
    fh.close()
    return cal


if __name__ == "__main__":
    ImageProcessing_Diap()

Gaze360

Gaze360原始数据集下载链接：Research | MIT CSAIL

预处理过程如下：

加载元数据文件中的录制信息、注视方向、人脸和眼睛的边界框等数据。
创建用于保存图像和标签的文件夹结构，并打开标签文件以准备写入数据。
遍历每张图像，根据人脸和眼睛的边界框信息裁剪图像，并将裁剪后的人脸、左眼和右眼图像保存到相应的文件夹中。
对于每张图像，计算注视方向的二维坐标，并将图像路径、注视方向、人脸原始路径等信息写入标签文件。
关闭所有标签文件，完成数据预处理。

预处理完整代码及注释如下：

import numpy as np
import scipy.io as sio
import cv2
import os
import sys

sys.path.append("../core/")  # 添加自定义核心模块的路径
import data_processing_core as dpc  # 导入数据处理核心模块

root = "F:/facegaze/LSUN/"  # 原始数据集根目录
out_root = "F:/facegaze/LSUN/Gaze360pro"  # 预处理后数据输出目录


def ImageProcessing_Gaze360():
    msg = sio.loadmat(os.path.join(root, "metadata.mat"))  # 加载元数据文件

    recordings = msg["recordings"]  # 录制信息
    gazes = msg["gaze_dir"]  # 注视方向
    head_bbox = msg["person_head_bbox"]  # 人头边界框
    face_bbox = msg["person_face_bbox"]  # 人脸边界框
    lefteye_bbox = msg["person_eye_left_bbox"]  # 左眼边界框
    righteye_bbox = msg["person_eye_right_bbox"]  # 右眼边界框
    splits = msg["splits"]  # 数据集划分信息

    split_index = msg["split"]  # 数据集划分索引
    recording_index = msg["recording"]  # 录制索引
    person_index = msg["person_identity"]  # 人物身份索引
    frame_index = msg["frame"]  # 帧索引

    total_num = recording_index.shape[1]  # 图片总数
    outfiles = []  # 存储标签文件对象的列表

    # 建立用于保存图像和标签的文件夹结构
    if not os.path.exists(os.path.join(out_root, "Label")):
        os.makedirs(os.path.join(out_root, "Label"))

    for i in range(4):
        if not os.path.exists(os.path.join(out_root, "Image", splits[0, i][0])):
            os.makedirs(os.path.join(out_root, "Image", splits[0, i][0], "Left"))
            os.makedirs(os.path.join(out_root, "Image", splits[0, i][0], "Right"))
            os.makedirs(os.path.join(out_root, "Image", splits[0, i][0], "Face"))

        outfiles.append(open(os.path.join(out_root, "Label", f"{splits[0, i][0]}.label"), 'w'))
        outfiles[i].write("Face Left Right Origin 3DGaze 2DGaze\n")  # 写入标签文件头部

    # 处理每张图像
    for i in range(total_num):
        im_path = os.path.join(root, "imgs",
                               recordings[0, recording_index[0, i]][0],
                               "head", '%06d' % person_index[0, i],
                               '%06d.jpg' % frame_index[0, i]
                               )  # 构建图片路径

        progressbar = "".join(["\033[41m%s\033[0m" % '   '] * int(i / total_num * 20))
        progressbar = "\r" + progressbar + f" {i}|{total_num}"
        print(progressbar, end="", flush=True)

        if (face_bbox[i] == np.array([-1, -1, -1, -1])).all():  # 如果没有检测到人脸，则跳过
            continue

        category = splits[0, split_index[0, i]][0]  # 图片类别
        gaze = gazes[i]  # 注视方向向量

        img = cv2.imread(im_path)  # 读取图片
        face = CropFaceImg(img, head_bbox[i], face_bbox[i])  # 裁剪人脸区域图像
        lefteye = CropEyeImg(img, head_bbox[i], lefteye_bbox[i])  # 裁剪左眼区域图像
        righteye = CropEyeImg(img, head_bbox[i], righteye_bbox[i])  # 裁剪右眼区域图像

        # 将裁剪后的图像保存到输出文件夹中
        cv2.imwrite(os.path.join(out_root, "Image", category, "Face", f"{i + 1}.jpg"), face)
        cv2.imwrite(os.path.join(out_root, "Image", category, "Left", f"{i + 1}.jpg"), lefteye)
        cv2.imwrite(os.path.join(out_root, "Image", category, "Right", f"{i + 1}.jpg"), righteye)

        gaze2d = GazeTo2d(gaze)  # 将注视方向向量转换为二维坐标

        # 构建保存标签信息的字符串
        save_name_face = os.path.join(category, "Face", f"{i + 1}.jpg")
        save_name_left = os.path.join(category, "Left", f"{i + 1}.jpg")
        save_name_right = os.path.join(category, "Right", f"{i + 1}.jpg")

        save_origin = os.path.join(recordings[0, recording_index[0, i]][0],
                                   "head", "%06d" % person_index[0, i], "%06d.jpg" % frame_index[0, i])

        save_gaze = ",".join(gaze.astype("str"))
        save_gaze2d = ",".join(gaze2d.astype("str"))

        save_str = " ".join([save_name_face, save_name_left, save_name_right, save_origin, save_gaze, save_gaze2d])
        outfiles[split_index[0, i]].write(save_str + "\n")  # 将标签信息写入文件

    for i in outfiles:
        i.close()


def GazeTo2d(gaze):
    yaw = np.arctan2(gaze[0], -gaze[2])  # 计算水平方向的角度
    pitch = np.arcsin(gaze[1])  # 计算俯仰方向的角度
    return np.array([yaw, pitch])  # 返回二维注视方向坐标


def CropFaceImg(img, head_bbox, cropped_bbox):
    bbox = np.array([(cropped_bbox[0] - head_bbox[0]) / head_bbox[2],  # 计算裁剪区域的相对位置
                     (cropped_bbox[1] - head_bbox[1]) / head_bbox[3],
                     cropped_bbox[2] / head_bbox[2], cropped_bbox[3] / head_bbox[3]])

    size = np.array([img.shape[1], img.shape[0]])  # 获取图片尺寸

    bbox_pixel = np.concatenate([bbox[:2] * size, bbox[2:] * size]).astype("int")  # 计算裁剪区域的像素坐标

    # 找到图片中心，并根据最大长度裁剪出正方形头部图像
    center = np.array([bbox_pixel[0] + bbox_pixel[2] // 2, bbox_pixel[1] + bbox_pixel[3] // 2])
    length = int(max(bbox_pixel[2], bbox_pixel[3]) / 2)

    center[0] = max(center[0], length)
    center[1] = max(center[1], length)

    result = img[(center[1] - length): (center[1] + length),
             (center[0] - length): (center[0] + length)]

    result = cv2.resize(result, (224, 224))  # 将图像调整为指定大小
    return result


def CropEyeImg(img, head_bbox, cropped_bbox):
    bbox = np.array([(cropped_bbox[0] - head_bbox[0]) / head_bbox[2],
                     (cropped_bbox[1] - head_bbox[1]) / head_bbox[3],
                     cropped_bbox[2] / head_bbox[2], cropped_bbox[3] / head_bbox[3]])

    size = np.array([img.shape[1], img.shape[0]])

    bbox_pixel = np.concatenate([bbox[:2] * size, bbox[2:] * size]).astype("int")

    center = np.array([bbox_pixel[0] + bbox_pixel[2] // 2, bbox_pixel[1] + bbox_pixel[3] // 2])
    height = bbox_pixel[3] / 36
    weight = bbox_pixel[2] / 60
    ratio = max(height, weight)

    size = np.array([ratio * 30, ratio * 18]).astype("int")

    center[0] = max(center[0], size[0])
    center[1] = max(center[1], size[1])

    result = img[(center[1] - size[1]): (center[1] + size[1]),
             (center[0] - size[0]): (center[0] + size[0])]

    result = cv2.resize(result, (60, 36))
    return result


if __name__ == "__main__":
    ImageProcessing_Gaze360()

ETH-Gaze

ETH-Gaze原始数据集下载网址：https://ait.ethz.ch/projects/2020/ETH-XGaze

预处理步骤如下：

设置路径和参数：定义数据路径 path、图像输出路径 imo_path 和标签输出路径 annoo_path。设置是否为测试模式 test，决定是否包含面部注视信息。
写入标签文件头部信息：
- 如果不是测试模式，创建标签文件并写入列头信息包括：面部、注视、头部姿态、原始图像路径、摄像头索引、帧索引、规范化矩阵。
- 如果是测试模式，创建标签文件并写入列头信息包括：面部、头部姿态、原始图像路径、摄像头索引、帧索引、规范化矩阵。
定义处理人员数据的函数 process_person：
- 打开HDF5格式的数据文件，根据是否为测试模式选择相应的数据字段。
- 遍历数据文件中的每个样本，获取图像数据和相关信息。
- 将人脸姿态、摄像头索引、帧索引、规范化矩阵以及（如果不是测试模式）面部注视信息写入标签文件。
遍历每个人的数据文件夹：
- 获取每个人的数据文件夹列表并按名称排序。
- 对于每个人的数据文件夹：
  - 提取人员ID，并构建人员数据文件路径。
  - 调用 process_person 函数处理当前人的数据。

预处理完整代码及注释：

import os
import h5py
import numpy as np
import cv2

# 数据路径
path = "F:/facegaze/LSUN/ETH-Gaze/test"
# 图像输出路径
imo_path = "F:/facegaze/LSUN/ETH-Gaze/Image/test"
# 标签输出路径
annoo_path = "F:/facegaze/LSUN/ETH-Gaze/Label/test.label"
# 是否为测试模式
test = False

# 如果图像输出路径不存在，则创建路径
if not os.path.exists(imo_path):
    os.makedirs(imo_path)

# 如果标签文件路径的父目录不存在，则创建路径
if not os.path.exists(os.path.dirname(annoo_path)):
    os.makedirs(os.path.dirname(annoo_path))

# 根据是否为测试模式，写入不同列头信息到标签文件
if not test:
    with open(annoo_path, 'w') as outfile:
        outfile.write("face gaze head origin cam_index frame_index normmat\n")
else:
    with open(annoo_path, 'w') as outfile:
        outfile.write("face head origin cam_index frame_index normmat\n")

# 处理每个人的数据
def process_person(h5files_path, imo_metapath, sub_id, annoo_metapath, begin_num, test):
    datas = h5py.File(h5files_path, 'r')
    # 根据是否为测试模式，选择数据字段
    if not test:
        keys = ["cam_index", "face_gaze", "face_head_pose",
                "face_mat_norm", "face_patch", "frame_index"]
    else:
        keys = ["cam_index", "face_head_pose",
                "face_mat_norm", "face_patch", "frame_index"]
    length = datas[keys[0]].shape[0]
    print(f"==> Length: {length}")

    # 创建当前人的图像输出路径
    imo_path = os.path.join(imo_metapath, sub_id)
    if not os.path.exists(imo_path):
        os.makedirs(imo_path)

    # 打开标签文件，逐一处理每个样本数据
    with open(annoo_metapath, 'a') as outfile:
        for i in range(length):
            img = datas["face_patch"][i, :]
            # 保存人脸图像
            cv2.imwrite(os.path.join(imo_path, f"{begin_num}.jpg"), img)

            im_path = os.path.join(sub_id, f"{begin_num}.jpg")
            head = ",".join(list(datas["face_head_pose"][i, :].astype("str")))
            norm_mat = ",".join(list(datas["face_mat_norm"][i, :].astype("str").flatten()))
            cam_index = ",".join(list(datas["cam_index"][i, :].astype("str")))
            frame_index = ",".join(list(datas["frame_index"][i, :].astype("str")))
            # 如果不是测试模式，则写入面部姿态、摄像头索引、帧索引、规范化矩阵和标签文件
            if not test:
                gaze = ",".join(list(datas["face_gaze"][i, :].astype("str")))
                outfile.write(
                    f"{im_path} {gaze} {head} {os.path.join(sub_id, str(i) + '.jpg')} {cam_index} {frame_index} {norm_mat}\n")
            else:
                # 如果是测试模式，则不包含面部注视信息，写入标签文件
                outfile.write(
                    f"{im_path} {head} {os.path.join(sub_id, str(i) + '.jpg')} {cam_index} {frame_index} {norm_mat}\n")
            begin_num += 1
    datas.close()
    return begin_num

# 遍历每个人的数据文件夹
filenames = os.listdir(path)
filenames.sort()
num = 1

for count, filename in enumerate(filenames):
    print(f"Processing.. {filename}, [{count}/{len(filenames)}]")
    sub_id = filename.split(".")[0]
    file_path = os.path.join(path, filename)
    # 处理当前人的数据
    num = process_person(file_path,
                         imo_path,
                         sub_id,
                         annoo_path,
                         num,
                         test)

总结

如果大家需要原始数据集或者处理后数据集网盘资源，可在评论区留言，关注点赞哦😀

晚风何处来

关注

32
点赞
踩
38

收藏

觉得还不错? 一键收藏
41
评论
Gaze Estimation视线估计数据集处理知识（MPIIFaceGaze、EyeDiap、Gaze360、ETH-Gaze）

Gaze Estimation人脸数据集学习（MPIIFaceGaze、EyeDiap、Gaze360、ETH-Gaze）
复制链接

扫一扫