BIWI头部姿态估计数据集使用说明与点云图像融合

最新推荐文章于 2023-07-09 23:50:04 发布

我是全宇宙ENERGE的总量

最新推荐文章于 2023-07-09 23:50:04 发布

阅读量3.5k

点赞数 8

分类专栏： Python 机器学习文章标签：深度学习 python 点云

本文链接：https://blog.csdn.net/weixin_43038346/article/details/120097266

版权

Python 同时被 2 个专栏收录

14 篇文章

订阅专栏

机器学习

11 篇文章

订阅专栏

BIWI头部姿态估计数据集使用说明与点云图像融合

前言

因为之前使用BIWI数据集的时候遇到了一些困难，因此在几经探索之后，我觉得有必要和大家分享一下，帮助大家少走一些弯路。

官方的使用说明存放在readme.txt中，我也引用如下：

The database contains 24 sequences acquired with a Kinect sensor. 20 people (some were recorded twice - 6 women and 14 men) were recorded while turning their heads, sitting in front of the sensor, at roughly one meter of distance.

For each sequence, the corresponding .obj file represents a head template deformed to match the neutral face of that specific person.

In each folder, two .cal files contain calibration information for the depth and the color camera, e.g., the intrinsic camera matrix of the depth camera and the global rotation and translation to the rgb camera.

For each frame, a _rgb.png and a _depth.bin files are provided, containing color and depth data. The depth (in mm) is already segmented (the background is removed using a threshold on the distance) and the binary files compressed (an example c code is provided to show how to read the depth data into memory).

The _pose.txt files contain the ground truth information, i.e., the location of the center of the head in 3D and the head rotation, encoded as a 3x3 rotation matrix.

数据集结构

数据集中一共有24个文件夹，由20个志愿者参与制作。其中有几个志愿者在不同场景下多次出现。每一个文件夹下提供以下的内容：

Kinect-RGB相机内参与外参 ------> rgb.cal

Kinect-深度相机内参与外参 ------> depth.cal

数据是以连续的帧的形式标记的，对于每一帧都提供了RGB图片、深度图以及姿态真值。例如：

frame_00003_rgb.png

frame_00003_depth.bin

frame_00003_pose.txt

稍后我将逐个讲解每一个文件的使用方法。

相机参数

下面是某一个文件中的深度相机参数depth.cal文件：

# depth.cal
575.816 0 320 
0 575.816 240 
0 0 1 

0 0 0 0 

1 0 0 
0 1 0 
0 0 1 

0 0 0 

640 480

其中1-3行就是深度相机的内参，而最底部一行是深度图的大小，也就是640*480的图片。其它几行在深度相机参数中没有得到体现，我们再来看看RGB相机参数rgb.cal文件。

#  rgb.cal
517.679 0 320 
0 517.679 240.5 
0 0 1 

0 0 0 0 

0.999947 0.00432361 0.00929419 
-0.00446314 0.999877 0.0150443 
-0.009228 -0.015085 0.999844 

-24.0198 5.8896 -13.2308 

640 480

第5-7行旋转矩阵 $R_{3\times3}$ ，而第8行是平移向量 $C$ 。

深度图的读取

深度图.bin文件不能直接读取，因为其采用了压缩的结构。此部分直接展示Python语言编写的读取代码，而C版本的读取方法可以参考数据集中提供的iosample.cpp。

import struct
import numpy as np
p = 0
width = struct.unpack('i', f.read(4))[0]
height = struct.unpack('i', f.read(4))[0]
while p < width*height:
    empty_num = struct.unpack('i', f.read(4))[0]
    for i in range(empty_num):
        depth_image.append(0)
        full_num = struct.unpack('i', f.read(4))[0]
        for i in range(full_num):
            depth = struct.unpack('h', f.read(2))[0]
            depth_image.append(depth)
            p += empty_num + full_num
depth_image = np.reshape(depth_image, (height, width))
depth_image = depth_image / depth_image.max() * 255
depth_image = depth_image.astype('uint8')

至此，就可以使用OpenCV将生成的图片显示出来。

姿态真值信息

姿态真值信息存放在_pose.txt中，如下所示：

# _pose.txt
0.995837 0.0107805 0.090512 
-0.0196103 0.995068 0.0972386 
-0.0890173 -0.0986087 0.991137 

69.6054 -6.72574 882.439

1-3行存放的是头部三维旋转矩阵，而第5行存放的头部中心所处的位置（世界坐标）。

点云的获取

原理

depth_image变量中存放的是像素坐标下的深度信息，而使用深度相机的内参可以将其转换为世界坐标下的点。

Python实现

point_set = np.zero((1, 3))				# 存放点云
img3d = np.zeros((480, 640, 3))
for y in range(0, 480):
	for x in range(0, 640):
        d = depth_image[y][x]
        if d > 0:
            img3d[y][x][0] = d * (float(x) - depth_intrinsic[0][2]) / depth_intrinsic[0][0]
            img3d[y][x][1] = d * (float(y) - depth_intrinsic[1][2]) / depth_intrinsic[1][1]
            img3d[y][x][2] = d
            points_set = np.concatenate((points_set, np.asarray(img3d[y][x]).reshape((1, 3))))
		else:
            img3d[y][x] = [0, 0, 0]

点云图像融合

点云图像融合是思路是——将RGB图像的每一个像素转换到世界坐标系下，遍历上一步中得到的点云中的点，得到该点的颜色信息。

由于点云已经是位于世界坐标系中，且RGB相机外参也是相对于深度相机的旋转与平移，因此我们需要将将RGB图像的每一个像素转换到世界坐标系，而不是相反的转换。

将RGB图像像素坐标转换为世界坐标的代码如下：

def get_color(points_set):
    rgb_intrinsic = np.genfromtxt('.../rgb.cal', skip_footer=6)
    R_1 = np.genfromtxt('.../rgb.cal', skip_footer=2, skip_header=5)
    R = np.zeros((4, 4))
    R[0:3, 0:3] = R_1
    R[3][3] = 1
    c = np.genfromtxt('.../rgb.cal', skip_header=9, skip_footer=1)
    T = np.eye(4)
    T[0:3, 3] = -c
    PI = rgb_intrinsic.dot(np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])).dot(R).dot(T)
    points_set = np.hstack((points_set, np.ones((points_set.shape[0], 1)))).T
    print(points_set.shape)
    return PI.dot(points_set)

对点云颜色赋值的代码如下：

rgb_point_set = get_color(point_set)
rgb_point_set[0, :] /= rgb_point_set[2, :]
rgb_point_set[1, :] /= rgb_point_set[2, :]
color_set = []
for i in range(rgb_point_set.shape[1]):
    loc = rgb_point_set[:, i]
    x = int(loc[0])
    y = int(loc[1])
    color_set.append(img[y-460][x-64])			# 460和64为经验上获得的值
color_set = np.array(color_set) / 255

还需要对矩阵进行一些调整，如BGR转RGB，坐标系X轴与Y轴交换。

point_set_1 = np.ones(point_set.shape)
point_set_1[:, 0] = -point_set[:, 1]
point_set_1[:, 1] = point_set[:, 0]
point_set_1[:, 2] = point_set[:, 2]

color_set_1 = np.ones(point_set.shape)
color_set_1[:, 0] = color_set[:, 2]
color_set_1[:, 1] = color_set[:, 1]
color_set_1[:, 2] = color_set[:, 0]

最后使用open3d将点云展示：

import open3d as o3d

point_cloud = o3d.PointCloud()
point_cloud.points = o3d.Vector3dVector(point_set_1)
point_cloud.colors = o3d.Vector3dVector(color_set_1)
o3d.draw_geometries([point_cloud])

显示效果如下：

在这里插入图片描述

完整代码

import cv2
import numpy as np
import struct
from math import ceil, floor
import open3d as o3d

def get_depth_image(file):
    p = 0
    depth_image = []
    points_set = np.zeros((1, 3))
    depth_intrinsic = np.genfromtxt('.../depth.cal', skip_footer=6)
    with open(file, 'rb') as f:
        width = struct.unpack('i', f.read(4))[0]
        height = struct.unpack('i', f.read(4))[0]
        while p < width*height:
            empty_num = struct.unpack('i', f.read(4))[0]
            for i in range(empty_num):
                depth_image.append(0)
            full_num = struct.unpack('i', f.read(4))[0]
            for i in range(full_num):
                depth = struct.unpack('h', f.read(2))[0]
                depth_image.append(depth)
            p += empty_num + full_num
        depth_image = np.reshape(depth_image, (height, width))
        depth_image = depth_image / depth_image.max() * 255
        depth_image = depth_image.astype('uint8')
        img3d = np.zeros((480, 640, 3))
        for y in range(0, 480):
            for x in range(0, 640):
                d = depth_image[y][x]
                if d > 0:
                    img3d[y][x][0] = d * (float(x) - depth_intrinsic[0][2]) / depth_intrinsic[0][0]
                    img3d[y][x][1] = d * (float(y) - depth_intrinsic[1][2]) / depth_intrinsic[1][1]
                    img3d[y][x][2] = d
                    points_set = np.concatenate((points_set, np.asarray(img3d[y][x]).reshape((1, 3))))
                else:
                    img3d[y][x] = [0, 0, 0]
        return np.delete(points_set, 0, axis=0)


def get_color(points_set):
    rgb_intrinsic = np.genfromtxt('.../rgb.cal', skip_footer=6)
    R_1 = np.genfromtxt('.../rgb.cal', skip_footer=2, skip_header=5)
    R = np.zeros((4, 4))
    R[0:3, 0:3] = R_1
    R[3][3] = 1
    c = np.genfromtxt('.../rgb.cal', skip_header=9, skip_footer=1)
    T = np.eye(4)
    T[0:3, 3] = -c
    PI = rgb_intrinsic.dot(np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])).dot(R).dot(T)
    points_set = np.hstack((points_set, np.ones((points_set.shape[0], 1)))).T
    return PI.dot(points_set)

img = cv2.imread('.../frame_00054_rgb.png')
point_set = get_depth_image('.../frame_00054_depth.bin')

cv2.imshow('img', img)

rgb_point_set = get_color(point_set)
rgb_point_set[0, :] /= rgb_point_set[2, :]
rgb_point_set[1, :] /= rgb_point_set[2, :]
color_set = []
for i in range(rgb_point_set.shape[1]):
    loc = rgb_point_set[:, i]
    x = int(loc[0])
    y = int(loc[1])
    color_set.append(img[y-460][x-64])
    if i % 10000 == 0:
        print(img[y-480][x-64])
color_set = np.array(color_set) / 255

print(color_set.shape)
print(point_set.shape)

point_cloud = o3d.PointCloud()
point_set_1 = np.ones(point_set.shape)
point_set_1[:, 0] = -point_set[:, 1]
point_set_1[:, 1] = point_set[:, 0]
point_set_1[:, 2] = point_set[:, 2]

color_set_1 = np.ones(point_set.shape)
color_set_1[:, 0] = color_set[:, 2]
color_set_1[:, 1] = color_set[:, 1]
color_set_1[:, 2] = color_set[:, 0]
point_cloud.points = o3d.Vector3dVector(point_set_1)
point_cloud.colors = o3d.Vector3dVector(color_set_1)
o3d.draw_geometries([point_cloud])
 2]

color_set_1 = np.ones(point_set.shape)
color_set_1[:, 0] = color_set[:, 2]
color_set_1[:, 1] = color_set[:, 1]
color_set_1[:, 2] = color_set[:, 0]
point_cloud.points = o3d.Vector3dVector(point_set_1)
point_cloud.colors = o3d.Vector3dVector(color_set_1)
o3d.draw_geometries([point_cloud])