基于CoppeliaSim构建空间非合作目标六自由度跟踪数据集

最新推荐文章于 2024-10-25 01:20:23 发布

Dongz__

最新推荐文章于 2024-10-25 01:20:23 发布

阅读量859

点赞数 3

分类专栏：目标跟踪文章标签：计算机视觉深度学习 tracker 目标跟踪机器学习

本文链接：https://blog.csdn.net/qq_25521779/article/details/117606028

版权

目标跟踪专栏收录该内容

2 篇文章

订阅专栏

文章目录

简介
已解决问题
未解决问题
模型概览
数据集格式
数据集概览

简介

我们想通过仿真平台CoppeliaSim创建一个通用空间非合作目标六自由度跟踪数据集，它由原始数据、标注数据以及实用工具集组成。该数据集将不仅限于六自由度跟踪领域，也可以用于目标检测、二维单目跟踪、二维双目跟踪、三维跟踪、以及目标姿态估计等领域。

原始数据
主要包括双目图像（1024x512），深度图像（1024x512），双目相机投影矩阵；
标注数据
1）2D Annotaions
Object category, Left bbox2d（u, v, w, h）, Right bbox2d（u, v, w, h）, and Truncation level ；
2）3D Annotations
Object position（X, Y, Z）, Object size（L, W, H）, Object orientation（ $\alpha$ , $\beta$ , $\gamma$ ）；
实用工具集
包括数据集Loader，目标检测评估，双目跟踪评估，三维跟踪评估，六自由度跟踪评估、目标姿态估计评估等等；

六自由度跟踪(6DOF Tracking)：在一定初始条件下，实现arbitrary 目标位置、尺寸以及姿态的跟踪。

已解决问题

1. 如何获得CoppeliaSim中vision camera的内参矩阵？

CoppeliaSim是基于perspective projection实现三维到二维的映射，即虚拟相机的实现。因此，我们利用该原理以及CoppeliaSim所提供的参数，反算出vision camera的内参矩阵。
Copellia Vision Camera
相机水平方向投影几何关系如上图所示。由于根据CoppeliaSim提供的vision camera 参数我们并不清楚相机的焦距（相平面所在位置），因此我们先假设其相平面距离为 $f_x$ ，在相平面成像的大小为 $(w, h)$ 。这个假设并不会影响最终得到的内参矩阵，在推导过程中它会被消去。已知目标点在相机坐标系下的三维位置为 $(X, Y, Z)$ ，假定目标点在相平面的投影位置为 $x_i, y_i)$ 。同时，我们也能够获得vision camera的 horizontal perspective angle $\alpha_x$ 。

我们首先可以知道如下关系：
$\frac{x_i}{X} = \frac{f_x}{Z}$
即，
$x_i = \frac{f_x}{Z} X \tag1$
此外，还有
$\frac{w/2}{f_x} = \tan(\frac{\alpha_x}{2})$
即，
$\tan(\frac{\alpha_x}{2})f_x \tag2$
而图像坐标系到像素坐标系存在如下关系：
$\begin{aligned} u &= (x_i+\frac{w}{2}) \centerdot \frac{W}{w} \\ &= x_i \centerdot \frac{W}{w} + \frac{W}{2} \end{aligned}$
其中， $W$ 是vision camera的水平分辨率。将式(1)和式(2)分别代入上式，可得：
$\frac{W}{ 2 \tan(\frac{\alpha_x}{2})Z} X+ \frac{W}{2} \tag3$
同理可得，vision camera垂直方向的变换关系，
$\frac{H}{ 2 \tan(\frac{\alpha_y}{2})Z} Y+ \frac{H}{2} \tag4$
其中，H是vision camera的垂直分辨率， $\alpha_y$ 是vertical perspective angle。CoppeliaSim只能设定大概的 $\alpha$ ，具体 $\alpha_x$ 和 $\alpha_y$ 根据下面代码进行计算：

        ratio = cam_resolution_x / cam_resolution_y
        if ratio > 1:
            angle_x = cam_pers_angle
            angle_y = 2 * math.atan(math.tan(cam_pers_angle / 2) / ratio)

        else:
            angle_x = 2 * math.atan(math.tan(cam_pers_angle / 2) * ratio)
            angle_y = cam_pers_angle

我们将式(3)和式(4)，写成齐次矩阵形式：
$\begin{aligned} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix}= \begin{bmatrix} \frac{W}{ 2 \tan(\frac{\alpha_x}{2})Z} & 0 & \frac{W}{2Z} \\ 0 & \frac{H}{ 2 \tan(\frac{\alpha_y}{2})Z} & \frac{H}{2Z}\\ 0 & 0 & 1/Z \end{bmatrix} \begin{bmatrix} X \\ Y \\ Z \end{bmatrix} \end{aligned}$
两边同乘Z，消掉变换矩阵中的Z变量：
$\begin{aligned} Z\begin{bmatrix} u \\ v \\ 1 \end{bmatrix}= \begin{bmatrix} \frac{W}{2 \tan(\alpha_x/2)} & 0 & \frac{W}{2} \\ 0 & \frac{H}{ 2 \tan(\alpha_y/2)} & \frac{H}{2}\\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} X \\ Y \\ Z \end{bmatrix} \tag5 \end{aligned}$
到此，我们就成功推导出了CoppeliaSim平台中Vision Camera的内参矩阵。至于，Vision camera的外参矩阵（相机投影矩阵）或者双目相机的变换矩阵，就比较容易得到了，这里我们不做赘述。此外，在每个视频序列文件夹中，我们都提供了双目相机的投影矩阵文件(分别用.npz文件保存，3x4维)，并将左目相机坐标系定为参考坐标系。

附上内参矩阵获取代码：

 # get camera intrinsic matrix
    def get_camera_intrinsic_matrix(observer_id, camera_handle):
        # get left camera matrix
        return_code, cam_pers_angle = sim.simxGetObjectFloatParameter(observer_id, camera_handle,
                                                                      sim.sim_visionfloatparam_perspective_angle,
                                                                      sim.simx_opmode_oneshot_wait)
        # print('=> left camera perspective angle: {}'.format(math.degrees(cam_pers_angle)))

        return_code, cam_resolution_x = sim.simxGetObjectIntParameter(observer_id, camera_handle,
                                                                      sim.sim_visionintparam_resolution_x,
                                                                      sim.simx_opmode_oneshot_wait)
        return_code, cam_resolution_y = sim.simxGetObjectIntParameter(observer_id, camera_handle,
                                                                      sim.sim_visionintparam_resolution_y,
                                                                      sim.simx_opmode_oneshot_wait)
        # print('=> the resolution of camera: {}'.format((cam_resolution_x, cam_resolution_y)))
        ratio = cam_resolution_x / cam_resolution_y
        if ratio > 1:
            angle_x = cam_pers_angle
            angle_y = 2 * math.atan(math.tan(cam_pers_angle / 2) / ratio)

        else:
            angle_x = 2 * math.atan(math.tan(cam_pers_angle / 2) * ratio)
            angle_y = cam_pers_angle
        # print('agnleX: {}, angleY:{}'.format(math.degrees(angle_x), math.degrees(angle_y)))

        camera_matrix = np.array([[cam_resolution_x / (2 * math.tan(angle_x / 2)), 0, cam_resolution_x / 2],
                                  [0, cam_resolution_y / (2 * math.tan(angle_y / 2)), cam_resolution_y / 2],
                                  [0, 0, 1]])
        # print('=> camera matrix: {}'.format(camera_matrix))
        return camera_matrix

2. 如何获得vision cmaera 对应的深度图？

最开始，我是利用Kinect相机中的kinect_depth部件单独拿出来，作为深度相机，并于左目vision camera对齐。后面参考别人的blog，利用sim.simxGetVisionSensorDepthBuffer()获取到了和vision camera对齐的深度图。

我看官方的API文档好像提到了，simxGetVisionSensorDepthBuffer()这个接口需要在simxGetVisionSensorImage()后面调用。我没有实际测试，但是还是建议按照这种方式来。
附上视觉图像和深度图获取代码：

	    def image_acquisition(self, target_id, observer_id):
        # get left and depth image
        err_code, left_resolution, left_raw_img = sim.simxGetVisionSensorImage(observer_id, self.left_camera_handle, 0,
                                                                               sim.simx_opmode_oneshot_wait)
        if err_code != sim.simx_return_ok:
            print('=> Failed to acquire left image')
            left_raw_img = None

        err_code, left_depth_resolution, left_depth_img = sim.simxGetVisionSensorDepthBuffer(observer_id, self.left_camera_handle,
                                                                                   sim.simx_opmode_oneshot_wait)
        if err_code != sim.simx_return_ok:
            print('=> Failed to acquire left depth image')
            left_depth_img = None
            
        # pre-processing raw image
        left_img = np.array(left_raw_img, dtype=np.uint8).reshape(left_resolution[1], left_resolution[0], -1)
        left_img = cv2.cvtColor(left_img, cv2.COLOR_RGB2BGR)
        cv2.flip(left_img, 1, left_img)
        # pre-processing raw depth image
        left_depth_img = np.array(left_depth_img, dtype=np.float32).reshape(left_depth_resolution[1], left_depth_resolution[0])
        left_depth_img = left_depth_img * (self.left_cam_far_distance - self.left_cam_near_distance) + self.left_cam_near_distance
        cv2.flip(left_depth_img, 1, left_depth_img)
       
       	cv2.imshow('left image', left_img)
        cv2.imshow('left depth image', cv2.applyColorMap(left_depth_img.astype(np.uint8), cv2.COLORMAP_MAGMA))
        cv2.waitKey(1)
        return left_img, left_depth_img

【注】针对Observer我们写了一个类，上述代码是其中一个类方法，它需要一些类成员变量，例如对象句柄、相机近平面和远平面距离等等，可以自行查找相关文档实现。

3. 如何构建双目相机+深度相机模块，并同时获得双目图像以及深度图像？

在问题2中，我们解决了如何根据Vision Camera获取图像以及深度图像。同理，我们只需要组合两个Vision Camera即可实现双目相机+深度相机的功能。同时，我们将left_camera和right_camera组合起来，命名为observer。
双目相机模块

4. 如何获得被观测目标的六自由度信息，包括位置、尺寸以及姿态？

CoppeliaSim 官方Python API提供了目标位置和姿态的访问接口。sim.simxGetObjectPosition() and sim.simxGetObjectOrientation()

	    # get target position and right related to left camera
        err_code, left_target_pos = sim.simxGetObjectPosition(clientID=target_id, objectHandle=self.target_handle,
                                                              relativeToObjectHandle=self.left_camera_handle,
                                                              operationMode=sim.simx_opmode_oneshot_wait)
        if err_code != sim.simx_return_ok:
            print('=> Failed to acquire target position')
            left_target_pos = None
        err_code, left_target_orientation = sim.simxGetObjectOrientation(clientID=target_id,
                                                                         objectHandle=self.target_handle,
                                                                         relativeToObjectHandle=self.left_camera_handle,
                                                                         operationMode=sim.simx_opmode_oneshot_wait)

        if err_code != sim.simx_return_ok:
            print('=> Failed to acquire target orientation')
            left_target_orientation = None

但是对于目标尺寸，CoppeliaSim并没有直接的API函数可以直接获得。因此，我重写了一个函数，用于获取目标的三维尺寸。需要注意的是，我们设定目标坐标系X、Y、Z轴上的尺寸分别为length、width、height，这对之后标注信息的可视化有影响。

	# get object size
    def get_object_size(object_id, object_handle):
        return_code, x_min = sim.simxGetObjectFloatParameter(object_id, object_handle,
                                                             sim.sim_objfloatparam_objbbox_min_x,
                                                             operationMode=sim.simx_opmode_oneshot_wait)
        return_code, x_max = sim.simxGetObjectFloatParameter(object_id, object_handle,
                                                             sim.sim_objfloatparam_objbbox_max_x,
                                                             operationMode=sim.simx_opmode_oneshot_wait)
        # print('x_max: {}, x_min: {}'.format(x_max, x_min))
        length = x_max - x_min

        return_code, y_min = sim.simxGetObjectFloatParameter(object_id, object_handle,
                                                             sim.sim_objfloatparam_objbbox_min_y,
                                                             operationMode=sim.simx_opmode_oneshot_wait)
        return_code, y_max = sim.simxGetObjectFloatParameter(object_id, object_handle,
                                                             sim.sim_objfloatparam_objbbox_max_y,
                                                             operationMode=sim.simx_opmode_oneshot_wait)
        # print('y_max: {}, y_min: {}'.format(y_max, y_min))
        width = y_max - y_min

        return_code, z_min = sim.simxGetObjectFloatParameter(object_id, object_handle,
                                                             sim.sim_objfloatparam_objbbox_min_z,
                                                             operationMode=sim.simx_opmode_oneshot_wait)
        return_code, z_max = sim.simxGetObjectFloatParameter(object_id, object_handle,
                                                             sim.sim_objfloatparam_objbbox_max_z,
                                                             operationMode=sim.simx_opmode_oneshot_wait)
        # print('z_max: {}, z_min: {}'.format(z_max, z_min))
        height = z_max - z_min

        print('=> the shape of target: {}'.format([length, width, height]))
        return [length, width, height]

【注】：上述代码都是为了获得目标相对左目相机的六自由度信息。

5. 如何将六自由度信息映射到vision camera平面进行可视化？

因为我们已经在问题1中求得了Vision Camera的外参矩阵，因此对于目标六自由度信息的可视化而言，我们只需要求得三维包围框八个顶点在世界坐标系中的坐标，再将其映射到图像平面，最后进行连线即可。

目标六自由度信息：目标三维位置、尺寸以及旋转角

1）顶点坐标计算
在计算顶点坐标前，我们需要明确我们坐标系的定义。一般涉及三个坐标系：Object coordinate system, World coordinate system(Reference cooridnate system), and Camera coordinate system.

Object coordinate system: 以目标中心为原点，X轴是length, Y轴是width, Z轴是height。
World coordinate system: 从左目相机指向右目相机为X轴方向，左目相机光轴方向为Z轴，根绝右手定则，从Z轴顺时针握向X轴，大拇指方向即为Y轴。
Camera coordiante system: 与世界坐标系重合，不做赘述。

坐标系定义
坐标系关系如上图所示。

目标顶点在目标坐标系下的定义：
$V_1=(-l/2, w/2, h/2)$
$V_2=(-l/2, -w/2, h/2)$
$V_3=(l/2, w/2, h/2)$
$V_4=(l/2, -w/2, h/2)$
$V_5=(-l/2, w/2, -h/2)$
$V_6=(-l/2, -w/2, -h/2)$
$V_7=(l/2, w/2, -h/2)$
$V_8=(l/2, -w/2, -h/2)$

此外，沿世界坐标系X轴旋转角为 $\alpha$ ，Y轴旋转角为 $\beta$ ，Z轴旋转角为 $\gamma$ 。sim.simxGetObjectOrientation()接口返回值为[ $\alpha$ , $\beta$ , $\gamma$ ]。实际应用中，我们需要将其转化为旋转矩阵形式。因此，我们附上欧拉角向量转为旋转矩阵的代码：

    def euler_vector_2_rotation_matrix(euler_vector):
        # euler vector format: [alpha, beta, gamma]
        alpha, beta, gamma = euler_vector
        r_x = np.array([[1, 0, 0],
                        [0, np.cos(alpha), -np.sin(alpha)],
                        [0, np.sin(alpha), np.cos(alpha)]])
        r_y = np.array([[np.cos(beta), 0, np.sin(beta)],
                        [0, 1, 0],
                        [-np.sin(beta), 0, np.cos(beta)]])
        r_z = np.array([[np.cos(gamma), -np.sin(gamma), 0],
                        [np.sin(gamma), np.cos(gamma), 0],
                        [0, 0, 1]])
        rotation_matrix = np.matmul(r_x, np.matmul(r_y, r_z))
        return rotation_matrix

最后，附上计算三维包围框顶点（参考坐标系）的代码：

    # get the vertices of 3D bounding box of object
    def bbox3d_vertices_calc(self, object_pos, object_size, euler_angle):
        l, w, h = object_size
        
        v1_obj = np.array([[-l / 2], [w / 2], [h / 2]])
        v2_obj = np.array([[-l / 2], [-w / 2], [h / 2]])
        v3_obj = np.array([[l / 2], [w / 2], [h / 2]])
        v4_obj = np.array([[l / 2], [-w / 2], [h / 2]])
        v5_obj = np.array([[-l / 2], [w / 2], [-h / 2]])
        v6_obj = np.array([[-l / 2], [-w / 2], [-h / 2]])
        v7_obj = np.array([[l / 2], [w / 2], [-h / 2]])
        v8_obj = np.array([[l / 2], [-w / 2], [-h / 2]])
        
        #bbox_3d_obj = np.array([[v1_obj, v2_obj, v3_obj, v4_obj, v5_obj, v6_obj, v7_obj, v8_obj]]).reshape(8, 3).astype(np.float32)
        # print("bbox_3d_obj: \n{}".format(bbox_3d_obj))
        
        object2world_matrix = self.euler_vector_2_rotation_matrix(euler_angle)
        
        object_pos = np.array(object_pos).reshape(3, 1)
        v1_3d = np.matmul(object2world_matrix, v1_obj) + object_pos
        v2_3d = np.matmul(object2world_matrix, v2_obj) + object_pos
        v3_3d = np.matmul(object2world_matrix, v3_obj) + object_pos
        v4_3d = np.matmul(object2world_matrix, v4_obj) + object_pos
        v5_3d = np.matmul(object2world_matrix, v5_obj) + object_pos
        v6_3d = np.matmul(object2world_matrix, v6_obj) + object_pos
        v7_3d = np.matmul(object2world_matrix, v7_obj) + object_pos
        v8_3d = np.matmul(object2world_matrix, v8_obj) + object_pos
        bbox_3d_vertices = np.array([[v1_3d, v2_3d, v3_3d, v4_3d, v5_3d, v6_3d, v7_3d, v8_3d]]).reshape(8, 3).astype(np.float32)
        # print("The vertices of bbox:\n{}".format(bbox_3d_vertices))
        return bbox_3d_vertices

2）顶点坐标转换
在得到三维包围框的八个顶点坐标之后，我们只需要进行简单的矩阵运算，就可以得到八个顶点在像素坐标系下的二维坐标。但是，需要注意的是，我们的相机外参矩阵 $\in R^{3 \times 4}$ 。所以，需要对之前计算的三维顶点坐标进行扩维，补一即可。
具体代码如下：

 # map the vertices of bbo3d to image plane
    def bbox3d_to_bbox2d(self, bbox_3d, left_camera_matrix, right_camera_matrix):
        one = np.ones(8).reshape(8, 1)
        bbox_3d = np.concatenate((bbox_3d, one), axis=1)

        left_v1_2d = np.matmul(left_camera_matrix, bbox_3d[0])
        left_v2_2d = np.matmul(left_camera_matrix, bbox_3d[1])
        left_v3_2d = np.matmul(left_camera_matrix, bbox_3d[2])
        left_v4_2d = np.matmul(left_camera_matrix, bbox_3d[3])
        left_v5_2d = np.matmul(left_camera_matrix, bbox_3d[4])
        left_v6_2d = np.matmul(left_camera_matrix, bbox_3d[5])
        left_v7_2d = np.matmul(left_camera_matrix, bbox_3d[6])
        left_v8_2d = np.matmul(left_camera_matrix, bbox_3d[7])

        right_v1_2d = np.matmul(right_camera_matrix, bbox_3d[0])
        right_v2_2d = np.matmul(right_camera_matrix, bbox_3d[1])
        right_v3_2d = np.matmul(right_camera_matrix, bbox_3d[2])
        right_v4_2d = np.matmul(right_camera_matrix, bbox_3d[3])
        right_v5_2d = np.matmul(right_camera_matrix, bbox_3d[4])
        right_v6_2d = np.matmul(right_camera_matrix, bbox_3d[5])
        right_v7_2d = np.matmul(right_camera_matrix, bbox_3d[6])
        right_v8_2d = np.matmul(right_camera_matrix, bbox_3d[7])

        l_v1_2d = left_v1_2d[:2] / left_v1_2d[2]
        l_v2_2d = left_v2_2d[:2] / left_v2_2d[2]
        l_v3_2d = left_v3_2d[:2] / left_v3_2d[2]
        l_v4_2d = left_v4_2d[:2] / left_v4_2d[2]
        l_v5_2d = left_v5_2d[:2] / left_v5_2d[2]
        l_v6_2d = left_v6_2d[:2] / left_v6_2d[2]
        l_v7_2d = left_v7_2d[:2] / left_v7_2d[2]
        l_v8_2d = left_v8_2d[:2] / left_v8_2d[2]

        r_v1_2d = right_v1_2d[:2] / right_v1_2d[2]
        r_v2_2d = right_v2_2d[:2] / right_v2_2d[2]
        r_v3_2d = right_v3_2d[:2] / right_v3_2d[2]
        r_v4_2d = right_v4_2d[:2] / right_v4_2d[2]
        r_v5_2d = right_v5_2d[:2] / right_v5_2d[2]
        r_v6_2d = right_v6_2d[:2] / right_v6_2d[2]
        r_v7_2d = right_v7_2d[:2] / right_v7_2d[2]
        r_v8_2d = right_v8_2d[:2] / right_v8_2d[2]
        left_bbox_2d = np.array([[l_v1_2d], [l_v2_2d], [l_v3_2d], [l_v4_2d],
                                 [l_v5_2d], [l_v6_2d], [l_v7_2d], [l_v8_2d]]).reshape((8, 2)).astype(np.int64)
        right_bbox_2d = np.array([[r_v1_2d], [r_v2_2d], [r_v3_2d], [r_v4_2d],
                                  [r_v5_2d], [r_v6_2d], [r_v7_2d], [r_v8_2d]]).reshape((8, 2)).astype(np.int64)
        # print("The projected vertices of bbox in left image:\n{}".format(left_bbox_2d))
        # print("The projected vertices of bbox in right image:\n{}".format(right_bbox_2d))
        return left_bbox_2d, right_bbox_2d

3）包围框绘制
最后就是根据之前给出的几何关系，连接三维包围框顶点在二维图像平面中的各个点。
代码如下：

 # drawing the bbox3d in image plane
    def bbox3d_display(self, image, bbox3d_vertices, color, thickness):
        cv2.putText(image, 'V1', tuple(bbox3d_vertices[0]), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
        cv2.putText(image, 'V2', tuple(bbox3d_vertices[1]), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
        cv2.putText(image, 'V3', tuple(bbox3d_vertices[2]), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
        cv2.putText(image, 'V4', tuple(bbox3d_vertices[3]), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
        
        cv2.line(image, tuple(bbox3d_vertices[0]), tuple(bbox3d_vertices[1]), color=color, thickness=thickness)
        cv2.line(image, tuple(bbox3d_vertices[0]), tuple(bbox3d_vertices[2]), color=color, thickness=thickness)
        cv2.line(image, tuple(bbox3d_vertices[0]), tuple(bbox3d_vertices[4]), color=color, thickness=thickness)
        cv2.line(image, tuple(bbox3d_vertices[1]), tuple(bbox3d_vertices[3]), color=color, thickness=thickness)
        cv2.line(image, tuple(bbox3d_vertices[1]), tuple(bbox3d_vertices[5]), color=color, thickness=thickness)
        cv2.line(image, tuple(bbox3d_vertices[2]), tuple(bbox3d_vertices[3]), color=color, thickness=thickness)
        cv2.line(image, tuple(bbox3d_vertices[2]), tuple(bbox3d_vertices[6]), color=color, thickness=thickness)
        cv2.line(image, tuple(bbox3d_vertices[3]), tuple(bbox3d_vertices[7]), color=color, thickness=thickness)
        cv2.line(image, tuple(bbox3d_vertices[4]), tuple(bbox3d_vertices[5]), color=color, thickness=thickness)
        cv2.line(image, tuple(bbox3d_vertices[4]), tuple(bbox3d_vertices[6]), color=color, thickness=thickness)
        cv2.line(image, tuple(bbox3d_vertices[5]), tuple(bbox3d_vertices[7]), color=color, thickness=thickness)
        cv2.line(image, tuple(bbox3d_vertices[6]), tuple(bbox3d_vertices[7]), color=color, thickness=thickness)
        return image

最终六自由度信息在双目图像上的可视化结果如下图所示：
六自由度信息在右目图像上的可视化结果
六自由度信息在左目图像上的可视化结果

6. 发现六自由度运动目标可视化总出问题，尤其是当Eular angle角度跨越90，-90°时？

我发现当目标旋转角跨越90°或-90°时，目标包围框的可视化结果会发生跳变。同时，有一些情况，目标包围框的可视化结果与实际情况不太相符。

最开始，我认为是多线程的问题导致的（数据读取线程和控制线程是异步的），亦或是双目图像+深度图像读取时延导致的（图像数据和目标信息的时间不一致）。

但经过排查，并不是多线程或者数据读取延时导致的，而是由于cv2.Rodrigues()函数导致的。我以为它能够将Eular angle vector直接转化成Rotation Matrix。但实际上它存在一些问题（我没有进一步探究其原因），所以导致了大部分时候包围框是准的，只有各别情况包围框不对。之后，我重写了欧拉角向量转换函数euler_vector_2_rotation_matrix()，问题成功解决了。

    def euler_vector_2_rotation_matrix(euler_vector):
        # euler vector format: [alpha, beta, gamma]
        alpha, beta, gamma = euler_vector
        r_x = np.array([[1, 0, 0],
                        [0, np.cos(alpha), -np.sin(alpha)],
                        [0, np.sin(alpha), np.cos(alpha)]])
        r_y = np.array([[np.cos(beta), 0, np.sin(beta)],
                        [0, 1, 0],
                        [-np.sin(beta), 0, np.cos(beta)]])
        r_z = np.array([[np.cos(gamma), -np.sin(gamma), 0],
                        [np.sin(gamma), np.cos(gamma), 0],
                        [0, 0, 1]])
        rotation_matrix = np.matmul(r_x, np.matmul(r_y, r_z))
        return rotation_matrix

7. 如何获得目标二维包围框(left bbox2d)？能否由bbox3d转成left and right bbox2d？

由于我要借助CoppeliaSim仿真平台构造通用空间非合作目标六自由度跟踪数据集，那么目标二维包围框（bbox2d）是必须的。由上面几个问题可以知道，通过平台Python API获取目标六自由度信息（bbox3d）是比较容易的。但能否由bbox3d得到bbox2d呢？

我们首先想到的是最小外接包围框的方法，即获得bbox3d的八个顶点在图像坐标系中的边界值。如下图所示：
右目图像bbox2d
左目图像bbox2d 可以明显的看到，left bbox2d 和 right bbox2d 的包围效果非常不好，包含了较大的非目标区域。所以，我们借助先验知识，融合深度信息，利用分割算法，得到了非常精准的bbox2d。
refined left bbox2d
refined right