简介
我们想通过仿真平台CoppeliaSim创建一个通用空间非合作目标六自由度跟踪数据集,它由原始数据、标注数据以及实用工具集组成。该数据集将不仅限于六自由度跟踪领域,也可以用于目标检测、二维单目跟踪、二维双目跟踪、三维跟踪、以及目标姿态估计等领域。
-
原始数据
主要包括双目图像(1024x512),深度图像(1024x512),双目相机投影矩阵; -
标注数据
1)2D Annotaions
Object category, Left bbox2d(u, v, w, h), Right bbox2d(u, v, w, h), and Truncation level ;
2)3D Annotations
Object position(X, Y, Z), Object size(L, W, H), Object orientation( α \alpha α, β \beta β, γ \gamma γ); -
实用工具集
包括数据集Loader,目标检测评估,双目跟踪评估,三维跟踪评估,六自由度跟踪评估、目标姿态估计评估等等;
六自由度跟踪(6DOF Tracking):在一定初始条件下,实现arbitrary 目标位置、尺寸以及姿态的跟踪。
已解决问题
1. 如何获得CoppeliaSim中vision camera的内参矩阵?
CoppeliaSim是基于perspective projection实现三维到二维的映射,即虚拟相机的实现。因此,我们利用该原理以及CoppeliaSim所提供的参数,反算出vision camera的内参矩阵。
相机水平方向投影几何关系如上图所示。由于根据CoppeliaSim提供的vision camera 参数我们并不清楚相机的焦距(相平面所在位置),因此我们先假设其相平面距离为
f
x
f_x
fx,在相平面成像的大小为
(
w
,
h
)
(w, h)
(w,h)。这个假设并不会影响最终得到的内参矩阵,在推导过程中它会被消去。已知目标点在相机坐标系下的三维位置为
(
X
,
Y
,
Z
)
(X, Y, Z)
(X,Y,Z),假定目标点在相平面的投影位置为
(
x
i
,
y
i
)
(x_i, y_i)
(xi,yi)。同时,我们也能够获得vision camera的 horizontal perspective angle
α
x
\alpha_x
αx。
我们首先可以知道如下关系:
x
i
X
=
f
x
Z
\frac{x_i}{X} = \frac{f_x}{Z}
Xxi=Zfx
即,
x
i
=
f
x
Z
X
(1)
x_i = \frac{f_x}{Z} X \tag1
xi=ZfxX(1)
此外,还有
w
/
2
f
x
=
tan
(
α
x
2
)
\frac{w/2}{f_x} = \tan(\frac{\alpha_x}{2})
fxw/2=tan(2αx)
即,
w
=
2
tan
(
α
x
2
)
f
x
(2)
w = 2 \tan(\frac{\alpha_x}{2})f_x \tag2
w=2tan(2αx)fx(2)
而图像坐标系到像素坐标系存在如下关系:
u
=
(
x
i
+
w
2
)
⋅
W
w
=
x
i
⋅
W
w
+
W
2
\begin{aligned} u &= (x_i+\frac{w}{2}) \centerdot \frac{W}{w} \\ &= x_i \centerdot \frac{W}{w} + \frac{W}{2} \end{aligned}
u=(xi+2w)⋅wW=xi⋅wW+2W
其中,
W
W
W是vision camera的水平分辨率。将式(1)和式(2)分别代入上式,可得:
u
=
W
2
tan
(
α
x
2
)
Z
X
+
W
2
(3)
u = \frac{W}{ 2 \tan(\frac{\alpha_x}{2})Z} X+ \frac{W}{2} \tag3
u=2tan(2αx)ZWX+2W(3)
同理可得,vision camera垂直方向的变换关系,
v
=
H
2
tan
(
α
y
2
)
Z
Y
+
H
2
(4)
v = \frac{H}{ 2 \tan(\frac{\alpha_y}{2})Z} Y+ \frac{H}{2} \tag4
v=2tan(2αy)ZHY+2H(4)
其中,H是vision camera的垂直分辨率,
α
y
\alpha_y
αy是vertical perspective angle
。CoppeliaSim只能设定大概的
α
\alpha
α,具体
α
x
\alpha_x
αx和
α
y
\alpha_y
αy根据下面代码进行计算:
ratio = cam_resolution_x / cam_resolution_y
if ratio > 1:
angle_x = cam_pers_angle
angle_y = 2 * math.atan(math.tan(cam_pers_angle / 2) / ratio)
else:
angle_x = 2 * math.atan(math.tan(cam_pers_angle / 2) * ratio)
angle_y = cam_pers_angle
我们将式(3)和式(4),写成齐次矩阵形式:
[
u
v
1
]
=
[
W
2
tan
(
α
x
2
)
Z
0
W
2
Z
0
H
2
tan
(
α
y
2
)
Z
H
2
Z
0
0
1
/
Z
]
[
X
Y
Z
]
\begin{aligned} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix}= \begin{bmatrix} \frac{W}{ 2 \tan(\frac{\alpha_x}{2})Z} & 0 & \frac{W}{2Z} \\ 0 & \frac{H}{ 2 \tan(\frac{\alpha_y}{2})Z} & \frac{H}{2Z}\\ 0 & 0 & 1/Z \end{bmatrix} \begin{bmatrix} X \\ Y \\ Z \end{bmatrix} \end{aligned}
⎣⎡uv1⎦⎤=⎣⎢⎡2tan(2αx)ZW0002tan(2αy)ZH02ZW2ZH1/Z⎦⎥⎤⎣⎡XYZ⎦⎤
两边同乘Z,消掉变换矩阵中的Z变量:
Z
[
u
v
1
]
=
[
W
2
tan
(
α
x
/
2
)
0
W
2
0
H
2
tan
(
α
y
/
2
)
H
2
0
0
1
]
[
X
Y
Z
]
(5)
\begin{aligned} Z\begin{bmatrix} u \\ v \\ 1 \end{bmatrix}= \begin{bmatrix} \frac{W}{2 \tan(\alpha_x/2)} & 0 & \frac{W}{2} \\ 0 & \frac{H}{ 2 \tan(\alpha_y/2)} & \frac{H}{2}\\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} X \\ Y \\ Z \end{bmatrix} \tag5 \end{aligned}
Z⎣⎡uv1⎦⎤=⎣⎢⎡2tan(αx/2)W0002tan(αy/2)H02W2H1⎦⎥⎤⎣⎡XYZ⎦⎤(5)
到此,我们就成功推导出了CoppeliaSim平台中Vision Camera的内参矩阵。至于,Vision camera的外参矩阵(相机投影矩阵)或者双目相机的变换矩阵,就比较容易得到了,这里我们不做赘述。此外,在每个视频序列文件夹中,我们都提供了双目相机的投影矩阵文件(分别用.npz文件保存,3x4维),并将左目相机坐标系定为参考坐标系。
附上内参矩阵获取代码:
# get camera intrinsic matrix
def get_camera_intrinsic_matrix(observer_id, camera_handle):
# get left camera matrix
return_code, cam_pers_angle = sim.simxGetObjectFloatParameter(observer_id, camera_handle,
sim.sim_visionfloatparam_perspective_angle,
sim.simx_opmode_oneshot_wait)
# print('=> left camera perspective angle: {}'.format(math.degrees(cam_pers_angle)))
return_code, cam_resolution_x = sim.simxGetObjectIntParameter(observer_id, camera_handle,
sim.sim_visionintparam_resolution_x,
sim.simx_opmode_oneshot_wait)
return_code, cam_resolution_y = sim.simxGetObjectIntParameter(observer_id, camera_handle,
sim.sim_visionintparam_resolution_y,
sim.simx_opmode_oneshot_wait)
# print('=> the resolution of camera: {}'.format((cam_resolution_x, cam_resolution_y)))
ratio = cam_resolution_x / cam_resolution_y
if ratio > 1:
angle_x = cam_pers_angle
angle_y = 2 * math.atan(math.tan(cam_pers_angle / 2) / ratio)
else:
angle_x = 2 * math.atan(math.tan(cam_pers_angle / 2) * ratio)
angle_y = cam_pers_angle
# print('agnleX: {}, angleY:{}'.format(math.degrees(angle_x), math.degrees(angle_y)))
camera_matrix = np.array([[cam_resolution_x / (2 * math.tan(angle_x / 2)), 0, cam_resolution_x / 2],
[0, cam_resolution_y / (2 * math.tan(angle_y / 2)), cam_resolution_y / 2],
[0, 0, 1]])
# print('=> camera matrix: {}'.format(camera_matrix))
return camera_matrix
2. 如何获得vision cmaera 对应的深度图?
最开始,我是利用Kinect相机中的kinect_depth
部件单独拿出来,作为深度相机,并于左目vision camera对齐。后面参考别人的blog,利用sim.simxGetVisionSensorDepthBuffer()
获取到了和vision camera对齐的深度图。
我看官方的API文档好像提到了,simxGetVisionSensorDepthBuffer()
这个接口需要在simxGetVisionSensorImage()
后面调用。我没有实际测试,但是还是建议按照这种方式来。
附上视觉图像和深度图获取代码:
def image_acquisition(self, target_id, observer_id):
# get left and depth image
err_code, left_resolution, left_raw_img = sim.simxGetVisionSensorImage(observer_id, self.left_camera_handle, 0,
sim.simx_opmode_oneshot_wait)
if err_code != sim.simx_return_ok:
print('=> Failed to acquire left image')
left_raw_img = None
err_code, left_depth_resolution, left_depth_img = sim.simxGetVisionSensorDepthBuffer(observer_id, self.left_camera_handle,
sim.simx_opmode_oneshot_wait)
if err_code != sim.simx_return_ok:
print('=> Failed to acquire left depth image')
left_depth_img = None
# pre-processing raw image
left_img = np.array(left_raw_img, dtype=np.uint8).reshape(left_resolution[1], left_resolution[0], -1)
left_img = cv2.cvtColor(left_img, cv2.COLOR_RGB2BGR)
cv2.flip(left_img, 1, left_img)
# pre-processing raw depth image
left_depth_img = np.array(left_depth_img, dtype=np.float32).reshape(left_depth_resolution[1], left_depth_resolution[0])
left_depth_img = left_depth_img * (self.left_cam_far_distance - self.left_cam_near_distance) + self.left_cam_near_distance
cv2.flip(left_depth_img, 1, left_depth_img)
cv2.imshow('left image', left_img)
cv2.imshow('left depth image', cv2.applyColorMap(left_depth_img.astype(np.uint8), cv2.COLORMAP_MAGMA))
cv2.waitKey(1)
return left_img, left_depth_img
【注】针对Observer我们写了一个类,上述代码是其中一个类方法,它需要一些类成员变量,例如对象句柄、相机近平面和远平面距离等等,可以自行查找相关文档实现。
3. 如何构建双目相机+深度相机模块,并同时获得双目图像以及深度图像?
在问题2中,我们解决了如何根据Vision Camera获取图像以及深度图像。同理,我们只需要组合两个Vision Camera即可实现双目相机+深度相机的功能。同时,我们将left_camera
和right_camera
组合起来,命名为observer
。
4. 如何获得被观测目标的六自由度信息,包括位置、尺寸以及姿态?
CoppeliaSim 官方Python API提供了目标位置和姿态的访问接口。sim.simxGetObjectPosition()
and sim.simxGetObjectOrientation()
# get target position and right related to left camera
err_code, left_target_pos = sim.simxGetObjectPosition(clientID=target_id, objectHandle=self.target_handle,
relativeToObjectHandle=self.left_camera_handle,
operationMode=sim.simx_opmode_oneshot_wait)
if err_code != sim.simx_return_ok:
print('=> Failed to acquire target position')
left_target_pos = None
err_code, left_target_orientation = sim.simxGetObjectOrientation(clientID=target_id,
objectHandle=self.target_handle,
relativeToObjectHandle=self.left_camera_handle,
operationMode=sim.simx_opmode_oneshot_wait)
if err_code != sim.simx_return_ok:
print('=> Failed to acquire target orientation')
left_target_orientation = None
但是对于目标尺寸,CoppeliaSim并没有直接的API函数可以直接获得。因此,我重写了一个函数,用于获取目标的三维尺寸。需要注意的是,我们设定目标坐标系X、Y、Z轴上的尺寸分别为length
、width
、height
,这对之后标注信息的可视化有影响。
# get object size
def get_object_size(object_id, object_handle):
return_code, x_min = sim.simxGetObjectFloatParameter(object_id, object_handle,
sim.sim_objfloatparam_objbbox_min_x,
operationMode=sim.simx_opmode_oneshot_wait)
return_code, x_max = sim.simxGetObjectFloatParameter(object_id, object_handle,
sim.sim_objfloatparam_objbbox_max_x,
operationMode=sim.simx_opmode_oneshot_wait)
# print('x_max: {}, x_min: {}'.format(x_max, x_min))
length = x_max - x_min
return_code, y_min = sim.simxGetObjectFloatParameter(object_id, object_handle,
sim.sim_objfloatparam_objbbox_min_y,
operationMode=sim.simx_opmode_oneshot_wait)
return_code, y_max = sim.simxGetObjectFloatParameter(object_id, object_handle,
sim.sim_objfloatparam_objbbox_max_y,
operationMode=sim.simx_opmode_oneshot_wait)
# print('y_max: {}, y_min: {}'.format(y_max, y_min))
width = y_max - y_min
return_code, z_min = sim.simxGetObjectFloatParameter(object_id, object_handle,
sim.sim_objfloatparam_objbbox_min_z,
operationMode=sim.simx_opmode_oneshot_wait)
return_code, z_max = sim.simxGetObjectFloatParameter(object_id, object_handle,
sim.sim_objfloatparam_objbbox_max_z,
operationMode=sim.simx_opmode_oneshot_wait)
# print('z_max: {}, z_min: {}'.format(z_max, z_min))
height = z_max - z_min
print('=> the shape of target: {}'.format([length, width, height]))
return [length, width, height]
【注】:上述代码都是为了获得目标相对左目相机的六自由度信息。
5. 如何将六自由度信息映射到vision camera平面进行可视化?
因为我们已经在问题1中求得了Vision Camera的外参矩阵,因此对于目标六自由度信息的可视化而言,我们只需要求得三维包围框八个顶点在世界坐标系中的坐标,再将其映射到图像平面,最后进行连线即可。
目标六自由度信息:目标三维位置、尺寸以及旋转角
1)顶点坐标计算
在计算顶点坐标前,我们需要明确我们坐标系的定义。一般涉及三个坐标系:Object coordinate system
, World coordinate system
(Reference cooridnate system), and Camera coordinate system
.
Object coordinate system: 以目标中心为原点,X轴是length, Y轴是width, Z轴是height。
World coordinate system: 从左目相机指向右目相机为X轴方向,左目相机光轴方向为Z轴,根绝右手定则,从Z轴顺时针握向X轴,大拇指方向即为Y轴。
Camera coordiante system: 与世界坐标系重合,不做赘述。
坐标系关系如上图所示。
目标顶点在目标坐标系下的定义:
V 1 = ( − l / 2 , w / 2 , h / 2 ) V_1=(-l/2, w/2, h/2) V1=(−l/2,w/2,h/2)
V 2 = ( − l / 2 , − w / 2 , h / 2 ) V_2=(-l/2, -w/2, h/2) V2=(−l/2,−w/2,h/2)
V 3 = ( l / 2 , w / 2 , h / 2 ) V_3=(l/2, w/2, h/2) V3=(l/2,w/2,h/2)
V 4 = ( l / 2 , − w / 2 , h / 2 ) V_4=(l/2, -w/2, h/2) V4=(l/2,−w/2,h/2)
V 5 = ( − l / 2 , w / 2 , − h / 2 ) V_5=(-l/2, w/2, -h/2) V5=(−l/2,w/2,−h/2)
V 6 = ( − l / 2 , − w / 2 , − h / 2 ) V_6=(-l/2, -w/2, -h/2) V6=(−l/2,−w/2,−h/2)
V 7 = ( l / 2 , w / 2 , − h / 2 ) V_7=(l/2, w/2, -h/2) V7=(l/2,w/2,−h/2)
V 8 = ( l / 2 , − w / 2 , − h / 2 ) V_8=(l/2, -w/2, -h/2) V8=(l/2,−w/2,−h/2)
此外,沿世界坐标系X轴旋转角为
α
\alpha
α,Y轴旋转角为
β
\beta
β,Z轴旋转角为
γ
\gamma
γ。sim.simxGetObjectOrientation()
接口返回值为[
α
\alpha
α,
β
\beta
β,
γ
\gamma
γ]。实际应用中,我们需要将其转化为旋转矩阵形式。因此,我们附上欧拉角向量转为旋转矩阵的代码:
def euler_vector_2_rotation_matrix(euler_vector):
# euler vector format: [alpha, beta, gamma]
alpha, beta, gamma = euler_vector
r_x = np.array([[1, 0, 0],
[0, np.cos(alpha), -np.sin(alpha)],
[0, np.sin(alpha), np.cos(alpha)]])
r_y = np.array([[np.cos(beta), 0, np.sin(beta)],
[0, 1, 0],
[-np.sin(beta), 0, np.cos(beta)]])
r_z = np.array([[np.cos(gamma), -np.sin(gamma), 0],
[np.sin(gamma), np.cos(gamma), 0],
[0, 0, 1]])
rotation_matrix = np.matmul(r_x, np.matmul(r_y, r_z))
return rotation_matrix
最后,附上计算三维包围框顶点(参考坐标系)的代码:
# get the vertices of 3D bounding box of object
def bbox3d_vertices_calc(self, object_pos, object_size, euler_angle):
l, w, h = object_size
v1_obj = np.array([[-l / 2], [w / 2], [h / 2]])
v2_obj = np.array([[-l / 2], [-w / 2], [h / 2]])
v3_obj = np.array([[l / 2], [w / 2], [h / 2]])
v4_obj = np.array([[l / 2], [-w / 2], [h / 2]])
v5_obj = np.array([[-l / 2], [w / 2], [-h / 2]])
v6_obj = np.array([[-l / 2], [-w / 2], [-h / 2]])
v7_obj = np.array([[l / 2], [w / 2], [-h / 2]])
v8_obj = np.array([[l / 2], [-w / 2], [-h / 2]])
#bbox_3d_obj = np.array([[v1_obj, v2_obj, v3_obj, v4_obj, v5_obj, v6_obj, v7_obj, v8_obj]]).reshape(8, 3).astype(np.float32)
# print("bbox_3d_obj: \n{}".format(bbox_3d_obj))
object2world_matrix = self.euler_vector_2_rotation_matrix(euler_angle)
object_pos = np.array(object_pos).reshape(3, 1)
v1_3d = np.matmul(object2world_matrix, v1_obj) + object_pos
v2_3d = np.matmul(object2world_matrix, v2_obj) + object_pos
v3_3d = np.matmul(object2world_matrix, v3_obj) + object_pos
v4_3d = np.matmul(object2world_matrix, v4_obj) + object_pos
v5_3d = np.matmul(object2world_matrix, v5_obj) + object_pos
v6_3d = np.matmul(object2world_matrix, v6_obj) + object_pos
v7_3d = np.matmul(object2world_matrix, v7_obj) + object_pos
v8_3d = np.matmul(object2world_matrix, v8_obj) + object_pos
bbox_3d_vertices = np.array([[v1_3d, v2_3d, v3_3d, v4_3d, v5_3d, v6_3d, v7_3d, v8_3d]]).reshape(8, 3).astype(np.float32)
# print("The vertices of bbox:\n{}".format(bbox_3d_vertices))
return bbox_3d_vertices
2)顶点坐标转换
在得到三维包围框的八个顶点坐标之后,我们只需要进行简单的矩阵运算,就可以得到八个顶点在像素坐标系下的二维坐标。但是,需要注意的是,我们的相机外参矩阵
M
∈
R
3
×
4
M \in R^{3 \times 4}
M∈R3×4。所以,需要对之前计算的三维顶点坐标进行扩维,补一即可。
具体代码如下:
# map the vertices of bbo3d to image plane
def bbox3d_to_bbox2d(self, bbox_3d, left_camera_matrix, right_camera_matrix):
one = np.ones(8).reshape(8, 1)
bbox_3d = np.concatenate((bbox_3d, one), axis=1)
left_v1_2d = np.matmul(left_camera_matrix, bbox_3d[0])
left_v2_2d = np.matmul(left_camera_matrix, bbox_3d[1])
left_v3_2d = np.matmul(left_camera_matrix, bbox_3d[2])
left_v4_2d = np.matmul(left_camera_matrix, bbox_3d[3])
left_v5_2d = np.matmul(left_camera_matrix, bbox_3d[4])
left_v6_2d = np.matmul(left_camera_matrix, bbox_3d[5])
left_v7_2d = np.matmul(left_camera_matrix, bbox_3d[6])
left_v8_2d = np.matmul(left_camera_matrix, bbox_3d[7])
right_v1_2d = np.matmul(right_camera_matrix, bbox_3d[0])
right_v2_2d = np.matmul(right_camera_matrix, bbox_3d[1])
right_v3_2d = np.matmul(right_camera_matrix, bbox_3d[2])
right_v4_2d = np.matmul(right_camera_matrix, bbox_3d[3])
right_v5_2d = np.matmul(right_camera_matrix, bbox_3d[4])
right_v6_2d = np.matmul(right_camera_matrix, bbox_3d[5])
right_v7_2d = np.matmul(right_camera_matrix, bbox_3d[6])
right_v8_2d = np.matmul(right_camera_matrix, bbox_3d[7])
l_v1_2d = left_v1_2d[:2] / left_v1_2d[2]
l_v2_2d = left_v2_2d[:2] / left_v2_2d[2]
l_v3_2d = left_v3_2d[:2] / left_v3_2d[2]
l_v4_2d = left_v4_2d[:2] / left_v4_2d[2]
l_v5_2d = left_v5_2d[:2] / left_v5_2d[2]
l_v6_2d = left_v6_2d[:2] / left_v6_2d[2]
l_v7_2d = left_v7_2d[:2] / left_v7_2d[2]
l_v8_2d = left_v8_2d[:2] / left_v8_2d[2]
r_v1_2d = right_v1_2d[:2] / right_v1_2d[2]
r_v2_2d = right_v2_2d[:2] / right_v2_2d[2]
r_v3_2d = right_v3_2d[:2] / right_v3_2d[2]
r_v4_2d = right_v4_2d[:2] / right_v4_2d[2]
r_v5_2d = right_v5_2d[:2] / right_v5_2d[2]
r_v6_2d = right_v6_2d[:2] / right_v6_2d[2]
r_v7_2d = right_v7_2d[:2] / right_v7_2d[2]
r_v8_2d = right_v8_2d[:2] / right_v8_2d[2]
left_bbox_2d = np.array([[l_v1_2d], [l_v2_2d], [l_v3_2d], [l_v4_2d],
[l_v5_2d], [l_v6_2d], [l_v7_2d], [l_v8_2d]]).reshape((8, 2)).astype(np.int64)
right_bbox_2d = np.array([[r_v1_2d], [r_v2_2d], [r_v3_2d], [r_v4_2d],
[r_v5_2d], [r_v6_2d], [r_v7_2d], [r_v8_2d]]).reshape((8, 2)).astype(np.int64)
# print("The projected vertices of bbox in left image:\n{}".format(left_bbox_2d))
# print("The projected vertices of bbox in right image:\n{}".format(right_bbox_2d))
return left_bbox_2d, right_bbox_2d
3)包围框绘制
最后就是根据之前给出的几何关系,连接三维包围框顶点在二维图像平面中的各个点。
代码如下:
# drawing the bbox3d in image plane
def bbox3d_display(self, image, bbox3d_vertices, color, thickness):
cv2.putText(image, 'V1', tuple(bbox3d_vertices[0]), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
cv2.putText(image, 'V2', tuple(bbox3d_vertices[1]), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
cv2.putText(image, 'V3', tuple(bbox3d_vertices[2]), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
cv2.putText(image, 'V4', tuple(bbox3d_vertices[3]), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
cv2.line(image, tuple(bbox3d_vertices[0]), tuple(bbox3d_vertices[1]), color=color, thickness=thickness)
cv2.line(image, tuple(bbox3d_vertices[0]), tuple(bbox3d_vertices[2]), color=color, thickness=thickness)
cv2.line(image, tuple(bbox3d_vertices[0]), tuple(bbox3d_vertices[4]), color=color, thickness=thickness)
cv2.line(image, tuple(bbox3d_vertices[1]), tuple(bbox3d_vertices[3]), color=color, thickness=thickness)
cv2.line(image, tuple(bbox3d_vertices[1]), tuple(bbox3d_vertices[5]), color=color, thickness=thickness)
cv2.line(image, tuple(bbox3d_vertices[2]), tuple(bbox3d_vertices[3]), color=color, thickness=thickness)
cv2.line(image, tuple(bbox3d_vertices[2]), tuple(bbox3d_vertices[6]), color=color, thickness=thickness)
cv2.line(image, tuple(bbox3d_vertices[3]), tuple(bbox3d_vertices[7]), color=color, thickness=thickness)
cv2.line(image, tuple(bbox3d_vertices[4]), tuple(bbox3d_vertices[5]), color=color, thickness=thickness)
cv2.line(image, tuple(bbox3d_vertices[4]), tuple(bbox3d_vertices[6]), color=color, thickness=thickness)
cv2.line(image, tuple(bbox3d_vertices[5]), tuple(bbox3d_vertices[7]), color=color, thickness=thickness)
cv2.line(image, tuple(bbox3d_vertices[6]), tuple(bbox3d_vertices[7]), color=color, thickness=thickness)
return image
最终六自由度信息在双目图像上的可视化结果如下图所示:
6. 发现六自由度运动目标可视化总出问题,尤其是当Eular angle角度跨越90,-90°时?
我发现当目标旋转角跨越90°或-90°时,目标包围框的可视化结果会发生跳变。同时,有一些情况,目标包围框的可视化结果与实际情况不太相符。
最开始,我认为是多线程的问题导致的(数据读取线程和控制线程是异步的),亦或是双目图像+深度图像读取时延导致的(图像数据和目标信息的时间不一致)。
但经过排查,并不是多线程或者数据读取延时导致的,而是由于cv2.Rodrigues()
函数导致的。我以为它能够将Eular angle vector
直接转化成Rotation Matrix
。但实际上它存在一些问题(我没有进一步探究其原因),所以导致了大部分时候包围框是准的,只有各别情况包围框不对。之后,我重写了欧拉角向量转换函数euler_vector_2_rotation_matrix()
,问题成功解决了。
def euler_vector_2_rotation_matrix(euler_vector):
# euler vector format: [alpha, beta, gamma]
alpha, beta, gamma = euler_vector
r_x = np.array([[1, 0, 0],
[0, np.cos(alpha), -np.sin(alpha)],
[0, np.sin(alpha), np.cos(alpha)]])
r_y = np.array([[np.cos(beta), 0, np.sin(beta)],
[0, 1, 0],
[-np.sin(beta), 0, np.cos(beta)]])
r_z = np.array([[np.cos(gamma), -np.sin(gamma), 0],
[np.sin(gamma), np.cos(gamma), 0],
[0, 0, 1]])
rotation_matrix = np.matmul(r_x, np.matmul(r_y, r_z))
return rotation_matrix
7. 如何获得目标二维包围框(left bbox2d)?能否由bbox3d转成left and right bbox2d?
由于我要借助CoppeliaSim仿真平台构造通用空间非合作目标六自由度跟踪数据集,那么目标二维包围框(bbox2d)是必须的。由上面几个问题可以知道,通过平台Python API获取目标六自由度信息(bbox3d)是比较容易的。但能否由bbox3d得到bbox2d呢?
我们首先想到的是最小外接包围框的方法,即获得bbox3d的八个顶点在图像坐标系中的边界值。如下图所示:
可以明显的看到,left bbox2d 和 right bbox2d 的包围效果非常不好,包含了较大的非目标区域。所以,我们借助先验知识,融合深度信息,利用分割算法,得到了非常精准的bbox2d。
未解决问题
1. 如何实现背景动态变化?
2. 当将目标运动控制和目标图像采集,6自由度信息获取在不同线程中分开实现,会出现线程阻塞情况(图像无法获取、无法显示)?
3. 六自由度跟踪如何进行评估?能否沿用average overlap measurement?
4. 目前绝大部分算法做不到6DOF Tracking,因此能否将6DOF annotations转化为3DOF, 4DOF, 5DOF format,以进行低维评估?
我们通过最小外接三维包围框实现了6DOF向3DOF(X, Y, Z, W, H, Z)的转换。同时3DOF的评估完全可以沿用average overlap指标,因此,我们称其为三维目标跟踪。