参考的代码为:GitHub - ruhyadi/YOLO3D: YOLO 3D Object Detection for Autonomous Driving Vehicle
参考的数据集格式介绍为:KITTI数据集下载及解析_kitti标注 dontcare-CSDN博客
script.Dataset.Dataset.format_label函数的194行开始,反序列化了标注数据:
label = { 'Class': Class, 'Box_2D': Box_2D, 'Dimensions': Dimension, 'Alpha': Alpha, 'Orientation': Orientation, 'Confidence': Confidence } return label
参考txt的label文件如下:
代码和数据格式对照以下可知:
- class即line[0]。截断数据line[1]未使用。遮挡数据line[2]未使用。
- 2d边界框大小box_2d即line[4-7]。
- 3d物体尺寸dimension即line[8-10]。
- alpha即line[3]。
- 3d物体的位置location没有使用。空间方向ry并没有用到。confidence指定为1。
训练代码的train.train()函数中,只用到 'Dimensions': Dimension, 'Alpha': Alpha, 'Orientation': Orientation, 'Confidence': Confidence进行loss回归。
with tqdm(data_gen, unit='batch') as tepoch: for local_batch, local_labels in tepoch: # progress bar tepoch.set_description(f'Epoch {epoch}') # ground-truth truth_orient = local_labels['Orientation'].float().cuda() truth_conf = local_labels['Confidence'].float().cuda() truth_dim = local_labels['Dimensions'].float().cuda() # convert to cuda local_batch = local_batch.float().cuda() # forward [orient, conf, dim] = model(local_batch) # loss orient_loss = orient_loss_func(orient, truth_orient, truth_conf) dim_loss = dim_loss_func(dim, truth_dim) truth_conf = torch.max(truth_conf, dim=1)[1] conf_loss = conf_loss_func(conf, truth_conf)
结论:
yolo3d的训练过程,只对朝向,置信度,3d框大小进行了回归。而回归网络没有2d检测中类别和2d框的检测头。这在回归网络的前向传播函数(script.Model.ResNet18.forward())中也能看出:
def forward(self, x): x = self.model(x) x = x.view(-1, 512 * 7 * 7) orientation = self.orientation(x) orientation = orientation.view(-1, self.bins, 2) orientation = F.normalize(orientation, dim=2) confidence = self.confidence(x) dimension = self.dimension(x) return orientation, confidence, dimension