【自动驾驶环境感知项目】——基于Paddle3D的点云障碍物检测

yuan〇

已于 2022-12-23 20:43:15 修改

阅读量2.2k

点赞数 5

分类专栏：自动驾驶感知文章标签：自动驾驶 paddle 人工智能

于 2022-12-20 22:30:54 首次发布

本文链接：https://blog.csdn.net/sinat_52032317/article/details/128383733

版权

自动驾驶感知专栏收录该内容

16 篇文章

订阅专栏

本文档详述了如何使用Paddle3D进行自动驾驶中的点云障碍物检测，包括环境配置、数据准备、模型训练、评估、导出和部署。选用CenterPoint模型，对300帧的简化版KITTI数据集进行训练，经过模型优化和调整，展示了训练过程及预测结果，并提供了模型部署和预测的代码示例。

摘要由CSDN通过智能技术生成

1. 自动驾驶实战：基于Paddle3D的点云障碍物检测

在这里插入图片描述

项目地址——自动驾驶实战：基于Paddle3D的点云障碍物检测
课程地址——自动驾驶感知系统揭秘

1.1 环境信息

硬件信息
CPU: 2核
AI加速卡: v100
总显存: 16GB
总内存: 16 GB
总硬盘:100 GB
环境配置
Python:3.7.4
框架信息
框架版本:
PaddlePaddle 2.4.0（项目默认框架版本为2.3.2，但由于某些库更新了，原版本代码无法正常运行——2022.12.20）

1.2 准备点云数据

Paddle3D支持按照KITTI数据集格式构建自己的数据集，可参考文档自定义数据集格式说明进行准备。

为了能快速演示整个流程，本项目使用数量为300帧的KITTI小数据集。该小数据集是从从KITTI训练集中随机抽取了250帧点云、验证集中随机抽取50帧点云。完整的KITTI数据集可至官网下载。

解压小数据集：

!tar xvzf data/data165771/kitti300frame.tar.gz

在这里插入图片描述

1.3 安装Paddle3D

克隆Paddle3D源码，基于develop分支完成安装：

下载Paddle3D源码：

!git clone https://github.com/PaddlePaddle/Paddle3D

在这里插入图片描述
更新pip

!pip install --upgrade pip

进入Paddle3D所在路径：

cd /home/aistudio/Paddle3D

安装Paddle3D依赖项：

!python -m pip install -r requirements.txt

在这里插入图片描述
安装Paddle3D源码：

!python setup.py install

在这里插入图片描述

1.4 模型训练

CenterPoint以点云作为输入，基于关键点检测的方式回归物体的尺寸、方向和速度。面向物体尺寸多样不一的场景时其精度表现更高，简易的模型设计使其在性能上也表现更加高效。

目前Paddle3D对CenterPoint性能做了极致优化，因此本项目选择CenterPoint完成点云障碍物检测。

(1) 创建数据集软链

!mkdir datasets
!ln -s /home/aistudio/kitti300frame ./datasets
!mv ./datasets/kitti300frame ./datasets/KITTI

(2) 生成训练时数据增强所需的真值库

!python tools/create_det_gt_database.py --dataset_name kitti --dataset_root ./datasets/KITTI --save_dir ./datasets/KITTI

在这里插入图片描述
(3) 修改配置文件
Paddle3D中CenterPoint提供的KITTI baseline是基于8卡32G V100训练，此处只有1张16G V100显卡，所以需要将学习率和批大小修改成针对本地单卡的。注意：要修改两个文件。

!cp configs/centerpoint/centerpoint_pillars_016voxel_kitti.yml configs/centerpoint/centerpoint_pillars_016voxel_minikitti.yml
# 将batch_size从4减少至2
# 将base_learning_rate从0.001减小至0.0000625 （减小16倍）
# 将epochs减小至20

（4）启动训练

通过指定--model https://bj.bcebos.com/paddle3d/models/centerpoint//centerpoint_pillars_016voxel_kitti/model.pdparams基于预训练模型进行Fintune:
训练时长约3小时（epoch = 160）。

!python tools/train.py --config configs/centerpoint/centerpoint_pillars_016voxel_minikitti.yml --save_dir ./output_kitti --num_workers 3 --save_interval 5 --model https://bj.bcebos.com/paddle3d/models/centerpoint//centerpoint_pillars_016voxel_kitti/model.pdparams

在这里插入图片描述

1.5 模型评估

模型训练完成后，可以评估模型的精度：

!python tools/evaluate.py --config configs/centerpoint/centerpoint_pillars_016voxel_minikitti.yml --model https://bj.bcebos.com/paddle3d/models/centerpoint//centerpoint_pillars_016voxel_kitti/model.pdparams --batch_size 1 --num_workers 3

在这里插入图片描述可以看到，模型对于车的识别结果较好，对于行人、自行车的识别效果较差。

1.6 模型导出

epoch = 160

!python tools/export.py --config configs/centerpoint/centerpoint_pillars_016voxel_minikitti.yml --model ./output_kitti/epoch_160/model.pdparams --save_dir ./output_kitti_inference

epoch = 20

!python tools/export.py --config configs/centerpoint/centerpoint_pillars_016voxel_minikitti.yml --model ./output_kitti/epoch_20/model.pdparams --save_dir ./output_kitti_inference

在这里插入图片描述

1.7 模型部署

CenterPoint支持使用C++和Python语言部署，C++部署方式可以参考Paddle3D CenterPoint C++部署文档。本项目采用基于Paddle Inference推理引擎，使用Python语言进行部署。

进入python部署代码所在目录：

cd deploy/centerpoint/python

指定模型文件所在路径、待预测点云文件所在路径，执行预测：

!python infer.py --model_file /home/aistudio/Paddle3D/output_kitti_inference/centerpoint.pdmodel --params_file /home/aistudio/Paddle3D/output_kitti_inference/centerpoint.pdiparams --lidar_file /home/aistudio/Paddle3D/datasets/KITTI/training/velodyne/000104.bin --num_point_dim 4

预测数据，保存为pred.txt文件

Score: 0.8801702857017517 Label: 0 Box(x_c, y_c, z_c, w, l, h, -rot): 15.949016571044922 3.401707649230957 -0.838792085647583 1.6645100116729736 4.323172092437744 1.5860841274261475 1.9184210300445557
Score: 0.8430783748626709 Label: 0 Box(x_c, y_c, z_c, w, l, h, -rot): 4.356827735900879 7.624661922454834 -0.7098685503005981 1.625045657157898 3.905561685562134 1.6247363090515137 1.9725804328918457
Score: 0.8185914158821106 Label: 0 Box(x_c, y_c, z_c, w, l, h, -rot): 37.213130950927734 -4.919034481048584 -1.0293656587600708 1.6997096538543701 3.982091188430786 1.504879355430603 1.9469859600067139
Score: 0.7745243906974792 Label: 0 Box(x_c, y_c, z_c, w, l, h, -rot): 23.25118637084961 -7.758514404296875 -1.1013673543930054 1.665223240852356 4.344735145568848 1.5877608060836792 -1.17690110206604
Score: 0.7038797736167908 Label: 0 Box(x_c, y_c, z_c, w, l, h, -rot): 28.963340759277344 -1.4896349906921387 -0.902091383934021 1.7479376792907715 4.617205619812012 1.563733458518982 1.8184967041015625
Score: 0.17345771193504333 Label: 0 Box(x_c, y_c, z_c, w, l, h, -rot): 43.669837951660156 -14.53625774383545 -0.8703321218490601 1.6278570890426636 3.9948198795318604 1.514023780822754 0.8482679128646851
Score: 0.13678330183029175 Label: 0 Box(x_c, y_c, z_c, w, l, h, -rot): -0.004944105166941881 -0.15849240124225616 -1.0278022289276123 1.6501120328903198 3.8041915893554688 1.5048600435256958 -1.235048770904541
Score: 0.1301172822713852 Label: 0 Box(x_c, y_c, z_c, w, l, h, -rot): 57.76295471191406 3.359720468521118 -0.6944454908370972 1.5720959901809692 3.5915873050689697 1.4879614114761353 0.9269833564758301
Score: 0.11003818362951279 Label: 0 Box(x_c, y_c, z_c, w, l, h, -rot): 31.166778564453125 4.625081539154053 -1.1666207313537598 1.5735297203063965 3.7136919498443604 1.4804550409317017 1.9523953199386597
Score: 0.1025351956486702 Label: 0 Box(x_c, y_c, z_c, w, l, h, -rot): 25.748493194580078 17.13875389099121 -1.334594964981079 1.497322916984558 3.33937406539917 1.375497579574585 1.2643632888793945
Score: 0.10173971205949783 Label: 0 Box(x_c, y_c, z_c, w, l, h, -rot): 40.83097457885742 -14.601551055908203 -0.8294260501861572 1.5711668729782104 4.121697902679443 1.4682270288467407 1.7342313528060913
Score: 0.10121338814496994 Label: 0 Box(x_c, y_c, z_c, w, l, h, -rot): 58.73474884033203 -13.881253242492676 -0.9767248630523682 1.7061117887496948 4.040306091308594 1.5399994850158691 2.1176371574401855
Score: 0.14155955612659454 Label: 1 Box(x_c, y_c, z_c, w, l, h, -rot): 25.353525161743164 -15.547738075256348 -0.19825123250484467 0.6376267075538635 1.7945102453231812 1.7785474061965942 2.360872507095337
Score: 0.12169421464204788 Label: 1 Box(x_c, y_c, z_c, w, l, h, -rot): 29.0213565826416 18.047924041748047 -0.8930382132530212 0.39424917101860046 1.6417008638381958 1.6697943210601807 -1.8017770051956177

对预测结果进行可视化，更为直观地展示预测效果。我们将预测结果保存至文件pred.txt中，便于可视化脚本加载。通过指定–draw_threshold可以过滤掉低分预测框：

!python /home/aistudio/show_lidar_pred_on_image.py --calib_file /home/aistudio/Paddle3D/datasets/KITTI/training/calib/000104.txt --image_file /home/aistudio/Paddle3D/datasets/KITTI/training/image_2/000104.png --label_file /home/aistudio/Paddle3D/datasets/KITTI/training/label_2/000104.txt --pred_file /home/aistudio/pred.txt --save_dir ./ --draw_threshold 0.16

效果

图片保存的路径
/home/aistudio/Paddle3D/deploy/centerpoint/python/000104.png 在这里插入图片描述

三维点云的投影

附录

show_lidar_pred_on_image.py

import argparse
import os
import os.path as osp

import cv2
import numpy as np

from paddle3d.datasets.kitti.kitti_utils import box_lidar_to_camera
from paddle3d.geometries import BBoxes3D, CoordMode
from paddle3d.sample import Sample

classmap = {0: 'Car', 1: 'Cyclist', 2: 'Pedestrain'}


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--calib_file', dest='calib_file', help='calibration file', type=str)
    parser.add_argument(
        '--image_file', dest='image_file', help='image file', type=str)
    parser.add_argument(
        '--label_file', dest='label_file', help='label file', type=str)
    parser.add_argument(
        '--pred_file',
        dest='pred_file',
        help='prediction results file',
        type=str)
    parser.add_argument(
        '--save_dir',
        dest='save_dir',
        help='the path to save visualized result',
        type=str)
    parser.add_argument(
        '--draw_threshold',
        dest='draw_threshold',
        help=
        'prediction whose confidence is lower than threshold would not been shown',
        type=float)
    return parser.parse_args()


class Calib:
    def __init__(self, dict_calib):
        super(Calib, self).__init__()
        self.P0 = dict_calib['P0'].reshape(3, 4)
        self.P1 = dict_calib['P1'].reshape(3, 4)
        self.P2 = dict_calib['P2'].reshape(3, 4)
        self.P3 = dict_calib['P3'].reshape(3, 4)
        self.R0_rect = dict_calib['R0_rect'].reshape(3, 3)
        self.Tr_velo_to_cam = dict_calib['Tr_velo_to_cam'].reshape(3, 4)
        self.Tr_imu_to_velo = dict_calib['Tr_imu_to_velo'].reshape(3, 4)


class Object3d:
    def __init__(self, content):
        super(Object3d, self).__init__()
        lines = content.split()
        lines = list(filter(lambda x: len(x), lines))
        self.name, self.truncated, self.occluded, self.alpha = lines[0], float(
            lines[1]), float(lines[2]), float(lines[3])
        self.bbox = [lines[4], lines[5], lines[6], lines[7]]
        self.bbox = np.array([float(x) for x in self.bbox])
        self.dimensions = [lines[8], lines[9], lines[10]]
        self.dimensions = np.array([float(x) for x in self.dimensions])
        self.location = [lines[11], lines[12], lines[13]]
        self.location = np.array([float(x) for x in self.location])
        self.rotation_y = float(lines[14])
        if len(lines) == 16:
            self.score = float(lines[15])


def rot_y(rotation_y):
    cos = np.cos(rotation_y)
    sin = np.sin(rotation_y)
    R = np.array([[cos, 0, sin], [0, 1, 0], [-sin, 0, cos]])
    return R


def parse_gt_info(calib_path, label_path):

    with open(calib_path) as f:
        lines = f.readlines()
    lines = list(filter(lambda x: len(x) and x != '\n', lines))
    dict_calib = {}
    for line in lines:
        key, value = line.split(":")
        dict_calib[key] = np.array([float(x) for x in value.split()])
    calib = Calib(dict_calib)

    with open(label_path, 'r') as f:
        lines = f.readlines()
        lines = list(filter(lambda x: len(x) and x != '\n', lines))
    obj = [Object3d(x) for x in lines]
    return calib, obj


def predictions_to_kitti_format(pred):
    num_boxes = pred.bboxes_3d.shape[0]
    names = np.array([classmap[label] for label in pred.labels])
    calibs = pred.calibs
    if pred.bboxes_3d.coordmode != CoordMode.KittiCamera:
        bboxes_3d = box_lidar_to_camera(pred.bboxes_3d, calibs)
    else:
        bboxes_3d = pred.bboxes_3d

    if bboxes_3d.origin != [.5, 1., .5]:
        bboxes_3d[:, :3] += bboxes_3d[:, 3:6] * (
            np.array([.5, 1., .5]) - np.array(bboxes_3d.origin))
        bboxes_3d.origin = [.5, 1., .5]

    loc = bboxes_3d[:, :3]
    dim = bboxes_3d[:, 3:6]

    contents = []
    for i in range(num_boxes):
        # In kitti records, dimensions order is hwl format
        content = "{} 0 0 0 0 0 0 0 {} {} {} {} {} {} {} {}".format(
            names[i], dim[i, 2], dim[i, 1], dim[i, 0], loc[i, 0], loc[i, 1],
            loc[i, 2], bboxes_3d[i, 6], pred.confidences[i])
        contents.append(content)

    obj = [Object3d(x) for x in contents]
    return obj


def parse_pred_info(pred_path, calib):
    with open(pred_path, 'r') as f:
        lines = f.readlines()
        lines = list(filter(lambda x: len(x) and x != '\n', lines))

    scores = []
    labels = []
    boxes_3d = []
    for res in lines:
        score = float(res.split("Score: ")[-1].split(" ")[0])
        label = int(res.split("Label: ")[-1].split(" ")[0])
        box_3d = res.split("Box(x_c, y_c, z_c, w, l, h, -rot): ")[-1].split(" ")
        box_3d = [float(b) for b in box_3d]
        scores.append(score)
        labels.append(label)
        boxes_3d.append(box_3d)
    scores = np.stack(scores)
    labels = np.stack(labels)
    boxes_3d = np.stack(boxes_3d)
    data = Sample(pred_path, 'lidar')
    data.bboxes_3d = BBoxes3D(boxes_3d)
    data.bboxes_3d.coordmode = 'Lidar'
    data.bboxes_3d.origin = [0.5, 0.5, 0.5]
    data.bboxes_3d.rot_axis = 2
    data.labels = labels
    data.confidences = scores
    data.calibs = calib

    return data


def visualize(image_path, calib, obj, title, draw_threshold=None):
    img = cv2.imread(image_path)
    for i in range(len(obj)):
        if obj[i].name in ['Car', 'Pedestrian', 'Cyclist']:
            if draw_threshold is not None and hasattr(obj[i], 'score'):
                if obj[i].score < draw_threshold:
                    continue
            R = rot_y(obj[i].rotation_y)
            h, w, l = obj[i].dimensions[0], obj[i].dimensions[1], obj[
                i].dimensions[2]
            x = [l / 2, l / 2, -l / 2, -l / 2, l / 2, l / 2, -l / 2, -l / 2]
            y = [0, 0, 0, 0, -h, -h, -h, -h]
            z = [w / 2, -w / 2, -w / 2, w / 2, w / 2, -w / 2, -w / 2, w / 2]
            corner_3d = np.vstack([x, y, z])
            corner_3d = np.dot(R, corner_3d)

            corner_3d[0, :] += obj[i].location[0]
            corner_3d[1, :] += obj[i].location[1]
            corner_3d[2, :] += obj[i].location[2]

            corner_3d = np.vstack((corner_3d, np.zeros((1,
                                                        corner_3d.shape[-1]))))
            corner_2d = np.dot(calib.P2, corner_3d)
            corner_2d[0, :] /= corner_2d[2, :]
            corner_2d[1, :] /= corner_2d[2, :]

            if obj[i].name == 'Car':
                color = [20, 20, 255]
            elif obj[i].name == 'Pedestrian':
                color = [0, 255, 255]
            else:
                color = [255, 0, 255]

            thickness = 1
            for corner_i in range(0, 4):
                ii, ij = corner_i, (corner_i + 1) % 4
                corner_2d = corner_2d.astype('int32')
                cv2.line(img, (corner_2d[0, ii], corner_2d[1, ii]),
                         (corner_2d[0, ij], corner_2d[1, ij]), color, thickness)
                ii, ij = corner_i + 4, (corner_i + 1) % 4 + 4
                cv2.line(img, (corner_2d[0, ii], corner_2d[1, ii]),
                         (corner_2d[0, ij], corner_2d[1, ij]), color, thickness)
                ii, ij = corner_i, corner_i + 4
                cv2.line(img, (corner_2d[0, ii], corner_2d[1, ii]),
                         (corner_2d[0, ij], corner_2d[1, ij]), color, thickness)
            box_text = obj[i].name
            if hasattr(obj[i], 'score'):
                box_text += ': {:.2}'.format(obj[i].score)
            cv2.putText(img, box_text,
                        (min(corner_2d[0, :]), min(corner_2d[1, :]) - 2),
                        cv2.FONT_HERSHEY_COMPLEX_SMALL, 0.5, color, 1)
    cv2.putText(img, title, (int(img.shape[1] / 2), 20),
                cv2.FONT_HERSHEY_SIMPLEX, 0.75, (255, 100, 0), 2)

    return img


def main(args):
    calib, gt_obj = parse_gt_info(args.calib_file, args.label_file)
    gt_image = visualize(args.image_file, calib, gt_obj, title='GroundTruth')
    pred = parse_pred_info(args.pred_file, [
        calib.P0, calib.P1, calib.P2, calib.P3, calib.R0_rect,
        calib.Tr_velo_to_cam, calib.Tr_imu_to_velo
    ])
    preds = predictions_to_kitti_format(pred)
    pred_image = visualize(
        args.image_file,
        calib,
        preds,
        title='Prediction',
        draw_threshold=args.draw_threshold)
    show_image = np.vstack([gt_image, pred_image])
    cv2.imwrite(
        osp.join(args.save_dir,
                 osp.split(args.image_file)[-1]), show_image)


if __name__ == '__main__':
    args = parse_args()
    main(args)