端到端自动驾驶模型SparseDrive部署过程

置顶奔跑的花短裤

已于 2024-10-29 16:49:03 修改

阅读量2.5k

点赞数 8

分类专栏：端到端自动驾驶学习文章标签：自动驾驶人工智能机器学习端到端自动驾驶 SparseDrive 1024程序员节

于 2024-10-18 10:03:27 首次发布

本文链接：https://blog.csdn.net/li1886477130/article/details/143033918

版权

端到端自动驾驶学习专栏收录该内容

10 篇文章

订阅专栏

SparseDrive
论文链接
https://arxiv.org/pdf/2405.19620
仓库链接
https://github.com/swc-17/SparseDrive

在这里插入图片描述

论文和模型的相关介绍大家可以参考其他博客的介绍，这里只介绍模型部署的过程和中间可能遇到的问题解决办法，以及代码解析和使用记录。

模型部署

项目自带有# Quick Start文档，可以参考。
下载的几步可以同步执行。
这里也介绍一下：

1、conda验证或安装

需要使用conda构建虚拟环境

nvcc -V

指令查看conda版本
无conda查看conda安装文档

2、设置新的虚拟环境

初次使用进行create，后续只需进行activate即可。

conda create -n sparsedrive python=3.8 -y
conda activate sparsedrive

3、安装依赖包### Install dependency packpages

是整个过程最复杂的一步了吧，消耗时间较长
cd 至sparsedrive的文件目录下

sparsedrive_path="path/to/sparsedrive"
cd ${sparsedrive_path}
pip3 install --upgrade pip
pip3 install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu116
pip3 install -r requirement.txt

编译安装后显示：
在这里插入图片描述

很容易下载出错，也可以到下载链接出手动下载。https://download.pytorch.org/whl/cu116
按照对应的版本下载就行。我下载的是torch-1.13.0+cu116-cp38-cp38-linux_x86_64.whl等。
下载完成后进行离线安装
打开命令行，使用如下指令进入需要安装pytorch的环境中：

conda activate xxx ##xx代表需要安装的具体环境名称

进入对应环境后，输入下面的指令安装torch，torchvision和torchaudio。

pip install torch-2.0.0+cu117-cp39-cp39-linux_x86_64.whl
……
 ##安装所有下载的文件，注意使用文件的绝对路径

验证是否安装成功
通过在命令行中输入以下指令验证pytorch是否安装成功

python
>>>import torch
>>>torch.cuda.is_available()
True

当显示True表示torch安装成功。

pip3 install -r requirement.txt

执行这条指令时可能会报错

error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [12 lines of output]
      fatal: 不是 git 仓库（或者任何父目录）：.git
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-az4fmfe4/flash-attn_4cf3f6b2d7834a539c7624728fa4b02f/setup.py", line 117, in <module>
          raise RuntimeError(
      RuntimeError: FlashAttention is only supported on CUDA 11.6 and above.  Note: make sure nvcc has a supported version by running nvcc -V.
      
      
      torch.__version__  = 1.13.0+cu116

原因是cuda版本过低，升级至要求版本即可。

4、编译可变形聚合操作### Compile the deformable_aggregation CUDA op

前置依赖torch，安装好后可以直接进行编译操作

cd projects/mmdet3d_plugin/ops
python3 setup.py develop
cd ../../../

编译成功后显示：
在这里插入图片描述

5、准备数据集### Prepare the data

数据集下载链接
在下载页面下载数据集和CAN bus expansion（操作发现还需要下载Map expansion）
根据自己的需要下载mini或者all数据集。（点击后面荧光色US下载）
下载完成后移动至对应文件夹

cd ${sparsedrive_path}
mkdir data
ln -s path/to/nuscenes ./data/nuscenes

可以将数据集放到工程外，使用绝对路径进行链接。也可以直接将数据集放到工程data/nuscenes内；

data的目录为
data
├── can_bus.zip
├── infos
│ └── mini
├── kmeans
│ ├── kmeans_det_900.npy
│ ├── kmeans_map_100.npy
│ └── kmeans_motion_6.npy
├── nuscenes
│ ├── can_bus
│ ├── LICENSE
│ ├── maps
│ ├── samples
│ ├── sweeps
│ └── v1.0-mini
├── nuScenes-map-expansion-v1.3.zip
└── v1.0-mini.tgz

打包数据集的元信息和标签，并将所需的pkl文件生成到data/infos。
我们还在data_converter中生成map_annos，默认roi_size为（30,60）。
建议初始按照默认进行执行（如果你想要一个不同的范围，你可以在tools/data_converter/nuscenes_converter.py中修改roi_sze）。

sh scripts/create_data.sh

根据下载的数据集，修改create_data.sh内的脚本代码
数据集创建成功后显示：
在这里插入图片描述

6、通过K-means生成锚点### Generate anchors by K-means

对于稀疏感知模块很重要

Gnerated anchors are saved to data/kmeans and can be visualized in vis/kmeans.

sh scripts/kmeans.sh

直接运行脚本会失败，提示找不到目标文件，可以修改/tools/kmeans/内各文件导入pkl文件为绝对路径：
在这里插入图片描述
完成后会在data/kmeans内生成锚点数据；
直接运行会生成三个npy；kmeans_plan_6.npy无法生成；
参考网上教程修改kmeans_plan.py代码后可生成kmeans_plan_6.npy

clusters = []
clusters.append(np.zeros((6, 6, 2)))
for trajs in navi_trajs[1:]:
# for trajs in navi_trajs:
    trajs = np.concatenate(trajs, axis=0).reshape(-1, 12)
    cluster = KMeans(n_clusters=K).fit(trajs).cluster_centers_
    cluster = cluster.reshape(-1, 6, 2)
    clusters.append(cluster)
    for j in range(K):
        plt.scatter(cluster[j, :, 0], cluster[j, :,1])
plt.savefig(f'vis/kmeans/plan_{K}', bbox_inches='tight')
plt.close()

clusters = np.stack(clusters, axis=0)
np.save(f'data/kmeans/kmeans_plan_{K}.npy', clusters)

7、下载预训练权重### Download pre-trained weights

Download the required backbone pre-trained weights.

mkdir ckpt
wget https://download.pytorch.org/models/resnet50-19c8e357.pth -O ckpt/resnet50-19c8e357.pth

代码内修改为绝对路径；

至此，部署完成，可以开始进行训练和测试了；

8、开始训练和测试### Commence training and testing

根据数据集的类型进行注释其他的内容；
根据自己的显卡数量修改num_gpus为1，batch_size也降低；

# train
sh scripts/train.sh

# test
sh scripts/test.sh

执行训练脚本后，会开始加载模型

{'version': 'v1.0-mini'}
{'version': 'v1.0-mini'}
{'version': 'v1.0-mini'}
{'version': 'v1.0-mini'}
{'version': 'v1.0-mini'}
{'version': 'v1.0-mini'}
{'version': 'v1.0-mini'}
{'version': 'v1.0-mini'}
Use GroupInBatchSampler !!!
Use GroupInBatchSampler !!!
Use GroupInBatchSampler !!!
Use GroupInBatchSampler !!!
Use GroupInBatchSampler !!!
Use GroupInBatchSampler !!!
Use GroupInBatchSampler !!!
Use GroupInBatchSampler !!!

然后等待一段时间
！
！
！
就会收到报错：显存（GPU的内存）不足。

Last error:
Cuda failure 'out of memory'
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
RuntimeError: CUDA error: out of memory

自己的电脑配置太低了，没有办法。
接下来将会在云服务器上进行再次安装训练。