1,下载代码:https://github.com/hustvl/GKT
git clone https://github.com/hustvl/GKT
2,创建环境:
conda create -n GKT python=3.8 -y
3,安装依赖:
cd segmentation
pip install -r requirements.txt#作者这里写错了
python setup.py develop
此处我在运行最后一条时报错,后来发现是权限问题,需要将anaconda的权限给所有用户使用。
4,数据组织
尝试可以下载nuscenes-mini-v1.0,也可以下载所有trainval-1.0的关键帧
参见:
https://github.com/hustvl/GKT/blob/main/segmentation/docs/dataset_setup.md
下载作者的标签
连接https://www.cs.utexas.edu/~bzhou/cvt/cvt_labels_nuscenes.tar.gz
解压相关标签和nuscenes数据
tar -xvf /path/to/downloads/cvt_labels_nuscenes.tar.gz -C /media/datasets
mkdir /media/datasets/nuscenes/
# Untar all the keyframes and metadata
for f in $(ls /path/to/downloads/v1.0-*.tgz); do tar -xzf $f -C /media/datasets/nuscenes; done
# Map expansion must go into the maps folder
unzip /path/to/downloads/nuScenes-map-expansion-v1.3.zip -d /media/datasets/nuscenes/maps
构建完毕如下
/media/datasets/
├─ nuscenes/
│ ├─ v1.0-trainval/
│ ├─ v1.0-mini/
│ ├─ samples/
│ ├─ sweeps/
│ └─ maps/
│ ├─ basemap/
│ └─ expansion/
└─ cvt_labels_nuscenes/
├─ scene-0001/
├─ scene-0001.json
├─ ...
├─ scene-1000/
└─ scene-1000.json
5,标签生成
安装依赖库
pip install nuscenes-devkit==1.1.7
生成标签
全数据集合,原作者写错了一个函数,generate_data.py
# 可视化
python3 scripts/generate_data.py
data=nuscenes \
data.version=v1.0-trainval \
data.dataset_dir=/media/datasets/nuscenes \
data.labels_dir=/media/datasets/cvt_labels_nuscenes \
visualization=nuscenes_viz
# 不可视化
python3 scripts/generate_data.py \
data=nuscenes \
data.version=v1.0-trainval \
data.dataset_dir=/media/datasets/nuscenes \
data.labels_dir=/media/datasets/cvt_labels_nuscenes \
如果使用mini数据集那么就将里面的v11.0-trainval换成v1.0-mini,即可。
6,可以下载其预训练权重,谷歌网盘,需要梯子下载。
https://drive.google.com/file/d/1WyVwxykkh3jlSW8HiT3NKtJjISBsUaiq/view?usp=sharing
mkdir pretrained_models
cd pretrained_models
# 将预先训练权重放这里
7,训练,测试,评估
#训练
python scripts/train.py +experiment=gkt_nuscenes_vehicle_kernel_7x1.yaml data.dataset_dir=<path/to/nuScenes> data.labels_dir=<path/to/labels>
#测试
python scripts/eval.py +experiment=gkt_nuscenes_vehicle_kernel_7x1.yaml data.dataset_dir=<path/to/nuScenes> data.labels_dir=<path/to/labels> experiment.ckptt <path/to/checkpoint>
#评估
python scripts/speed.py +experiment=gkt_nuscenes_vehicle_kernel_7x1.yaml data.dataset_dir=<path/to/nuScenes> data.labels_dir=<path/to/labels>
8,训练:
下面是我的代码:
python scripts/train.py +experiment=gkt_nuscenes_vehicle_kernel_7x1.yaml data.dataset_dir=/home/wxq/GKT/media/datasets/nuscenes data.labels_dir=/home/wxq/GKT/media/datasets/cvt_labels_nuscenes
过程:
Global seed set to 2022
Loaded pretrained weights for efficientnet-b4
[2023-02-02 21:16:45,214][torch.distributed.nn.jit.instantiator][INFO] - Created a temporary directory at /tmp/tmp_l4_325q
[2023-02-02 21:16:45,214][torch.distributed.nn.jit.instantiator][INFO] - Writing /tmp/tmp_l4_325q/_remote_module_non_sriptable.py
[2023-02-02 21:16:45,352][__main__][INFO] - Searching /home/wxq/GKT/segmentation/logs.
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
`Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
Global seed set to 2022
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[2023-02-02 21:16:45,412][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0
[2023-02-02 21:16:45,412][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
---------------------------------------------------
0 | backbone | CrossViewTransformer | 1.2 M
1 | loss_func | MultipleLoss | 0
2 | metrics | MetricCollection | 0
---------------------------------------------------
1.2 M Trainable params
0 Non-trainable params
1.2 M Total params
4.701 Total estimated model params size (MB)
/home/wxq/GKT/segmentation/cross_view_transformer/tabular_logger.py:36: UserWarning: Experiment logs directory /home/wxq/GKT/segmentation/logs/lightning_logs/version_11 exists and is not empty. Previous log files in this directory will be deleted when the new ones are saved!
rank_zero_warn(
/home/wxq/anaconda3/envs/GKT/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:486: PossibleUserWarning: Your `val_dataloader`'s sampler has shuffling enabled, it is strongly recommended that you turn shuffling off for val/test/predict dataloaders.
rank_zero_warn(
[2023-02-02 21:16:49,767][cross_view_transformer.tabular_logger][INFO] - lr-AdamW:0.000400, step:0
/home/wxq/anaconda3/envs/GKT/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:2719: UserWarning: Using trainer.logger when Trainer is configured to use multiple loggers. This behavior will change in v1.8 when LoggerCollection is removed, and trainer.logger will return the first logger in trainer.loggers
rank_zero_warn(
/home/wxq/anaconda3/envs/GKT/lib/python3.8/site-packages/pytorch_lightning/utilities/warnings.py:44: LightningDeprecationWarning: pytorch_lightning.utilities.warnings.rank_zero_warn has been deprecated in v1.6 and will be removed in v1.8. Use the equivalent function from the pytorch_lightning.utilities.rank_zero module instead.
new_rank_zero_deprecation(
/home/wxq/anaconda3/envs/GKT/lib/python3.8/site-packages/pytorch_lightning/utilities/warnings.py:49: UserWarning: Invalid logger <pytorch_lightning.loggers.base.LoggerCollection object at 0x7f18907cccd0>
return new_rank_zero_warn(*args, **kwargs)
[2023-02-02 21:16:54,787][root][INFO] - Reducer buckets have been rebuilt in this iteration.
[2023-02-02 21:17:21,545][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.039249, train/loss/visible_step:0.032033, train/loss/center_step:0.072163, epoch:0, step:49
[2023-02-02 21:17:48,963][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.042063, train/loss/visible_step:0.038138, train/loss/center_step:0.039251, epoch:0, step:99
[2023-02-02 21:18:16,449][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.026050, train/loss/visible_step:0.023647, train/loss/center_step:0.024033, epoch:0, step:149
[2023-02-02 21:18:43,918][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.029235, train/loss/visible_step:0.027685, train/loss/center_step:0.015508, epoch:0, step:199
[2023-02-02 21:19:11,457][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.016289, train/loss/visible_step:0.015199, train/loss/center_step:0.010893, epoch:0, step:249
[2023-02-02 21:19:38,981][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.014259, train/loss/visible_step:0.013455, train/loss/center_step:0.008041, epoch:0, step:299
[2023-02-02 21:20:06,562][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.024729, train/loss/visible_step:0.024085, train/loss/center_step:0.006436, epoch:0, step:349
[2023-02-02 21:20:34,088][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.012128, train/loss/visible_step:0.011636, train/loss/center_step:0.004923, epoch:0, step:399
[2023-02-02 21:21:01,596][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.013290, train/loss/visible_step:0.012880, train/loss/center_step:0.004103, epoch:0, step:449
[2023-02-02 21:21:29,067][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.017175, train/loss/visible_step:0.016829, train/loss/center_step:0.003464, epoch:0, step:499
一直运行很多个epoch,不知为什么,中间报错了这个:
/home/wxq/anaconda3/envs/GKT/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:2719: UserWarning: Using trainer.logger when Trainer is configured to use multiple loggers. This behavior will change in v1.8 when LoggerCollection is removed, and trainer.logger will return the first logger in trainer.loggers
rank_zero_warn(
/home/wxq/anaconda3/envs/GKT/lib/python3.8/site-packages/pytorch_lightning/utilities/warnings.py:44: LightningDeprecationWarning: pytorch_lightning.utilities.warnings.rank_zero_warn has been deprecated in v1.6 and will be removed in v1.8. Use the equivalent function from the pytorch_lightning.utilities.rank_zero module instead.
new_rank_zero_deprecation(
/home/wxq/anaconda3/envs/GKT/lib/python3.8/site-packages/pytorch_lightning/utilities/warnings.py:49: UserWarning: Invalid logger <pytorch_lightning.loggers.base.LoggerCollection object at 0x7f1993a24f40>
return new_rank_zero_warn(*args, **kwargs)
[2023-02-03 02:27:02,722][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.007324, train/loss/visible_step:0.007275, train/loss/center_step:0.000486, epoch:4, step:29649
[2023-02-03 02:27:30,199][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.005525, train/loss/visible_step:0.005477, train/loss/center_step:0.000483, epoch:4, step:29699
[2023-02-03 02:27:57,726][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.008282, train/loss/visible_step:0.008228, train/loss/center_step:0.000543, epoch:4, step:29749
[2023-02-03 02:28:25,204][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.004319, train/loss/visible_step:0.004281, train/loss/center_step:0.000381, epoch:4, step:29799
[2023-02-03 02:28:52,707][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.003136, train/loss/visible_step:0.003106, train/loss/center_step:0.000293, epoch:4, step:29849
[2023-02-03 02:29:20,174][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.005958, train/loss/visible_step:0.005909, train/loss/center_step:0.000488, epoch:4, step:29899
[2023-02-03 02:29:47,677][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.007348, train/loss/visible_step:0.007289, train/loss/center_step:0.000595, epoch:4, step:29949
[2023-02-03 02:30:15,226][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.007833, train/loss/visible_step:0.007781, train/loss/center_step:0.000523, epoch:4, step:29999
[2023-02-03 02:30:15,883][cross_view_transformer.tabular_logger][INFO] - train/loss_epoch:0.008613, train/loss/visible_epoch:0.008551, train/loss/center_epoch:0.000621, epoch:5, step:30000
段错误 (核心已转储)
报错:段错误,不是代码本身的错,而是电脑问题。
9,评价
利用存储的检查点,在output里面,下面是我的代码:
python scripts/eval.py +experiment=gkt_nuscenes_vehicle_kernel_7x1.yaml data.dataset_dir=/home/wxq/GKT/media/datasets/nuscenes data.labels_dir=/home/wxq/GKT/media/datasets/cvt_labels_nuscenes experiment.ckptt=/home/wxq/GKT/segmentation/pretrained_models/model-v1.ckpt
结果:
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Validate metric DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
train/metrics/iou@0.40 0.0
train/metrics/iou@0.50 0.0
train/metrics/iou_with_occlusions@0.40 0.0
train/metrics/iou_with_occlusions@0.50 0.0
val/loss 0.010933063924312592
val/loss/center 0.0006371336639858782
val/loss/visible 0.01086935494095087
val/metrics/iou@0.40 0.3631359338760376
val/metrics/iou@0.50 0.32436245679855347
val/metrics/iou_with_occlusions@0.40 0.3276636302471161
val/metrics/iou_with_occlusions@0.50 0.2666575610637665
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
double free or corruption (!prev)
已放弃 (核心已转储)
最后也报错了已放弃,哈哈哈
10,速度测试
python scripts/speed.py +experiment=gkt_nuscenes_vehicle_kernel_7x1.yaml data.dataset_dir=/home/wxq/GKT/media/datasets/nuscenes data.labels_dir=/home/wxq/GKT/media/datasets/cvt_labels_nuscenes
结果:
Benchmark mixed precision by adding +mixed_precision=True
Benchmark cpu performance +device=cpu
Global seed set to 2022
Loaded pretrained weights for efficientnet-b4
inference latency: 31.349 ms, speed: 31.898 fps
结束