Swin-Transformer-Object-Detection在Ubuntu的实现

最新推荐文章于 2024-06-17 20:44:02 发布

blanche0707

最新推荐文章于 2024-06-17 20:44:02 发布

阅读量573

点赞数

文章标签： transformer ubuntu 深度学习目标检测

本文链接：https://blog.csdn.net/blanche0707/article/details/127239999

版权

环境配置

cvmart创建的实例环境已经安装了Pytorch和CUDA，所以不需要自己再造轮子。首先从github上把Swin-Transformer-Obeject-Detection的源码下载到实例中。然后进行依赖库安装。

Swin-Transformer

git clone https://github.com/SwinTransformer/Swin-Transformer-Object-Detection.git
cd Swin-Transformer-Object-Detection
pip install -r requirements.txt
python setup.py develop

由于安装时候报错，建议将requirements/runtime.txt中的mmpycocotools注释掉，貌似不影响后续训练。当然我跑的是VOC格式数据集，如果使用COCO数据集，不保证会不会出错。另外，安装完成后你会发现pip无法正常使用，经排查是因为python3.7和typing依赖库冲突导致的，使用如下命令把它卸载。加上sudo后，pip可以完成卸载。

sudo pip uninstall typing

mmcv

然后安装mmcv-full，这一步使用pip安装可能会报错，采用官方给的mim方式安装比较妥当。此处需要指定版本，否则不兼容。

pip install -U openmim
mim install mmcv-full==1.4.0

apex

官方代码采用了apex加速，我们先把apex下载下来。然后进行安装。此前安装一直无法识别，参考了该文章后成功：Swin-Ttransformer Object Detection 环境配置及训练
如果你不想采用apex，可以将它注释掉，或者直接忽略一些报错信息。

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir ./

下载预训练模型

在加载预训练模型阶段有可能出现多种错误，目前在github的issues中以及其他文章中提到的多与模型采用的数据集有关。关于模型权重的转换有待讨论。

如果你采用COCO数据集，可以下载https://github.com/SwinTransformer/Swin-Transformer-Object-Detection提供的预训练模型试试
如果你采用VOC格式数据集，请到https://github.com/microsoft/Swin-Transformer下载

本人采用预训练模型为swin_tiny_patch4_window7_224_22kto1k_finetune.pth，并放在Swin的根目录下，你也可以下载任意一个你喜欢的模型，记得同时下载.pth和其对应的config文件，并放到对应的文件夹。在configs/swin文件夹下已有一些配置文件。

测试验证

本人对demo/image_demo.py做了一些小改动，然后用下载的预训练模型进行推断测试。

from argparse import ArgumentParser

from mmdet.apis import inference_detector, init_detector, show_result_pyplot


def main():
    parser = ArgumentParser()
    parser.add_argument('img', help='Image file')
    parser.add_argument('config', help='Config file')
    parser.add_argument('checkpoint', help='Checkpoint file')
    parser.add_argument(
        '--device', default='cuda:0', help='Device used for inference') # cpu is available
    parser.add_argument(
        '--score-thr', type=float, default=0.3, help='bbox score threshold')
    args = parser.parse_args()

    # build the model from a config file and a checkpoint file
    model = init_detector(args.config, args.checkpoint, device=args.device)
    # test a single image
    result = inference_detector(model, args.img)
    # show the results
    show_result_pyplot(model, args.img, result, score_thr=args.score_thr)
    model.show_result(args.img,result,out_file='demo/demo_result.jpg')


if __name__ == '__main__':
    main()

python demo/image_demo.py demo/demo.jpg configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py swin_tiny_patch4_window7_224_22kto1k_finetune.pth

这是推断结果，至此，你的Swin-Transformer-Object-Detection已经可以使用，预训练模型也可以正确加载。

实例图片目标检测标注框及mask

模型训练

由于本项目没有像素标注mask，参考Swin-Transformer目标检测把原模型改为纯目标检测的无mask模型。并更改相应文件中的类别名、类别数量。该文章介绍得十分详细，为我们提供了很大帮助。

数据处理

在极市打榜平台的数据是分布在多个文件夹中，如果采用VOC格式数据，我的做法是全部合并到一个文件夹，如有更好的做法请提出供大家参考。如果采用COCO格式，应该还需添加一些json文件，在官方代码中似乎有VOC->COCO的代码，如没有请自行搜索。具体细节涉及打榜得分问题，此处省略。

开始训练

有文章指出直接在config文件中加载预训练模型可以避免一些报错，具体报错信息为：
KeyError: "CascadeRCNN: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'"
我尝试在base的config文件中修改并未奏效，路径也是个令人头疼的问题，如果有同学成功了也请分享。该错误在官方github中有对应issue，感兴趣的同学可以直接查看。issue#4
这里贴上我的训练脚本语句，我还是采用–cfg-options的方法加载预训练模型了。

python /project/train/src_repo/Swintransformer/tools/train.py /project/train/src_repo/Swintransformer/configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py --gpu-ids 0 --work-dir /project/train/models --cfg-options model.pretrained=swin_tiny_patch4_window7_224_22kto1k_finetune.pth

断点继续训练的方法还在探索，未完待续。