关于Detectron库预训练模型的权重转换

关于Detectron库预训练模型的权重转换

最近在调试代码的过程中涉及到detectron库的使用,在模型训练前,主干网络的部分需要加载预训练模型,但是原始的预训练模型在detron库中的代码是不能直接使用的,需要通过转换工具对模型的键值对做一下转换才可以使用,具体的使用流程见:MaskFormer/MODEL_ZOO.md at main · facebookresearch/MaskFormer (github.com)

常见的resnet系列的模型就不在这里举例了,以swin-transformer为例,说明如何使用转换工具。

convert-pretrained-swin-model-to-d2.py文件是转换swin_transformer模型的代码,具体的代码内容如下:

#!/usr/bin/env python
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved

import pickle as pkl
import sys

import torch

"""
Usage:
  # download pretrained swin model:
  wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth
  # run the conversion
  ./convert-pretrained-model-to-d2.py swin_tiny_patch4_window7_224.pth swin_tiny_patch4_window7_224.pkl
  # Then, use swin_tiny_patch4_window7_224.pkl with the following changes in config:
MODEL:
  WEIGHTS: "/path/to/swin_tiny_patch4_window7_224.pkl"
INPUT:
  FORMAT: "RGB"
"""

if __name__ == "__main__":
    input = sys.argv[1]

    obj = torch.load(input, map_location="cpu")["model"]

    res = {"model": obj, "__author__": "third_party", "matching_heuristics": True}

    with open(sys.argv[2], "wb") as f:
        pkl.dump(res, f)

使用方式如下(以ubuntu系统为例):

pip install timm

wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth
python tools/convert-pretrained-swin-model-to-d2.py swin_tiny_patch4_window7_224.pth swin_tiny_patch4_window7_224.pkl

wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_small_patch4_window7_224.pth
python tools/convert-pretrained-swin-model-to-d2.py swin_small_patch4_window7_224.pth swin_small_patch4_window7_224.pkl

wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth
python tools/convert-pretrained-swin-model-to-d2.py swin_base_patch4_window12_384_22k.pth swin_base_patch4_window12_384_22k.pkl

wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth
python tools/convert-pretrained-swin-model-to-d2.py swin_large_patch4_window12_384_22k.pth swin_large_patch4_window12_384_22k.pkl

转换之后的结果如下图,即pickle格式的新文件。

image-20221213144738224
附swin transformer系列模型下载地址
microsoft/Swin-Transformer: This is an official implementation for “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows”. (github.com)

ImageNet-1K and ImageNet-22K Pretrained Swin-V1 Models

namepretrainresolutionacc@1acc@5#paramsFLOPsFPS22K model1K model
Swin-TImageNet-1K224x22481.295.528M4.5G755-github/baidu/config/log
Swin-SImageNet-1K224x22483.296.250M8.7G437-github/baidu/config/log
Swin-BImageNet-1K224x22483.596.588M15.4G278-github/baidu/config/log
Swin-BImageNet-1K384x38484.597.088M47.1G85-github/baidu/config
Swin-TImageNet-22K224x22480.996.028M4.5G755github/baidu/configgithub/baidu/config
Swin-SImageNet-22K224x22483.297.050M8.7G437github/baidu/configgithub/baidu/config
Swin-BImageNet-22K224x22485.297.588M15.4G278github/baidu/configgithub/baidu/config
Swin-BImageNet-22K384x38486.498.088M47.1G85github/baidugithub/baidu/config
Swin-LImageNet-22K224x22486.397.9197M34.5G141github/baidu/configgithub/baidu/config
Swin-LImageNet-22K384x38487.398.2197M103.9G42github/baidugithub/baidu/config

ImageNet-1K and ImageNet-22K Pretrained Swin-V2 Models

namepretrainresolutionwindowacc@1acc@5#paramsFLOPsFPS22K model1K model
SwinV2-TImageNet-1K256x2568x881.895.928M5.9G572-github/baidu/config
SwinV2-SImageNet-1K256x2568x883.796.650M11.5G327-github/baidu/config
SwinV2-BImageNet-1K256x2568x884.296.988M20.3G217-github/baidu/config
SwinV2-TImageNet-1K256x25616x1682.896.228M6.6G437-github/baidu/config
SwinV2-SImageNet-1K256x25616x1684.196.850M12.6G257-github/baidu/config
SwinV2-BImageNet-1K256x25616x1684.697.088M21.8G174-github/baidu/config
SwinV2-B*ImageNet-22K256x25616x1686.297.988M21.8G174github/baidu/configgithub/baidu/config
SwinV2-B*ImageNet-22K384x38424x2487.198.288M54.7G57github/baidu/configgithub/baidu/config
SwinV2-L*ImageNet-22K256x25616x1686.998.0197M47.5G95github/baidu/configgithub/baidu/config
SwinV2-L*ImageNet-22K384x38424x2487.698.3197M115.4G33github/baidu/configgithub/baidu/config

Note:

  • SwinV2-B* (SwinV2-L*) with input resolution of 256x256 and 384x384 both fine-tuned from the same pre-training model using a smaller input resolution of 192x192.
  • SwinV2-B* (384x384) achieves 78.08 acc@1 on ImageNet-1K-V2 while SwinV2-L* (384x384) achieves 78.31.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

肆十二

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值