convnext_xxlarge.clip_laion2b_soup_ft_in12k timm模型库

Model card for convnext_xxlarge.clip_laion2b_soup_ft_in12k

A ConvNeXt image classification model. CLIP image tower weights pretrained in OpenCLIP on LAION and fine-tuned on ImageNet-12k by Ross Wightman.

Please see related OpenCLIP model cards for more details on pretrain:

Model Details

Model Usage

Image Classification

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('convnext_xxlarge.clip_laion2b_soup_ft_in12k', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

Feature Map Extraction

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'convnext_xxlarge.clip_laion2b_soup_ft_in12k',
    pretrained=True,
    features_only=True,
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

for o in output:
    # print shape of each feature map in output
    # e.g.:
    #  torch.Size([1, 384, 64, 64])
    #  torch.Size([1, 768, 32, 32])
    #  torch.Size([1, 1536, 16, 16])
    #  torch.Size([1, 3072, 8, 8])

    print(o.shape)

Image Embeddings

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'convnext_xxlarge.clip_laion2b_soup_ft_in12k',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 3072, 8, 8) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

Model Comparison

Explore the dataset and runtime metrics of this model in timm model results.

All timing numbers from eager model PyTorch 1.13 on RTX 3090 w/ AMP.

modeltop1top5img_sizeparam_countgmacsmactssamples_per_secbatch_size
convnextv2_huge.fcmae_ft_in22k_in1k_51288.84898.742512660.29600.81413.0728.5848
convnextv2_huge.fcmae_ft_in22k_in1k_38488.66898.738384660.29337.96232.3550.5664
convnext_xxlarge.clip_laion2b_soup_ft_in1k88.61298.704256846.47198.09124.45122.45256
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_38488.31298.578384200.13101.11126.74196.84256
convnextv2_large.fcmae_ft_in22k_in1k_38488.19698.532384197.96101.1126.74128.94128
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_32087.96898.47320200.1370.2188.02283.42256
convnext_xlarge.fb_in22k_ft_in1k_38487.7598.556384350.2179.2168.99124.85192
convnextv2_base.fcmae_ft_in22k_in1k_38487.64698.42238488.7245.2184.49209.51256
convnext_large.fb_in22k_ft_in1k_38487.47698.382384197.77101.1126.74194.66256
convnext_large_mlp.clip_laion2b_augreg_ft_in1k87.34498.218256200.1344.9456.33438.08256
convnextv2_large.fcmae_ft_in22k_in1k87.2698.248224197.9634.443.13376.84256
convnext_base.clip_laion2b_augreg_ft_in12k_in1k_38487.13898.21238488.5945.2184.49365.47256
convnext_xlarge.fb_in22k_ft_in1k87.00298.208224350.260.9857.5368.01256
convnext_base.fb_in22k_ft_in1k_38486.79698.26438488.5945.2184.49366.54256
convnextv2_base.fcmae_ft_in22k_in1k86.7498.02222488.7215.3828.75624.23256
convnext_large.fb_in22k_ft_in1k86.63698.028224197.7734.443.13581.43256
convnext_base.clip_laiona_augreg_ft_in1k_38486.50497.9738488.5945.2184.49368.14256
convnext_base.clip_laion2b_augreg_ft_in12k_in1k86.34497.9725688.5920.0937.55816.14256
convnextv2_huge.fcmae_ft_in1k86.25697.75224660.29115.079.07154.72256
convnext_small.in12k_ft_in1k_38486.18297.9238450.2225.5863.37516.19256
convnext_base.clip_laion2b_augreg_ft_in1k86.15497.6825688.5920.0937.55819.86256
convnext_base.fb_in22k_ft_in1k85.82297.86622488.5915.3828.751037.66256
convnext_small.fb_in22k_ft_in1k_38485.77897.88638450.2225.5863.37518.95256
convnextv2_large.fcmae_ft_in1k85.74297.584224197.9634.443.13375.23256
convnext_small.in12k_ft_in1k85.17497.50622450.228.7121.561474.31256
convnext_tiny.in12k_ft_in1k_38485.11897.60838428.5913.1439.48856.76256
convnextv2_tiny.fcmae_ft_in22k_in1k_38485.11297.6338428.6413.1439.48491.32256
convnextv2_base.fcmae_ft_in1k84.87497.0922488.7215.3828.75625.33256
convnext_small.fb_in22k_ft_in1k84.56297.39422450.228.7121.561478.29256
convnext_large.fb_in1k84.28296.892224197.7734.443.13584.28256
convnext_tiny.in12k_ft_in1k84.18697.12422428.594.4713.442433.7256
convnext_tiny.fb_in22k_ft_in1k_38484.08497.1438428.5913.1439.48862.95256
convnextv2_tiny.fcmae_ft_in22k_in1k83.89496.96422428.644.4713.441452.72256
convnext_base.fb_in1k83.8296.74622488.5915.3828.751054.0256
convnextv2_nano.fcmae_ft_in22k_in1k_38483.3796.74238415.627.2224.61801.72256
convnext_small.fb_in1k83.14296.43422450.228.7121.561464.0256
convnextv2_tiny.fcmae_ft_in1k82.9296.28422428.644.4713.441425.62256
convnext_tiny.fb_in22k_ft_in1k82.89896.61622428.594.4713.442480.88256
convnext_nano.in12k_ft_in1k82.28296.34422415.592.468.373926.52256
convnext_tiny_hnf.a2h_in1k82.21695.85222428.594.4713.442529.75256
convnext_tiny.fb_in1k82.06695.85422428.594.4713.442346.26256
convnextv2_nano.fcmae_ft_in22k_in1k82.0396.16622415.622.468.372300.18256
convnextv2_nano.fcmae_ft_in1k81.8395.73822415.622.468.372321.48256
convnext_nano_ols.d1h_in1k80.86695.24622415.652.659.383523.85256
convnext_nano.d1h_in1k80.76895.33422415.592.468.373915.58256
convnextv2_pico.fcmae_ft_in1k80.30495.0722249.071.376.13274.57256
convnext_pico.d1_in1k79.52694.5582249.051.376.15686.88256
convnext_pico_ols.d1_in1k79.52294.6922249.061.436.55422.46256
convnextv2_femto.fcmae_ft_in1k78.48893.982245.230.794.574264.2256
convnext_femto_ols.d1_in1k77.8693.832245.230.824.876910.6256
convnext_femto.d1_in1k77.45493.682245.220.794.577189.92256
convnextv2_atto.fcmae_ft_in1k76.66493.0442243.710.553.814728.91256
convnext_atto_ols.a2_in1k75.8892.8462243.70.584.117963.16256
convnext_atto.d2_in1k75.66492.92243.70.553.818439.22256

Citation

@software{ilharco_gabriel_2021_5143773,
  author       = {Ilharco, Gabriel and
                  Wortsman, Mitchell and
                  Wightman, Ross and
                  Gordon, Cade and
                  Carlini, Nicholas and
                  Taori, Rohan and
                  Dave, Achal and
                  Shankar, Vaishaal and
                  Namkoong, Hongseok and
                  Miller, John and
                  Hajishirzi, Hannaneh and
                  Farhadi, Ali and
                  Schmidt, Ludwig},
  title        = {OpenCLIP},
  month        = jul,
  year         = 2021,
  note         = {If you use this software, please cite it as below.},
  publisher    = {Zenodo},
  version      = {0.1},
  doi          = {10.5281/zenodo.5143773},
  url          = {https://doi.org/10.5281/zenodo.5143773}
}

@inproceedings{schuhmann2022laionb,
  title={{LAION}-5B: An open large-scale dataset for training next generation image-text models},
  author={Christoph Schuhmann and
          Romain Beaumont and
          Richard Vencu and
          Cade W Gordon and
          Ross Wightman and
          Mehdi Cherti and
          Theo Coombes and
          Aarush Katta and
          Clayton Mullis and
          Mitchell Wortsman and
          Patrick Schramowski and
          Srivatsa R Kundurthy and
          Katherine Crowson and
          Ludwig Schmidt and
          Robert Kaczmarczyk and
          Jenia Jitsev},
  booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2022},
  url={https://openreview.net/forum?id=M3Y74vmsMcY}
}

i

@inproceedings{Radford2021LearningTV,
  title={Learning Transferable Visual Models From Natural Language Supervision},
  author={Alec Radford and Jong Wook Kim and Chris Hallacy and A. Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever},
  booktitle={ICML}, 
  year={2021}
}

@article{liu2022convnet,
  author  = {Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
  title   = {A ConvNet for the 2020s},
  journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year    = {2022},
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

AI生成曾小健

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值