【计算机视觉】timm包实现ConvNeXt

本文介绍了FacebookResearch的ConvNeXt模型的各种版本,包括在ImageNet-1k和ImageNet-22k上的训练模型,以及不同分辨率和微调的情况。还提供了ConvNeXtConfig的详细参数描述。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一、ConvNeXt

我们为 ConvNeXt 模型提供实现和预训练权重。

Paper: A ConvNet for the 2020s.

https://arxiv.org/abs/2201.03545

在这里插入图片描述
原始 pytorch 代码和权重来自:

https://github.com/facebookresearch/ConvNeXt

此代码已从 timm 实现移植。有以下型号可供选择。

1.1 型号

1.1.1 在 ImageNet-1k 上训练的模型

convnext_tiny
convnext_small
convnext_base
convnext_large

1.1.2 在 ImageNet-22k 上训练的模型,在 ImageNet-1k 上微调

convnext_tiny_in22ft1k
convnext_small_in22ft1k
convnext_base_in22ft1k
convnext_large_in22ft1k
convnext_xlarge_in22ft1k

1.1.3 在 ImageNet-22k 上训练的模型,在 ImageNet-1k 上以 384 分辨率进行微调

convnext_tiny_384_in22ft1k
convnext_small_384_in22ft1k
convnext_base_384_in22ft1k
convnext_large_384_in22ft1k
convnext_xlarge_384_in22ft1k

1.1.4 在 ImageNet-22k 上训练的模型

convnext_tiny_in22k
convnext_small_in22k
convnext_base_in22k
convnext_large_in22k
convnext_xlarge_in22k

1.2 ConvNeXtConfig

classConvNeXtConfig(name='', url='', nb_classes=1000, in_channels=3, input_size=(224, 224), patch_size=4, embed_dim=(96, 192, 384, 768), nb_blocks=(3, 3, 9, 3), mlp_ratio=4.0, conv_mlp_block=False, drop_rate=0.0, drop_path_rate=0.1, norm_layer='layer_norm_eps_1e-6', act_layer='gelu', init_scale=1e-06, crop_pct=0.875, interpolation='bicubic', mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), first_conv='stem/0', classifier='head/fc')
Parameters:
name (str) – Name of the model.

url (str) – URL for pretrained weights.

nb_classes (int) – Number of classes for classification head.

in_channels (int) – Number of input image channels.

input_size (Tuple[int, int]) – Input image size (height, width)

patch_size (int) – Patchifying the image is implemented via a convolutional layer with kernel size and stride equal to patch_size.

embed_dim (Tuple) – Feature dimensions at each stage.

nb_blocks (Tuple) – Number of blocks at each stage.

mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim

conv_mlp_block (bool) – There are two equivalent implementations of the ConvNeXt block, using either (1) 1x1 convolutions or (2) fully connected layers. In PyTorch option (2) also requires permuting channels, which is not needed in TensorFlow. We offer both implementations here, because some timm models use (1) while others use (2).

drop_rate (float) – Dropout rate.

drop_path_rate (float) – Dropout rate for stochastic depth.

norm_layer (str) – Normalization layer. See norm_layer_factory() for possible values.

act_layer (str) – Activation function. See act_layer_factory() for possible values.

init_scale (float) – Inital value for layer scale weights.

crop_pct (float) – Crop percentage for ImageNet evaluation.

interpolation (str) – Interpolation method for ImageNet evaluation.

mean (Tuple[float, float, float]) – Defines preprocessing function. If x is an image with pixel values in (0, 1), the preprocessing function is (x - mean) / std.

std (Tuple[float, float, float]) – Defines preprpocessing function.

first_conv (str) – Name of first convolutional layer. Used by create_model() to adapt the number in input channels when loading pretrained weights.

classifier (str) – Name of classifier layer. Used by create_model() to adapt the classifier when loading pretrained weights.

1.3 ConvNeXt

classConvNeXt(*args, **kwargs)
Parameters:
cfg (ConvNeXtConfig) – Configuration class for the model.

**kwargs – Arguments are passed to tf.keras.Model.
call(x, training=False, return_features=False)
Parameters:
x – Input to model

training (bool) – Training or inference phase?

return_features (bool) – If True, we return not only the model output, but a dictionary with intermediate features.
Returns:
If return_features=True, we return a tuple (y, features), where y is the model output and features is a dictionary with intermediate features.

If return_features=False, we return only y.
propertydummy_inputs: Tensor

Returns a tensor of the correct shape for inference.

propertyfeature_names: List[str]

Names of features, returned when calling call with return_features=True.

forward_features(x, training=False, return_features=False)

Forward pass through model, excluding the classifier layer. This function is useful if the model is used as input for downstream tasks such as object detection.

Parameters:
x – Input to model

training (bool) – Training or inference phase?

return_features (bool) – If True, we return not only the model output, but a dictionary with intermediate features.
Returns:
If return_features=True, we return a tuple (y, features), where y is the model output and features is a dictionary with intermediate features.

If return_features=False, we return only y.
### 如何在 PyCharm 中安装和配置 timm #### 配置 Anaconda 虚拟环境中的 timm 为了确保 `timm` 库能够在 PyCharm 工程中正常使用,首先要确认该库已经被正确安装到所使用的 Python 解释器环境中。如果是在 Anaconda 下管理的虚拟环境中操作,则可以通过命令行来完成安装: ```bash conda activate your_env_name # 替换为实际的环境名称 pip install timm # 使用 pip 安装 timm ``` 或者通过 Conda 渠道安装: ```bash conda install -c pytorch timm ``` 这一步骤能够保证 `timm` 成功加入到了指定的 Python 环境里[^3]。 #### 设置 PyCharm 的项目解释器 接着,在 PyCharm 内部要指明使用上述含有 `timm` 的特定版本 Python 解释器。具体做法如下: 进入菜单栏选择 File -> Settings (对于 macOS 用户则是 PyCharm -> Preferences),导航至 Project: *your_project_name* -> Python Interpreter 。此时应该能看到当前被选用作解析脚本文件的那个解释器列表;点击右侧齿轮图标旁边的加号 (+) 来添加新的解释器路径,指向之前创建好的含有 `timm` 的 Anaconda 虚拟环境下的 python.exe 文件位置(例如 D:\Anaconda\envs\tf_gpu\python.exe)。一旦选定完毕,PyCharm 将自动识别并列出此环境下所有的已安装软件,其中就含了刚刚引入的 `timm`[^1]。 #### 测试 timm 是否可用 最后一步是为了验证一切设置无误,可以在 PyCharm 编辑窗口内编写简单的测试代码片段以尝试导入 `timm` 并执行基本功能调用: ```python import timm print(timm.__version__) model_names = timm.list_models(pretrained=True) for name in model_names[:5]: print(name) ``` 这段程序将会打印出 `timm` 版本信息以及预训练模型的名字列表的一部分,以此证明 `timm` 正常工作于当前开发环境中。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

旅途中的宽~

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值