【计算机视觉】timm包实现ConvNeXt

最新推荐文章于 2024-09-14 11:15:30 发布

置顶旅途中的宽~

最新推荐文章于 2024-09-14 11:15:30 发布

阅读量2.3k

点赞数 26

分类专栏：计算机视觉文章标签：计算机视觉人工智能目标检测 timm ConvNetXt

本文链接：https://blog.csdn.net/wzk4869/article/details/134678147

版权

计算机视觉专栏收录该内容

165 篇文章

订阅专栏

本文介绍了FacebookResearch的ConvNeXt模型的各种版本，包括在ImageNet-1k和ImageNet-22k上的训练模型，以及不同分辨率和微调的情况。还提供了ConvNeXtConfig的详细参数描述。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章目录

一、ConvNeXt

一、ConvNeXt

我们为 ConvNeXt 模型提供实现和预训练权重。

Paper: A ConvNet for the 2020s.

https://arxiv.org/abs/2201.03545

在这里插入图片描述
原始 pytorch 代码和权重来自：

https://github.com/facebookresearch/ConvNeXt

此代码已从 timm 实现移植。有以下型号可供选择。

1.1 型号

1.1.1 在 ImageNet-1k 上训练的模型

convnext_tiny

convnext_small

convnext_base

convnext_large

1.1.2 在 ImageNet-22k 上训练的模型，在 ImageNet-1k 上微调

convnext_tiny_in22ft1k

convnext_small_in22ft1k

convnext_base_in22ft1k

convnext_large_in22ft1k

convnext_xlarge_in22ft1k

1.1.3 在 ImageNet-22k 上训练的模型，在 ImageNet-1k 上以 384 分辨率进行微调

convnext_tiny_384_in22ft1k

convnext_small_384_in22ft1k

convnext_base_384_in22ft1k

convnext_large_384_in22ft1k

convnext_xlarge_384_in22ft1k

1.1.4 在 ImageNet-22k 上训练的模型

convnext_tiny_in22k

convnext_small_in22k

convnext_base_in22k

convnext_large_in22k

convnext_xlarge_in22k

1.2 ConvNeXtConfig

classConvNeXtConfig(name='', url='', nb_classes=1000, in_channels=3, input_size=(224, 224), patch_size=4, embed_dim=(96, 192, 384, 768), nb_blocks=(3, 3, 9, 3), mlp_ratio=4.0, conv_mlp_block=False, drop_rate=0.0, drop_path_rate=0.1, norm_layer='layer_norm_eps_1e-6', act_layer='gelu', init_scale=1e-06, crop_pct=0.875, interpolation='bicubic', mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), first_conv='stem/0', classifier='head/fc')

Parameters:
name (str) – Name of the model.

url (str) – URL for pretrained weights.

nb_classes (int) – Number of classes for classification head.

in_channels (int) – Number of input image channels.

input_size (Tuple[int, int]) – Input image size (height, width)

patch_size (int) – Patchifying the image is implemented via a convolutional layer with kernel size and stride equal to patch_size.

embed_dim (Tuple) – Feature dimensions at each stage.

nb_blocks (Tuple) – Number of blocks at each stage.

mlp_ratio (float) – Ratio of mlp hidden dim to embedding dim

conv_mlp_block (bool) – There are two equivalent implementations of the ConvNeXt block, using either (1) 1x1 convolutions or (2) fully connected layers. In PyTorch option (2) also requires permuting channels, which is not needed in TensorFlow. We offer both implementations here, because some timm models use (1) while others use (2).

drop_rate (float) – Dropout rate.

drop_path_rate (float) – Dropout rate for stochastic depth.

norm_layer (str) – Normalization layer. See norm_layer_factory() for possible values.

act_layer (str) – Activation function. See act_layer_factory() for possible values.

init_scale (float) – Inital value for layer scale weights.

crop_pct (float) – Crop percentage for ImageNet evaluation.

interpolation (str) – Interpolation method for ImageNet evaluation.

mean (Tuple[float, float, float]) – Defines preprocessing function. If x is an image with pixel values in (0, 1), the preprocessing function is (x - mean) / std.

std (Tuple[float, float, float]) – Defines preprpocessing function.

first_conv (str) – Name of first convolutional layer. Used by create_model() to adapt the number in input channels when loading pretrained weights.

classifier (str) – Name of classifier layer. Used by create_model() to adapt the classifier when loading pretrained weights.

1.3 ConvNeXt

classConvNeXt(*args, **kwargs)

Parameters:
cfg (ConvNeXtConfig) – Configuration class for the model.

**kwargs – Arguments are passed to tf.keras.Model.

call(x, training=False, return_features=False)

Parameters:
x – Input to model

training (bool) – Training or inference phase?

return_features (bool) – If True, we return not only the model output, but a dictionary with intermediate features.

Returns:
If return_features=True, we return a tuple (y, features), where y is the model output and features is a dictionary with intermediate features.

If return_features=False, we return only y.

propertydummy_inputs: Tensor

Returns a tensor of the correct shape for inference.

propertyfeature_names: List[str]

Names of features, returned when calling call with return_features=True.

forward_features(x, training=False, return_features=False)

Forward pass through model, excluding the classifier layer. This function is useful if the model is used as input for downstream tasks such as object detection.

Parameters:
x – Input to model

training (bool) – Training or inference phase?

return_features (bool) – If True, we return not only the model output, but a dictionary with intermediate features.

Returns:
If return_features=True, we return a tuple (y, features), where y is the model output and features is a dictionary with intermediate features.

If return_features=False, we return only y.