使用timm模型的金字塔特征

文章介绍了timm库如何用于查看各种预训练模型并创建作为特征提取器的模型。通过设置features_only参数,可以得到不同尺度的特征地图,这对于对象检测、分割等任务非常有用。此外,timm提供了.feature_info属性来查询特征信息,包括通道数和分辨率。还可以通过out_indices和output_stride参数选择特定层次的特征或限制输出的步长。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1 查看timm模型

import timm

print(timm.list_models())

Models API and Pretrained weights | timmdocs (fast.ai)

2 使用timm的特征

Multi-scale Feature Maps (Feature Pyramid)

Object detection, segmentation, keypoint, and a variety of dense pixel tasks require access to feature maps from the backbone network at multiple scales. This is often done by modifying the original classification network. Since each network varies quite a bit in structure, it’s not uncommon to see only a few backbones supported in any given obj detection or segmentation library.

timm allows a consistent interface for creating any of the included models as feature backbones that output feature maps for selected levels.

A feature backbone can be created by adding the argument features_only=True to any create_model call. By default 5 strides will be output from most models (not all have that many), with the first starting at 2 (some start at 1 or 4).

Create a feature map extraction model

>>> import torch
>>> import timm
>>> m = timm.create_model('resnest26d', features_only=True, pretrained=True)
>>> o = m(torch.randn(2, 3, 224, 224))
>>> for x in o:
...     print(x.shape)

Output:

torch.Size([2, 64, 112, 112])
torch.Size([2, 256, 56, 56])
torch.Size([2, 512, 28, 28])
torch.Size([2, 1024, 14, 14])
torch.Size([2, 2048, 7, 7])

Query the feature information

After a feature backbone has been created, it can be queried to provide channel or resolution reduction information to the downstream heads without requiring static config or hardcoded constants. The .feature_info attribute is a class encapsulating the information about the feature extraction points.

>>> import torch
>>> import timm
>>> m = timm.create_model('regnety_032', features_only=True, pretrained=True)
>>> print(f'Feature channels: {m.feature_info.channels()}')
>>> o = m(torch.randn(2, 3, 224, 224))
>>> for x in o:
...     print(x.shape)

Output:

Feature channels: [32, 72, 216, 576, 1512]
torch.Size([2, 32, 112, 112])
torch.Size([2, 72, 56, 56])
torch.Size([2, 216, 28, 28])
torch.Size([2, 576, 14, 14])
torch.Size([2, 1512, 7, 7])

Select specific feature levels or limit the stride

There are two additional creation arguments impacting the output features.

  • out_indices selects which indices to output
  • output_stride limits the feature output stride of the network (also works in classification mode BTW)

out_indices is supported by all models, but not all models have the same index to feature stride mapping. Look at the code or check feature_info to compare. The out indices generally correspond to the C(i+1)th feature level (a 2^(i+1) reduction). For most models, index 0 is the stride 2 features, and index 4 is stride 32.

output_stride is achieved by converting layers to use dilated convolutions. Doing so is not always straightforward, some networks only support output_stride=32.

>>> import torch
>>> import timm
>>> m = timm.create_model('ecaresnet101d', features_only=True, output_stride=8, out_indices=(2, 4), pretrained=True)
>>> print(f'Feature channels: {m.feature_info.channels()}')
>>> print(f'Feature reduction: {m.feature_info.reduction()}')
>>> o = m(torch.randn(2, 3, 320, 320))
>>> for x in o:
...     print(x.shape)

Output:

Feature channels: [512, 2048]
Feature reduction: [8, 8]
torch.Size([2, 512, 40, 40])
torch.Size([2, 2048, 40, 40])
### 使用 EfficientNetV2 和 YOLOv8 提取图像的特征向量 #### 准备工作 为了使用 EfficientNetV2 和 YOLOv8 来提取图像的特征向量,首先需要安装必要的库并加载模型。 ```bash pip install torch torchvision ultralytics timm ``` #### 加载预训练模型 接下来,分别加载 EfficientNetV2 和 YOLOv8 的预训练权重: ```python import torch from torchvision import transforms from PIL import Image from timm.models.efficientnet_v2 import efficientnet_v2_s from ultralytics import YOLO # 初始化EfficientNetV2-S模型 model_eff = efficientnet_v2_s(pretrained=True) # 设置为评估模式 model_eff.eval() # 初始化YOLOv8模型 yolo_model = YOLO('yolov8n.pt') ``` #### 图像预处理 对于不同的模型,可能需要不同形式的数据预处理。这里提供了一个通用的预处理方法适用于大多数情况下的图像输入: ```python transform = transforms.Compose([ transforms.Resize((224, 224)), # 调整大小至适合EfficientNetV2输入尺寸 transforms.ToTensor(), # 将PIL图片转换成tensor,并缩放到[0,1]区间内 ]) image_path = 'path_to_your_image.jpg' img = Image.open(image_path).convert('RGB') # 打开并转为RGB格式 input_tensor = transform(img)[None,...] # 增加batch维度 ``` #### 特征提取过程 现在可以利用上述准备好的两个模型来进行特征提取操作了。 ##### 使用 EfficientNetV2 进行特征提取 EfficientNetV2 可以直接用于获取高层语义特征表示: ```python with torch.no_grad(): features_eff = model_eff.forward_features(input_tensor) # 获取最后一层卷积后的输出作为特征 print(features_eff.shape) # 输出形状应类似于torch.Size([1, C, H', W']) ``` 此处 `forward_features` 方法会返回最后一个卷积层之前的激活映射,这通常包含了丰富的空间信息以及类别区分度较高的特征[^1]。 ##### 利用 YOLOv8 提取多尺度特征金字塔 YOLOv8 不仅能够检测目标物体的位置还可以用来抽取多个层次上的特征图谱: ```python results = yolo_model.predict(source=image_path, save=False) features_yolo = results.pred[0].cpu().numpy()[:, :4] for layer_name in ['backbone', 'neck']: feature_maps = getattr(yolo_model.model, layer_name)(input_tensor)[-1] print(f"{layer_name} output shape:", feature_maps.size()) ``` 注意,在实际应用中可以根据具体需求选取特定阶段(`backbone`, `neck`)甚至更细粒度地指定某一层级来获得所需的空间位置敏感型表征[^3]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值