【论文阅读】Learning Spatial Regularization with Image-level Supervisionsfor Multi-label Image Classifica

目录

SRN

SRN的优势

 SRN网络结构

SRN:注意力机制 fatt(·)

SRN:fsr(·)结构

Multiple Steps 分步训练

实验结果


SRN

空间正则化网络(Spatial Regularization Network, SRN),学习所有标签间的注意力图(attention maps),并通过可学习卷积挖掘标签间的潜在关系,结合正则化分类结果和 ResNet-101 网络的分类结果,以提高图像分类表现。

SRN的优势

  • 挖掘图像多标签之间的语义和空间关联性,较大地提高精度
  • 当网络模型对具有空间相关标签的图片训练后,注意力机制自适应地关注图像的相关区域
  • 图像级标注,端到端训练

 SRN网络结构

  • Main Net:ResNet-101,针对各标签分别学习得到独立的分类器。“Res-2048” 表示具有2048输出的 ResNet 网络模块; 
  • SRN 采用ResNet-101的视觉特征作为输入,利用注意力机制学习得到标签间的正则空间关系;
  • 结合主网络和SRN的分类结果得到最终的分类置信度;

SRN:注意力机制 fatt(·)

当图像存在某个标签时,更多的注意力应该放在相关的区域,标签注意力图编码了标签对应的丰富空间信息。l被标记则l相关区域的注意力值应该更高

注意力图能用于产生更鲁棒的空间正则信息,但每个标签的注意力图总是和为1,可能会突出错误位置,造成错误的空间正则信息,论文提出使用加权注意力图U,U解码了标签局部和全局的置信分数。

SRN:fsr(·)结构

  • conv2、conv3多通道,512输出,捕捉多标签的语义关系;
  • conv4单通道,2048输出,4个kernel为一组缠绕1个相同的特征通道,不同kernel捕捉语义关联标签间的不同空间关系。

Multiple Steps 分步训练

  1. 只训练主网络, 基于 ResNet,pretrained on ImageNet,fcnn 和 fcls;
  2. 固定 fcnn 和 fcls, 训练 fatt
  3. 固定 fcnn, fcls和 fatt,训练 fsr;
  4. 联合训练整个网络。

图像增强策略:

  • resize为256×256
  • 裁剪4个角和中心区域,长宽在{256,224,192,168,128}中随机选取
  • resize为224×224

实验结果

神经元在这四个标签(“男性”、“长袖”、“正式”、“长裤”)之间存在很强的空间和语义关系。
神经元在这四个标签(“男性”、“长袖”、“正式”、“长裤”)之间存在很强的空间和语义关系。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Image super-resolution (SR) is the process of increasing the resolution of a low-resolution (LR) image to a higher resolution (HR) version. This is an important task in computer vision and has many practical applications, such as improving the quality of images captured by low-resolution cameras or enhancing the resolution of medical images. However, most existing SR methods suffer from a loss of texture details and produce overly smooth HR images, which can result in unrealistic and unappealing results. To address this issue, a new SR method called Deep Spatial Feature Transform (DSFT) has been proposed. DSFT is a deep learning-based approach that uses a spatial feature transform layer to recover realistic texture in the HR image. The spatial feature transform layer takes the LR image and a set of HR feature maps as input and transforms the features to a higher dimensional space. This allows the model to better capture the high-frequency details in the image and produce more realistic HR images. The DSFT method also employs a multi-scale approach, where the LR image is processed at multiple scales to capture both local and global features. Additionally, the model uses residual connections to improve the training process and reduce the risk of overfitting. Experimental results show that DSFT outperforms state-of-the-art SR methods in terms of both quantitative metrics and visual quality. The method is also shown to be robust to different noise levels and image degradation scenarios. In summary, DSFT is a promising approach for realistic texture recovery in image super-resolution. Its ability to capture high-frequency details and produce visually appealing HR images makes it a valuable tool for various applications in computer vision.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值