Paper reading——Deep residual pooling network for texture recognition

该研究提出了一种深度残差池化网络,用于纹理识别。通过保留特征图的空间信息并结合残差编码,实现了一个端到端的学习框架。这种方法克服了传统方法中的限制,提高了学习效率,并在多个纹理识别数据集上表现出优越性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Title

Deep residual pooling network for texture recognition

Year/ Authors/ Journal

2021

/Mao, Shangbo and Rajan, Deepu and Chia, Liang Tien

/ Pattern Recognition

citation

@article{
   mao2021deep,
  title={
   Deep residual pooling network for texture recognition},
  author={
   Mao, Shangbo and Rajan, Deepu and Chia, Liang Tien},
  journal={
   Pattern Recognition},
  volume={
   112},
  pages={
   107817},
  year={
   2021},
  publisher={
   Elsevier}
}

Summary

  • The balance between the orderless features and the spatial information for effective texture recognition.

  • Experiments show that retaining the spatial information before aggregation is helpful in feature learning for texture recognition.

Interesting Point(s)

  1. It would be interesting to explore if the best feature maps could be automatically identified as suitable candidates for combining.

  2. In our method, the multi-size training can influence only the features learned in the convolutional transfer module, which will not lead to a major influence in the final performance. We plan to address this in the future.

  3. The properties of the Deep-TEN for integrating Encoding Layer with and end-to-end CNN architecture.

    • spatial information + orderless features.

    • end to end training.

    • same dimensions.

Research Objective(s)

在这里插入图片描述

Fig. 1. Overall framework of deep residual pooling network. When the backbone network is Resnet-50 and the input image size is 224 ×224 ×3 , the dimension of the feature map extracted from $ f_{cnn} $ is 7 ×7 ×2048 , as the orange cube in the figure shows. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Current deep learning-based texture recognition methods extract spatial orderless features from pretrained deep learning models that are trained on large-scale image datasets. These methods either pro- duce high dimensional features or have multiple steps like dictionary learning, feature encoding and dimension reduction. In this paper, we propose a novel end-to-end learning framework that not only overcomes these limitations, but also demonstrates faster learning.

Contributions

The contribution of our work is three fold:

  • We propose a learnable residual pooling layer comprising of a residual encoding module and an aggregation module. We take advantage of the feature learning ability of the convolutional layer and integrate the idea of residual encoding to pro- pose a learnable pooling layer. Besides, the proposed layer produces the residual codes retaining spatial information and aggregates them to a feature with a lower dimension compared with the state-of-the-art methods. Experiments show that retaining the spatial information before aggregation is helpful in feature learning for texture recognition.
  • We propose a novel end-to-end learning framework that integrates the residual pooling layer into a pretrained CNN model for efficient feature transfer for texture recognition. We show the overview of the proposed residual pooling framework in Fig. 1 .
  • We compare our feature dimensions as well as the performance of the proposed pooling layer with other residual encoding schemes to illustrate state-of-the-art performance on bench- mark texture datasets and also on a visual inspection dataset from industry. We also test our method on a scene recognition dataset.

Background / Problem Statement

  • Following its success, several pretrained CNN models complemented by specific modules to improve accuracy have been proposed [5,13,25] that achieve better performance on benchmark datasets such as Flickr Material Dataset (FMD) [23] and Describable Texture Dataset (DTD) [3] . However, since the methods proposed in [4,25] contain multiple steps such as feature extraction, orderless encoding and dimension reduction, the advantages offered by end-to-end learning are not fully utilized. Moreover, the features extracted by all of these methods [4,5,25] have high dimensions resulting in operation on large matrices.
  • There is a need to balance orderless feature and ordered spatial information for effective texture recognition [33] . From feature visualization experiments, we see that pretrained CNN features are able to differentiate textures only to a certain ex- tent. Hence, we propose to use the pretrained CNN features as the compact dictionary. Since the pretrained CNN features mainly focus on the extraction of spatial sensitive information, we implement the hard assignment based on the spatial locations during the calculation of the residuals. Then, in order to get an orderless feature, we propose an aggregation module to remove the spatial sensitive information.
  • The challenge is to make the loss function differentiable with respect to the inputs and layer parameters.

All in one word: for better transferring the deep-learning method into texture recognition. Since model in this field always with pretrained in large dataset (such as ImageNet).

Method(s)

在这里插入图片描述

Unlike Deep TEN [34] , which removed the spatial sensitive information at the beginning itself, we retain the spatial sensitive information until the aggregation module to achieve the balance of orderless features and ordered spatial information.

Our proposed residual encoding module is motivated by Deep TEN [34] , but there are two main differences. The first one is that in the convolutional transfer module, we retain the spatial information in the pretrained convolutional features as the dictionary and apply hard assignment based on the spatial location. Deep TEN removes the spatial information in their residual encoding layer. The second one is that the aggregation module yields the final orderless feature whose dimension equals the number of channels in the current pretrained convolutional layer without a separate dimension reduction step. This avoids the potential risk of over- fitting and extra computation.

Due to the using of global average pool, this method results in a lower D D

deep residual learning for image recognition是一种用于图像识别的深度残差学习方法。该方法通过引入残差块(residual block)来构建深度神经网络,以解决深度网络训练过程中的梯度消失和梯度爆炸等问题。 在传统的深度学习网络中,网络层数增加时,随之带来的问题是梯度消失和梯度爆炸。这意味着在网络中进行反向传播时,梯度会变得非常小或非常大,导致网络训练变得困难。deep residual learning则使用了残差连接(residual connection)来解决这一问题。 在残差块中,输入特征图被直接连接到输出特征图上,从而允许网络直接学习输入与输出之间的残差。这样一来,即使网络层数增加,也可以保持梯度相对稳定,加速网络训练的过程。另外,通过残差连接,网络也可以更好地捕获图像中的细节和不同尺度的特征。 使用deep residual learning方法进行图像识别时,我们可以通过在网络中堆叠多个残差块来增加网络的深度。这样,网络可以更好地提取图像中的特征,并在训练过程中学习到更复杂的表示。通过大规模图像数据训练,deep residual learning可以在很多图像识别任务中达到甚至超过人类表现的准确性。 总之,deep residual learning for image recognition是一种利用残差连接解决梯度消失和梯度爆炸问题的深度学习方法,通过增加网络深度并利用残差学习,在图像识别任务中获得了突破性的表现。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值