google brain_SpineNet：Google Brain的非常规骨干架构

最新推荐文章于 2023-04-23 18:34:18 发布

weixin_26632369

最新推荐文章于 2023-04-23 18:34:18 发布

阅读量271

点赞数

文章标签： python java

原文链接：https://medium.com/visionwizard/spinenet-an-unconventional-backbone-architecture-from-google-brain-d78d669fdd69

版权

google brain

The problem of classification has been solved quite efficiently with the encoder-decoder architectures having the decreasing resolution scales in the encoder part. However, this architecture fails to efficiently generate strong multi-scale features required for the purpose of object detection (simultaneous recognition and localization).

使用编码器部分中分辨率等级降低的编码器-解码器体系结构已经非常有效地解决了分类问题。但是，该体系结构无法有效地生成用于对象检测(同时识别和定位)所需的强大的多尺度功能。

SpineNet与以前的主干网有何不同？ (How is SpineNet different than previous backbones?)

“High resolution may be needed to detect the presence of a feature, while its exact position need not to be determined with equally high precision.” [1]

“可能需要高分辨率才能检测特征的存在，而不必以同样高的精度确定其确切位置。” [1]

骨干缩小的缺点 (The Drawback of Scale-Decreased Backbone)

Normally, a backbone model refers to the scale-decreased network in the encoder-decoder architecture i.e. the encoder.
通常，骨干模型是指编码器-解码器体系结构中的比例减小网络，即编码器。
Since the task of the encoder is to compute feature representations from the input, a scale-decreased backbone will not be able to hold the spatial information.
由于编码器的任务是从输入中计算特征表示，因此比例减小的主干将无法保存空间信息。
As the layers get deeper, features will become more abstract and less localized hence making it difficult for the decoder to retrieve the exact required features.
随着层变得更深，特征将变得更加抽象和局部化，因此使解码器难以检索确切所需的特征。

拟议的创新 (Proposed Innovation)

In order to overcome the difficulty of obtaining and retrieving multi-scale features for localization, a scale-permuted model with cross-scale connections is introduced with the following improvements:

为了克服获取和获取用于局部化的多尺度特征的困难，引入了具有跨尺度连接的尺度置换模型，并进行了以下改进：

The scales of feature maps are given the flexibility to increase or decrease at any time in the architecture by means of permuting blocks as opposed to the earlier pattern of strictly decreasing. This will enable maintenance of the spatial information.
与早期严格减少的模式相反，借助置换块，可以在体系结构中随时随地增加或减少特征图的比例。这将能够维护空间信息。
The connections of feature maps are allowed to go across feature scales in order to perform feature fusion from multiple scales.
要素地图的连接被允许跨越要素比例，以便从多个比例执行要素融合 。

Image for post — Figure 1: An example of a scale-decreased network (left) vs. scale permuted network (right). The width denotes the resolution and height denotes the feature dimension (number of channels). [Source: [1]]

方法与架构 (The Methodology and Architecture)

神经架构搜索(NAS) (Neural Architecture Search (NAS))

The methodology of choosing the proposed architecture of SpineNet has been employed by using NAS [1].
使用NAS [1]来选择SpineNet的建议架构的方法。
NAS uses the reinforcement learning controller. It proposes various architectures and those are sent to the environment in which they are trained fully.
NAS使用强化学习控制器。它提出了各种架构，并将它们发送到经过全面培训的环境中。
The output accuracy will act as a reward and the decision to choose the architecture will depend on it.
输出精度将作为奖励，选择架构的决定将取决于此。

The SpineNet architecture possesses a fixed stem network (scale-decreased network) followed by a learned scale-permuted network. The search space of the NAS to build the scale permuted network comprises of scale permutations, cross-scale connections, and block adjustments.

SpineNet体系结构具有固定的主干网络(比例减小的网络)，其后是学习的比例排列的网络。 NAS的用于构建比例尺排列网络的搜索空间包括比例尺排列，跨比例尺连接和块调整。

Scale permutations: A block can only connect to its parent block which has lower orderings, so the ordering of blocks is important. Here, the permutations occur for the intermediate and output blocks.
比例排列 ：块只能连接到其较低顺序的父块，因此块的顺序很重要。在此，中间块和输出块发生置换。
Cross-scale connections: For each block in the search space, two input connections are defined.
跨标度连接：为搜索空间中的每个块定义两个输入连接。
Block adjustments: Each of the blocks can adjust their scale-level and type. Scale levels for intermediate blocks can range like {−1,0,1,2} and the block types could be either bottleneck block or a residual block.
块调整 ：每个块都可以调整其缩放级别和类型。中间块的缩放级别可以在{−1,0,1,2}范围内，并且块类型可以是瓶颈块或残差块 。

跨尺度连接中的重采样 (Resampling in Cross-Scale Connections)

While performing cross-scale connections, a challenge is faced when the cross-scale features having different resolution and feature dimensions among the parent and target blocks are to be fused.
在执行跨尺度连接时，当要融合父块和目标块之间具有不同分辨率和特征尺寸的跨尺度特征时，将面临挑战。
In order to do so, spatial and feature resampling is done to match the parameters to the target block.
为了这样做，进行空间和特征重采样以使参数与目标块匹配。
In resampling, the nearest-neighbor algorithm is used for upsampling whereas a stride 2 3×3 convolution performs down-sampling on the feature map to match the target resolution.
在重采样中，最近邻居算法用于上采样，而步幅2 3×3卷积在特征图上执行下采样以匹配目标分辨率。

For detailed understanding refer to section 3.2 of [1].

有关详细的理解，请参考[1]的3.2节。

ResNet从SpineNet架构演变而来 (Evolution of SpineNet Architecture from ResNet)

The scale-permuted model is formed by permuting the blocks of ResNet architecture.
通过置换ResNet体系结构的块，形成了比例置换模型。
For comparing the fully scale-decreased network with the scale-permuted network, a number of intermediate models are generated which gradually shifts the architecture to the scale-permuted form.
为了将完全按比例缩小的网络与按比例排列的网络进行比较，生成了许多中间模型，这些模型逐渐将体系结构转换为按比例排列的形式。

In the above figure, part (a) denotes the usage of ResNet-50 followed by a Feature Pyramid Network (FPN) output layer.
在上图中，部分(a)表示ResNet-50的用法，其后是特征金字塔网络(FPN)输出层。
In part (b), 7 blocks are part of ResNet and 10 blocks are utilized for the scale-permuted network.
在(b)部分中，ResNet的一部分是7个块，比例缩放网络使用了10个块。
In part c, all blocks are part of the scale-permuted network and in (d) SpineNet-49 is introduced with the highest AP score of 40.8% requiring 10% fewer FLOPs (85.4B vs. 95.2B).
在c部分中，所有块都是规模排列网络的一部分，(d)中引入了SpineNet-49，其AP最高分数为40.8％，所需FLOP减少了10％(85.4B与95.2B)。

拟议的SpineNet体系结构 (Proposed SpineNet Architectures)

Based on the SpineNet-49 architecture derived in figure 4 (d), four more architectures are constructed in the SpineNet family.

基于图4(d)中得出的SpineNet-49体系结构，在SpineNet系列中还构建了另外四种体系结构。

SpineNet-49S has the same architecture as SpineNet-49 with the feature dimensions scaled down by a factor of 0.65.
SpineNet-49S具有与SpineNet-49相同的体系结构，其特征尺寸缩小了0.65倍。
SpineNet-96 architecture repeats all the blocks two times so the model size is double than SpineNet-49.
SpineNet-96架构重复所有块两次，因此模型大小是SpineNet-49的两倍。
SpineNet-143 repeats each block three times and the scaling factor in resampling operation is kept to 1.0.
SpineNet-143将每个块重复三次，并将重采样操作中的缩放因子保持为1.0。
SpineNet-190 repeats each block four times with the scaling factor = 1.3 to further scale up the feature dimension.
SpineNet-190使用缩放因子= 1.3重复每个块四次，以进一步放大特征尺寸。

比较结果 (Comparative Results)

The experiments are conducted for object detection as well as for the task of image classification to demonstrate the versatility of the proposed architecture.

进行了对象检测以及图像分类任务的实验，以证明所提出体系结构的多功能性。

物体检测 (Object Detection)

The ResNet-FPN backbone model is replaced with the RetinaNet detector for the task of object detection. The model is evaluated on the COCO test-dev dataset and is trained on the train2017 split.

将ResNet-FPN主干模型替换为RetinaNet检测器，以执行对象检测任务。该模型在COCO test-dev数据集上进行评估，并在train2017拆分中进行训练。

The following results (Figure 6) demonstrate that SpineNet models outperform other popular detectors by large margins. The largest SpineNet-190 achieves the highest 52.1% AP. Generally, SpineNet architectures require a fewer number of FLOPs and a lesser number of parameters making the models computationally less expensive.
以下结果(图6)证明SpineNet模型在很大程度上优于其他流行的检测器。最大的SpineNet-190达到最高的AP达到52.1％ 。通常，SpineNet体系结构需要较少的FLOP和较少数量的参数，从而使模型的计算成本降低。

The following results (figure 7) on COCO val2017 demonstrate that SpineNet-49 requires ~10% lesser FLOPs and AP has improved to 40.8 as opposed to 37.8 in R50-FPN.
以下关于COCO val2017的结果(图7)表明，SpineNet-49所需的FLOP减少了约10％，并且AP已提高到40.8 ，而R50-FPN则为37.8。

RetinaNet model adopting SpineNet backbones achieves a higher AP score with considerably less number of FLOPs as compared to ResNet-FPN and NAS-FPN backbones (figure 8).
与ResNet-FPN和NAS-FPN骨干相比，采用SpineNet骨干的RetinaNet模型获得了更高的AP分数，并且FLOP数量更少(图8)。

影像分类 (Image Classification)

SpineNet is trained on two datasets- ImageNet ILSVRC-2012 and iNaturalist-2017 for the purpose of image classification.

SpineNet在两个数据集上进行了训练-ImageNet ILSVRC-2012和iNaturalist-2017，用于图像分类。

On ImageNet, the Top-1% and Top-5% accuracy are at par with ResNet and in addition to that, the number of FLOPs is considerably reduced.
在ImageNet上，Top-1％和Top-5％的精度与ResNet相当，此外，FLOP的数量大大减少了。
On iNaturalist, ResNet is outperformed by SpineNet with a large margin of 5% along with a reduction in FLOPs.
在iNaturalist上，ResNet优于SpineNet，利润率高达5％，并且FLOP减少了。

The above results demonstrate that SpineNet not only works better for object detection but also proves to be versatile enough for other visual learning tasks like image classification.

以上结果表明，SpineNet不仅可以更好地用于物体检测，而且对于其他视觉学习任务(如图像分类)具有足够的通用性。

标度排列和跨标度连接的重要性 (Importance of Scale-Permutation and Cross-Scale Connections)

According to [1], two popular architecture shapes-Fish and Hourglass are chosen in encoder-decoder networks to compare with the R0-SP53 proposed model. Cross-connections in all models are learned using NAS.

根据[1] ，在编码器-解码器网络中选择了两种流行的架构形状： Fish和Hourglass与R0-SP53提出的模型进行比较。使用NAS了解所有模型中的交叉连接。

标度排列 (Scale-Permutation)

The insight derived is that jointly learning scale-permutations and cross-scale connections (R0-SP53) prove to be beneficial instead of only learning connections on a fixed architecture/fixed block orderings (Hourglass and Fish).
得出的见解是，联合学习比例排列和跨比例连接(R0-SP53)被证明是有益的，而不是仅学习固定体系结构/固定块顺序(沙漏和鱼)上的连接。
The AP score is higher (40.7%) in the case of proposed model R0-SP53.
在建议的型号R0-SP53的情况下，AP得分较高(40.7％) 。

跨尺度连接 (Cross-Scale Connections)

The method employed to study the importance of cross-scale connections is graph damage.
研究跨尺度连接的重要性的方法是图形损伤。
The cross-scale connections are damaged in three ways - removing short connections, removing long connections, and removing both the connections.
跨刻度连接以三种方式损坏-删除短连接，删除长连接以及删除两个连接。
Results show that the AP score is severely affected in case (2) and (3). The reason is long-range connections can effectively handle frequent resolution changes so damaging those will hurt the overall accuracy more.
结果表明，在情况(2)和(3)中，AP分数受到严重影响。原因是远程连接可以有效地处理频繁的分辨率更改，因此损坏这些范围将更损害整体精度。

For detailed implementation and experimentation details, refer to the section 5 of [1].

有关详细的实现和实验细节，请参阅[1]的第5节。

最终见解 (Final Insights)

In [1], a new meta-architecture, a scale-permuted model is proposed to effectively solve the task of simultaneous object recognition and localization which earlier could not be solved effectively using a scale-decreased backbone.
在[1]中 ，提出了一种新的元体系结构，一种比例尺排列模型，以有效地解决同时对象识别和定位的任务，而先前使用比例尺减小的骨架无法有效地解决该问题。
Neural Architecture Search (NAS) is used to obtain SpineNet-49 architecture. Furthermore, by increasing the model depth, four more architectures are produced which are more robust.
神经体系结构搜索(NAS)用于获取SpineNet-49体系结构。此外，通过增加模型深度，可以生产出四个更坚固的体系结构。
SpineNet is evaluated for the object detection task using the COCO test-dev set and it achieves a 52.1% AP which is higher than existing state-of-the-art detectors.
使用COCO测试开发集对SpineNet进行了对象检测任务评估，它达到了52.1％的AP，这比现有的最新检测器要高。
SpineNet is also successful in getting comparable and improved Top-1% accuracy on the image-classification task by using ImageNet and iNaturalist dataset respectively.
通过分别使用ImageNet和iNaturalist数据集，SpineNet还成功地在图像分类任务上获得了可比的并提高了Top-1％的准确性。
In summary, higher accuracy is achieved with less compute and approximately the same number of parameters by using the new architecture.
总之，通过使用新的体系结构，可以用更少的计算量和大约相同数量的参数实现更高的精度。