论文解读以及翻译-语义分割

HIERARCHICAL MULTI-SCALE ATTENTION FOR SEMANTIC SEGMENTATION
层次多尺度注意力机制的语义分割
ABSTRACT
摘要
Multi-scale inference is commonly used to improve the results of semantic segmentation。
多尺度推理是提高语义分割的常见的方式。
Multiple images scales are passed through a network and then the results are combined with averaging or max pooling
多尺度图像进行前向传播,然后结果结合平均或者最大池化。
In this work, we present an attention-based approach to combining multi-scale predictions
在本研究中,我们提出了一种基于注意力方法与多尺度预测结合的方法
We show that predictions at certain scales are better at resolving particular failures modes,and that the network learns to favor those scales for such cases in order to generate better predictions.
我们发现这个预测在特定的尺度下是更好的解决特定的故障模式,网络对于那些例子,学会了偏好这些这些尺寸来产生更好的预测。
Our attention mechanism is hierarchical, which enables it to be roughly 4x more memory efficient to train than other recent approaches.
我们的注意力机制是分层次,这使得比其他的最近的方法大约高出4倍多的内存效率
In addition to enabling faster training, this allows us to train with larger crop sizes which leads to greater model accuracy.
另外是更快的训练,因此允许我们训练更大的裁剪尺寸,因此导致更大的模型准确率。
We demonstrate the result of our method on two datasets: Cityscapes and Mapillary Vistas
我们在两种数据集上(Cityscapes and Mapillary Vistas)验证我们的结果。
For Cityscapes, which has a large number of weakly labelled images, we also leverage auto-labelling to improve generalization.
对于Cityscapes数据集,有大量弱标记的图像,我们利用自动化标签来提高泛化性
Using our approach we achieve a new state-of-the-art results in both Mapillary (61.1 IOU val) and Cityscapes (85.1 IOU test).
用我们的方法我们实现了一个最先进的结果在Mapillary和Cityscapes数据集上
Keywords Semantic Segmentation · Attention · Auto-labelling
关键词 语义分割 注意力 自动标签
1 Introduction
1)第一段
1 引言
The task of semantic segmentation is to label all pixels within an image as belonging to one of N classes
语义分割的任务是标记一张图像中所有像素属于N个类别中的一种
There is a trade off in this task in that certain types of predictions are best handled at lower inference resolution and other tasks better handled at higher inference resolution
在这个任务中存在权衡,即某些类型的预测在较低的推理分辨率下处理的最好,而其他在较高的推理分辨率下处理的更好
Fine detail, such as the edges of objects or thin structures, is often better predicted with scaled up images sizes.
精细的细节,例如物体边缘和细的结构,通常是放大图像的尺寸更好的预测
And at the same time, predictions of large structures, which requires more global context, is often done better at scaled down image sizes, because the network’s receptive field can observe more of the necessary context
与此同时,预测大的结构,需要更多的全局背景,通常是通过缩小图像尺寸能够做的更好,因为网络的感受野能够获取更多的必要背景
We refer to this latter issue as class confusion. Examples of both of these cases are presented in
Figure 1.
我们把后面的问题成为类混淆,在图一中给出了这两种例子
2)第二段
Using multi-scale inference is a common practice to address this trade off
用多尺度的推理是一种常见的推理来解决这种利弊
Predictions are done at a range of scales, and the results are combined with averaging or max pooling.
预测是在不同的尺度上,结果是与平均和最大池化结合起来
Using averaging to combine multiple scales generally improves results, but it suffers the problem of combining the best predictions with poorer ones
使用平均法结合多个尺度一般能够提高结果,但是它有一个问题就是把最好的预测和较差的预测结合起来了
For example, if for a given pixel, the best prediction comes from the 2x scale, and a much worse prediction comes from the 0.5x scale, then averaging will combine these predictions, resulting in sub-par output. Max-pooling, on the other hand, selects only one of N scales to use for a given pixel, while the optimal answer may be a weighted combination across the different scales of predictions.
例如,假如给定一个像素,这个最好的预测是来源于2x规模,更坏的预测来源于0.5x规模,然后平均这个预测,导致低于标准的输出,最大池化,另一方面,对于给定的像素,仅仅选择N个尺度中的一个,而这个最佳答案也许是结合不同的尺度的预测的权重组合
To address this problem, we adopt an attention mechanism to predict how to combine multi-scale predictions together at a pixel level, similar to the method proposed by Chen et. al. [1].
为了解决这个问题,我们采用一种注意力机制来预测在像素级怎样结合多尺度的预测,相似的方式被 Chen et. al.提出
We propose a hierarchical attention mechanism by which the network learns to predict a relative weighting between adjacent scales
我们提出一种层次注意力机制,通过网络学习预测相邻尺度的一个相对权重,
In our method, because of it’s hierarchical nature we only require to augment the training pipeline with one extra scale whereas other methods such as [1] require each additional inference scale to be explicitly added during the training phase
在我们的方法,因为它是层次性质,我们仅仅需要一个额外的规模来增加训练管道,而其他的方法在训练阶段需要额外显示的添加推理尺度
For example, when the target inference scales for multi-scale evaluation are {0.5, 1.0 and 2.0}, other attention methods require the network to first be trained with all of those scales, resulting in 4.25x (0.5
xx2 + 2.0xx2) extra training cost.
例如,当多尺度评价的目标推理尺度为{0.5、1.0和2.0}时,其他注意方法要求网络首先使用所有这些尺度进行训练,结果为4.25x额外的训练代价
Our method only requires adding an extra 0.5x scale during training, which only adds 0.25x (0.5xx2) cost
我们的方法只需要在训练期间添加额外的0.5x尺度,这只需要增加0.25x 代价
.Furthermore, our proposed hierarchical mechanism also provides the flexibility of choosing extra scales at inference time as compared to previous proposed methods that are limited to only use training scales during inference
此外,我们提出的分级机制也提供了在推理时选择额外尺度的灵活性,而以往提出的方法仅限于在推理时使用训练尺度
To achieve state-of-the-art results in Cityscapes, we also adopt an auto-labelling strategy of coarse images in order to increase the variance in the dataset, thereby improving generalization. Our strategy is motivated by multiple recent works, including [2, 3, 4]. As opposed to the typical soft-labelling strategy, we adopt hard labelling in order to manage label storage size, which helps to improve training throughput by lowering the disk IO cost.
为了在城市景观中获得最先进的结果,我们还采用了粗糙图像的自动标记策略,以增加数据集中的方差,从而改善泛化。我们的策略受到最近多项工作的推动,包括[2,3,4]。与典型的软标签策略不同,我们采用硬标签来管理标签存储大小,这有助于通过降低磁盘IO成本来提高培训吞吐量

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值