[LETR]Line Segment Detection Using Transformers without Edges(CVPR.2021 oral)

LETR是一种基于Transformer的端到端线段检测方法,摒弃了传统的边缘检测等启发式模块。通过多尺度编码器/解码器策略和直接端点距离损失,LETR在Wireframe和YorkUrban基准测试中实现最先进的性能。其方法包括图像特征提取、线段检测和预测,利用自注意力和交叉注意力机制进行线段建模和检测。
摘要由CSDN通过智能技术生成
image-20210722111911358

1. Motivation

  • Despite its practical and scientific importance, line segment detection remains an unsolved problem in computer vision.

  • Deep learning techniques still consist of heuristics-guided modules such as edge/junction/region detection, line grouping, and post-processing, limiting the scope of their performance enhancement and further development.

  • Can we directly model all the vectorized line segments with a neural network? We are motivated by the fol- lowing observations for the Transformer frameworks.

  • LETR is built on top of a seminal work, DEtection TRansformer (DETR).

image-20210722111911358

2. Contribution

​ 实现了E2E的line segment detection,去除了许多启发式处理操作,再2个benchmark上实现SOTA。

  • We cast the line segment detection problem in a joint end-to-end fashion without explicit edge/junction/region detection and heuristics-guided perceptual grouping processes, which is in distinction to the existing literature in this domain. We achieve state-of-the-art results on the Wireframe [15] and YorkUrban benchmarks [5].

  • We perform line segment detection using Transformers, based specifically on DETR [4], to realize tokenized entity modeling, perceptual grouping, and joint detection via an integrated encoder-decoder, a self-attention mechanism, and joint query inference within Transformers.

    2个创新点,多尺度encoder-deocder以及专门用于检测线段的direct endpoint distance loss。

  • We introduce two new algorithmic aspects to DETR [4]: first, a multi-scale encoder/decoder strategy as shown in Figure 2; second, a direct endpoint distance loss term in training, allowing geometric structures like line segments to be directly learned and detected — something not feasible in the standard DETR bounding box representations.

3. Method

3.1 OverView

image-20210722112646585
  • However, the vanilla DETR still focuses on bounding box representation with a GIoU loss

  • We further convert the box predictor in DETR into a vectorized line segment predictor by adapting the losses and enhancing the use of multi-scale features in our designed model.

LETR整个过程包括以下4个阶段:

  • Image Feature Extraction

  • Image Feature Encoding

  • Line Segment Detection

    • l ∈ R N × C l \in R^{N \times C} lRN×C
  • Line Segment Prediction

    • 分为2个prediction head,其中一个是line coordinate分支,预测位置;另一个是prediction confidence分支。

Self-Attention and Cross-Attention

image-20210722115035440
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值