1. Motivation
-
Despite its practical and scientific importance, line segment detection remains an unsolved problem in computer vision.
-
Deep learning techniques still consist of heuristics-guided modules such as edge/junction/region detection, line grouping, and post-processing, limiting the scope of their performance enhancement and further development.
-
Can we directly model all the vectorized line segments with a neural network? We are motivated by the fol- lowing observations for the Transformer frameworks.
-
LETR is built on top of a seminal work, DEtection TRansformer (DETR).
2. Contribution
实现了E2E的line segment detection,去除了许多启发式处理操作,再2个benchmark上实现SOTA。
-
We cast the line segment detection problem in a joint end-to-end fashion without explicit edge/junction/region detection and heuristics-guided perceptual grouping processes, which is in distinction to the existing literature in this domain. We achieve state-of-the-art results on the Wireframe [15] and YorkUrban benchmarks [5].
-
We perform line segment detection using Transformers, based specifically on DETR [4], to realize tokenized entity modeling, perceptual grouping, and joint detection via an integrated encoder-decoder, a self-attention mechanism, and joint query inference within Transformers.
2个创新点,多尺度encoder-deocder以及专门用于检测线段的direct endpoint distance loss。
-
We introduce two new algorithmic aspects to DETR [4]: first, a multi-scale encoder/decoder strategy as shown in Figure 2; second, a direct endpoint distance loss term in training, allowing geometric structures like line segments to be directly learned and detected — something not feasible in the standard DETR bounding box representations.
3. Method
3.1 OverView
-
However, the vanilla DETR still focuses on bounding box representation with a GIoU loss
-
We further convert the box predictor in DETR into a vectorized line segment predictor by adapting the losses and enhancing the use of multi-scale features in our designed model.
LETR整个过程包括以下4个阶段:
-
Image Feature Extraction
-
Image Feature Encoding
-
Line Segment Detection
- l ∈ R N × C l \in R^{N \times C} l∈RN×C
-
Line Segment Prediction
- 分为2个prediction head,其中一个是line coordinate分支,预测位置;另一个是prediction confidence分支。
Self-Attention and Cross-Attention
LETR是一种基于Transformer的端到端线段检测方法,摒弃了传统的边缘检测等启发式模块。通过多尺度编码器/解码器策略和直接端点距离损失,LETR在Wireframe和YorkUrban基准测试中实现最先进的性能。其方法包括图像特征提取、线段检测和预测,利用自注意力和交叉注意力机制进行线段建模和检测。
最低0.47元/天 解锁文章
1824

被折叠的 条评论
为什么被折叠?



