文本检测 -- Differentiable Binarization

最新推荐文章于 2024-06-06 10:03:44 发布

Fiona-Dong

最新推荐文章于 2024-06-06 10:03:44 发布

阅读量333

点赞数

分类专栏：博客/文章

原文链接：https://arxiv.org/abs/1911.08947

版权

博客/文章专栏收录该内容

6 篇文章 0 订阅

订阅专栏

1. Abstract

Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text.

However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text.

In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network.

Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection.

Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. I

In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency.

Specifically, with a backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset.

2. Introduction

As a key component of scene text reading, scene text detection that aims to localize the bounding box or region of each text instance is still a challenging task, since scene text is often with various scales and shapes, including horizontal, multi-oriented and curved text.

Segmentation- based scene text detection has attracted a lot of attention recently, as it can describe the text of various shapes, benefiting from its prediction results at the pixel-level.

However, most segmentation-based methods require complex post-processing for grouping the pixel-level prediction results into detected text instances, resulting in a considerable time cost in the inference procedure.

Take two recent state- of-the-art methods for scene text detection as examples:

PSENet (Wang et al. 2019a) proposed the post-processing of progressive scale expansion for improving the detection accuracies;

Pixel embedding in (Tian et al. 2019) is used for clustering the pixels based on the segmentation results, which has to calculate the feature distances among pixels.

Most existing detection methods use the similar post- processing pipeline as shown in Fig. 2:

Firstly, they set a fixed threshold for converting the probability map produced by a segmentation network into a binary image;

Then, some heuristic techniques like pixel clustering are used for grouping pixels into text instances.

Alternatively, our pipeline aims to insert the binarization operation into a segmentation network for joint optimization.

In this manner, the threshold value at every place of an image can be adaptively predicted, which can fully distinguish the pixels from the foreground and background.

However, the standard binarization function is not differentiable, we instead present an approximate function for binarization called Differentiable Binarization (DB), which is fully differentiable when training it along with a segmentation network.

The major contribution in this paper is the proposed DB module that is differentiable, which makes the process of binarization end-to-end trainable in a CNN.

By combining a simple network for semantic segmentation and the proposed DB module, we proposed a robust and fast scene text detector.

Observed from the performance evaluation of us- ing the DB module, we discover that our detector has several prominent advantages over the previous state-of-the-art segmentation-based approaches.

Our method achieves consistently better performances on five benchmark datasets of scene text, including horizon- tal, multi-oriented and curved text.
Our method performs much faster than the previous lead- ing methods, as DB can provide a highly robust binariza- tion map, significantly simplifying the post-processing.
DB works quite well when using a light-weight backbone, which significantly enhances the detection performance with the backbone of ResNet-18.
AsDBcanberemovedintheinferencestagewithoutsac- rificing the performance, there is no extra memory/time cost for testing.

3. Methodology

The architecture of our proposed method is shown in Fig. 3:

Firstly, the input image is fed into a feature-pyramid backbone.

Secondly, the pyramid features are up-sampled to the same scale and cascaded to produce feature F.

Then, feature F is used to predict both the probability map (P ) and the threshold map (T).

After that, the approximate binary map (Bˆ ) is calculated by P and F .

In the training period, the supervision is applied on the probability map, the threshold map, and the approximate binary map, where the probability map and the approximate binary map share the same supervision.

In the inference period, the bounding boxes can be obtained easily from the approximate binary map or the probability map by a box formulation module.

参考文献

Real-time Scene Text Detection with Differentiable Binarization

Fiona-Dong

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
文本检测 -- Differentiable Binarization

1. AbstractRecently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text.However, the post-processing of binarization is essential
复制链接

扫一扫

专栏目录