PPTs

1、2019.4.9    Shape Robust Text Detection with Progressive Scale Expansion Network(即PSENet)

     单位: DeepInsight@PCALab, Nanjing University of Science and Technology         National Key Lab for Novel Software Technology, Nanjing University

文本检测的现有困难:(基于四边形边框回归的方法----无法很好地定位任意形状的文本,基于语义分割的方法------无法很好地区分靠的很近的文本实例):The challenges of shape robust text detection lie in two aspects: 1) most existing quadrangular bounding box based detectors are difficult to locate texts with arbitrary shapes, which are hard to be enclosed perfectly in a rectangle; 2) most
pixel-wise segmentation-based detectors may not separate the text instances thatare very close to each other.

The detection results of different methods,  (a) is the original image. (b)bounding box regression-based method;(c) semantic
segmentation-based method; (d) is the result of our proposed PSENet.

 

本文的解决办法:propose a novel Progressive Scale Expansion Network (PSENet), a segmentation-based
detector with multiple predictions for each text instance--------------基于分割的检测器(实际上是 instance segmentation network)。These predictions correspond to different “kernels”(这个kernel不是卷积核的意思) produced by shrinking the original text instance into various scales.

Due to the fact that there are large geometrical margins among these minimal kernels, our method is effective to distinguish the adjacent text instances and is robust to arbitrary shapes(因为基于分割).

效果:state-of-the-art results on ICDAR 2015 and ICDAR 2017 MLT, PSENet outperforms the previous best record by absolute 6.37% on the curve text dataset SCUT-CTW1500(曲型文本数据集)。

简介:. Based on bounding box regression, to locate the text targets in forms of rectangles or quadrangles with certain orientations.-----------缺点是 cannot detect the text instances with arbitrary shapes;

semantic segmentation-based methods can explicitly handle the curve text detection problems. ---------------缺点: it may still fail to separate two text instances when they are relatively close, because their shared adjacent boundaries will probably merge them together as one single text instance.

优点--2个:There are two advantages of the proposed PSENet:

1、Firstly, as a segmentation-based method, PSENet is able to locate texts with arbitrary shapes.(基于语义分割能识别任意形状文本)

2、Secondly, we put forward a progressive scale expansion algorithm, with which the closely adjacent text instances can be identified successfully(提出的PSE算法能很好地处理一般语义分割中不能区分相邻文本的问题)

具体方法:

1)assign each text instance with multiple predicted segmentation areas如下图 and denote these segmentation areas as “kernels” (kernel是分割区域) and for one text instance, there are several corresponding kernels.

kernels的特点:

1)Each of the kernels shares the similar shape with the original entire text instance,

2)they all locate at the same central point but differ in scales.

获得最终检测结果的方法:To obtain the final detections, we adopt the progressive scale expansion algorithm. It is based on Breadth-First-Search (BFS) 

progressive scale expansion algorithm(渐进尺寸扩展算法的步骤)is composed of 3 steps:

1) starting from the kernels with minimal scales (instances can be distinguished in this step);

2) expanding their areas by involving more pixels in larger kernels gradually;

3) finishing until the largest kernels are explored.

The motivations of the progressive scale expansion:

1)minimal kernels are quite easy to be separated(as their boundaries are far away from each other.)----克服了之前的基于语义分割的方法的缺点。

2) the largest kernels are indispensable for achieving the final precise detections(所以要扩展到最大kernel)

3)he kernels are gradually growing from small to large scales, and thus the smoonth surpervisions would make the networks much easier to learn;

4)the PSE algorithm ensures the accurate locations of text instances as their boundaries are expanded in a careful and gradual manner.

contributions贡献:

1)propose a progressive scale expansion algorithm which is able to accurately separate the text instances standing closely to each other.

2)propose a novel Progressive Scale Expansion Network (PSENet) which can precisely detect text instances with arbitrary shapes.

3)PSENet significantly surpasses the state-of-the-art methods on the curve text detection dataset SCUT-CTW1500. Furthermore, it also achieves competitive results on the regular quadrangular text benchmarks: ICDAR 2015 and ICDAR 2017 MLT.

文本检测的发展史: horizontal text detection--》oriented text detection--》irregular quadrangles text detection(可以使用角点检测的方法,通过检测角点来得到不规则多边形)-->curved text detection. 即from horizontal rectangle to rotated rectangle and further to irregular quadrangle,and curve text detection.

 

 

整个检测流程图overall pipeline:

The left part is implemented from FPN [16]. The right part denotes the feature fusion and the progressive scale expansion algorithm.  feature fusion by concate. 4个地方进行low-level feature maps with high-level feature maps 融合之后(通过add方法),然后进一步有三个经过上采样之后concanate得到F(以获得多种尺度的感受野,Intuitively, such fusion is very likely to facilitate the generations of the kernels with various scales.),然后F被映射到N个分支生成N个分割图S1,S2,,...,Sn(尺度从小到大,and Sn denotes for the original segmentation mask),每个分割图的尺度由超参数hyper-parameters决定。从S1扩展到Sn来获得最终的检测R(S1, to their complete shapes in Sn, and obtain the final detection results as R.)

 

Progressive Scale Expansion Algorithm

Its main  idea is brought from the Breadth-First-Search (BFS) algorithm. 例如有三个分割图的S = {S1; S2; S3},

渐进尺度扩展算法的流程是:

1)基于最小的kernels map (分割区域图)S1,生成4个不同的连接组件(z这是所有文本实例的中心部分),在图(b)中用四种不同颜色表示。

2)通过依次合并S2,S3中的像素来将S1扩展

3)提取(d)中的连接组件作为最终的检测结果。

 scale expansion is illustrated in Fig. 3 (g). 扩展是基于BreadthFirst-Search algorithm,在扩展的时候可能存在冲突的像素,如(g)中的2。处理冲突的像素的原则是:冲突的像素只被一个kernel合并,并且采用先到先得的策略。

 

Label Generation

由于 网络 produces segmentation results (e.g. S1; S2; :::; Sn) with different kernel scales. 所以训练的时候需要不同kernel scales对应的ground truths. 这些标签可以通过简单地 shrinking the original text instance。为了在图4(c)中依次获得收缩掩模,使用
Vatti裁剪算法(Vatti clipping algorithm)[28]将原始多边形pn沿所有边向内缩小di像素并得到缩小的多边形pi(见图4(a))。 随后,每个收缩的多边形pi被转换成0/1二元掩模以用作分割标签的ground truth。我们将这些groungd truth图表示为G1;G2;:::; 分别是Gn。 在数学上,如果我们将比例视为ri,则pn和pi之间的边际di可以计算为:

         

其中Area(.)和Perimeter()是计算多边形面积和周长的函数。m,n分别是最小缩放率和分割图的个数。

 

Loss Function

L = λLc + (1 − λ)Ls;

其中Lc和Ls分别代表完整文本实例和缩小的损失,并且λ平衡Lc和Ls之间的重要性。

we adopt Online Hard Example Mining (OHEM) 来更好地区分类似文本的物体。

有个问题就是,水平翻转和剪裁之后,标签怎么对应??

 

实验结果

 

 

 

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值