【Paper Reading】【TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes】

最新推荐文章于 2020-04-23 17:45:32 发布

surfman777

最新推荐文章于 2020-04-23 17:45:32 发布

阅读量183

点赞数

分类专栏：场景文本检测

本文链接：https://blog.csdn.net/Ocelot777/article/details/104730167

版权

5 篇文章 0 订阅

订阅专栏

用一系列相互重叠的disk表示文本序列，每个disk以文本行中心线为圆心，带有半径和方向。半径r为局部宽度的1/2。方向θ为中心线与中心c的正切方向。
disk并未与character一一对应
网络输出
- score map of text center line（TCL）
- score map of text regions（TR）
- 半径r，sinθ，cosθ
- TR可用作TCL的mask
网络结构
- backbone：VGG16/19 or ResNet without FC layers
- feature merging network（UNet）
- outputs：
  - 7 channels，4 for logits of TR/TCL，3 for r，cosθ，sinθ。
  - softmax for TR/TCL and regularizing cosθ and sinθ
  - striding algorithm
    a）Centralizing：从TCL中随机选取一个起始点，作切线(tangent line)和法线(normal line)。法线与TCL区域交集的重点即为所求的centralized point
    b）String：向两个相反的方向进行搜索，直至端点
    c）Sliding：算法沿中心轴迭代，画圆圈
  - filter out false positive text instances：
    a) TCL像素的数目至少应为平均半径的0.2倍
    b) 重构文本区域的像素，至少有一半属于TR
  - label generation
  - training objectives
  - Ltr与Ltcl为交叉熵损失，TR loss使用OHNM，N:P=3:1；其余为smoothedL1 Loss。Lr = SmoothedL1((R - r) / r)

关注