Arbitrary-Oriented Scene Text Detection via Rotation Proposals

最新推荐文章于 2021-08-09 10:05:00 发布

Monte0539

最新推荐文章于 2021-08-09 10:05:00 发布

阅读量650

点赞数

分类专栏：深度学习

深度学习专栏收录该内容

11 篇文章 0 订阅

订阅专栏

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

论文地址：https://arxiv.org/abs/1703.01086

github地址：https://github.com/mjq11302010044/RRPN

该论文是基于faster-rcnn框架，在场景文字识别领域的应用。

创新点：生成带文字角度信息的倾斜的proposal

1.RRPN(Rotation Region Proposal Networks):生成带角度信息的anchor，从而生成任意方向的proposals.

2.RRoI(The Rotation Region-of-Interest) pooling layer:将任意方向的proposals映射到feature map上，再进行max pooling.

RRPN部分：

数据预处理：

groud truth of a text region:(x,y,h,w,θ)

其中,x,y为bounding box的几何中心；h为bounding box的短边，w为bounding box的长边；θ为bounding box长边旋转的角度，范围为

Anchors:

1.angle：-pi/6, 0, pi/6, pi/3, pi/2 以及2pi/3

规定bounding box的旋转范围为[3pi/4,-pi/4)，而anchor的旋转角度包括：-pi/6, 0, pi/6, pi/3, pi/2 以及2pi/3

要求每个anchor对应的target的旋转角度和anchor本身的旋转角度差不能超过pi/12，这称为fit domain。

因此，anchor对应的target的旋转角度范围如下：

anchor角度	target旋转角度范围
-pi/6	[-pi/4,-pi/12)
0	[-pi/12,pi/12)
pi/6	[pi/12,pi/4)
pi/3	[pi/4,5pi/12)
pi/2	[5pi/12,7pi/12)
2pi/3	[7pi/12,3pi/4)

可以看到，anchor与对应的target的角度差不超过pi/12.

2.aspect ration: 1:2, 1:5, 1:8

3.scale:8,16,32

对于feature map上面的每一点，生成的anchors的数量为6*3*3=54个。

除了增加了角度信息，生成anchor的方式与faster-rcnn类似。

在过滤的时候使用了 Scale Jittering策略。

步骤：

1.给图像增加大小为原边长0.25倍的border-padding。

2.利用给定的anchor(x,y,w,h,θ)信息，生成anchor的四个顶点坐标。

3.判断四个顶点坐标是否在新边界内。

IoU:

思想：先算出两个矩阵边的交点，然后生成一个多变形，最后将多边形分割成三角形进行计算。

伪代码如下：

RRoI部分：

RRoI Pooling算法：

１）将每一个proposal分成７×７的sub regions。

２）对每一个sub region的四个点，进行affine Transformation（仿射变换），得到对应的平行四边形。

３）对feature map中的平行四边形进行max pooling。

伪代码：

Affine Transformation（仿射变换）：

参考：https://www.cnblogs.com/ghj1976/p/5199086.html，思路很棒，但是其中列举的部分矩阵有错误，已在本文订正。

１）概念

仿射变换，就是允许图形任意倾斜，而且允许图形在两个方向上任意伸缩的变换。其可以保持原来的线共点，点共线的关系不变，保持原来相互平行的线仍然平行，保持原来的中点仍然是中点，保持原来在一直线上几段线段之间的比例关系不变。

但是，仿射变换不能保持原来的线段长度不变，也不能保持原来的夹角角度不变。

仿射变换可以用下面公式表示：

其中，(tx,ty)表示平移量，而参数ai则反映了图像旋转，缩放等变化。将参数tx,ty,ai(i=1~4)计算出，即可得到两幅图形的坐标变换关系。

２）RRPN中用到的变换举例：

a）平移变换（Translation)

将每一点移动到（x+tx, y+ty)，变换矩阵为：

平移变换不会产生形变。

效果：

b）旋转变换（Rotation）

目标图形围绕原点顺时针旋转θ弧度，变换矩阵为：

效果：

c) 缩放变换（scale)

将每一点的横坐标放大（缩小）至sx倍，纵坐标放大（缩小）至sy倍，变换矩阵为：

效果：

d）组合

目标图形以(x,y)为轴心顺时针旋转θ弧度，变换矩阵为：

相当于两次平移变换与一次原点旋转变换的组合，也就是先移动到中心节点，然后旋转，然后再移动回去。

平移与旋转的变换效果如下：

RRoI　Pooling 实现

相信有了前面affine Transformation（仿射变换）的铺垫，RRoI Pooling的关键代码相对而言可以较为容易的看懂。

/home/crediks/Downloads/RRPN/caffe-fast-rcnn/src/caffe/layers/roi_pooling_layer.cpp:

void RotateROIPoolingLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
const Dtype* bottom_data = bottom[0]->cpu_data();
const Dtype* bottom_rois = bottom[1]->cpu_data();
const Dtype* image_info = bottom[2]->cpu_data();
// Number of ROIs
int num_rois = bottom[1]->num();
int batch_size = bottom[0]->num();
int top_count = top[0]->count();
Dtype* top_data = top[0]->mutable_cpu_data();
caffe_set(top_count, Dtype(-FLT_MAX), top_data);
int* argmax_data = max_idx_.mutable_cpu_data();
caffe_set(top_count, -1, argmax_data);
int imageWidth = int(image_info[1]*spatial_scale_+0.5);
int imageHeight = int(image_info[0]*spatial_scale_+0.5);
// For each ROI R = [batch_index Cx Cy height width angle]: max pool over R
for (int n = 0; n < num_rois; ++n) {
// Points
int roi_batch_ind = bottom_rois[0];
CHECK_GE(roi_batch_ind, 0);
CHECK_LT(roi_batch_ind, batch_size);
Dtype cx = bottom_rois[1];
Dtype cy = bottom_rois[2];
Dtype h = bottom_rois[3];
Dtype w = bottom_rois[4];
Dtype angle = bottom_rois[5]/180.0*3.1415926535;
//TransformPrepare
Dtype dx = -pooled_width_/2.0;
Dtype dy = -pooled_height_/2.0;
//每一个sub region的大小
Dtype Sx = w*spatial_scale_/pooled_width_;
Dtype Sy = h*spatial_scale_/pooled_height_;
Dtype Alpha = cos(angle);
Dtype Beta = sin(angle);
Dtype Dx = cx*spatial_scale_;
Dtype Dy = cy*spatial_scale_;
Dtype M[2][3];
M[0][0] = Alpha*Sx;
M[0][1] = Beta*Sy;
M[0][2] = Alpha*Sx*dx+Beta*Sy*dy+Dx;
M[1][0] = -Beta*Sx;
M[1][1] = Alpha*Sy;
M[1][2] = -Beta*Sx*dx+Alpha*Sy*dy+Dy;
/*std::cout<<M[0][0]<<std::endl;
std::cout<<M[0][1]<<std::endl;
std::cout<<M[0][2]<<std::endl;
std::cout<<M[1][0]<<std::endl;
std::cout<<M[1][1]<<std::endl;
std::cout<<M[1][2]<<std::endl;
*/
const Dtype* batch_data = bottom_data + bottom[0]->offset(roi_batch_ind);
for (int c = 0; c < channels_; ++c) {
for (int ph = 0; ph < pooled_height_; ++ph) {
for (int pw = 0; pw < pooled_width_; ++pw) {
const int pool_index = ph * pooled_width_ + pw;
Dtype P[8];
P[0] = M[0][0]*pw+M[0][1]*ph+M[0][2];
P[1] = M[1][0]*pw+M[1][1]*ph+M[1][2];
P[2] = M[0][0]*pw+M[0][1]*(ph+1)+M[0][2];
P[3] = M[1][0]*pw+M[1][1]*(ph+1)+M[1][2];
P[4] = M[0][0]*(pw+1)+M[0][1]*ph+M[0][2];
P[5] = M[1][0]*(pw+1)+M[1][1]*ph+M[1][2];
P[6] = M[0][0]*(pw+1)+M[0][1]*(ph+1)+M[0][2];
P[7] = M[1][0]*(pw+1)+M[1][1]*(ph+1)+M[1][2];
std::cout<<imageWidth<<imageHeight<<std::endl;
int leftMost = int(max(round(min(min(P[0],P[2]),min(P[4],P[6]))) ,0.0));
int rightMost= int(min(round(max(max(P[0],P[2]),max(P[4],P[6]))),imageWidth-1.0));
int topMost= int(max(round(min(min(P[1],P[3]),min(P[5],P[7]))),0.0));
int bottomMost= int(min(round(max(max(P[1],P[3]),max(P[5],P[7]))),imageHeight-1.0));
//bool is_empty = (rightMost<= leftMost) || (bottomMost <= topMost);
//std::cout<<leftMost<<rightMost<<topMost<<bottomMost<<std::endl;
Dtype AB[2];
AB[0] = P[2] - P[0];
AB[1] = P[3] - P[1];
　　Dtype ABAB = AB[0]*AB[0] +AB[1]*AB[1];
　　Dtype AC[2];
AC[0] = P[4] - P[0];
AC[1] = P[5] - P[1];
Dtype ACAC = AC[0]*AC[0] + AC[1]*AC[1];
top_data[pool_index] = 0;
argmax_data[pool_index] = -1;
for (int h = topMost; h < bottomMost+1; ++h) {
for (int w = leftMost; w < rightMost+1; ++w) {
Dtype AP[2];
AP[0] = w - P[0];
AP[1] = h - P[1];
Dtype ABAP = AB[0]*AP[0] +AB[1]*AP[1];
Dtype ACAP = AC[0]*AP[0] + AC[1]*AP[1];
if(ABAB>ABAP&&ABAP>=0&&ACAC>ACAP&&ACAP>=0){
const int index = h * width_ + w;
if (batch_data[index] > top_data[pool_index]) {
top_data[pool_index] = batch_data[index];
argmax_data[pool_index] = index;
}
}
}
}
}
}
// Increment all data pointers by one channel
batch_data += bottom[0]->offset(0, 1);
top_data += top[0]->offset(0, 1);
argmax_data += max_idx_.offset(0, 1);
}
// Increment ROI data pointer
bottom_rois += bottom[1]->offset(1);
}
}

1.矩阵Ｍ定义了Affine Transformation的变换矩阵。包含了缩放，旋转，平移操作。

2.通过缩放，旋转，平移，找到sub region对应的平行四边形在feature map上的坐标。也就是p[0]~p[7]，示意图如下：

3.由于θ的大小不一定，所以p[0],p[2],p[4],p[6]均有可能为最左边的坐标。最右，最上和最下同理。通过比较坐标的位置，找到最左，最右，最上和最下的坐标。

4.过滤掉无效位置（if(ABAB>ABAP&&ABAP>=0&&ACAC>ACAP&&ACAP>=0)...)

5.最后进行max pooling.

实验说明：

ICDAR2015数据集：利用数据集中给定的四个顶点，生成倾斜的Bounding box;

ICDAR2013数据集：生成水平方向的Bounding box。

调参技巧：前20万次迭代lr为0.0001

后10万次迭代lr为0.0005

过滤：由于生成的anchors的数量为原来的6倍，因此将超出边框的anchors进行过滤。

运行部分：

bug解决方案：

Segmentation fault，错误解决方案：

./lib/rotation/data_extractor.py:

1.将import cv2的顺序提前，也就是将import文件的顺序改为：

import cv2

import numpy as np

import os

from xml.dom.minidom import parse

import xml.dom.minidom

from PIL import Image

import pickle

./tools/train_net.py:

2.将from rotation.data_extractor import get_rroidb的顺序提前，也就是将import文件的顺序改为：

from rotation.data_extractor import get_rroidb

import _init_paths

#from fast_rcnn.train import get_training_roidb, train_net #D

from rotation.rt_train import get_training_roidb, train_net

from fast_rcnn.config import cfg, cfg_from_file, cfg_from_list, get_output_dir

from datasets.factory import get_imdb

from rotation.data_extractor import get_rroidb

import datasets.imdb

import caffe

import argparse

import pprint

import numpy as np

import sys

Monte0539

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Arbitrary-Oriented Scene Text Detection via Rotation Proposals论文地址：https://arxiv.org/abs/1703.01086github地址：https://github.com/mjq11302010044/RRPN该论文是基于faster-rcnn框架，在场景文字识别领域的应用。创新点：生成带文字角度信息的倾斜的proposal1.RRPN(Rotation Region Proposal Netw..
复制链接

扫一扫

专栏目录