FOTS(1)基础网络

ResNet

paper:https://arxiv.org/pdf/1512.03385.pdf

ResNet(深度残差网络)提出于2015年,有效的解决了当网络层数增加导致的梯度消失和梯度爆炸问题(如下图所示):
在这里插入图片描述文中提出了如下网络结构:
在这里插入图片描述

在原本的网络结构上引入右侧的x,这样网络的输出变为H(X)=F(X)+X,当网络层数太深时,若F(X),那么H(X)=X,也就可以将深层网络看成浅层网络。
为什么我们要利用残差网络来制造恒等映射呢,当网络深度增加时,如果新增加的网络什么都不做,模型的效果也不至于下降,然而对于神经网络,什么都不做(恒等映射)恰恰是最难的地方,由于非线性层的存在,每一层都存在信息损失。也就是拟合F(X)=X很难,但拟合F(X)=0要简单很多。
这时我们会发现一个隐患,如果输入层和输出层的chanel不一样怎么相加,此时可以对X做卷积,来让X的chanel数与F(X)匹配。
ResNet的block的两种形式:
在这里插入图片描述
FOTS使用的ResNet50使用的就是右侧这种形式,通过11的卷积层降低chanel数,来显著减少33卷积层的参数。
ResNet网络结构:
在这里插入图片描述
代码:`

'''
取出Resnst50的四层,来作为encoding部分,保存来进行特征融合
'''
bbNet =  pretrainedmodels.__dict__['resnet50'](pretrained='imagenet')
self.backbone = bbNet
def __foward_backbone(self, input):
     conv2 = None
     conv3 = None
     conv4 = None
     output = None 
     for name, layer in self.backbone.named_children():
         input = layer(input)
         if name == 'layer1':
             conv2 = input
         elif name == 'layer2':
             conv3 = input
         elif name == 'layer3':
             conv4 = input
         elif name == 'layer4':
             output = input
             break

     return output, conv4, conv3, conv2

Unet

网络结构图:
在这里插入图片描述
左侧为encoding,采用卷积和下采样(池化),将下采样前的feature进行copy和crop送到右侧,右侧为decoding,采用卷积和上采样(反卷积),每一层上采样完与左侧的送来的进行concat。

'''
self.__foward_backbone:restnet50,返回output, conv4, conv3, conv2
self.mergeLayers0,1,2,3: concat+conv
self.__unpool: 上采样
'''
f = self.__foward_backbone(input)
g = [None] * 4
h = [None] * 4
# 底层
h[0] = self.mergeLayers0(f[0])
g[0] = self.__unpool(h[0])

# i = 2
h[1] = self.mergeLayers1(g[0], f[1])
g[1] = self.__unpool(h[1])

# i = 3
h[2] = self.mergeLayers2(g[1], f[2])
g[2] = self.__unpool(h[2])

# i = 4
h[3] = self.mergeLayers3(g[2], f[3])
#g[3] = self.__unpool(h[3])

# final stage
final = self.mergeLayers4(h[3])
final = self.bn5(final)
final = F.relu(final)

基础知识

上采样:与下采样减小图像尺寸相反,上采样被用来增大图像尺寸,常用的上采样方法有反卷积,双线性插值,上池化等。首先介绍反卷积。
反卷积(Transposed Convolution)
反卷积也是一种卷积,需要设置kernel_size,Stride,padding ;不同之处在于要对输入的特征图进行处理,反卷积的过程可以分为两步:
1.对原特征图进行插值,在宽和高的方向上,每两个像素点间插入(Stride−1)个值为0的点,新特征图的高为:H’=H+(Stride−1)∗(H−1),宽为W’=W+(Stride−1)∗(W−1)
2.对插值后的特征图进行卷积,输出feature map的高为(H-1)Stride-2padding+Size

#反卷积
nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0,  output_padding=0, groups=1, bias=True, dilation=1)
'''
in_channels(int) – 输入信号的通道数
out_channels(int) – 卷积产生的通道数
kerner_size(int or tuple) - 卷积核的大小
stride(int or tuple,optional) - 卷积步长,即要将输入扩大的倍数。
padding(int or tuple, optional) - 输入的每一条边补充0的层数,高宽都增加2*padding
output_padding(int or tuple, optional) - 输出边补充0的层数,高宽都增加padding
groups(int, optional) – 从输入通道到输出通道的阻塞连接数
bias(bool, optional) - 如果bias=True,添加偏置
dilation(int or tuple, optional) – 卷积核元素之间的间距

'''
#上下采样函数
torch.nn.functional.interpolate(input, size=None, scale_factor=None, mode='nearest', align_corners=None)
'''
参数:
    - input (Tensor): input tensor
    - size (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int]):输出的 spatial 尺寸.
    - scale_factor (float or Tuple[float]): spatial 尺寸的缩放因子.
    - mode (string): 上采样算法:nearest, linear, bilinear, trilinear, area. 默认为 nearest.
    - align_corners (bool, optional): 如果 align_corners=True,则对齐 input 和 output 的角点像素(corner pixels),保持在角点像素的值. 只会对 mode=linear, bilinear 和 trilinear 有作用. 默认是 False.
    """
'''

生成bbox

fots算法中的bbox采用的是east算法中的RBOX,即带角度的长方形框,其有五个参数,分别为像素位置到矩形的顶部,右侧,底部,左侧边界的4个距离di,和旋转角度θ。
由RBOX五个参数生成bbox四个顶点的坐标的函数为:

def restore_rectangle_rbox(origin, geometry):
    '''
    :param geometry:[d1,d2,d3,d4,cita] #distance to top,left,bottom,right
    :return:
    '''
    d = geometry[:, :4]
    angle = geometry[:, 4]
    # for angle > 0
    origin_0 = origin[angle >= 0]
    d_0 = d[angle >= 0]
    angle_0 = angle[angle >= 0]
    if origin_0.shape[0] > 0:
        # (0,-H),(W, -H),(W, 0),(0, 0),(Left, -bottom)
        p = np.array([np.zeros(d_0.shape[0]), -d_0[:, 0] - d_0[:, 2],
                      d_0[:, 1] + d_0[:, 3], -d_0[:, 0] - d_0[:, 2],
                      d_0[:, 1] + d_0[:, 3], np.zeros(d_0.shape[0]),
                      np.zeros(d_0.shape[0]), np.zeros(d_0.shape[0]),
                      d_0[:, 3], -d_0[:, 2]])
        p = p.transpose((1, 0)).reshape((-1, 5, 2))  # N*5*2

        rotate_matrix_x = np.array([np.cos(angle_0), np.sin(angle_0)]).transpose((1, 0))
        rotate_matrix_x = np.repeat(rotate_matrix_x, 5, axis = 1).reshape(-1, 2, 5).transpose((0, 2, 1))  # N*5*2

        rotate_matrix_y = np.array([-np.sin(angle_0), np.cos(angle_0)]).transpose((1, 0))
        rotate_matrix_y = np.repeat(rotate_matrix_y, 5, axis = 1).reshape(-1, 2, 5).transpose((0, 2, 1))

        p_rotate_x = np.sum(rotate_matrix_x * p, axis = 2)[:, :, np.newaxis]  # N*5*1
        p_rotate_y = np.sum(rotate_matrix_y * p, axis = 2)[:, :, np.newaxis]  # N*5*1

        p_rotate = np.concatenate([p_rotate_x, p_rotate_y], axis = 2)  # N*5*2

        p3_in_origin = origin_0 - p_rotate[:, 4, :]
        new_p0 = p_rotate[:, 0, :] + p3_in_origin  # N*2
        new_p1 = p_rotate[:, 1, :] + p3_in_origin
        new_p2 = p_rotate[:, 2, :] + p3_in_origin
        new_p3 = p_rotate[:, 3, :] + p3_in_origin

        new_p_0 = np.concatenate([new_p0[:, np.newaxis, :], new_p1[:, np.newaxis, :],
                                  new_p2[:, np.newaxis, :], new_p3[:, np.newaxis, :]], axis = 1)  # N*4*2
    else:
        new_p_0 = np.zeros((0, 4, 2))
    # for angle < 0
    origin_1 = origin[angle < 0]
    d_1 = d[angle < 0]
    angle_1 = angle[angle < 0]
    if origin_1.shape[0] > 0:
        p = np.array([-d_1[:, 1] - d_1[:, 3], -d_1[:, 0] - d_1[:, 2],
                      np.zeros(d_1.shape[0]), -d_1[:, 0] - d_1[:, 2],
                      np.zeros(d_1.shape[0]), np.zeros(d_1.shape[0]),
                      -d_1[:, 1] - d_1[:, 3], np.zeros(d_1.shape[0]),
                      -d_1[:, 1], -d_1[:, 2]])
        p = p.transpose((1, 0)).reshape((-1, 5, 2))  # N*5*2

        rotate_matrix_x = np.array([np.cos(-angle_1), -np.sin(-angle_1)]).transpose((1, 0))
        rotate_matrix_x = np.repeat(rotate_matrix_x, 5, axis = 1).reshape(-1, 2, 5).transpose((0, 2, 1))  # N*5*2

        rotate_matrix_y = np.array([np.sin(-angle_1), np.cos(-angle_1)]).transpose((1, 0))
        rotate_matrix_y = np.repeat(rotate_matrix_y, 5, axis = 1).reshape(-1, 2, 5).transpose((0, 2, 1))

        p_rotate_x = np.sum(rotate_matrix_x * p, axis = 2)[:, :, np.newaxis]  # N*5*1
        p_rotate_y = np.sum(rotate_matrix_y * p, axis = 2)[:, :, np.newaxis]  # N*5*1

        p_rotate = np.concatenate([p_rotate_x, p_rotate_y], axis = 2)  # N*5*2

        p3_in_origin = origin_1 - p_rotate[:, 4, :]
        new_p0 = p_rotate[:, 0, :] + p3_in_origin  # N*2
        new_p1 = p_rotate[:, 1, :] + p3_in_origin
        new_p2 = p_rotate[:, 2, :] + p3_in_origin
        new_p3 = p_rotate[:, 3, :] + p3_in_origin

        new_p_1 = np.concatenate([new_p0[:, np.newaxis, :], new_p1[:, np.newaxis, :],
                                  new_p2[:, np.newaxis, :], new_p3[:, np.newaxis, :]], axis = 1)  # N*4*2
    else:
        new_p_1 = np.zeros((0, 4, 2))
    return np.concatenate([new_p_0, new_p_1])

在这个函数的基础上从score map和geo map中得到bbox顶点坐标

def detect(score_map, geo_map, score_map_thresh = 0.5, box_thresh = 0.1, nms_thres = 0.2, timer = None):
    '''1e-5
    restore text boxes from score map and geo map
    :param score_map:1 channel
    :param geo_map:5 channel
    :param timer:
    :param score_map_thresh: threshhold for score map
    :param box_thresh: threshhold for boxes
    :param nms_thres: threshold for nms
    :return:
    '''
    if len(score_map.shape) == 4:
        score_map = score_map[0, :, :, 0]
        geo_map = geo_map[0, :, :, ]
    # filter the score map
    xy_text = np.argwhere(score_map > score_map_thresh)
    # sort the text boxes via the y axis
    xy_text = xy_text[np.argsort(xy_text[:, 0])]
    # restore
    start = time.time()
    text_box_restored = Toolbox.restore_rectangle_rbox(xy_text[:, ::-1] * 4, geo_map[xy_text[:, 0], xy_text[:, 1], :])  # N*4*2
    # print('{} text boxes before nms'.format(text_box_restored.shape[0]))
    boxes = np.zeros((text_box_restored.shape[0], 9), dtype = np.float32)
    boxes[:, :8] = text_box_restored.reshape((-1, 8))
    boxes[:, 8] = score_map[xy_text[:, 0], xy_text[:, 1]]
    timer['restore'] = time.time() - start
    # nms part
    start = time.time()
    # boxes = nms_locality.nms_locality(boxes.astype(np.float64), nms_thres)
    boxes = lanms.merge_quadrangle_n9(boxes.astype('float32'), nms_thres)
    timer['nms'] = time.time() - start
    if boxes.shape[0] == 0:
        return None, timer

    # here we filter some low score boxes by the average score map, this is different from the orginal paper
    for i, box in enumerate(boxes):
        mask = np.zeros_like(score_map, dtype = np.uint8)
        cv2.fillPoly(mask, box[:8].reshape((-1, 4, 2)).astype(np.int32) // 4, 1)
        boxes[i, 8] = cv2.mean(score_map, mask)[0]
    boxes = boxes[boxes[:, 8] > box_thresh]
    return boxes, timer
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值