学习中遇到各种小知识点（不断更新）

最新推荐文章于 2023-05-30 14:37:42 发布

wyl2077

最新推荐文章于 2023-05-30 14:37:42 发布

阅读量1.6k

点赞数 1

分类专栏：机器学习文章标签： python 深度学习

本文链接：https://blog.csdn.net/dbdxwyl/article/details/109548301

版权

机器学习专栏收录该内容

11 篇文章 1 订阅

订阅专栏

Shoelace公式

Shoelace公式，也叫高斯面积公式，俗称鞋带公式。根据按时针顺序的顶点坐标可求确定区域内多边形的面积。公式为：
在这里插入图片描述
当X的下标大于n时：Xn+1=X1,Yn+1=Y1。并且当点顺为顺时针时，面积为负，点顺序为逆时针时。面积为正。利用这个公式来判断文本框的输入四个点的顺序，使其均为逆时针，并且筛选掉面积太小的框

#Shoelace公式
def polygon_area(poly):
    '''
    compute area of a polygon
    '''
    edge = [
        (poly[1][0] - poly[0][0]) * (poly[1][1] + poly[0][1]),
        (poly[2][0] - poly[1][0]) * (poly[2][1] + poly[1][1]),
        (poly[3][0] - poly[2][0]) * (poly[3][1] + poly[2][1]),
        (poly[0][0] - poly[3][0]) * (poly[0][1] + poly[3][1])
    ]
    return np.sum(edge) / 2.


def check_and_validate_polys(polys, tags, xxx_todo_changeme):
    '''
    check so that the text poly is in the same direction,
    and also filter some invalid polygons
    :param polys:
    :param tags:
    :return:
    '''
    (h, w) = xxx_todo_changeme
    if polys.shape[0] == 0:
        return polys
    polys[:, :, 0] = np.clip(polys[:, :, 0], 0, w - 1)
    polys[:, :, 1] = np.clip(polys[:, :, 1], 0, h - 1)

    validated_polys = []
    validated_tags = []
    for poly, tag in zip(polys, tags):
        p_area = polygon_area(poly)
        if abs(p_area) < 1:
            # print poly
            print('invalid poly')
            continue
        if p_area > 0:
            print('poly in wrong direction')
            poly = poly[(0, 3, 2, 1), :]
        validated_polys.append(poly)
        validated_tags.append(tag)
    return np.array(validated_polys), np.array(validated_tags)

图片去均值

1.是什么，均值分为图像均值和像素均值，图像均值是指，对训练集所有图片同一空间位置的像素值求均值，像素均值是指，对训练集所有图像的R，G和B通道求均值。
2.为什么，类似于数据归一化，为了防止梯度爆炸，所以对图片也要进行“归一化”。
3.怎么做，直接用图片各通道的原像素值减去均值：

def __mean_image_subtraction(self, images, means = [123.68, 116.78, 103.94]):
      '''
      image normalization，subtract the mean of every channel
      :param images: bathsize *channnel* w * h 
      '''
      num_channels = images.data.shape[1]
      if len(means) != num_channels:
          raise ValueError('len(means) must match the number of channels')
      for i in range(num_channels):
          images.data[:, i, :, :] -= means[i]

      return images

self.modules() 和 self.children()

在初始化参数时，可以利用这两个函数。
对于这样一个网络：
在这里插入图片描述
self.children()存储网络结构的子层模块，也就是net’s children那一层
self.modules()采用深度优先遍历的方式，存储了net的所有模块，包括net itself,net’s children, children of net’s children。

#initialization
for m in self.modules():
    if isinstance(m, nn.Conv2d):
        m.weight.data.normal_().fmod_(2).mul_(0.01).add_(0)
        #init.xavier_uniform_(m.weight.data)
    elif isinstance(m, nn.BatchNorm2d):
        m.weight.data.fill_(1)
        m.bias.data.zero_()
    elif isinstance(m, nn.Linear):
        init.xavier_uniform_(m.weight.data)
        m.bias.data.zero_()

参考网址：https://blog.csdn.net/dss_dssssd/article/details/83958518
PS：对于named_children()和named_modules()函数，同上图，前者返回net’s children的layer名，后者返回所有模块的名字。

OCR领域的一些名词解释

RPN(RegionProposal Network)：区域生成网络
Seq2Seq：一般通过Encoder-Decoder（编码-解码）框架实现从序列到序列的转换。
CTC（Connectionist Temporal Classification）：decoder不知道输入输出是否对齐的情况使用的算法
FPN（feature pyramid networks）:多尺度的object detection算法
OHEM(Online Hard Example Mining):自动地选择 had negative 来进行训练
RoI（Region of interest）:图片中认为有存在目标的区域
dropout：防止过拟合，我们在前向传播的时候，让某个神经元的激活值以一定的概率p停止工作，这样可以使模型泛化性更强，因为它不会太依赖某些局部的特征。
concat和add：concat是通道数的增加;add是特征图相加，通道数不变
ICDAR：（International Conference on Document Analysis and Recognition）国际文档分析和识别大会