细致讲解!— 基于Detectron2的AdelaiDet工具包的ABCNet模型训练(3)


ABCNet模型简介

场景文本检测与识别越来越受到人们的关注。
现有的方法大致可以分为两类:基于字符的方法和基于分割的方法。这些方法要么用于字符注释,要么需要维护complex pipeline,这不适合实时应用。ABCNet作者通过提出Adaptive Bezier-Curve Network(ABCNet)来解决这个问题。
其贡献主要有三个方面:
1)首次通过参数化(parameterized)Bezier curve 自适应拟合任意形状的文本。
2) 设计了一种新的BezierAlign层,用于提取任意形状文本实例的精确卷积特征,与以前的方法相比,显著提高了精度。
3) 与标准边界盒检测相比,Bezier curve 检测引入的计算开销可以忽略不计,这使得其在效率和准确性上都具有优势。
在任意形状的基准数据集(即Total Text和CTW1500)上的实验表明,ABCNet达到了最先进的精度,同时显著提高了速度。在整个文本中,实时版本比最新最先进的方法快10倍以上,识别精度具有竞争力。
官方论文:在这里贴一个论文的链接!


训练方法

训练平台的选择:极链AI云

优势特点:与很多平台不同,按小时计费;服务器GPU类型多样,选择多,数量多;有短期的暑期优惠和长期的学生优惠可以选择,非常的实惠!!!

贴一个极链AI云的官网:https://cloud.videojj.com


所需配置

之前的配置文章


模型训练方法

数据集准备(以CTW1500为例)

准备CTW1500数据集:
 build datasets link:

ln -s /datasets/CTW1500/annotations /root/AdelaiDet/datasets/CTW1500/annotations
ln -s /datasets/CTW1500/ctwtest_text_image /root/AdelaiDet/datasets/CTW1500/ctwtest_text_image
ln -s /datasets/CTW1500/ctwtrain_text_image /root/AdelaiDet/datasets/CTW1500/ctwtrain_text_image

tips:需要检查yaml文件下的路径。


训练语句

在/root/AdelaiDet文件夹下的终端中输入:
tips:需要根据选择加载checkpoint的pretrained文件路径,如果需要从头开始训练,可以将yaml文件中的weights路径删除。

OMP_NUM_THREADS=1 python tools/train_net.py \
    --config-file configs/BAText/CTW1500/attn_R_50.yaml \
    --num-gpus 1 \
    MODEL.WEIGHTS ctw1500_attn_R_50.pth

具体usage,按需调整:

usage: train_net.py [-h] [--config-file FILE] [--resume] [--eval-only] [--num-gpus NUM_GPUS] [--num-machines NUM_MACHINES] [--machine-rank MACHINE_RANK] [--dist-url DIST_URL] ...

positional arguments:
  opts                  Modify config options at the end of the command. For Yacs configs, use space-separated "PATH.KEY VALUE" pairs. For python-based LazyConfig, use
                        "path.key=value".

optional arguments:
  -h, --help            show this help message and exit
  --config-file FILE    path to config file
  --resume              Whether to attempt to resume from the checkpoint directory. See documentation of `DefaultTrainer.resume_or_load()` for what it means.
  --eval-only           perform evaluation only
  --num-gpus NUM_GPUS   number of gpus *per machine*
  --num-machines NUM_MACHINES
                        total number of machines
  --machine-rank MACHINE_RANK
                        the rank of this machine (unique per machine)
  --dist-url DIST_URL   initialization URL for pytorch distributed backend. See https://pytorch.org/docs/stable/distributed.html for details.

Examples:

Run on single machine:
    $ tools/train_net.py --num-gpus 8 --config-file cfg.yaml

Change some config options:
    $ tools/train_net.py --config-file cfg.yaml MODEL.WEIGHTS /path/to/weight.pth SOLVER.BASE_LR 0.001

预训练模型权重文件路径

在路径/modelsets/AdelaiDet/ABCNet下。
选择自己需要模型的预训练权重。


数据集

CTW1500 数据集(需科学上网下载)

数据集路径:/datasets/CTW1500/
In this paper, we introduce a very large Chinese text dataset in the wild. While optical character recognition (OCR) in document images is well studied and many commercial tools are available, the detection and recognition of text in natural images is still a challenging problem, especially for some more complicated character sets such as Chinese text. Lack of training data has always been a problem, especially for deep learning methods which require massive training data. In this paper, we provide details of a newly created dataset of Chinese text with about 1 million Chinese characters from 3850 unique ones annotated by experts in over 30000 street view images. This is a challenging dataset with good diversity containing planar text, raised text, text under poor illumination, distant text, partially occluded text, etc. Besides the dataset, we give baseline results using state-of-the-art methods for three tasks: character recognition (top-1 accuracy of 80.5%), character detection (AP of 70.9%), and text line detection (AED of 22.1). The dataset, source code, and trained models are publicly available.


TotalText 数据集(需科学上网下载)

数据集路径:/datasets/TotalText/
In order to facilitate a new text detection research, we introduce Total-Text dataset (IJDAR)(ICDAR-17 paper) (presentation slides), which is more comprehensive than the existing text datasets. The Total-Text consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.


来源

点这里跳转github仓库!

小结

按照步骤即可完成,整个训练过程较长,可以适当调整参数加快训练。

  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

赵云战江湖

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值