本文主要内容索引
ABCNet模型简介
场景文本检测与识别越来越受到人们的关注。
现有的方法大致可以分为两类:基于字符的方法和基于分割的方法。这些方法要么用于字符注释,要么需要维护complex pipeline,这不适合实时应用。ABCNet作者通过提出Adaptive Bezier-Curve Network(ABCNet)来解决这个问题。
其贡献主要有三个方面:
1)首次通过参数化(parameterized)Bezier curve 自适应拟合任意形状的文本。
2) 设计了一种新的BezierAlign层,用于提取任意形状文本实例的精确卷积特征,与以前的方法相比,显著提高了精度。
3) 与标准边界盒检测相比,Bezier curve 检测引入的计算开销可以忽略不计,这使得其在效率和准确性上都具有优势。
在任意形状的基准数据集(即Total Text和CTW1500)上的实验表明,ABCNet达到了最先进的精度,同时显著提高了速度。在整个文本中,实时版本比最新最先进的方法快10倍以上,识别精度具有竞争力。
官方论文:在这里贴一个论文的链接!
训练方法
训练平台的选择:极链AI云
优势特点:与很多平台不同,按小时计费;服务器GPU类型多样,选择多,数量多;有短期的暑期优惠和长期的学生优惠可以选择,非常的实惠!!!
贴一个极链AI云的官网:https://cloud.videojj.com
所需配置
模型训练方法
数据集准备(以CTW1500为例)
准备CTW1500数据集:
build datasets link:
ln -s /datasets/CTW1500/annotations /root/AdelaiDet/datasets/CTW1500/annotations
ln -s /datasets/CTW1500/ctwtest_text_image /root/AdelaiDet/datasets/CTW1500/ctwtest_text_image
ln -s /datasets/CTW1500/ctwtrain_text_image /root/AdelaiDet/datasets/CTW1500/ctwtrain_text_image
tips:需要检查yaml文件下的路径。
训练语句
在/root/AdelaiDet文件夹下的终端中输入:
tips:需要根据选择加载checkpoint的pretrained文件路径,如果需要从头开始训练,可以将yaml文件中的weights路径删除。
OMP_NUM_THREADS=1 python tools/train_net.py \
--config-file configs/BAText/CTW1500/attn_R_50.yaml \
--num-gpus 1 \
MODEL.WEIGHTS ctw1500_attn_R_50.pth
具体usage,按需调整:
usage: train_net.py [-h] [--config-file FILE] [--resume] [--eval-only] [--num-gpus NUM_GPUS] [--num-machines NUM_MACHINES] [--machine-rank MACHINE_RANK] [--dist-url DIST_URL] ...
positional arguments:
opts Modify config options at the end of the command. For Yacs configs, use space-separated "PATH.KEY VALUE" pairs. For python-based LazyConfig, use
"path.key=value".
optional arguments:
-h, --help show this help message and exit
--config-file FILE path to config file
--resume Whether to attempt to resume from the checkpoint directory. See documentation of `DefaultTrainer.resume_or_load()` for what it means.
--eval-only perform evaluation only
--num-gpus NUM_GPUS number of gpus *per machine*
--num-machines NUM_MACHINES
total number of machines
--machine-rank MACHINE_RANK
the rank of this machine (unique per machine)
--dist-url DIST_URL initialization URL for pytorch distributed backend. See https://pytorch.org/docs/stable/distributed.html for details.
Examples:
Run on single machine:
$ tools/train_net.py --num-gpus 8 --config-file cfg.yaml
Change some config options:
$ tools/train_net.py --config-file cfg.yaml MODEL.WEIGHTS /path/to/weight.pth SOLVER.BASE_LR 0.001
预训练模型权重文件路径
在路径/modelsets/AdelaiDet/ABCNet下。
选择自己需要模型的预训练权重。
数据集
CTW1500 数据集(需科学上网下载)
数据集路径:/datasets/CTW1500/
In this paper, we introduce a very large Chinese text dataset in the wild. While optical character recognition (OCR) in document images is well studied and many commercial tools are available, the detection and recognition of text in natural images is still a challenging problem, especially for some more complicated character sets such as Chinese text. Lack of training data has always been a problem, especially for deep learning methods which require massive training data. In this paper, we provide details of a newly created dataset of Chinese text with about 1 million Chinese characters from 3850 unique ones annotated by experts in over 30000 street view images. This is a challenging dataset with good diversity containing planar text, raised text, text under poor illumination, distant text, partially occluded text, etc. Besides the dataset, we give baseline results using state-of-the-art methods for three tasks: character recognition (top-1 accuracy of 80.5%), character detection (AP of 70.9%), and text line detection (AED of 22.1). The dataset, source code, and trained models are publicly available.
TotalText 数据集(需科学上网下载)
数据集路径:/datasets/TotalText/
In order to facilitate a new text detection research, we introduce Total-Text dataset (IJDAR)(ICDAR-17 paper) (presentation slides), which is more comprehensive than the existing text datasets. The Total-Text consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.
来源
小结
按照步骤即可完成,整个训练过程较长,可以适当调整参数加快训练。