Introduction
Vedastr is an open source scene text recognition toolbox based on PyTorch. It is designed to be flexible
in order to support rapid implementation and evaluation for scene text recognition task.
Features
Modular design
We decompose the scene text recognition framework into different components and one can
easily construct a customized scene text recognition framework by combining different modules.
Flexibility
Vedastr is flexible enough to be able to easily change the components within a module.
Module expansibility
It is easy to integrate a new module into the vedastr project.
Support of multiple frameworks
The toolbox supports several popular scene text recognition framework, e.g., CRNN,
TPS-ResNet-BiLSTM-Attention, Transformer, etc.
Good performance
We re-implement the best model in deep-text-recognition-benchmark
and get better average accuracy. What's more, we implement a simple baseline(ResNet-FC)
and the performance is acceptable.
License
This project is released under Apache 2.0 license.
Benchmark and model zoo
Note:
We test our model on IIIT5K_3000,
SVT,
IC03_867,
IC13_1015,
IC15_2077,SVTP,
CUTE80. The training data we used is MJSynth(MJ) and
SynthText(ST). You can find the
datasets below.
MODEL
CASE SENSITIVE
IIIT5k_3000
SVT
IC03_867
IC13_1015
IC15_2077
SVTP
CUTE80
AVERAGE
False
87.33
87.79
95.04
92.61
74.45
81.09
74.91
84.95
False
85.03
86.4
94
91.03
70.29
77.67
71.43
82.38
False
88.87
88.87
96.19
93.99
79.08
84.81
84.67
87.55
AVERAGE : Average accuracy over all test datasets
TPS : Spatial transformer network
Small-SATRN: On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention,
training phase is case sensitive while testing phase is case insensitive.
CASE SENSITIVE : If true, the output is case sensitive and contain common characters.
If false, the output is not case sentive and contains only numbers and letters.
Installation
Requirements
Linux
Python 3.6+
PyTorch 1.2.0 or higher
CUDA 9.0 or higher
We have tested the following versions of OS and softwares:
OS: Ubuntu 16.04.6 LTS
CUDA: 9.0
Python 3.6.9
Install vedastr
a. Create a conda virtual environment and activate it.
conda create -n vedastr python=3.6 -y
conda activate vedastr
b. Install PyTorch and torchvision following the official instructions,
e.g.,
conda installpytorch torchvision -c pytorch
c. Clone the vedastr repository.
git clone https://github.com/Media-Smart/vedastr.git
cdvedastr
vedastr_root=${PWD}
d. Install dependencies.
pip install -r requirements.txt
Prepare data
a. Download Lmdb data from deep-text-recognition-benchmark,
which contains training data, validation data and evaluation data.
b. Make directory data as follows:
cd ${vedastr_root}
mkdir ${vedastr_root}/data
c. Put the download Lmdb data into this data directory, the structure of data directory will look like as follows:
data
└── data_lmdb_release
├── evaluation
├── training
│ ├── MJ
│ │ ├── MJ_test
│ │ ├── MJ_train
│ │ └── MJ_valid
│ └── ST
└── validation
Train
a. Config
Modify some configuration accordingly in the config file like configs/tps_resnet_bilstm_attn.py
b. Run
python tools/trainval.py configs/tps_resnet_bilstm_attn.py
Snapshots and logs will be generated at vedastr/workdir.
Test
a. Config
Modify some configuration accordingly in the config file like configs/tps_resnet_bilstm_attn.py
b. Run
python tools/test.py configs/tps_resnet_bilstm_attn.py path_to_tps_resnet_bilstm_attn_weights
Demo
a. Run
python tools/demo.py config-path weight-path img-path
Contact
This repository is currently maintained by Jun Sun(@ChaseMonsterAway), Hongxiang Cai (@hxcai), Yichao Xiong (@mileistone).
Credits