自动生成OCR合成数据集步骤——TextRecognitionDataGenerator

yuanjiaqi_k

已于 2022-12-30 13:23:04 修改

阅读量1.8k

点赞数 1

分类专栏： pytorch 文章标签： python 开发语言

于 2022-09-13 09:22:27 首次发布

本文链接：https://blog.csdn.net/yuanjiaqi_k/article/details/126636864

版权

pytorch 专栏收录该内容

16 篇文章 2 订阅

订阅专栏

代码源自https://github.com/Belval/TextRecognitionDataGenerator

安装环境

下载安装包解压
安装环境pip install trdg
进入文件夹,安装requirementspip install -r requirements.txt
安装完成
进入文件夹cd trdg
尝试运行看是否存在错误python run.py -c 10
报错

AttributeError: module ‘PIL.Image’ has no attribute ‘Resampling’
‘FreeTypeFont’ object has no attribute ‘getlength’

这是由于pillow包的版本不对问题
更新pillow为9.2.0

pip install --upgrade pillow -i https://pypi.tuna.tsinghua.edu.cn/simple

再次运行，out文件夹下含有对应数据

生成数据方法

使用方法

trgd文件夹中，更换背景文件夹-images 内放入想要的背景
dicts里面放入想要的字典
out为默认输出文件夹
fonts为字体文件夹，只能识别ttf类型，需要把ttf放在对应的文件夹中
运行命令参照下列表格
修改名称：run.py中倒数几行iname 可以更改生成的txt中的文件名，从0开始填写i+0
图片生成后，使用rename.py更改名称，变量k后面的参数，一定要与run.py中的一致

    if args.name_format == 2:
        # Create file with filename-to-label connections
        with open(
            os.path.join(args.output_dir, "labels.txt"), "w", encoding="utf8"
        ) as f:
            for i in range(string_count):
                iname="20221114_gan_y_"+str(i+0);#需要修改 命名方式如上，如果需要接着标签，i+数字
                file_name = iname + "." + args.extension
                label = strings[i]
                label = label.replace(" ", "")
                f.write("{}\t{}\n".format(file_name, label))

运行

python run.py 根据需要增加下列参数

参数说明

引用自：https://blog.csdn.net/u012995500/article/details/109405270?spm=1001.2014.3001.5501

参数		参数说明	举例
`--output_dir`		生成图片输出路径
`-ft/--font`		设定生成文本所用的字体文件（.ttf）格式	`-ft ./fonts/font/simhei.ttf`
`-fd/--font_dir`		设定生成文本所用字体的文件夹，生成的图片从文件夹中随机选择字体
`-dt/--dict`		设定从字典文件（路径）中选择单词生成图片	`./dicts/0.txt`(一个临时文件夹)
`-i/--input_file`		生成图片中文字的源文件（路径），不指定用项目默认文件
`-l/--language`		语言：en—英文，ch—中文，默认英文
`-w/--length`		随机生成图片包含的单词数	12个字`-w 12`
`-c/--count`		生成的图片数量	`-c 500`
`-f/--format`		生成图片的像素高度（水平排版），生成图片的像素宽度（竖直排版）	`-f 100`
`-b/--background`		设置图片的背景，0-高斯噪声； 1-白色背景； 2-图片
`-sw/--space_width`		设定图片中单词之间的像素间隔，默认为1像素	`-sw 0`
`-na/--name_format`		生成图片的命名格式，图片名称通常包含标签，对于一些包含特殊符号的图片，由于图片命名中不能包含特殊图片，所以另生成一个文本记录标签。	`-na 2`
`-rs/--random_sequences` 在 -rs 为 True 的情况下可以有右侧设置	-let/–include_letters	用字符随机生成单词，用于随机生成单词的字符中包含字母
	`num/--include_numbers`	用字符随机生成单词，用于随机生成单词的字符中包含数字
	`-sym/--include_symbols`	用字符随机生成单词，用于随机生成单词的字符中包含符号
`-w/--length`		随机生成图片包含的单词数
`-r/--random`		以-w设置的单词数为上限，随机生成不同单词数的图片
`-t/--thread_count`		运行程序使用的线程数，实测8线程下，生成一万张图片仅需 6s，设置较高的线程可以明显提速
`-e/--extension`		生成图片的保存格式，默认”jpg“
`-k/--skew_angle`		文字在图片中的倾斜角度
`-rk/--random_skew`		在倾斜角度 -k 设置的情况下，比如设为 a，则生成图片文字的倾斜角度在 -a~a之间随机选择
`-bl/--blur`		设定图片的高斯模糊值，默认为0，即无高斯模糊处理
`-rbl/--random_blur`		在设定高斯模糊值 -rbl 的情况下，比如设为b，则生成图片的高斯模糊值在 0～b之间随机取值
`-id/--image_dir`		在设定背景参数 -b 的值为2（即图片）的情况下，从指定的图片文件夹中读取图片作为背景。
`-hw/--handwritten`		利用训练好的RNN模型，生成手写字体图
`-om/--output_mask`		对于每一张生成的图片，输出同样尺寸的掩码（全黑图片），训练的时候作为一种trick
`-d/--distorsion`		对生成图片中的文字进行扭曲，默认为0。1-正弦扭曲，2-余弦扭曲
`-do/--distorsion_orientation`		在 -d 设定为正弦扭曲或者余弦扭曲的情况下，设定扭曲方向，0 - 竖直方向上的扭曲 1-横向扭曲
`-wd/--width`		设定图片的像素宽度，在不指定的情况下，宽度为文本的宽度+10，假如设定宽度，过短会截取部分文本
`-al/--alignment`		在设定文本宽度参数 -wd的情况下，截取文本的方式，0 -从左侧开始截取 1- 从中心向两边截取 2-从右侧开始截取
`-or/--orientation`		文本在图片中的排版，0- 横向排版，1- 竖向排版，默认横向排版
`-tc/--text_color`		文本的颜色，通过设定的颜色，或者颜色范围，生成特定颜色的文本，颜色格式为16进制如：#282828，（#000000，#282828）
`-cs/--character_spacing`		设定图片中字符之间的像素间隔，默认为0像素
`-m/--margins`		设定图片中文本，上下左右的空白间隔，以间隔的像素值表示，默认（5,5,5,5,）
`-fi/--fit`		是否按文本裁切图片，使图片中文本上下左右的间隔均为0，默认为 False
`-fd/--font_dir`		设定生成文本所用字体的文件夹，生成的图片从文件夹中随机选择字体
`-ca/--case`		设定图片中生成的文字大小写：upper/lower
`-ws/--word_split`		设定是设定根据单词还是字符分隔文字，True-根据单词 Talse-根据字符

python run.py --help
sage: run.py  [-h] [--output_dir [OUTPUT_DIR]] [-i [INPUT_FILE]]
              [-l [LANGUAGE]] -c [COUNT] [-rs] [-let] [-num] [-sym]
              [-w [LENGTH]] [-r] [-f [FORMAT]] [-t [THREAD_COUNT]]
              [-e [EXTENSION]] [-k [SKEW_ANGLE]] [-rk] [-wk] [-bl [BLUR]]
              [-rbl] [-b [BACKGROUND]] [-hw] [-na NAME_FORMAT]
              [-om OUTPUT_MASK] [-obb OUTPUT_BBOXES] [-d [DISTORSION]]
              [-do [DISTORSION_ORIENTATION]] [-wd [WIDTH]] [-al [ALIGNMENT]]
              [-or [ORIENTATION]] [-tc [TEXT_COLOR]] [-sw [SPACE_WIDTH]]
              [-cs [CHARACTER_SPACING]] [-m [MARGINS]] [-fi] [-ft [FONT]]
              [-fd [FONT_DIR]] [-id [IMAGE_DIR]] [-ca [CASE]] [-dt [DICT]]
              [-ws] [-stw [STROKE_WIDTH]] [-stf [STROKE_FILL]]
              [-im [IMAGE_MODE]]

Generate synthetic text data for text recognition.

optional arguments:
  -h, --help            show this help message and exit
  --output_dir [OUTPUT_DIR]
                        The output directory
  -i [INPUT_FILE], --input_file [INPUT_FILE]
                        When set, this argument uses a specified text file as
                        source for the text
  -l [LANGUAGE], --language [LANGUAGE]
                        The language to use, should be fr (French), en
                        (English), es (Spanish), de (German), ar (Arabic), cn
                        (Chinese), ja (Japanese) or hi (Hindi)
  -c [COUNT], --count [COUNT]
                        The number of images to be created.
  -rs, --random_sequences
                        Use random sequences as the source text for the
                        generation. Set '-let','-num','-sym' to use
                        letters/numbers/symbols. If none specified, using all
                        three.
  -let, --include_letters
                        Define if random sequences should contain letters.
                        Only works with -rs
  -num, --include_numbers
                        Define if random sequences should contain numbers.
                        Only works with -rs
  -sym, --include_symbols
                        Define if random sequences should contain symbols.
                        Only works with -rs
  -w [LENGTH], --length [LENGTH]
                        Define how many words should be included in each
                        generated sample. If the text source is Wikipedia,
                        this is the MINIMUM length
  -r, --random          Define if the produced string will have variable word
                        count (with --length being the maximum)
  -f [FORMAT], --format [FORMAT]
                        Define the height of the produced images if
                        horizontal, else the width
  -t [THREAD_COUNT], --thread_count [THREAD_COUNT]
                        Define the number of thread to use for image
                        generation
  -e [EXTENSION], --extension [EXTENSION]
                        Define the extension to save the image with
  -k [SKEW_ANGLE], --skew_angle [SKEW_ANGLE]
                        Define skewing angle of the generated text. In
                        positive degrees
  -rk, --random_skew    When set, the skew angle will be randomized between
                        the value set with -k and it's opposite
  -wk, --use_wikipedia  Use Wikipedia as the source text for the generation,
                        using this paremeter ignores -r, -n, -s
  -bl [BLUR], --blur [BLUR]
                        Apply gaussian blur to the resulting sample. Should be
                        an integer defining the blur radius
  -rbl, --random_blur   When set, the blur radius will be randomized between 0
                        and -bl.
  -b [BACKGROUND], --background [BACKGROUND]
                        Define what kind of background to use. 0: Gaussian
                        Noise, 1: Plain white, 2: Quasicrystal, 3: Image
  -hw, --handwritten    Define if the data will be "handwritten" by an RNN
  -na NAME_FORMAT, --name_format NAME_FORMAT
                        Define how the produced files will be named. 0:
                        [TEXT]_[ID].[EXT], 1: [ID]_[TEXT].[EXT] 2: [ID].[EXT]
                        + one file labels.txt containing id-to-label mappings
  -om OUTPUT_MASK, --output_mask OUTPUT_MASK
                        Define if the generator will return masks for the text
  -obb OUTPUT_BBOXES, --output_bboxes OUTPUT_BBOXES
                        Define if the generator will return bounding boxes for
                        the text, 1: Bounding box file, 2: Tesseract format
  -d [DISTORSION], --distorsion [DISTORSION]
                        Define a distorsion applied to the resulting image. 0:
                        None (Default), 1: Sine wave, 2: Cosine wave, 3:
                        Random
  -do [DISTORSION_ORIENTATION], --distorsion_orientation [DISTORSION_ORIENTATION]
                        Define the distorsion's orientation. Only used if -d
                        is specified. 0: Vertical (Up and down), 1: Horizontal
                        (Left and Right), 2: Both
  -wd [WIDTH], --width [WIDTH]
                        Define the width of the resulting image. If not set it
                        will be the width of the text + 10. If the width of
                        the generated text is bigger that number will be used
  -al [ALIGNMENT], --alignment [ALIGNMENT]
                        Define the alignment of the text in the image. Only
                        used if the width parameter is set. 0: left, 1:
                        center, 2: right
  -or [ORIENTATION], --orientation [ORIENTATION]
                        Define the orientation of the text. 0: Horizontal, 1:
                        Vertical
  -tc [TEXT_COLOR], --text_color [TEXT_COLOR]
                        Define the text's color, should be either a single hex
                        color or a range in the ?,? format.
  -sw [SPACE_WIDTH], --space_width [SPACE_WIDTH]
                        Define the width of the spaces between words. 2.0
                        means twice the normal space width
  -cs [CHARACTER_SPACING], --character_spacing [CHARACTER_SPACING]
                        Define the width of the spaces between characters. 2
                        means two pixels
  -m [MARGINS], --margins [MARGINS]
                        Define the margins around the text when rendered. In
                        pixels
  -fi, --fit            Apply a tight crop around the rendered text
  -ft [FONT], --font [FONT]
                        Define font to be used
  -fd [FONT_DIR], --font_dir [FONT_DIR]
                        Define a font directory to be used
  -id [IMAGE_DIR], --image_dir [IMAGE_DIR]
                        Define an image directory to use when background is
                        set to image
  -ca [CASE], --case [CASE]
                        Generate upper or lowercase only. arguments: upper or
                        lower. Example: --case upper
  -dt [DICT], --dict [DICT]
                        Define the dictionary to be used
  -ws, --word_split     Split on words instead of on characters (preserves
                        ligatures, no character spacing)
  -stw [STROKE_WIDTH], --stroke_width [STROKE_WIDTH]
                        Define the width of the strokes
  -stf [STROKE_FILL], --stroke_fill [STROKE_FILL]
                        Define the color of the contour of the strokes, if
                        stroke_width is bigger than 0
  -im [IMAGE_MODE], --image_mode [IMAGE_MODE]
                        Define the image mode to be used. RGB is default, L
                        means 8-bit grayscale images, 1 means 1-bit binary
                        images stored with one pixel per byte, etc.

使用举例

我的需求：使用D:\MyDatasets\ocr\alpha\font 中的字体，生成20220708 A000 或者2158D219A1的格式
–random_sequences: True
-num/–include_numbers 用字符随机生成单词，用于随机生成单词的字符中包含数字
-fd/–font_dir 设定生成文本所用字体的文件夹，生成的图片从文件夹中随机选择字体
-wd 340 -f 50 大小380*46
-b 3 background 3代表图片，文件夹在D:\pythonProject\TextRecognitionDataGenerator-master\trdg\out
-na 2 用名称 lable 格式保存数据
字母加数字随机混合：

python run.py -fd D:/MyDatasets/ocr/alpha/font/ --random_sequences -let --include_numbers -c 10 -w 2 -r  -wd 340 -f 50 -b
3 -na 2

仅数字，模糊，倾斜

python run.py -fd D:/MyDatasets/ocr/alpha/font/ --random_sequences --include_numbers -c 50 -w 2 -r  -wd 340 -f 50 -b 3 -k 5 -rk -bl 3 -rbl

中文：
找了一个对应的形近字，制作了txt字典

-l ch 中文
-dt 字典文件
-sw 字间距
tc 字体颜色

python run.py -fd ./fonts/cn -l ch -dt D:/pythonProject/TextRecognitionDataGenerator-master/trdg/dicts/chinese.txt -c 100 -w 12 -r  -wd 340 -f 50 -b 3 --output_dir ./cout -sw 0 -k 5 -rk -bl 1 -rbl -tc #000000,#FFFFFF

常用中文生成

python run.py -fd D:/MyDatasets/ocr/alpha/njs/ -dt ./dicts/temp.txt -c 500 -w 10  -f 100 -b 1 --output_dir ./ntt -sw 0 -fi -na 2

yuanjiaqi_k

关注

1
点赞
踩
9

收藏

觉得还不错? 一键收藏
1
评论
自动生成OCR合成数据集步骤——TextRecognitionDataGenerator

生成需要的ocr合成数据
复制链接

扫一扫

专栏目录