基于text render生成文字识别训练样本

最新推荐文章于 2024-07-26 13:24:37 发布

云梦泽没有东边和西边

最新推荐文章于 2024-07-26 13:24:37 发布

阅读量332

点赞数 4

文章标签：人工智能 ocr

本文链接：https://blog.csdn.net/m0_51030297/article/details/140434391

版权

这里写目录标题

示例文件
参数修改
启动
补充

text_renderer Github链接

示例文件

./text_renderer-master/example_data/example.py
生成效果展示：
在这里插入图片描述

参数修改

1. 输入文件路径参数的修改

CURRENT_DIR = Path(os.path.abspath(os.path.dirname(__file__)))
OUT_DIR = CURRENT_DIR / "output"
DATA_DIR = CURRENT_DIR
BG_DIR = DATA_DIR / "bg_gray"
CHAR_DIR = DATA_DIR / "char"
FONT_DIR = DATA_DIR / "font"
FONT_LIST_DIR = DATA_DIR / "font_list"
TEXT_DIR = DATA_DIR / "text"

font_cfg = dict(
    font_dir=FONT_DIR,
    font_list_file=FONT_LIST_DIR / "font_list.txt",
    font_size=(30, 31),
)

2. 生成数据的相关参数修改

def base_cfg(
    name: str, corpus, corpus_effects=None, layout_effects=None, layout=None, gray=True
):
    return GeneratorCfg(

        # 生成图片的数量
        num_image=21,
        save_dir=OUT_DIR / name,
        # RenderCfg:可以设置生成图片的高度
        render_cfg=RenderCfg(
            text_color_cfg=SimpleTextColorCfg(),
            height=48,
            bg_dir=BG_DIR,
            # perspective_transform : Apply Perspective Transform
            perspective_transform=perspective_transform,
            # gray : Save image as gray image
            gray=gray,
            # layout_effects : Effects apply on merged text mask image output by Layout.
            layout_effects=layout_effects,
            # layout : Layout will applied if corpus is a List
            layout=layout,
            corpus=corpus,
            corpus_effects=corpus_effects,
        ),
    )

3. 生成功能的函数构建

def enum_data():
    return base_cfg(
        inspect.currentframe().f_code.co_name,
        # 即使开启灰度，也只有50%的概率产生灰度图，灰度的概率在render.py的norm函数中设置
        gray=True,
        corpus=EnumCorpus(
            EnumCorpusCfg(
                text_paths=[TEXT_DIR / "corpus_generate.txt"],
                filter_by_chars=False,
                chars_file=CHAR_DIR / "keys_v1.txt",
                **font_cfg
            ),
        ),
        corpus_effects=Effects(
            [
                
                # Padding(p=0.8, w_ratio=[0.2, 0.21], h_ratio=[0.7, 0.71], center=True),
                Padding(p=0.4, w_ratio=[0.1, 0.19], h_ratio=[0.7, 0.71], center=True),
                Curve(p=0.3, period=180, amplitude=(4, 5)),
                ImgAugEffect(p=0.5,aug=iaa.Emboss(alpha=(0.9, 1.0), strength=(1.5, 1.6))),
                Line(p=0.5, color_cfg=SimpleTextColorCfg()),# FixedTextColorCfg
                # OneOf([DropoutRand(), DropoutVertical(),DropoutHorizontal()]),
                DropoutRand(p=0.6),
                DropoutVertical(p=0.2),
                DropoutHorizontal(p=0.2),
                # Padding(),
            ]
        ),
        layout=ExtraTextLineLayout(),#SameLineLayout()
        layout_effects=Effects(Line(p=0.5)),
    )

启动

执行命令：

python3 main.py --config example_data/example.py --dataset img --num_processes 1 --log_period 10

补充

1. 随机生成用于训练横向识别和纵向识别的样本：在`./text_renderer-master/text_renderer/render.py`文件中

在gen_single_corpus函数内加入一句话：

    def gen_single_corpus(self) -> Tuple[PILImage, str, PILImage, PILImage]:
        # 水平就True  纵向就False各50%概率
        self.corpus.cfg.horizontal=random.choice([False,True])

2. 随机生成灰度图：在`./text_renderer-master/text_renderer/render.py`文件中

在norm函数内：

    def norm(self, image: np.ndarray) -> np.ndarray:
        if self.cfg.gray:
            # 加了下面这一个判断，即便config中gray是True，也只有一半的概率生成灰度图！
            if random.choice([True, False]):
                print("gray")
                image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

3. 实现逐行读取已经准备好的语料库：在`./text_renderer-master/text_renderer/corpus/enum_corpus.py`文件中

在EnumCorpus(Corpus)类下将原来的get_text函数注释掉，新增下面的函数。

这里注意！修改后执行命令中的线程数必须为1！

# 定义一个全局变量来跟踪当前索引
current_index = 0
class EnumCorpus(Corpus):
	# 修改后的，逐行读enum_text
    def get_text(self):
        global current_index  # 使用全局变量
        if current_index >= len(self.texts):  # 如果索引超出范围，重置为0
            current_index = 0
        text=self.texts[current_index]
        current_index += 1  # 移动到下一个索引
        return text

4. 生成的图片包含深色底，浅色字：在`text_renderer/config/init.py`中修改`class SimpleTextColorCfg(TextColorCfg)`

注：如果用这个类了才在这个类里面改！

@dataclass
class SimpleTextColorCfg(TextColorCfg):
    """
    Randomly use mean value of background image
    """

    alpha: Tuple[int, int] = (110, 255)

    def get_color(self, bg_img: PILImage) -> Tuple[int, int, int, int]:
        np_img = np.array(bg_img)
        mean = np.mean(np_img)

        alpha = np.random.randint(*self.alpha)
        if mean<200:
            r = np.random.randint(200,255)
            g = np.random.randint(200,255)
            b = np.random.randint(200,255)
        else:
            r = np.random.randint(0, int(mean * 0.7))
            g = np.random.randint(0, int(mean * 0.7))
            b = np.random.randint(0, int(mean * 0.7))
        text_color = (r, g, b, alpha)

        return text_color