批量生成印刷字体字库

最新推荐文章于 2020-07-18 14:39:42 发布

shoot-I

最新推荐文章于 2020-07-18 14:39:42 发布

阅读量2.5k

点赞数 2

分类专栏：汉字识别文章标签：图像识别印刷字体数据集样本创造汉字识别

本文链接：https://blog.csdn.net/weixin_41827761/article/details/85093099

版权

汉字识别专栏收录该内容

0 篇文章 1 订阅

订阅专栏

本文为作者原创，未经允许，不得转载。

主要目的

本文主要是为了做印刷体字体识别的前期工作做准备，后期需要大量字体样本做神经网络训练，但缺乏印刷体样本，因此特地写了个程序自动生成所需要的样本。

##主要实现过程

本文主要分为三个部分实现，主要包括文本字库预处理，生成图片字已及保存到相应文件夹。其中文本字库采用的是百度上搜索到的《常用汉字3500个》，生成图片用的是PIL模块。

程序代码：

#文件目录一
# -*- coding: utf-8 -*-
#获取字体文件名，字体文件可以在百度下载或者电脑字体目录下寻找
import os
def Word_Font():
    Word_Font_Path = 'D:\pycharm11\文字生成\Word_Font'
    dirs = os.listdir(Word_Font_Path)
    Word_Font_List=[]
    for dir in dirs:
        Word_Font_List.append(dir)
    return Word_Font_List

#文件目录二
import re
f = open ("D:\pycharm11\文字生成\常用汉字3500个.txt","r")
lines = f.readlines()  # 读取全部内容 ，并以列表方式返回
Library = []
for line in lines:
    line = line.split('\n')
    line = re.sub(r'\n', "", line[0])
    line = re.sub(':', "", line)
    line=line.replace(" ", "")
    line = line.lstrip('：')
    line = line[:0] + line[13:]
    for i in line:
       Library.append(i)
size_L=len(Library)

#主程序目录
from PIL import Image, ImageDraw, ImageFont, ImageOps
import os
import re
import Word_Font
#读取字体文件
Word_Font_List = Word_Font.Word_Font()

#选择字体以及图片参数的初始值
class LetterImage():
    def __init__(self, imgSize=(0, 0), imgMode='RGB', bg_color=(0, 0,0), fg_color=(255, 255, 255),
                 fontsize=10,Word_Font=Word_Font_List[1]):
        self.imgSize = imgSize
        self.imgMode = imgMode
        self.fontsize = fontsize
        self.bg_color = bg_color
        self.fg_color = fg_color
        self.font = ImageFont.truetype(Word_Font, fontsize)

#设定生成图片大小
    def GenLetterImage(self, letters):
        self.letters = letters
        (self.letterWidth, self.letterHeight) = self.font.getsize(letters)
        if self.imgSize == (0, 0):
            self.imgSize = (self.letterWidth - 0, self.letterHeight +15) #底边边距
        self.imgWidth, self.imgHeight = self.imgSize
        self.img = Image.new(self.imgMode, self.imgSize, self.bg_color)
        self.drawBrush = ImageDraw.Draw(self.img)
        textY0 = (self.imgHeight - self.letterHeight-2 )
        textY0 = int(textY0)
        textX0 = int((self.imgWidth - self.letterWidth-2 )) #显示窗口坐标
        self.drawBrush.text((textX0, textY0), self.letters, fill=self.fg_color, font=self.font)

if __name__ == '__main__':
    f = open("D:\pycharm11\文字生成\常用汉字3500个.txt", "r")
    lines = f.readlines()  # 读取全部内容 ，并以列表方式返回
    Library = []
    for line in lines:
        line = line.split('\n')
        line = re.sub(r'\n', "", line[0])
        line = re.sub(':', "", line)
        line = line.replace(" ", "")
        line = line.lstrip('：')
        # line = line[:0] + line[13:]
        for i in line:
            Library.append(i)
    letterList = []
    #---------------将图片参数追加到列表以便后期调用--------------------
    for j in range (0,len(Word_Font_List),1):
        letterList.append(LetterImage(bg_color=(0, 120, 0), fontsize=100,Word_Font=Word_Font_List[j]))
        print(Word_Font_List[j])
        num_letter = len(Library)  #字体数量
        # ---------------------------创建文件夹------------------------------
        File_name = re.sub(r'\.', '_', Word_Font_List[j])
        paths = os.getcwd()[:-4] + '文字生成\\'+File_name  # 获取此py文件路径，在此路径选创建文件夹
        if not os.path.exists(paths):
            os.makedirs(paths)
        paths = paths +"\\"
        # -----------------在某一种字体下，对字库遍历，生成相应字体图片------------------------
        for i in range(num_letter-1):
            letterList[j].GenLetterImage(Library[i])
            grayImg = ImageOps.grayscale(letterList[j].img)
            grayImg.save(paths+str(i)+".png")

流程图:

总结：
本文在在字库文本库里中，处理得不干净，主要有里面的冒号以及重复字处理，后期有待改进，另外程序中没有写入判断机制，比如创建的文件夹是否存在，里面的文件是否要重新更新。

shoot-I

关注

2
点赞
踩
15

收藏

觉得还不错? 一键收藏
10
评论
批量生成印刷字体字库

本文为作者原创，未经允许，不得转载。主要目的本文主要是为了做印刷体字体识别的前期工作做准备，后期需要大量字体样本做神经网络训练，但缺乏印刷体样本，因此特地写了个程序自动生成所需要的样本。##主要实现过程本文主要分为三个部分实现，主要包括文本字库预处理，生成图片字已经保存到相应文件夹。其中文本字库采用的是百度上搜索到的《常用汉字3500个》，生成图片用的是PIL模块。程序代码：#文件目录...
复制链接

扫一扫