Detecting Text in Natural Image + YOLOv3+crnn

最新推荐文章于 2024-08-21 09:46:49 发布

山水之间2018

最新推荐文章于 2024-08-21 09:46:49 发布

阅读量5.3k

点赞数 8

分类专栏： OCR 文章标签： ocr识别

本文链接：https://blog.csdn.net/Gavinmiaoc/article/details/83176507

版权

OCR 专栏收录该内容

8 篇文章 1 订阅

订阅专栏

本项目基于yolo3 与crnn 实现中文自然场景文字检测及识别

项目地址：https://github.com/chineseocr/chineseocr

环境部署

python=3.6 pytorch==0.4.1

git clone https://github.com/chineseocr/chineseocr.git
cd chineseocr
sh setup.sh #(cpu sh setpu-cpu.sh)

下载编译darknet(如果直接运用opencv dnn 可忽略darknet的编译)

git clone https://github.com/pjreddie/darknet.git 
mv darknet chineseocr/
##编译对GPU、cudnn的支持 修改 Makefile
#GPU=1
#CUDNN=1
#OPENCV=0
#OPENMP=0
make

修改 darknet/python/darknet.py line 48
root = '/root/'##chineseocr所在目录
lib = CDLL(root+"chineseocr/darknet/libdarknet.so", RTLD_GLOBAL)

注意：这里我使用的是opencv3.4.3，所以我直接略过了darknet的编译。

另外，我没有直接执行 sh setup.sh #(cpu sh setpu-cpu.sh)，而是根据情况分步执行其中的命令。

环境配置方面，我升级了opencv:

pip3 install --upgrade --user opencv-python  -i https://pypi.tuna.tsinghua.edu.cn/simple/

升级了pytorch:

pip3 install --upgrade --user torch torchvision -i https://pypi.tuna.tsinghua.edu.cn/simple/

另外安装了：

pip3 install   Cython  lmdb mahotas  -i https://pypi.tuna.tsinghua.edu.cn/simple/

下载模型文件

模型文件地址:

baidu pan
google drive（暂时无更新）

复制文件夹中的所有文件到models目录

也可将yolo3模型转换为keras版本，详细参考https://github.com/qqwweee/keras-yolo3.git

或者直接运用opencv>=3.4 dnn模块调用darknet模型(参考 opencv_dnn_detect.py)。

上述环境布置好后，记得执行：

pushd detector/utils && sh make.sh && popd

这个是生成.so文件，供后面调用，必须要走这步。

否则将出现以下情况;

执行了后是这样的：

接下来是

web服务启动

cd chineseocr## 进入chineseocr目录
ipython app.py 8080 ##8080端口号，可以设置任意端口

最后，在网页输入：

http://192.168.1.202:8080/ocr

这里ip要根据自己电脑实际情况修改。

识别结果展示

还是很强大的，注意这里我没有做任何优化，后期会考虑优化速度，并且重新训练新一类的数据。

目前模型对仪表类数字识别不是很靠谱，对电影电视剧屏幕字体识别效果也不好，原因是没有训练过此类数据。

参考

扩展

附1.

1.重新训练crnn的具体步骤：

需要准备 1.自己的数据集 2.自己数据集基于的词典

详见：这个crnn的实现https://github.com/Sierkinhane/crnn_chinese_characters_rec

重新训练crnn的版本，train.py可以参考这里：

　https://github.com/meijieru/crnn.pytorch/blob/master/train.py

这个项目下训练的model可以直接拿到本项目使用

但是注意：

修改keys.py alphabet='0123456789abcdefghijklmnopqrstuvwxyz'，crnn.pytorch是识别英文的模型

需要修改crnn.pytorch 默认--alphabet ，用本项目中keys.py文件的alphabet替换

2.如何在你训练的基础上增加训练集？ #43 from(https://github.com/chineseocr/chineseocr/issues/43)

如果你的模型字符集和本项目的不一致，根据下面的代码，调整模型进行训练即可。

具体训练可以参考crnn.pytorch项目https://github.com/meijieru/crnn.pytorch.git

import torch.nn as nn
import torch.nn as nn
import torch.nn.parallel

from crnn.models import utils
from collections import OrderedDict
from config import ocrModel,LSTMFLAG,GPU

def data_parallel(model, input, ngpu):
    if isinstance(input.data, torch.cuda.FloatTensor) and ngpu > 1:
        output = nn.parallel.data_parallel(model, input, range(ngpu))
    else:
        output = model(input)
    return output


class BidirectionalLSTM(nn.Module):

    def __init__(self, nIn, nHidden, nOut, ngpu):
        super(BidirectionalLSTM, self).__init__()
        self.ngpu = ngpu

        self.rnn = nn.LSTM(nIn, nHidden, bidirectional=True)
        self.embedding = nn.Linear(nHidden * 2, nOut)

    def forward(self, input):
        recurrent, _ = utils.data_parallel(
            self.rnn, input, self.ngpu)  # [T, b, h * 2]

        T, b, h = recurrent.size()
        t_rec = recurrent.view(T * b, h)
        output = utils.data_parallel(
            self.embedding, t_rec, self.ngpu)  # [T * b, nOut]
        output = output.view(T, b, -1)

        return output


class CRNN(nn.Module):

    def __init__(self, imgH, nc, nclass, nh, ngpu, n_rnn=2, leakyRelu=False):
        super(CRNN, self).__init__()
        self.ngpu = ngpu
        assert imgH % 16 == 0, 'imgH has to be a multiple of 16'

        ks = [3, 3, 3, 3, 3, 3, 2]
        ps = [1, 1, 1, 1, 1, 1, 0]
        ss = [1, 1, 1, 1, 1, 1, 1]
        nm = [64, 128, 256, 256, 512, 512, 512]

        cnn = nn.Sequential()

        def convRelu(i, batchNormalization=False):
            nIn = nc if i == 0 else nm[i - 1]
            nOut = nm[i]
            cnn.add_module('conv{0}'.format(i),
                           nn.Conv2d(nIn, nOut, ks[i], ss[i], ps[i]))
            if batchNormalization:
                cnn.add_module('batchnorm{0}'.format(i), nn.BatchNorm2d(nOut))
            if leakyRelu:
                cnn.add_module('relu{0}'.format(i),
                               nn.LeakyReLU(0.2, inplace=True))
            else:
                cnn.add_module('relu{0}'.format(i), nn.ReLU(True))

        convRelu(0)
        cnn.add_module('pooling{0}'.format(0), nn.MaxPool2d(2, 2))  # 64x16x64
        convRelu(1)
        cnn.add_module('pooling{0}'.format(1), nn.MaxPool2d(2, 2))  # 128x8x32
        convRelu(2, True)
        convRelu(3)
        cnn.add_module('pooling{0}'.format(2), nn.MaxPool2d((2, 2),
                                                            (2, 1),
                                                            (0, 1)))  # 256x4x16
        convRelu(4, True)
        convRelu(5)
        cnn.add_module('pooling{0}'.format(3), nn.MaxPool2d((2, 2),
                                                            (2, 1),
                                                            (0, 1)))  # 512x2x16
        convRelu(6, True)  # 512x1x16

        self.cnn = cnn
        self.rnn = nn.Sequential(
            BidirectionalLSTM(512, nh, nh, ngpu),
            BidirectionalLSTM(nh, nh, nclass, ngpu)
        )

    def forward(self, input):
        # conv features
        conv = data_parallel(self.cnn, input, self.ngpu)
        b, c, h, w = conv.size()
        assert h == 1, "the height of conv must be 1"
        conv = conv.squeeze(2)
        conv = conv.permute(2, 0, 1)  # [w, b, c]

        # rnn features
        output = utils.data_parallel(self.rnn, conv, self.ngpu)

        return output


def pre_model(nclass, ocrModelPath):
    # @@parm nclass:字符总数
    # @@预训练模型文件

    if torch.cuda.is_available() and GPU:
        model = CRNN(32, 1, nclass + 1, 256, 1).cuda()
    else:
        model = CRNN(32, 1, nclass + 1, 256, 1).cpu()

    state_dict = torch.load(ocrModelPath, map_location=lambda storage, loc: storage)
    new_state_dict = OrderedDict()
    for k, v in state_dict.items():
        name = k.replace('module.', '')  # remove `module.`
        new_state_dict[name] = v

    model.load_state_dict(new_state_dict)
    model.eval()

    return model


def new_model(nclass, preModel):
    # 定义你自己的模型

    if torch.cuda.is_available() and GPU:
        model = CRNN(32, 1, nclass + 1, 256, 1).cuda()
    else:
        model = CRNN(32, 1, nclass + 1, 256, 1).cpu()

    modelDict = model.state_dict()  ##
    preModelDict = preModel.state_dict()  ##
    preModelDict = {k: v for k, v in preModelDict.items() if 'rnn.1' not in k}
    modelDict.update(preModelDict)  ##更新权重
    model.load_state_dict(modelDict)  ##加载预训练模型权重
    return model


nclass = 5530
ocrModelPath = 'ocr.pth'
model = pre_model(nclass, ocrModelPath)
##定义你自己的模型
nclass = 10  ##字符集大小
newmodel = new_model(10, model)

3.文字方向检测

详见：https://github.com/jiangxiluning/chinese-ocr

文字方向检测

基于图像分类，在VGG16模型的基础上，迁移训练0、90、180、270度的文字方向分类模型，详细代码参考angle/predict.py文件，训练图片100000张，准确率95.10%。模型地址百度云下载

文字检测

支持CPU、GPU环境，一键部署，文本检测训练参考(https://github.com/eragonruan/text-detection-ctpn)

4.如何训练yolo文字检测及 crnn ocr文字识别 #59

1）yolo文字训练和其他对象检测训练方式类似，唯一不同的是，后续有一个box聚类，原理参考了ctpn相关代码。此项目标注了“text”,'None'（无用,只是在于增加一个分类，实际中没有用的none），的目的在于如果只训练一个分类，yolo3无法收敛。

2）crnn+ctc训练就是支持不定长识别，训练可以定长与非定长训练，如果你按照crnn.pytorch网络训练，那么输出的最大字符与图像的长度是存在如下关系:nchars = [imgW/4]-2，比如你训练的是10的字，那么其实ctc自动给你填充了很多的补位符，详细可以参考ctc相关原理。如果只是用算法生成训练图像，模型的泛化能力会很弱，可能需要加入一部分真实场景的训练集。

数据标定如何进行的

和ctpn训练类似，只是将ctpn替换为darknet而已

yolo文字检测训练代码很简单哈，训练完全是按照darknet训练方式训练，也可以用https://github.com/qqwweee/keras-yolo3.git
还有很多地方需要优化，比如anchors,后面优化了，会一起放出来哈。box聚类代码在detector目录中，可以自己看看。

如果自己准备业务数据的话，只需要准备ocr训练数据即可（行文本及行图片）

注意：对于分隔比较开的字，现在的yolov3还不能识别到一行，调整model函数中alph参数即可，默认是0.1，你可以设置大一点

5.液晶显示器字符和数码管字符检测效果如何提升？ #44

液晶显示器显示的字符应该输入点式（针式打印），不是常规打印字体，你可以找一些这方面的字体，训练一下。或者先对检测出的图像做一下图像处理，可能效果会好一些

根据字体生成图片，可以参考 https://github.com/JarveeLee/SynthText_Chinese_version.git

字体、字号你可以任意模拟、背景也可以任意生成，这样泛化能力也会更好。更复杂，你还可以运用对抗网络，去构造更复杂的训练集

也可以参考另一个项目，用于文本识别的合成数据生成器：https://github.com/Belval/TextRecognitionDataGenerator

6.中文ocr的训练数据集如何生成呢? #4

一部分算法合成，另外一部分通过调用商业API（百度、微软等等），然后采用验证的方式（对于用一张图片，同一位置，nms大于0.8，如果两个API的识别结果一样，那么就取用，反之舍弃）获取数据，比例控制在8:2就可以。

7.效果不好的时候，需要定位问题出现在文本检测还是识别

文本检测测试

import opencv_dnn_detect
#import darknet_detect
from PIL import Image
import numpy as np
import cv2
##
img = cv2.imread('/home/gavin/Desktop/id_card.jpg')
boxes, scores = opencv_dnn_detect.text_detect(np.array(img))

for bbox in boxes:
    cv2.rectangle(img, (bbox[0], bbox[1]), (bbox[2], bbox[3]), color=(0, 255, 0), thickness=1)

print(len(boxes))
cv2.imshow('tested_1',img)
cv2.waitKey(0)
cv2.imwrite('/home/gavin/Desktop/tested_id_card.jpg',img)

ocr测试

from crnn.crnn import crnnOcr as crnnOcr 
from PIL import Image
partImg = Image.open('line.jpg')##单行文本图像
partImg = partImg.convert('L')
simPred = crnnOcr(partImg)##识别的文本
print(simPred)

附2.

1.重新训练crnn

数据集制作：参见这里

执行：

python3 run.py -w 2 -r -f 64 -wd 280 -bl 2 -rbl -b 3 -t 2 -rs -num -sym -na 2 -k 5 -rk -c 200000 -i texts/lcdisplay.txt

其中参数-na 2 表示按照下面格式生成数据

2: [ID].[EXT] + one file labels.txt containing id-to-label mappings

这里我只选择了num和部分sym随机合成的图片如下（你当然可以选择汉字或者其他字符组合，甚至手写字体都可以）：

当然也可以指定生成的格式：

import random
import re
import string
import os

pool = ''
pool += "0123456789"

with open("texts/lcdisplay.txt", 'w', encoding="utf8") as f:
    for i in range(200000):
        current_string = ""
        #for _ in range(0, random.randint(1, 10)):
        seq_len = random.randint(1, 5)
        current_string += ''.join([random.choice(pool) for _ in range(seq_len)])
        f.write("{}.{}\n".format(current_string,random.choice(pool)))

这个时候run运行需要指定-i参数

至此图片数据集是已经准备好。接下来移步另一个工作，制作lmdb数据，训练crnn。

需要安装 lmdb +wrap_ctc，过程都很简单。

项目源码可以参考这个

注意，getLmdb.py must run in python2.x

# -*- coding: utf-8 -*-
import os
import lmdb  # install lmdb by "pip install lmdb"
import cv2
import numpy as np
import glob

def checkImageIsValid(imageBin):
    if imageBin is None:
        return False
    imageBuf = np.fromstring(imageBin, dtype=np.uint8)
    img = cv2.imdecode(imageBuf, cv2.IMREAD_GRAYSCALE)
    if img is None:
        return False
    imgH, imgW = img.shape[0], img.shape[1]
    if imgH * imgW == 0:
        return False
    return True


def writeCache(env, cache):
    with env.begin(write=True) as txn:
        for k, v in cache.items():
            #txn.put(str(k).encode(), str(v).encode()) #python3
            txn.put(k, v)



def createDataset(outputPath, imagePathList, labelList, lexiconList=None, checkValid=True):
    """
    Create LMDB dataset for CRNN training.
    ARGS:
        outputPath    : LMDB output path
        imagePathList : list of image path
        labelList     : list of corresponding groundtruth texts
        lexiconList   : (optional) list of lexicon lists
        checkValid    : if true, check the validity of every image
    """
    assert (len(imagePathList) == len(labelList))
    nSamples = len(imagePathList)
    print('...................')
    # map_size=1099511627776 定义最大空间是1TB
    env = lmdb.open(outputPath, map_size=1099511627776)

    cache = {}
    cnt = 1
    for i in range(nSamples):
        imagePath = imagePathList[i]
        label = labelList[i]
        if not os.path.exists(imagePath):
            print('%s does not exist' % imagePath)
            continue
        with open(imagePath, 'rb') as f:
            imageBin = f.read()
        if checkValid:
            if not checkImageIsValid(imageBin):
                print('%s is not a valid image' % imagePath)
                continue

        imageKey = 'image-%09d' % cnt
        labelKey = 'label-%09d' % cnt
        cache[imageKey] = imageBin
        cache[labelKey] = label

        if lexiconList:
            lexiconKey = 'lexicon-%09d' % cnt
            cache[lexiconKey] = ' '.join(lexiconList[i])
        if cnt % 1000 == 0:
            writeCache(env, cache)
            cache = {}
            print('Written %d / %d' % (cnt, nSamples))
        cnt += 1
    nSamples = cnt - 1
    cache['num-samples'] = str(nSamples)
    writeCache(env, cache)
    print('Created dataset with %d samples' % nSamples)


def read_text(path):
    with open(path) as f:
        text = f.read()
    text = text.strip()

    return text


if __name__ == '__main__':

    outputPath = './data/lmdb/train'
    imgdata = open("./data/trainlabels.txt")
    imagePathList = []
    imgLabelLists = []
    for line in list(imgdata):
        label = line.split()[1]
        image = line.split()[0]
        imgLabelLists.append(label)
        imagePathList.append('/home/gavin/Dataset/train_images/' + image)

    print(len(imagePathList))
    print(len(imgLabelLists))
    createDataset(outputPath, imagePathList, imgLabelLists, lexiconList=None, checkValid=True)

train和val需要分别生成，修改上面的代码即可。完成后得到.mdb文件：

完成后可以进行训练，训练前现检查各个参数情况，记得alphabet要改成你自己的，比如我的：

alphabet = '0123456789%.°C'

最后，送上实例：

python3 crnn_main.py --adadelta --ngpu 1 --crnn ./expr/model.pth

python3 demo.py  --model_path ./expr/model.pth