【multi-digit】街景识别代码到中文车牌识别的迁移

背景论文:

Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

(https://arxiv.org/abs/1312.6082)

英文车牌迁移参考:http://matthewearl.github.io/2016/05/06/cnn-anpr/

上述博文所描述过程大致为:

  1. 利用gen.py生成1000张训练图片,图片组成为一个生成的车牌,加上随机的背景,并且添加上高斯噪声,旋转等,并且使用0,1标注该车牌是否完全包含在图片中(位置,大小等);
  2. 利用生成的1000张图片进行训练,此处可能需要进行的修改是(视tensorflow版本决定是不是需要修改)即添加logits和labels(如图):
  3. 窗口思想,利用滑动窗口在一幅尺寸较大的图当中截取合适的部分来应对尺寸变换。

 

训练结束后进行评估,此代码结果跑了两张从网上的图片,似乎效果还不错:

 

 

 

既然效果还不错,那可以试着迁移到中文车牌上了,

修改后的代码地址:

 

代码修改中所遇到的最主要问题是编码问题!在Ubuntu下默认的编码是ASCII,windows下默认编码是gbk,所以在代码修改过程中,

为了输出中文,需要

  1. 对字符串进行utf-8编码的转换。
  2. 对中文的读入和输出也需要对字符串的编码方式进行转换。由于python3中不再对str支持decode和encode操作,所以将文件的读入输出由imread和imwrite修改成imencode和imdecode。

之后的训练过程和之前并没有什么差别。具体修改见项目代码。

转载于:https://www.cnblogs.com/Annbless/p/9065641.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Recognizing arbitrary multi-character text in unconstrained natural photographs is a hard problem. In this paper, we address an equally hard sub-problem in this domain viz. recognizing arbitrary multi-digit numbers from Street View imagery. Traditional approaches to solve this problem typically separate out the localization, segmentation, and recognition steps. In this paper we propose a unified approach that integrates these three steps via the use of a deep convolutional neural network that operates directly on the image pixels. We employ the DistBelief (Dean et al., 2012) implementation of deep neural networks in order to train large, distributed neural networks on high quality images. We find that the performance of this approach increases with the depth of the convolutional network, with the best performance occurring in the deepest architecture we trained, with eleven hidden layers. We evaluate this approach on the publicly available SVHN dataset and achieve over 96% accuracy in recognizing complete street numbers. We show that on a per-digit recognition task, we improve upon the state-of-theart, achieving 97.84% accuracy. We also evaluate this approach on an even more challenging dataset generated from Street View imagery containing several tens of millions of street number annotations and achieve over 90% accuracy. To further explore the applicability of the proposed system to broader text recognition tasks, we apply it to transcribing synthetic distorted text from a popular CAPTCHA service, reCAPTCHA. reCAPTCHA is one of the most secure reverse turing tests that uses distorted text as one of the cues to distinguish humans from bots. With the proposed approach we report a 99.8% accuracy on transcribing the hardest category of reCAPTCHA puzzles. Our evaluations on both tasks, the street number recognition as well as reCAPTCHA puzzle transcription, indicate that at specific operating thresholds, the performance of the proposed system is comparable to, and in some cases exceeds, that of human operators.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值