深度学习之OCR(一)——多数字识别_Multi-digit Number Recognition

Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks.
Ian J. Goodfellow, Yaroslav Bulatov, Julian Ibarz, Sacha Arnoud, Vinay Shet.
ICLR 2014.

模型介绍

本文利用CNN进行门牌号识别,并且规定序列最大长度为N(实验设N=5)。最后一层包含6个softmax,第一个softmax预测是的序列长度L,L可取7个值:{0, 1, 2, 3, 4, 5, 大于5}。后面5个softmax分别表示对应位置上的数字,每个位置上的数字可取10个值:{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}。在训练时,对于digit不存在的位置,不反向传播任何误差信息。对于存在的digit和L,其误差为常见的负log似然。本文所用网络结构如下所示:

这里写图片描述

文中第一个卷积层使用maxout激活函数,其他层用ReLU激活函数。这里修改成所有层都使用ReLU激活函数。

数据集:
SVHN数据集

  • 0
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 10
    评论
Recognizing arbitrary multi-character text in unconstrained natural photographs is a hard problem. In this paper, we address an equally hard sub-problem in this domain viz. recognizing arbitrary multi-digit numbers from Street View imagery. Traditional approaches to solve this problem typically separate out the localization, segmentation, and recognition steps. In this paper we propose a unified approach that integrates these three steps via the use of a deep convolutional neural network that operates directly on the image pixels. We employ the DistBelief (Dean et al., 2012) implementation of deep neural networks in order to train large, distributed neural networks on high quality images. We find that the performance of this approach increases with the depth of the convolutional network, with the best performance occurring in the deepest architecture we trained, with eleven hidden layers. We evaluate this approach on the publicly available SVHN dataset and achieve over 96% accuracy in recognizing complete street numbers. We show that on a per-digit recognition task, we improve upon the state-of-theart, achieving 97.84% accuracy. We also evaluate this approach on an even more challenging dataset generated from Street View imagery containing several tens of millions of street number annotations and achieve over 90% accuracy. To further explore the applicability of the proposed system to broader text recognition tasks, we apply it to transcribing synthetic distorted text from a popular CAPTCHA service, reCAPTCHA. reCAPTCHA is one of the most secure reverse turing tests that uses distorted text as one of the cues to distinguish humans from bots. With the proposed approach we report a 99.8% accuracy on transcribing the hardest category of reCAPTCHA puzzles. Our evaluations on both tasks, the street number recognition as well as reCAPTCHA puzzle transcription, indicate that at specific operating thresholds, the performance of the proposed system is comparable to, and in some cases exceeds, that of human operators.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 10
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值