Multi-digits Recognition Using ConVNet on Mobile categories:

这是Stanford,Mobile Computer Vision课程的一个final report

简介

使用DeepBeliefSDK和opencv等工具,使用ConVNet模型做了一个Android APP。功能是能够从一张图中识别0-9几个数字。因为在移动设备上,设备的运算速度和memory都受到了限制,所以作者设计了一个简单的卷积网络(两层卷积两层maxpooling),并且使用了batching来加速识别速度。大致方法就是提取每一个digit的patch,分别输入CNN进行识别。使用的训练集是MNIST。

流程

  1. PreProcess: 将图片转换成灰度图,使用Canny进行边缘检测去定位图片中digits的位置,得到bounding boxes。接下来要将图片转换成二值图,首先将bounding boxes外的区域设置为黑色,然后在bounding boxes中,使用下面图中的方程求出threshold,绘出bounding boxes内部的二值图(注意pixels只取bounding boxes内的)。注意,这个app只适用于背景比较干净的场合(比如一张白纸上写几个数字)。

  2. Segment:提取到digits的patch,将图像还原成28 * 28的大小,以满足CNN的输入尺寸要求。

  3. Batching CNN:意思就是:CNN在test的时候,在CPU上,本来是输入一张图片,所以在FC层执行的是vector-matrix 乘法,那么若是先把多张图片经过卷积层得到各自vector,再将vector组合成matrix,然后在FC层执行matrix-matrix操作。因此,run-time得到提升。提升比例见下图:

总的流程图:

源码及文章

文章: 访问密码 ab86

源码

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Recognizing arbitrary multi-character text in unconstrained natural photographs is a hard problem. In this paper, we address an equally hard sub-problem in this domain viz. recognizing arbitrary multi-digit numbers from Street View imagery. Traditional approaches to solve this problem typically separate out the localization, segmentation, and recognition steps. In this paper we propose a unified approach that integrates these three steps via the use of a deep convolutional neural network that operates directly on the image pixels. We employ the DistBelief (Dean et al., 2012) implementation of deep neural networks in order to train large, distributed neural networks on high quality images. We find that the performance of this approach increases with the depth of the convolutional network, with the best performance occurring in the deepest architecture we trained, with eleven hidden layers. We evaluate this approach on the publicly available SVHN dataset and achieve over 96% accuracy in recognizing complete street numbers. We show that on a per-digit recognition task, we improve upon the state-of-theart, achieving 97.84% accuracy. We also evaluate this approach on an even more challenging dataset generated from Street View imagery containing several tens of millions of street number annotations and achieve over 90% accuracy. To further explore the applicability of the proposed system to broader text recognition tasks, we apply it to transcribing synthetic distorted text from a popular CAPTCHA service, reCAPTCHA. reCAPTCHA is one of the most secure reverse turing tests that uses distorted text as one of the cues to distinguish humans from bots. With the proposed approach we report a 99.8% accuracy on transcribing the hardest category of reCAPTCHA puzzles. Our evaluations on both tasks, the street number recognition as well as reCAPTCHA puzzle transcription, indicate that at specific operating thresholds, the performance of the proposed system is comparable to, and in some cases exceeds, that of human operators.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值