# Abstract

Multilayer Neural Networks trained with the backpropagation algorithm constitute the best example of a successful Gradient-Based Learning technique. Given an appropriate network architecture, Gradient-Based Learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional Neural Networks, that are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques.

Real-life document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called Graph Transformer Networks (GTN), allows such multi-module systems to be trained globally using Gradient-Based methods so as to minimize an overall performance measure.
Two systems for on-line handwriting recognition are described. Experiments demonstrate the advantage of global training, and the fexibility of Graph Transformer Networks.

A Graph Transformer Network for reading bank check is also described. It uses Convolutional Neural Network character recognizers combined with global training techniques to provides record accuracy on business and personal checks It is deployed commercially and reads several million checks per day.

# I. Introduction

Over the last several years, machine learning techniques, particularly when applied to neural networks, have played an increasingly important role in the design of pattern recognition systems.
In fact, it could be argued that the availability of learning techniques has been a crucial factor in the recent success of pattern recognition applications such as continuous speech recognition and handwriting recognition.

The main message of this paper is that better pattern recognition systems can be built by relying more on automatic learning, and less on hand-designed heuristics.
This is made possible by recent progress in machine learning and computer technology.
Using character recognition as a case study, we show that hand-crafted feature extraction can be advantageously replaced by carefully designed learning machines that operate directly on pixel images.
Using document understanding as a case study, we show that the traditional way of building recognition systems by manually integrating individually designed modules can be replaced by a unified and well-principled design paradigm, called Graph Transformer Networks, that allows training all the modules to optimize a global performance criterion

Since the early days of pattern recognition it has been known that the variability and richness of natural data, be it speech, glyphs, or other types of patterns, make it almost impossible to build an accurate recognition system entirely by hand.
Consequently, most pattern recognition systems are built using a combination of automatic learning techniques and hand-crafted algorithms.
The usual method of recognizing individual patterns consists in dividing the system into two main modules shown in figure 1.The first module, called the feature extractor, transforms the input patterns so that they can be represented by lowdimensional vectors or short strings of symbols that (a) can be easily matched or compared, and (b) are relatively invariant with respect to transformations and distortions of the input patterns that do not change their nature.
The feature extractor contains most of the prior knowledge and is rather specific to the task. It is also the focus of most of the design effort, because it is often entirely hand-crafted.
The classifier, on the other hand, is often general-purpose and trainable. One of the main problems with this approach is that the recognition accuracy is largely determined by the ability of the designer to come up with an ppropriate set of features.
This turns out to be a daunting task which, unfortunately, must be redone for each new problem. A large amount of the pattern recognition literature is devoted to describing and comparing the relative, merits of different featrue sets for particular tasks.

特征提取器包含了大部分的先验知识，并且是特定于任务的,它也是大部分设计工作的重点，因为它通常是完全人工制作的。
另一方面，分类器通常是通用的和可训练的。这种方法的一个主要问题是识别的准确性很大程度上取决于设计者提出适当的特征集的能力。

Historically, the need for appropriate feature extractors was due to the fact that the learning techniques used by the classifiers were limited to low-dimensional spaces with easily separable classes[1].
A combination of three factors have changed this vision over the last decade.
First, the availability of low-cost machines with fast arithmetic units allows to rely more on brute-force “numerical” methods than on algorithmic refinements. Second, the availability of large databases for problems with a large market and wide interest, such as handwriting recognition, has enabled designers to rely more on real data and less on hand-crafted feature extraction to build recognition systems. The third and very important factor is the availability ofpowerful machine learning techniques that can handle high-dimensional inputs and can generate intricate decision functions when fed with these large data sets. It can be argued that the recent progress in the accuracy of speech and handwriting recognition systems can be attributed in large part to an increased reliance on learning techniques and largetraining data sets. As evidence to this fact, a large proportion of modern commercial OCR systems use some form of multi-layer Neural Network trained with back-propagation.

In this study, we consider the tasks of handwritten character recognition (Sections I and II) and compare the performance of several learning techniques on a benchmark data set for handwritten digit recognition (Section II1).While more automatic learning is beneficial, no learning technique can succeed without a minimal amount of prior knowledge about the task.
In the case of multi-layer neural networks, a good way to incorporate knowledge is to tailor its architecture to the task. Convolutional Neural Networks [2] introduced in Section II are an example of specialized neural network architectures which incorporate knowledge about the invariances of 2D shapes by using local connection patterns, and by imposing constraints on the weights. A comparison of several methods for isolated handwritten digit recognition is presented in section III. To go from the recognition of individual characters to the recognition of words and sentences in documents, the idea of combining multiple modules trained to reduce the overall error is introduced in Section IV. Recognizing variable-length objects such as handwritten words using multi-module systems is best done if manipulate directed graphs.

This leads to the concept of trainable Graph Transformer Network (GTN) also introduced in Section IV. Section V describes the now classical method of heuristic over-segmentation for recognizing words or other character strings. Discriminative and non-discriminative gradient-based techniques for training a recognizer at the word level without requiring manual segmentation and labeling are presented in Section VI. Section VII presents the promising Space-Displacement Neural Network approach that eliminates the need for segmentation heuristics by scanning a recognizer at all possible locations on the input. In section VIII, it is shown that trainable Graph Transformer Networks can be formulated as multiple generalized transductions, based on a general graph composition algorithm. The connections between GTNs and Hidden Markov Models, commonly used in speech recognition is also treated. Section IX describes a globally trained GTN system for recognizing handwriting entered in a pen computer. This problem is known as “on-line” handwriting recognition, since the machine must produce immediate feedback as the user writes. The core of the system is a Convolutional Neural Network.The results clearly demonstrate the advantages of training a recognizer at the word level, ratherthantrainingit on pre-segmented, hand-labeled, isolated characters. Section X describes a complete GTN-based system for reading handwritten and machine-printed bank checks. The core of the system is the Convolutional Neural Network called LeNet-5 described in Section II. This system is in commercial use in the NCR Corporation line of checkrecognition systems for the banking industry. It is reading millions of checks per month in several banks across the United States.

系统的核心是第二节中描述的卷积神经网络LeNet-5。

## A Learn from Data

Zp, W

where ZP is the p-th input pattern, and W represents the ollection of adjustable parameters in the system.

Yp

In a pattern recognition setting, the output YP may be interpreted as the recognized class label of patternZP, or as scores or probabilities associated with each class.

Dp

DP, the “correct” or desired output for patn ZP, and the output produced by the system

P是训练的样本数，h是计算的有效能力或者机器的复杂程度，α是在0.5到1之间的数，k是常数。

# II CNN for Isolated Character Recognition

## B LeNet-5

### layer C3

C3有16个feature maps，每个单元与S2有5×5个连接，

6 *(3 *5 *5+1）+6 * (4 *5 *5+1）+3 * (4 * 5 * 5 +1)+1 * (6 * 5 * 5 + 1 ) = 1516

### layer S4

S4为第二个下采样层，与S2算法类似，由C3与2*2卷积核卷积而得（no overlapping）。则需要16 * 2 = 32 个可训练参数，有16 * （2 * 2 + 1 ） * 5 * 5 = 2000个连接

### layer C5

C5是第三个卷积层，有120个feature map，还是5 * 5 卷积核，因为S4也是5 * 5，所以每个feature map 是1 * 1，与S4全连接，之所以C5不被称为全连接层，是因为，如果输入更大，那这里C5的大小可能不是1 * 1。可训练的参数为120 * (16 * 5 * 5 + 1) = 120 * 401 = 48120个连接

### layer F6

squashing函数如文献中的公式(6)所示，公式中的A为函数幅度，S决定了原点处的斜率。这个函数是奇函数，有两条水平渐近线±A，这里的A被选为1.7159

## C 损失函数

yDp是第Dp个RBF单元的输出，亦即对应于输入Zp正确分类的输出。这种损失函数适用于大多数情况，但是缺少三个重要的特性：
1）如果我们使RBF的参数有适应性，E(W)有一个微不足道的，但完全不可接受的解，在这个解中，所有的RBF参数向量都相等，且F6的状态为常数，等于该参数向量。这样所有的RBF输出都为0，如果不允许RBF权值适应，则不会发生这种坍缩现象。（我的理解是，如果有适应性，就会产生一个通解，这个解造成RBF的输出都是0）
2）类与类之间没有竞争，可以通过一个更有区别的训练准则来实现，即最大后验估计（maximum a posteriori，MAP），类似于有时用来训练HMMs的最大互信息准则（Maximum Mutual Information，MMH）假设输入图像来自一个类，或者哪个类都不属于（即rubbish），那么它就相当于将正确类的后验概率最大化（或者最小化正确类概率的对数）在惩罚方面，它既可以像MSE那样将正确分类pushing down，也可以将错误结果pulls up

03-05 1万+

05-09 2608
02-29 85
12-29 3989
08-06 6万+
04-03 4864
03-09 3万+
04-16 2771
©️2020 CSDN 皮肤主题: 游动-白 设计师:上身试试