OpenCV-Python中的简单数字识别OCR

本文翻译自:Simple Digit Recognition OCR in OpenCV-Python

I am trying to implement a "Digit Recognition OCR" in OpenCV-Python (cv2). 我正在尝试在OpenCV-Python(cv2)中实现“数字识别OCR”。 It is just for learning purposes. 它仅用于学习目的。 I would like to learn both KNearest and SVM features in OpenCV. 我想在OpenCV中学习KNearest和SVM功能。

I have 100 samples (ie images) of each digit. 我有每个数字的100个样本(即图像)。 I would like to train with them. 我想和他们一起训练。

There is a sample letter_recog.py that comes with OpenCV sample. OpenCV示例附带了一个示例letter_recog.py But I still couldn't figure out on how to use it. 但我仍然无法弄清楚如何使用它。 I don't understand what are the samples, responses etc. Also, it loads a txt file at first, which I didn't understand first. 我不明白什么是样本,响应等。另外,它首先加载一个txt文件,我首先不明白。

Later on searching a little bit, I could find a letter_recognition.data in cpp samples. 稍后搜索一下,我可以在cpp示例中找到一个letter_recognition.data。 I used it and made a code for cv2.KNearest in the model of letter_recog.py (just for testing): 我使用它并在letter_recog.py模型中为cv2.KNearest创建了一个代码(仅用于测试):

import numpy as np
import cv2

fn = 'letter-recognition.data'
a = np.loadtxt(fn, np.float32, delimiter=',', converters={ 0 : lambda ch : ord(ch)-ord('A') })
samples, responses = a[:,1:], a[:,0]

model = cv2.KNearest()
retval = model.train(samples,responses)
retval, results, neigh_resp, dists = model.find_nearest(samples, k = 10)
print results.ravel()

It gave me an array of size 20000, I don't understand what it is. 它给了我一个20000的数组,我不明白它是什么。

Questions: 问题:

1) What is letter_recognition.data file? 1)letter_recognition.data文件是什么? How to build that file from my own data set? 如何从我自己的数据集构建该文件?

2) What does results.reval() denote? 2) results.reval()表示什么?

3) How we can write a simple digit recognition tool using letter_recognition.data file (either KNearest or SVM)? 3)我们如何使用letter_recognition.data文件(KNearest或SVM)编写简单的数字识别工具?


#1楼

参考:https://stackoom.com/question/dUo4/OpenCV-Python中的简单数字识别OCR


#2楼

For those who interested in C++ code can refer below code. 对于那些对C ++代码感兴趣的人可以参考下面的代码。 Thanks Abid Rahman for the nice explanation. 感谢Abid Rahman的好解释。


The procedure is same as above but, the contour finding uses only first hierarchy level contour, so that the algorithm uses only outer contour for each digit. 该过程与上述相同,但轮廓查找仅使用第一层级轮廓,因此算法仅对每个数字使用外轮廓。

Code for creating sample and Label data 用于创建样本和标签数据的代码

//Process image to extract contour
Mat thr,gray,con;
Mat src=imread("digit.png",1);
cvtColor(src,gray,CV_BGR2GRAY);
threshold(gray,thr,200,255,THRESH_BINARY_INV); //Threshold to find contour
thr.copyTo(con);

// Create sample and label data
vector< vector <Point> > contours; // Vector for storing contour
vector< Vec4i > hierarchy;
Mat sample;
Mat response_array;  
findContours( con, contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE ); //Find contour

for( int i = 0; i< contours.size(); i=hierarchy[i][0] ) // iterate through first hierarchy level contours
{
    Rect r= boundingRect(contours[i]); //Find bounding rect for each contour
    rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,0,255),2,8,0);
    Mat ROI = thr(r); //Crop the image
    Mat tmp1, tmp2;
    resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR ); //resize to 10X10
    tmp1.convertTo(tmp2,CV_32FC1); //convert to float
    sample.push_back(tmp2.reshape(1,1)); // Store  sample data
    imshow("src",src);
    int c=waitKey(0); // Read corresponding label for contour from keyoard
    c-=0x30;     // Convert ascii to intiger value
    response_array.push_back(c); // Store label to a mat
    rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,255,0),2,8,0);    
}

// Store the data to file
Mat response,tmp;
tmp=response_array.reshape(1,1); //make continuous
tmp.convertTo(response,CV_32FC1); // Convert  to float

FileStorage Data("TrainingData.yml",FileStorage::WRITE); // Store the sample data in a file
Data << "data" << sample;
Data.release();

FileStorage Label("LabelData.yml",FileStorage::WRITE); // Store the label data in a file
Label << "label" << response;
Label.release();
cout<<"Training and Label data created successfully....!! "<<endl;

imshow("src",src);
waitKey();

Code for training and testing 培训和测试代码

Mat thr,gray,con;
Mat src=imread("dig.png",1);
cvtColor(src,gray,CV_BGR2GRAY);
threshold(gray,thr,200,255,THRESH_BINARY_INV); // Threshold to create input
thr.copyTo(con);


// Read stored sample and label for training
Mat sample;
Mat response,tmp;
FileStorage Data("TrainingData.yml",FileStorage::READ); // Read traing data to a Mat
Data["data"] >> sample;
Data.release();

FileStorage Label("LabelData.yml",FileStorage::READ); // Read label data to a Mat
Label["label"] >> response;
Label.release();


KNearest knn;
knn.train(sample,response); // Train with sample and responses
cout<<"Training compleated.....!!"<<endl;

vector< vector <Point> > contours; // Vector for storing contour
vector< Vec4i > hierarchy;

//Create input sample by contour finding and cropping
findContours( con, contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE );
Mat dst(src.rows,src.cols,CV_8UC3,Scalar::all(0));

for( int i = 0; i< contours.size(); i=hierarchy[i][0] ) // iterate through each contour for first hierarchy level .
{
    Rect r= boundingRect(contours[i]);
    Mat ROI = thr(r);
    Mat tmp1, tmp2;
    resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR );
    tmp1.convertTo(tmp2,CV_32FC1);
    float p=knn.find_nearest(tmp2.reshape(1,1), 1);
    char name[4];
    sprintf(name,"%d",(int)p);
    putText( dst,name,Point(r.x,r.y+r.height) ,0,1, Scalar(0, 255, 0), 2, 8 );
}

imshow("src",src);
imshow("dst",dst);
imwrite("dest.jpg",dst);
waitKey();

Result 结果

In the result the dot in the first line is detected as 8 and we haven't trained for dot. 在结果中,第一行中的点被检测为8,并且我们没有训练过点。 Also I am considering every contour in first hierarchy level as the sample input, user can avoid it by computing the area. 此外,我正在考虑第一层次级别中的每个轮廓作为样本输入,用户可以通过计算区域来避免它。

结果


#3楼

If you are interested in the state of the art in Machine Learning, you should look into Deep Learning. 如果您对机器学习的最新技术感兴趣,您应该研究深度学习。 You should have a CUDA supporting GPU or alternatively use the GPU on Amazon Web Services. 您应该拥有支持GPU的CUDA,或者在Amazon Web Services上使用GPU。

Google Udacity has a nice tutorial on this using Tensor Flow . Google Udacity使用Tensor Flow提供了一个很好的教程。 This tutorial will teach you how to train your own classifier on hand written digits. 本教程将教您如何在手写数字上训练自己的分类器。 I got an accuracy of over 97% on the test set using Convolutional Networks. 使用Convolutional Networks,我在测试集上获得了超过97%的准确率。


#4楼

Well, I decided to workout myself on my question to solve above problem. 好吧,我决定在我的问题上自己解决以解决上述问题。 What I wanted is to implement a simpl OCR using KNearest or SVM features in OpenCV. 我想要的是在OpenCV中使用KNearest或SVM功能实现简化的OCR。 And below is what I did and how. 下面是我做了什么以及如何做。 ( it is just for learning how to use KNearest for simple OCR purposes). (它仅用于学习如何将KNearest用于简单的OCR目的)。

1) My first question was about letter_recognition.data file that comes with OpenCV samples. 1)我的第一个问题是关于OpenCV样本附带的letter_recognition.data文件。 I wanted to know what is inside that file. 我想知道那个文件里面有什么。

It contains a letter, along with 16 features of that letter. 它包含一个字母,以及该字母的16个特征。

And this SOF helped me to find it. this SOF帮我找到了它。 These 16 features are explained in the paper Letter Recognition Using Holland-Style Adaptive Classifiers . 这些16个特征在Letter Recognition Using Holland-Style Adaptive Classifiers得到了解释。 ( Although I didn't understand some of the features at end) (虽然我最后还不了解一些功能)

2) Since I knew, without understanding all those features, it is difficult to do that method. 2)因为我知道,如果不了解所有这些功能,就很难做到这一点。 I tried some other papers, but all were a little difficult for a beginner. 我试了一些其他的论文,但对初学者来说都有点困难。

So I just decided to take all the pixel values as my features. (I was not worried about accuracy or performance, I just wanted it to work, at least with the least accuracy) (我并不担心准确性或性能,我只是想让它起作用,至少准确度最低)

I took below image for my training data: 我在下面的图片中找到了我的训练数据:

在此输入图像描述

( I know the amount of training data is less. But, since all letters are of same font and size, I decided to try on this). (我知道训练数据的数量较少。但是,由于所有字母都是相同的字体和大小,我决定尝试这个)。

To prepare the data for training, I made a small code in OpenCV. 为了准备培训数据,我在OpenCV中编写了一个小代码。 It does following things: 它做了以下事情:

  1. It loads the image. 它加载图像。
  2. Selects the digits ( obviously by contour finding and applying constraints on area and height of letters to avoid false detections). 选择数字(显然通过轮廓查找和对字母的面积和高度应用约束来避免错误检测)。
  3. Draws the bounding rectangle around one letter and wait for key press manually . 在一个字母周围绘制边界矩形并等待key press manually This time we press the digit key ourselves corresponding to the letter in box. 这次我们按下数字键,对应于方框中的字母。
  4. Once corresponding digit key is pressed, it resizes this box to 10x10 and saves 100 pixel values in an array (here, samples) and corresponding manually entered digit in another array(here, responses). 按下相应的数字键后,它会将此框的大小调整为10x10,并将100个像素值保存在一个数组(此处为样本)中,并将相应的手动输入数字保存在另一个数组中(此处为响应)。
  5. Then save both the arrays in separate txt files. 然后将这两个数组保存在单独的txt文件中。

At the end of manual classification of digits, all the digits in the train data( train.png) are labeled manually by ourselves, image will look like below: 在手动数字分类结束时,列车数据(train.png)中的所有数字都由我们自己手动标记,图像如下所示:

在此输入图像描述

Below is the code I used for above purpose ( of course, not so clean): 以下是我用于上述目的的代码(当然,不是那么干净):

import sys

import numpy as np
import cv2

im = cv2.imread('pitrain.png')
im3 = im.copy()

gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)

#################      Now finding Contours         ###################

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

samples =  np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)

        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            cv2.imshow('norm',im)
            key = cv2.waitKey(0)

            if key == 27:  # (escape to quit)
                sys.exit()
            elif key in keys:
                responses.append(int(chr(key)))
                sample = roismall.reshape((1,100))
                samples = np.append(samples,sample,0)

responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
print "training complete"

np.savetxt('generalsamples.data',samples)
np.savetxt('generalresponses.data',responses)

Now we enter in to training and testing part. 现在我们进入培训和测试部分。

For testing part I used below image, which has same type of letters I used to train. 对于测试我在下面使用的图像部分,它具有我用来训练的相同类型的字母。

在此输入图像描述

For training we do as follows : 对于培训我们做如下

  1. Load the txt files we already saved earlier 加载我们之前保存的txt文件
  2. create a instance of classifier we are using ( here, it is KNearest) 创建我们正在使用的分类器实例(这里是KNearest)
  3. Then we use KNearest.train function to train the data 然后我们使用KNearest.train函数来训练数据

For testing purposes, we do as follows: 出于测试目的,我们执行以下操作:

  1. We load the image used for testing 我们加载用于测试的图像
  2. process the image as earlier and extract each digit using contour methods 如前所述处理图像并使用轮廓方法提取每个数字
  3. Draw bounding box for it, then resize to 10x10, and store its pixel values in an array as done earlier. 为它绘制边界框,然后调整大小为10x10,并将其像素值存储在数组中,如前所述。
  4. Then we use KNearest.find_nearest() function to find the nearest item to the one we gave. 然后我们使用KNearest.find_nearest()函数来查找最接近我们给出的项目。 ( If lucky, it recognises the correct digit.) (如果幸运的话,它会识别正确的数字。)

I included last two steps ( training and testing) in single code below: 我在下面的单个代码中包含了最后两个步骤(培训和测试):

import cv2
import numpy as np

#######   training part    ############### 
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))

model = cv2.KNearest()
model.train(samples,responses)

############################# testing part  #########################

im = cv2.imread('pi.png')
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)
        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            roismall = roismall.reshape((1,100))
            roismall = np.float32(roismall)
            retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
            string = str(int((results[0][0])))
            cv2.putText(out,string,(x,y+h),0,1,(0,255,0))

cv2.imshow('im',im)
cv2.imshow('out',out)
cv2.waitKey(0)

And it worked, below is the result I got: 它起作用,下面是我得到的结果:

在此输入图像描述


Here it worked with 100% accuracy. 在这里它以100%的准确度工作。 I assume this is because all the digits are of same kind and same size. 我认为这是因为所有的数字都是相同种类和相同的大小。

But any way, this is a good start to go for beginners ( I hope so). 但无论如何,这对初学者来说是个好开始(我希望如此)。

  • 4
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值