OpenCV计算机视觉实战 - 文档扫描OCR识别【项目实战】

#################################################################
纸上得来终觉浅,绝知此事要躬行
B站视频
新课件:https://pan.baidu.com/s/1frWHqCVGR2VTn5QBtW4lPA 提取码:xh02
老课件:https://pan.baidu.com/s/1Wi31FxSPBqWiuJX9quX-jA 提取码:bbfg
################################################################

检测流程:

边缘检测 -> 获得轮廓 -> 透视变换(即放平,包括平移旋转反转等) -> OCR识别
在这里插入图片描述

一、边缘检测

if __name__ == "__main__":
	# 读取输入
	image = cv2.imread(args["image"])
	# resize 坐标也会相同变化
	ratio = image.shape[0] / 500.0
	orig = image.copy()
image <span class="token operator">=</span> resize<span class="token punctuation">(</span>orig<span class="token punctuation">,</span> height <span class="token operator">=</span> <span class="token number">500</span><span class="token punctuation">)</span>	<span class="token comment"># 同比例变化:h指定500,w也会跟着变化</span>

<span class="token comment"># 预处理</span>
gray <span class="token operator">=</span> cv2<span class="token punctuation">.</span>cvtColor<span class="token punctuation">(</span>image<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>COLOR_BGR2GRAY<span class="token punctuation">)</span>
gray <span class="token operator">=</span> cv2<span class="token punctuation">.</span>GaussianBlur<span class="token punctuation">(</span>gray<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">5</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span>
edged <span class="token operator">=</span> cv2<span class="token punctuation">.</span>Canny<span class="token punctuation">(</span>gray<span class="token punctuation">,</span> <span class="token number">75</span><span class="token punctuation">,</span> <span class="token number">200</span><span class="token punctuation">)</span>	<span class="token comment"># 边缘检测</span>

<span class="token comment"># 展示预处理结果</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"STEP 1: 边缘检测"</span><span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">"Image"</span><span class="token punctuation">,</span> image<span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">"Edged"</span><span class="token punctuation">,</span> edged<span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>waitKey<span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>destroyAllWindows<span class="token punctuation">(</span><span class="token punctuation">)</span>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

在这里插入图片描述
注:

  • Line 5:缩放比例 ratio 也可以resize后再计算,透视变换中还原到原始的原图上时,需要用到ratio

二、获得轮廓

在main函数下

	# 轮廓检测
	cnts = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)[0]
	# cnts中可检测到许多个轮廓,取前5个最大面积的轮廓
	cnts = sorted(cnts, key = cv2.contourArea, reverse = True)[:5]
<span class="token comment"># 遍历轮廓</span>
<span class="token keyword">for</span> c <span class="token keyword">in</span> cnts<span class="token punctuation">:</span>	<span class="token comment"># C表示输入的点集</span>
	<span class="token comment"># 计算轮廓近似</span>
	peri <span class="token operator">=</span> cv2<span class="token punctuation">.</span>arcLength<span class="token punctuation">(</span>c<span class="token punctuation">,</span> <span class="token boolean">True</span><span class="token punctuation">)</span>
	<span class="token comment"># epsilon表示从原始轮廓到近似轮廓的最大距离,它是一个准确度参数</span>
	<span class="token comment"># True表示封闭的</span>
	approx <span class="token operator">=</span> cv2<span class="token punctuation">.</span>approxPolyDP<span class="token punctuation">(</span>c<span class="token punctuation">,</span> <span class="token number">0.02</span> <span class="token operator">*</span> peri<span class="token punctuation">,</span> <span class="token boolean">True</span><span class="token punctuation">)</span>
	<span class="token keyword">print</span><span class="token punctuation">(</span>approx<span class="token punctuation">,</span>approx<span class="token punctuation">.</span>shape<span class="token punctuation">)</span>
	<span class="token comment"># 4个点的时候就拿出来,screenCnt是这4个点的坐标</span>
	<span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>approx<span class="token punctuation">)</span> <span class="token operator">==</span> <span class="token number">4</span><span class="token punctuation">:</span>	<span class="token comment"># 近似轮廓得到4个点,意味着可能得到的是矩形</span>
		screenCnt <span class="token operator">=</span> approx	<span class="token comment"># 并且最大的那个轮廓是很有可能图像的最大外围</span>
		<span class="token keyword">break</span>

<span class="token comment"># 展示结果</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"STEP 2: 获取轮廓"</span><span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>drawContours<span class="token punctuation">(</span>image<span class="token punctuation">,</span> <span class="token punctuation">[</span>screenCnt<span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">255</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">"Outline"</span><span class="token punctuation">,</span> image<span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>waitKey<span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>destroyAllWindows<span class="token punctuation">(</span><span class="token punctuation">)</span>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24

在这里插入图片描述

三、透视变换

在main函数下

	# 透视变换
	# 4个点的坐标 即4个(x,y),故reshape(4,2)
	# 坐标是在变换后的图上得到,要还原到原始的原图上,需要用到ratio
	print(screenCnt.shape)
	warped = four_point_transform(orig, screenCnt.reshape(4, 2) * ratio)

 
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • reshape 其实是获得一个新矩阵,不改变screenCnt的形状
    在这里插入图片描述

同一个py文件中,在main函数前,透视变换函数 four_point_transform

def order_points(pts):
	# 初始化4个坐标点的矩阵
	rect = np.zeros((4, 2), dtype = "float32")
<span class="token comment"># 按顺序找到对应坐标0123分别是 左上,右上,右下,左下</span>
<span class="token comment"># 计算左上,右下</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"pts :\n "</span><span class="token punctuation">,</span>pts<span class="token punctuation">)</span>
s <span class="token operator">=</span> pts<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>axis <span class="token operator">=</span> <span class="token number">1</span><span class="token punctuation">)</span>		<span class="token comment"># 沿着指定轴计算第N维的总和</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"s : \n"</span><span class="token punctuation">,</span>s<span class="token punctuation">)</span>
rect<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">=</span> pts<span class="token punctuation">[</span>np<span class="token punctuation">.</span>argmin<span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">]</span>	<span class="token comment"># 即pts[1]</span>
rect<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span> <span class="token operator">=</span> pts<span class="token punctuation">[</span>np<span class="token punctuation">.</span>argmax<span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">]</span>	<span class="token comment"># 即pts[3]</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"第一次rect : \n"</span><span class="token punctuation">,</span>rect<span class="token punctuation">)</span>
<span class="token comment"># 计算右上和左下</span>
diff <span class="token operator">=</span> np<span class="token punctuation">.</span>diff<span class="token punctuation">(</span>pts<span class="token punctuation">,</span> axis <span class="token operator">=</span> <span class="token number">1</span><span class="token punctuation">)</span>	<span class="token comment"># 沿着指定轴计算第N维的离散差值</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"diff : \n"</span><span class="token punctuation">,</span>diff<span class="token punctuation">)</span>
rect<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">=</span> pts<span class="token punctuation">[</span>np<span class="token punctuation">.</span>argmin<span class="token punctuation">(</span>diff<span class="token punctuation">)</span><span class="token punctuation">]</span>	<span class="token comment"># 即pts[0]</span>
rect<span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">]</span> <span class="token operator">=</span> pts<span class="token punctuation">[</span>np<span class="token punctuation">.</span>argmax<span class="token punctuation">(</span>diff<span class="token punctuation">)</span><span class="token punctuation">]</span>	<span class="token comment"># 即pts[2]</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"第二次rect :\n "</span><span class="token punctuation">,</span>rect<span class="token punctuation">)</span>
<span class="token keyword">return</span> rect

def four_point_transform(image, pts):
# 获取输入坐标点
rect = order_points(pts)
(A, B, C, D) = rect
# (tl, tr, br, bl) = rect

<span class="token comment"># 计算输入的w和h值</span>
w1 <span class="token operator">=</span> np<span class="token punctuation">.</span>sqrt<span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token punctuation">(</span>C<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">-</span> D<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator">**</span> <span class="token number">2</span><span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token punctuation">(</span><span class="token punctuation">(</span>C<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">-</span> D<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator">**</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
w2 <span class="token operator">=</span> np<span class="token punctuation">.</span>sqrt<span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token punctuation">(</span>B<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">-</span> A<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator">**</span> <span class="token number">2</span><span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token punctuation">(</span><span class="token punctuation">(</span>B<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">-</span> A<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator">**</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
w <span class="token operator">=</span> <span class="token builtin">max</span><span class="token punctuation">(</span><span class="token builtin">int</span><span class="token punctuation">(</span>w1<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token builtin">int</span><span class="token punctuation">(</span>w2<span class="token punctuation">)</span><span class="token punctuation">)</span>

h1 <span class="token operator">=</span> np<span class="token punctuation">.</span>sqrt<span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token punctuation">(</span>B<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">-</span> C<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator">**</span> <span class="token number">2</span><span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token punctuation">(</span><span class="token punctuation">(</span>B<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">-</span> C<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator">**</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
h2 <span class="token operator">=</span> np<span class="token punctuation">.</span>sqrt<span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token punctuation">(</span>A<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">-</span> D<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator">**</span> <span class="token number">2</span><span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token punctuation">(</span><span class="token punctuation">(</span>A<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">-</span> D<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator">**</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
h <span class="token operator">=</span> <span class="token builtin">max</span><span class="token punctuation">(</span><span class="token builtin">int</span><span class="token punctuation">(</span>h1<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token builtin">int</span><span class="token punctuation">(</span>h2<span class="token punctuation">)</span><span class="token punctuation">)</span>

<span class="token comment"># 变换后对应坐标位置</span>
dst <span class="token operator">=</span> np<span class="token punctuation">.</span>array<span class="token punctuation">(</span><span class="token punctuation">[</span>	<span class="token comment"># 目标点</span>
	<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
	<span class="token punctuation">[</span>w <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span>	<span class="token comment"># 防止出错,-1</span>
	<span class="token punctuation">[</span>w <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">,</span> h <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
	<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span> h <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">,</span> dtype <span class="token operator">=</span> <span class="token string">"float32"</span><span class="token punctuation">)</span>

<span class="token comment"># 计算变换矩阵	(平移+旋转+翻转),其中</span>
M <span class="token operator">=</span> cv2<span class="token punctuation">.</span>getPerspectiveTransform<span class="token punctuation">(</span>rect<span class="token punctuation">,</span> dst<span class="token punctuation">)</span>	<span class="token comment"># (原坐标,目标坐标)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>M<span class="token punctuation">,</span>M<span class="token punctuation">.</span>shape<span class="token punctuation">)</span>
warped <span class="token operator">=</span> cv2<span class="token punctuation">.</span>warpPerspective<span class="token punctuation">(</span>image<span class="token punctuation">,</span> M<span class="token punctuation">,</span> <span class="token punctuation">(</span>w<span class="token punctuation">,</span> h<span class="token punctuation">)</span><span class="token punctuation">)</span>

<span class="token comment"># 返回变换后结果</span>
<span class="token keyword">return</span> warped
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49

在这里插入图片描述

注:

  • Line 7-19:左上,右上,右下,左下的坐标顺序调整
    在这里插入图片描述
    在这里插入图片描述

  • Line 27-34,44:计算变换后的w和h,以及cv2.getPerspectiveTransform的原理如下
    在这里插入图片描述

四、OCR识别

window上安装tesseract

# https://digi.bib.uni-mannheim.de/tesseract/
# 配置环境变量如E:\Program Files (x86)\Tesseract-OCR
# tesseract -v进行测试
# tesseract XXX.png 得到结果

 
 
  • 1
  • 2
  • 3
  • 4

在用户变量和系统变量的path中,都新增一个tesseract的路径,如D:\Program Files (x86)\Tesseract-OCR
在这里插入图片描述
在这里插入图片描述
设置完毕,测试成功
在这里插入图片描述
但 tesseract opencv.png cv 的时候,有可能出现以下错误

Error opening data file \Program Files (x86)\Tesseract-OCR\tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

 
 
  • 1
  • 2
  • 3
  • 4
  • 5

解决方法:
在系统变量中新增一个变量TESSDATA_PREFIX,使该变量的值为 D:\Program Files (x86)\Tesseract-OCR\tessdata 该路径值

再次测试 OK!
tesseract 测试图像 输出(自动输出到txt文件中,因此不用另加 .txt)
在这里插入图片描述
在这里插入图片描述

python中使用tesseract

安装

# pip install pytesseract

 
 
  • 1

测试

python test.py

 
 
  • 1

测试test.py中遇到以下两个错误

  1. pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it’s not in your PATH. See README file for more information.
    解决方法:
    修改pytesseract.py中的tesseract_cmd指向的路径
    tesseract_cmd = r’D:\Program Files (x86)\Tesseract-OCR\tesseract.exe’

  2. pytesseract.pytesseract.TesseractError: (1, ‘Error opening data file \Program Files (x86)\Tesseract-OCR\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set
    to your “tessdata” directory. Failed loading language ‘eng’ Tesseract couldn’t load any languages! Could not initialize tesseract.’)
    解决方法:
    还是在系统变量中新增一个变量TESSDATA_PREFIX,使该变量的值为 D:\Program Files (x86)\Tesseract-OCR\tessdata 该路径值

重启后才OK
在这里插入图片描述

该段参考链接:https://blog.csdn.net/qq756684177/article/details/81518891

  1. 什么是OCR?   
    OCR (Optical Character Recognition,光学字符识别)是指电子设备(例如扫描仪或数码相机)检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别方法将形状翻译成计算机文字的过程;即,针对印刷体字符,采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件,并通过识别软件将图像中的文字转换成文本格式,供文字处理软件进一步编辑加工的技术。
    如何除错或利用辅助信息提高识别正确率,是OCR最重要的课题,ICR(Intelligent Character Recognition)的名词也因此而产生。衡量一个OCR系统性能好坏的主要指标有:拒识率、误识率、识别速度、用户界面的友好性,产品的稳定性,易用性及可行性等。

  2. 什么是Tesseract   
    Tesseract的OCR引擎最先由HP实验室于1985年开始研发,至1995年时已经成为OCR业内最准确的三款识别引擎之一。然而,HP不久便决定放弃OCR业务,Tesseract也从此尘封。数年以后,HP意识到,与其将Tesseract束之高阁,不如贡献给开源软件业,让其重焕新生-
    2005年,Tesseract由美国内华达州信息技术研究所获得,并求诸于Google对Tesseract进行改进、消除Bug、优化工作。Tesseract目前已作为开源项目发布在Google Project,其项目主页在这里查看,其最新版本3.0已经支持中文OCR,并提供了一个命令行工具。

  • 2
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值