Tesserct-OCR:下载地址
tesseract-ocr-setup-3.05.00dev-205-ge205c59.exe
数字验证码识别:
OpenCV+Tesserct-OCR
OpenCV预处理
Tesserct-OCR验证码识别
步骤:
预处理 – 去除干扰线与点
不同的结构元素中选择
Image与numpy array相互转化
识别与输出
API层面:
学会使用OpenCV 形态学与二值化API做预处理
使用Tesseract-OCR做文字识别
识别率问题讨论
需要引入的包:(可在pycharm中直接下载)
PIL;(pip install pil)
http://pythonware.com/products/pil/
pytesseract:(pip install pytesseract)
https://pypi.python.org/pypi/pytesseract
代码实现:
import cv2 as cv
import numpy as np
from PIL import Image
import pytesseract as tess
def recognize_text():
#转成灰度图像
gray = cv.cvtColor(src, cv.COLOR_BGR2GRAY)
#二值化
ret, binary = cv.threshold(gray, 0, 255, cv.THRESH_BINARY_INV | cv.THRESH_OTSU)
#结构元素 去掉竖直的线
kernel = cv.getStructuringElement(cv.MORPH_RECT, (1, 2))
#开操作
bin1 = cv.morphologyEx(binary, cv.MORPH_OPEN, kernel)
#去掉横线
kernel = cv.getStructuringElement(cv.MORPH_RECT, (2, 1))
#开操作
open_out = cv.morphologyEx(bin1, cv.MORPH_OPEN, kernel)
cv.imshow("binary-image", open_out)
#黑色背景 变成白色背景
cv.bitwise_not(open_out, open_out)
#fromarray 二维数组
textImage = Image.fromarray(open_out)
#图片转成字符串
text = tess.image_to_string(textImage)
print("识别结果: %s"%text)
print("--------- Python OpenCV Tutorial ---------")
src = cv.imread("E:/ji_qi_xue_xi/opencv_kejian/opencv_python_image/582.jpg")
cv.namedWindow("input image", cv.WINDOW_AUTOSIZE)
cv.imshow("input image", src)
recognize_text()
cv.waitKey(0)
cv.destroyAllWindows()