摘要:
HyperLPR 是一个高性能开源中文车牌识别框架,基于keras-tensorflow实现,支持android,linux,windows,ios等多种平台。
目前已有C++实现版本和Python实现版本。
这里仅介绍基于Python的使用深度学习实现中文车牌识别的方案。
1.源码下载地址:
https://github.com/zeusees/HyperLPR
2.安装使用:
https://www.jianshu.com/p/7ab673abeaae
3.概念:
OpenCV:开源计算机视觉库,OpenCV从自OpenCV 3.1版以来,dnn(深度神经网络)模块一直是opencv_contrib库(opencv的附加库)的一部分,在3.3版中,把它从opencv_contrib仓库提到了主仓库(opencv)中。新版OpenCV dnn模块目前支持Caffe、TensorFlow、Torch、PyTorch等深度学习框架。 OpenCV 3.3开始就提供了读取TensoFlow模型的接口了,不过现在能支持的模型并不多。另外,新版本中使用预训练深度学习模型的API同时兼容C++和Python。
DNN:是深度学习的基础,Thensorflow是深度学习的一个框架。
ANN:人工神经网络
DNN: ANN人工神经网络有两个或两个以上隐藏层,称为DNN
多层感知机:只有一个隐藏层是多层感知机
感知机:没有隐藏层是感知机
TensorFlow是谷歌基于DistBelief进行研发的第二代人工智能学习系统,其命名来源于本身的运行原理。Tensor(张量)意味着N维数组,Flow(流)意味着基于数据流图的计算,TensorFlow为张量从流图的一端流动到另一端计算过程。TensorFlow是将复杂的数据结构传输至人工智能神经网中进行分析和处理过程的系统。
张量的理解:张量是有大小和多个方向的量。这里的方向就是指张量的阶数。
我们可以将标量视为零阶张量,矢量视为一阶张量,那么矩阵就是二阶张量。
4.源码分析
4.1入口文件 demo.py(部分):
from hyperlpr import pipline as pp
import cv2
#读取本地图片
image = cv2.imread("./1.jpg")
image, res = pp.SimpleRecognizePlate(image)
#打印识别结果信息
print(res)
#显示识别的图片
cv2.imshow("image",image)
cv2.waitKey(0)
opencv2的imread函数导入图片, 返回的是Mat类型。
4.2 SimpleRecognizePlate函数
def SimpleRecognizePlate(image):
images = detect.detectPlateRough(image,image.shape[0],top_bottom_padding_rate=0.1)
res_set = []
for j,plate in enumerate(images):
plate, rect, origin_plate =plate
# plate = cv2.cvtColor(plate, cv2.COLOR_RGB2GRAY)
plate =cv2.resize(plate,(136,36*2))
t1 = time.time()
#根据车牌颜色判断类型
ptype = td.SimplePredict(plate)
if ptype>0 and ptype<5:
#bitwise_not是对二进制数据进行“非”操作
plate = cv2.bitwise_not(plate)
#精确定位,倾斜校正等
image_rgb = fm.findContoursAndDrawBoundingBox(plate)
'''
输入参数:
裁剪的车牌区域图像(Mat类型),rect也是裁剪的车牌部分的图像(Mat类型)
实现处理:
1.将原来车牌图像resize大小
2.将原来灰度图颜色通道[0, 255]转化为float类型[0,1]
3.将输入66*16(float),输入进模型进行测试self.modelFineMapping.predict
'''
image_rgb = fv.finemappingVertical(image_rgb)
cache.verticalMappingToFolder(image_rgb)
print("e2e:", e2e.recognizeOne(image_rgb)) #
image_gray = cv2.cvtColor(image_rgb,cv2.COLOR_RGB2GRAY)
# image_gray = horizontalSegmentation(image_gray)
#cv2.imshow("image_gray",image_gray)
#cv2.waitKey()
cv2.imwrite("./"+str(j)+".jpg",image_gray)
#基于滑动窗口的字符分割
val = segmentation.slidingWindowsEval(image_gray)
#print("分割和识别",time.time() - t2,"s")
if len(val)==3:
blocks, res, confidence = val
if confidence/7>0.7:
image = drawRectBox(image,rect,res)
res_set.append(res)
for i,block in enumerate(blocks):
block_ = cv2.resize(block,(25,25))
block_ = cv2.cvtColor(block_,cv2.COLOR_GRAY2BGR)
image[j * 25:(j * 25) + 25, i * 25:(i * 25) + 25] = block_
if image[j*25:(j*25)+25,i*25:(i*25)+25].shape == block_.shape:
pass
if confidence>0:
print("车牌:",res,"置信度:",confidence/7)
else:
pass
# print "不确定的车牌:", res, "置信度:", confidence
return image,res_set
输入为一个Mat类型的图片
输出为识别的车牌字符串,以及confidence可信度,
4.3 detectPlateRough函数
detectPlateRough是返回图像中所有车牌的边框在图片中的bbox
返回的是一个表示车牌区域坐标边框的list
#返回图片中所有识别出来的车牌边框bbox
def detectPlateRough(image_gray,resize_h = 720,en_scale =1.08 ,top_bottom_padding_rate = 0.05):
print(image_gray.shape)
#top_bottom_padding_rate: 表示要裁剪掉图片的上下部占比
if top_bottom_padding_rate>0.2:
print("error:top_bottom_padding_rate > 0.2:",top_bottom_padding_rate)
exit(1)
#resize_h: 重新设定的图像大小,此处保持大小不变
height = image_gray.shape[0]
padding = int(height*top_bottom_padding_rate)
scale = image_gray.shape[1]/float(image_gray.shape[0])
image = cv2.resize(image_gray, (int(scale*resize_h), resize_h))
#裁剪掉top_bottom_padding_rate比例的垂直部分
image_color_cropped = image[padding:resize_h-padding,0:image_gray.shape[1]]
#裁剪之后的图片进行灰度化处理
image_gray = cv2.cvtColor(image_color_cropped,cv2.COLOR_RGB2GRAY)
#根据前面的cv2.CascadeClassifier()物体检测模型(3),输入image_gray灰度图像,边框可识别的最小size,最大size,输出得到车牌在图像中的offset,也就是边框左上角坐标( x, y )以及边框高度( h )和宽度( w )
watches = watch_cascade.detectMultiScale(image_gray, en_scale, 2, minSize=(36, 9),maxSize=(36*40, 9*40))
#对得到的车牌边框的bbox进行扩大(此刻得到的车牌可能因为车牌倾斜等原因导致显示不完整),先对宽度左右各扩大0.14倍,高度上下各扩大0.6倍
cropped_images = []
for (x, y, w, h) in watches:
cropped_origin = cropped_from_image(image_color_cropped, (int(x), int(y), int(w), int(h)))
x -= w * 0.14
w += w * 0.28
y -= h * 0.6
h += h * 1.1;
#按扩大之后的大小进行裁剪
cropped = cropped_from_image(image_color_cropped, (int(x), int(y), int(w), int(h)))
cropped_images.append([cropped,[x, y+padding, w, h],cropped_origin])
return cropped_images
4.4 SimplePredict函数
将图片输入到模型进行测试predict,根据车牌颜色判断类型,深色背景白色字体返回0,类似浅色背景深色字体的返回大于零的类型
model = Getmodel_tensorflow(5)
model.load_weights("./model/plate_type.h5")
model.save("./model/plate_type.h5")
def SimplePredict(image):
image = cv2.resize(image, (34, 9)) #将原来车牌图像resize大小:34*9
image = image.astype(np.float) / 255 #将原来灰度图颜色通道[0, 255]转化为float类型[0,1]
res = np.array(model.predict(np.array([image]))[0]) #将输入34*9(float),输入进模型进行测试predict
return res.argmax()
4.5 finemappingVertical函数
keras网络模型:对车牌的左右边界进行回归
通过modelFineMapping.loadweights()函数加载模型文件
通过modelFineMapping.predict输出网络结果
输入:16*66*3 tensor
输出:长度为2的tensor
def finemappingVertical(image):
resized = cv2.resize(image,(66,16))
resized = resized.astype(np.float)/255
res= model.predict(np.array([resized]))[0]
print("keras_predict",res)
res =res*image.shape[1]
res = res.astype(np.int)
H,T = res
H-=3
#3 79.86
#4 79.3
#5 79.5
#6 78.3
#T
#T+1 80.9
#T+2 81.75
#T+3 81.75
if H<0:
H=0
T+=2;
if T>= image.shape[1]-1:
T= image.shape[1]-1
image = image[0:35,H:T+2]
image = cv2.resize(image, (int(136), int(36)))
return image
4.6 recognizeOne函数
对于每个车牌区域的for循环中,经过fineMappingVertical处理后输入到recognizeOne函数,进行ocr识别
def recognizeOne(src):
# x_tempx= cv2.imread(src)
x_tempx = src
# x_tempx = cv2.bitwise_not(x_tempx)
x_temp = cv2.resize(x_tempx,( 160,40))
x_temp = x_temp.transpose(1, 0, 2)
t0 = time.time()
y_pred = pred_model.predict(np.array([x_temp]))
y_pred = y_pred[:,2:,:]
# plt.imshow(y_pred.reshape(16,66))
# plt.show()
#
# cv2.imshow("x_temp",x_tempx)
# cv2.waitKey(0)
return fastdecode(y_pred)
ocr部分的网络模型(keras模型)
输入层:160*40*3的tensor
输出层:长度为7 的tensor,类别有len(chars)+1种
chars = ["京", "沪", "津", "渝", "冀", "晋", "蒙", "辽", "吉", "黑", "苏", "浙", "皖", "闽", "赣", "鲁", "豫", "鄂", "湘", "粤", "桂",
"琼", "川", "贵", "云", "藏", "陕", "甘", "青", "宁", "新", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A",
"B", "C", "D", "E", "F", "G", "H", "J", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "U", "V", "W", "X",
"Y", "Z","港","学","使","警","澳","挂","军","北","南","广","沈","兰","成","济","海","民","航","空"
];
4.7 slidingWindowsEval函数
基于滑动窗口的字符分割与识别
def slidingWindowsEval(image):
windows_size = 16;
stride = 1
height= image.shape[0]
data_sets = []
for i in range(0,image.shape[1]-windows_size+1,stride):
data = image[0:height,i:i+windows_size]
data = cv2.resize(data,(23,23))
# cv2.imshow("image",data)
data = cv2.equalizeHist(data)
data = data.astype(np.float)/255
data= np.expand_dims(data,3)
data_sets.append(data)
res = model2.predict(np.array(data_sets))
pin = res
p = 1 - (res.T)[1]
p = f.gaussian_filter1d(np.array(p,dtype=np.float),3)
lmin = l.argrelmax(np.array(p),order = 3)[0]
interval = []
for i in range(len(lmin)-1):
interval.append(lmin[i+1]-lmin[i])
if(len(interval)>3):
mid = get_median(interval)
else:
return []
pin = np.array(pin)
res = searchOptimalCuttingPoint(image,pin,0,mid,3)
cutting_pts = res[1]
last = cutting_pts[-1] + mid
if last < image.shape[1]:
cutting_pts.append(last)
else:
cutting_pts.append(image.shape[1]-1)
name = ""
confidence =0.00
seg_block = []
for x in range(1,len(cutting_pts)):
if x != len(cutting_pts)-1 and x!=1:
section = image[0:36,cutting_pts[x-1]-2:cutting_pts[x]+2]
elif x==1:
c_head = cutting_pts[x - 1]- 2
if c_head<0:
c_head=0
c_tail = cutting_pts[x] + 2
section = image[0:36, c_head:c_tail]
elif x==len(cutting_pts)-1:
end = cutting_pts[x]
diff = image.shape[1]-end
c_head = cutting_pts[x - 1]
c_tail = cutting_pts[x]
if diff<7 :
section = image[0:36, c_head-5:c_tail+5]
else:
diff-=1
section = image[0:36, c_head - diff:c_tail + diff]
elif x==2:
section = image[0:36, cutting_pts[x - 1] - 3:cutting_pts[x-1]+ mid]
else:
section = image[0:36,cutting_pts[x-1]:cutting_pts[x]]
seg_block.append(section)
refined = refineCrop(seg_block,mid-1)
for i,one in enumerate(refined):
res_pre = cRP.SimplePredict(one, i )
# cv2.imshow(str(i),one)
# cv2.waitKey(0)
confidence+=res_pre[0]
name+= res_pre[1]
return refined,name,confidence
这里介绍一篇较为经典的论文(End-to-end text recognition with convolutional neural networks)在Cousera上Andrew Ng的Machine Learning课程中也提到了这种方法。在OpenCV的text模块中也有对应的实现。
它的主要思想是利用一个训练好的正负样本分类器来进行在图像上滑动然后产生概率response图,然后对raw response进行nms(非极大值抑制)。在确定字符bdbox,数目之后使用类似于viterbi算法来获取最佳分割路径。
详细可以参考:https://github.com/zeusees/HyperLPR-Training
之后会输出结果:
(1160, 720, 3)
校正角度 h 0 v 90
keras_predict [0.07086223 0.84606016]
f30a876f
e2e: ('皖A13H25', 0.9790264708655221)
828
寻找最佳点 0.019015073776245117
车牌: 皖A13H25 置信度: 0.8766724424702781
如上图可知,recognizeOne和slidingWindowsEval两种方式识别出来的置信度是不一样的,当置信度较低的时候会也就有可能出现两种方式识别的车牌结果不一样。
参考: