【项目三、车牌检测+识别项目】一、CCPD车牌数据集转为YOLOv5格式和LPRNet格式

满船清梦压星河HK

已于 2022-05-31 13:30:04 修改

阅读量3.7w

点赞数 68

分类专栏：项目/比赛总结文章标签：车牌检测 ocr yolov5 lprnet

于 2022-05-30 20:38:08 首次发布

本文链接：https://blog.csdn.net/qq_38253797/article/details/125042833

版权

项目/比赛总结专栏收录该内容

13 篇文章

订阅专栏

前言

马上要找工作了，想总结下自己做过的几个小项目。

之前已经总结过了我做的第一个项目：xxx病虫害检测项目，github源码地址：HuKai97/FFSSD-ResNet。CSDN讲解地址：

第二个项目：蜂巢检测项目，github源码地址：https://github.com/HuKai97/YOLOv5-ShuffleNetv2。CSDN讲解地址：

【项目二、蜂巢检测项目】一、串讲各类经典的卷积网络：InceptionV1-V4、ResNetV1-V2、MobileNetV1-V3、ShuffleNetV1-V2、ResNeXt、Xception。
【项目二、蜂巢检测项目】二、模型改进：YOLOv5s-ShuffleNetV2。

如果对YOLOv5不熟悉的同学可以先看看我写的YOLOv5源码讲解CSDN:【YOLOV5-5.x 源码讲解】整体项目文件导航，注释版YOLOv5源码我也开源在了Github上：HuKai97/yolov5-5.x-annotations，欢迎大家star!

之前一直在学习OCR相关的东西，就想着能不能做一个车牌识别的项目出来，刚好车牌检测也好做，直接用v5就可以了。我的打算是做一个轻量级的车牌识别项目，检测网络用的是YOLOv5s，识别网络有的是LPRNet。

这一节主要介绍下怎么把CCPD公开车牌数据集转化YOLOv5格式和LPRNet格式。

车牌识别项目所有讲解：

代码已全部上传GitHub：https://github.com/HuKai97/YOLOv5-LPRNet-Licence-Recognition

一、CCPD数据集介绍

CCPD2019车牌数据集是采集人员在合肥停车场采集、手工标注得来，采集时间在早7:30到晚10:00之间。且拍摄车牌照片的环境复杂多变，包括雨天、雪天、倾斜、模糊等。CCPD数据集包含将近30万张图片、图片尺寸为720x1160x3，共包含8种类型图片，每种类型、数量及类型说明如下表：

类型	图片数	说明
ccpd_base	199998	正常车牌
ccpd_challenge	10006	比较有挑战的车牌
ccpd_db	20001	光线较暗或较亮车牌
ccpd_fn	19999	距离摄像头较远或较近
ccpd_np	3036	没上牌的新车
ccpd_rotate	9998	水平倾斜20-50度，垂直倾斜-10-10度
ccpd_tilt	10000	水平倾斜15-45度，垂直倾斜-15-45度
ccpd_weather	9999	雨天、雪天或大雾的车牌
	总共283037张车牌图像

图片命名：“025-95_113-154&383_386&473-386&473_177&454_154&383_363&402-0_0_22_27_27_33_16-37-15.jpg”

解释：

025：车牌区域占整个画面的比例；
95_113：车牌水平和垂直角度, 水平95°, 竖直113°
154&383_386&473：标注框左上、右下坐标，左上(154, 383), 右下(386, 473)
86&473_177&454_154&383_363&402：标注框四个角点坐标，顺序为右下、左下、左上、右上
0_0_22_27_27_33_16：车牌号码映射关系如下: 第一个0为省份对应省份字典provinces中的’皖’,；第二个0是该车所在地的地市一级代码，对应地市一级代码字典alphabets的’A’；后5位为字母和文字, 查看车牌号ads字典，如22为Y，27为3，33为9，16为S，最终车牌号码为皖AY339S

省份：[“皖”, “沪”, “津”, “渝”, “冀”, “晋”, “蒙”, “辽”, “吉”, “黑”, “苏”, “浙”, “京”, “闽”, “赣”, “鲁”, “豫”, “鄂”, “湘”, “粤”, “桂”, “琼”, “川”, “贵”, “云”, “藏”, “陕”, “甘”, “青”, “宁”, “新”]

地市：[‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’, ‘G’, ‘H’, ‘J’, ‘K’, ‘L’, ‘M’, ‘N’, ‘P’, ‘Q’, ‘R’, ‘S’, ‘T’, ‘U’, ‘V’, ‘W’,‘X’, ‘Y’, ‘Z’]

车牌字典：[‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’, ‘G’, ‘H’, ‘J’, ‘K’, ‘L’, ‘M’, ‘N’, ‘P’, ‘Q’, ‘R’, ‘S’, ‘T’, ‘U’, ‘V’, ‘W’, ‘X’,‘Y’, ‘Z’, ‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’]

最新的CCPD2020又补充了1万多张新能源汽车数据，都在下面的官网，感兴趣的可以去下载。

二、CCPD数据集下载

完整的数据集集可以从这里下载，https://github.com/detectRecog/CCPD。

三、划分训练集、验证集和测试集

我是按7：1：2划分的，如果想改可以直接在项目下的 scrips/split_dataset 改，很简单：

"""
@Author: HuKai
@Date: 2022/5/29  10:44
@github: https://github.com/HuKai97
"""
import os
import random

import shutil
from shutil import copy2
trainfiles = os.listdir(r"K:\MyProject\datasets\ccpd\new\ccpd_2019\base")  #（图片文件夹）
num_train = len(trainfiles)
print("num_train: " + str(num_train) )
index_list = list(range(num_train))
print(index_list)
random.shuffle(index_list)  # 打乱顺序
num = 0
trainDir = r"K:\MyProject\datasets\ccpd\new\ccpd_2019\train"   #（将图片文件夹中的6份放在这个文件夹下）
validDir = r"K:\MyProject\datasets\ccpd\new\ccpd_2019\val"     #（将图片文件夹中的2份放在这个文件夹下）
detectDir = r"K:\MyProject\datasets\ccpd\new\ccpd_2019\test"   #（将图片文件夹中的2份放在这个文件夹下）
for i in index_list:
    fileName = os.path.join(r"K:\MyProject\datasets\ccpd\new\ccpd_2019\base", trainfiles[i])  #（图片文件夹）+图片名=图片地址
    if num < num_train*0.7:  # 7:1:2
        print(str(fileName))
        copy2(fileName, trainDir)
    elif num < num_train*0.8:
        print(str(fileName))
        copy2(fileName, validDir)
    else:
        print(str(fileName))
        copy2(fileName, detectDir)
    num += 1

四、车牌检测数据集制作

这个数据集的检测和识别标签都在图片名中，直接从图片名上读取出来，再写入txt文件中即可
代码放在项目中的 scrips/ccpd2yolov5：

"""
@Author: HuKai
@Date: 2022/5/29  10:47
@github: https://github.com/HuKai97
"""
import shutil
import cv2
import os

def txt_translate(path, txt_path):
    for filename in os.listdir(path):
        print(filename)

        list1 = filename.split("-", 3)  # 第一次分割，以减号'-'做分割
        subname = list1[2]
        list2 = filename.split(".", 1)
        subname1 = list2[1]
        if subname1 == 'txt':
            continue
        lt, rb = subname.split("_", 1)  # 第二次分割，以下划线'_'做分割
        lx, ly = lt.split("&", 1)
        rx, ry = rb.split("&", 1)
        width = int(rx) - int(lx)
        height = int(ry) - int(ly)  # bounding box的宽和高
        cx = float(lx) + width / 2
        cy = float(ly) + height / 2  # bounding box中心点

        img = cv2.imread(path + filename)
        if img is None:  # 自动删除失效图片（下载过程有的图片会存在无法读取的情况）
            os.remove(os.path.join(path, filename))
            continue
        width = width / img.shape[1]
        height = height / img.shape[0]
        cx = cx / img.shape[1]
        cy = cy / img.shape[0]

        txtname = filename.split(".", 1)
        txtfile = txt_path + txtname[0] + ".txt"
        # 绿牌是第0类，蓝牌是第1类
        with open(txtfile, "w") as f:
            f.write(str(0) + " " + str(cx) + " " + str(cy) + " " + str(width) + " " + str(height))


if __name__ == '__main__':
    # det图片存储地址
    trainDir = r"K:\MyProject\datasets\ccpd\new\ccpd_2019\images\train\\"
    validDir = r"K:\MyProject\datasets\ccpd\new\ccpd_2019\images\val\\"
    testDir = r"K:\MyProject\datasets\ccpd\new\ccpd_2019\images\test\\"
    # det txt存储地址
    train_txt_path = r"K:\MyProject\datasets\ccpd\new\ccpd_2019\labels\train\\"
    val_txt_path = r"K:\MyProject\datasets\ccpd\new\ccpd_2019\labels\val\\"
    test_txt_path = r"K:\MyProject\datasets\ccpd\new\ccpd_2019\labels\test\\"
    txt_translate(trainDir, train_txt_path)
    txt_translate(validDir, val_txt_path)
    txt_translate(testDir, test_txt_path)

五、车牌识别数据集制作

也是直接从图片名上读取车牌位置信息和车牌字符信息，再将车牌从图片中裁剪出来，最后按车牌字符信息作为图片名保存这张车牌
代码放在项目中的 scrips/ccpd2lpr：

"""
@Author: HuKai
@Date: 2022/5/29  21:24
@github: https://github.com/HuKai97
"""
import cv2
import os
import numpy as np

# 参考 https://blog.csdn.net/qq_36516958/article/details/114274778
# https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data#2-create-labels
from PIL import Image
# CCPD车牌有重复，应该是不同角度或者模糊程度
path = r'K:\MyProject\datasets\ccpd\new\ccpd_2019\images\test'  # 改成自己的车牌路径


provinces = ["皖", "沪", "津", "渝", "冀", "晋", "蒙", "辽", "吉", "黑", "苏", "浙", "京", "闽", "赣", "鲁", "豫", "鄂", "湘", "粤", "桂", "琼", "川", "贵", "云", "藏", "陕", "甘", "青", "宁", "新", "警", "学", "O"]
alphabets = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
             'X', 'Y', 'Z', 'O']
ads = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
       'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'O']
num = 0
for filename in os.listdir(path):
    num += 1
    result = ""
    _, _, box, points, plate, brightness, blurriness = filename.split('-')
    list_plate = plate.split('_')  # 读取车牌
    result += provinces[int(list_plate[0])]
    result += alphabets[int(list_plate[1])]
    result += ads[int(list_plate[2])] + ads[int(list_plate[3])] + ads[int(list_plate[4])] + ads[int(list_plate[5])] + ads[int(list_plate[6])]
    # 新能源车牌的要求，如果不是新能源车牌可以删掉这个if
    # if result[2] != 'D' and result[2] != 'F' \
    #         and result[-1] != 'D' and result[-1] != 'F':
    #     print(filename)
    #     print("Error label, Please check!")
    #     assert 0, "Error label ^~^!!!"
    print(result)
    img_path = os.path.join(path, filename)
    img = cv2.imread(img_path)
    assert os.path.exists(img_path), "image file {} dose not exist.".format(img_path)

    box = box.split('_')  # 车牌边界
    box = [list(map(int, i.split('&'))) for i in box]

    xmin = box[0][0]
    xmax = box[1][0]
    ymin = box[0][1]
    ymax = box[1][1]

    img = Image.fromarray(img)
    img = img.crop((xmin, ymin, xmax, ymax))  # 裁剪出车牌位置
    img = img.resize((94, 24), Image.LANCZOS)
    img = np.asarray(img)  # 转成array,变成24*94*3

    cv2.imencode('.jpg', img)[1].tofile(r"K:\MyProject\datasets\ccpd\new\ccpd_2019\rec_images\test\{}.jpg".format(result))
    # 图片中文名会报错
    # cv2.imwrite(r"K:\MyProject\datasets\ccpd\new\ccpd_2020\rec_images\train\{}.jpg".format(result), img)  # 改成自己存放的路径
print("共生成{}张".format(num))

六、我的车牌检测+识别数据集

我没有选用所有的CCPD数据集，太大了，我从CCPD2019中的base文件选下近3万张图片（27858），又把CCPD2020的11774张新能源汽车车牌加了进去，具体数据信息如下：

检测数据集det	绿牌（新能源）	蓝牌	total
train	8242	19501	27743
val	1178	2786	3964
test	2354	5571	7925
total	11774	27858	39632

识别数据集rec	绿牌（新能源）	蓝牌	total
train	2854	18639	21493
val	1353	2274	3627
test	1378	5485	6863
total	5585	26398	31983

为什么两个数据集数量不一样？主要是车牌识别是以车牌字符为文件名的，有些车牌检测照片重复了，自然就只能保留一个。

我的数据集是有缺陷的，只用了base数据集，还有一些复杂场景如：复杂天气（雨天、雪天），过亮过暗场景、远近距离场景、各个省份拍照数量不均（80%是皖）等等效果都有待提高。如果你是想和我一样做个demo的话，这近4万张图片应该是足够了，如果是考虑落地、实际展示的话，建议自行使用CCPD2019其他复杂数据进行扩充，也可以自己拍一些其他省份的数据再按我上面的方法进行扩充。

数据就不贴了，实在是太大了，照着我上面的方法直接下载CCPD数据集自己制作一个，代码方法都在上面了，做起来很快的！