项目2 车牌检测

1. 基本思想

YOLOv5+LPRNet。先使用YOLOv5检测车牌,再把检测车牌送入LPRNet得到检测结果。

2. 基础知识

2.1 YOLOV5(参考鱼苗检测)

2.1.1 模型 省略

2.1.2 输入输出 省略

2.1.3 损失函数 省略

2.2 LPRNet

2.2.1 模型

在这里插入图片描述
图像统一尺寸后输入到模型,先经过Backbone得到特征f2、f6、 f13、 f22,四个特征经过Neck处理后拼接在一起,最后经过检测头得到[bs,68,18]的结果,18表示模型输出18个字符,每个字符有68类。代码如下:

import torch.nn as nn
import torch
CHARS = ['京', '沪', '津', '渝', '冀', '晋', '蒙', '辽', '吉', '黑',
         '苏', '浙', '皖', '闽', '赣', '鲁', '豫', '鄂', '湘', '粤',
         '桂', '琼', '川', '贵', '云', '藏', '陕', '甘', '青', '宁',
         '新',
         '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
         'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K',
         'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V',
         'W', 'X', 'Y', 'Z', 'I', 'O', '-'
         ]
class small_basic_block(nn.Module):
    def __init__(self, ch_in, ch_out):
        super(small_basic_block, self).__init__()
        self.block = nn.Sequential(
            nn.Conv2d(ch_in, ch_out // 4, kernel_size=1),
            nn.ReLU(),
            nn.Conv2d(ch_out // 4, ch_out // 4, kernel_size=(3, 1), padding=(1, 0)),
            nn.ReLU(),
            nn.Conv2d(ch_out // 4, ch_out // 4, kernel_size=(1, 3), padding=(0, 1)),
            nn.ReLU(),
            nn.Conv2d(ch_out // 4, ch_out, kernel_size=1),
        )
    def forward(self, x):
        return self.block(x)

class LPRNet(nn.Module):
    def __init__(self, lpr_max_len, phase, class_num, dropout_rate):
        super(LPRNet, self).__init__()
        self.phase = phase
        self.lpr_max_len = lpr_max_len
        self.class_num = class_num
        self.backbone = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1),    # 0  -> [bs,3,24,94] -> [bs,64,22,92]
            nn.BatchNorm2d(num_features=64),                                       # 1  -> [bs,64,22,92]
            nn.ReLU(),                                                             # 2  -> [bs,64,22,92]
            nn.MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 1, 1)),                 # 3  -> [bs,64,20,90]
            small_basic_block(ch_in=64, ch_out=128),                               # 4  -> [bs,128,20,90]
            nn.BatchNorm2d(num_features=128),                                      # 5  -> [bs,128,20,90]
            nn.ReLU(),                                                             # 6  -> [bs,128,20,90]
            nn.MaxPool3d(kernel_size=(1, 3, 3), stride=(2, 1, 2)),                 # 7  -> [bs,64,18,44]
            small_basic_block(ch_in=64, ch_out=256),                               # 8  -> [bs,256,18,44]
            nn.BatchNorm2d(num_features=256),                                      # 9  -> [bs,256,18,44]
            nn.ReLU(),                                                             # 10 -> [bs,256,18,44]
            small_basic_block(ch_in=256, ch_out=256),                              # 11 -> [bs,256,18,44]
            nn.BatchNorm2d(num_features=256),                                      # 12 -> [bs,256,18,44]
            nn.ReLU(),                                                             # 13 -> [bs,256,18,44]
            nn.MaxPool3d(kernel_size=(1, 3, 3), stride=(4, 1, 2)),                 # 14 -> [bs,64,16,21]
            nn.Dropout(dropout_rate),  # 0.5 dropout rate                          # 15 -> [bs,64,16,21]
            nn.Conv2d(in_channels=64, out_channels=256, kernel_size=(1, 4), stride=1),   # 16 -> [bs,256,16,18]
            nn.BatchNorm2d(num_features=256),                                            # 17 -> [bs,256,16,18]
            nn.ReLU(),                                                                   # 18 -> [bs,256,16,18]
            nn.Dropout(dropout_rate),  # 0.5 dropout rate                                  19 -> [bs,256,16,18]
            nn.Conv2d(in_channels=256, out_channels=class_num, kernel_size=(13, 1), stride=1),  # class_num=68  20  -> [bs,68,4,18]
            nn.BatchNorm2d(num_features=class_num),                                             # 21 -> [bs,68,4,18]
            nn.ReLU(),                                                                          # 22 -> [bs,68,4,18]
        )
        self.container = nn.Sequential(
            nn.Conv2d(in_channels=448+self.class_num, out_channels=self.class_num, kernel_size=(1, 1), stride=(1, 1)),


    def forward(self, x):
        keep_features = list()
        for i, layer in enumerate(self.backbone.children()):
            x = layer(x)
            if i in [2, 6, 13, 22]:
                keep_features.append(x)

        global_context = list()
        # keep_features: [bs,64,22,92]  [bs,128,20,90] [bs,256,18,44] [bs,68,4,18]
        for i, f in enumerate(keep_features):
            if i in [0, 1]:
                # [bs,64,22,92] -> [bs,64,4,18]
                # [bs,128,20,90] -> [bs,128,4,18]
                f = nn.AvgPool2d(kernel_size=5, stride=5)(f)
            if i in [2]:
                # [bs,256,18,44] -> [bs,256,4,18]
                f = nn.AvgPool2d(kernel_size=(4, 10), stride=(4, 2))(f)

            # 没看懂这是在干嘛?有上面的avg提取上下文信息不久可以了?
            f_pow = torch.pow(f, 2)     # [bs,64,4,18]  所有元素求平方
            f_mean = torch.mean(f_pow)  # 1 所有元素求平均
            f = torch.div(f, f_mean)    # [bs,64,4,18]  所有元素除以这个均值
            global_context.append(f)

        x = torch.cat(global_context, 1)  # [bs,516,4,18]
        x = self.container(x)             # -> [bs, 68, 4, 18]   head头
        logits = torch.mean(x, dim=2)     # -> [bs, 68, 18]  # 68 字符类数   18字符

        return logits

if __name__=="__main__":
    lpr_max_len=18; phase=False; class_num=68; dropout_rate=0.5
    i = torch.rand([6,3,24,94])
    Net = LPRNet(lpr_max_len, phase, class_num, dropout_rate)
    o = Net(i) # torch.Size([6, 68, 18])

2.2.2 输入输出

  1. 模型输入
    图像处理步骤:
    (1) 处理图片。读入图片,把图片的通道由BGR转换成RGB,统一图片尺寸为[94,24],通过transform对图片归一化及改变通道位置。
    (2) 生成标签。图片的地址是标签,去除地址后缀得到标签,把标签转换成数字,判断标签是否正确。
    (3) 返回图片数组,标签及标签长度。标签长度在模型损失中使用。代码如下:
from imutils import paths
import numpy as np
import random
import cv2
import os

from torch.utils.data import Dataset

CHARS = ['京', '沪', '津', '渝', '冀', '晋', '蒙', '辽', '吉', '黑',
         '苏', '浙', '皖', '闽', '赣', '鲁', '豫', '鄂', '湘', '粤',
         '桂', '琼', '川', '贵', '云', '藏', '陕', '甘', '青', '宁',
         '新',
         '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
         'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K',
         'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V',
         'W', 'X', 'Y', 'Z', 'I', 'O', '-'
         ]

CHARS_DICT = {
   char:i for i, char in enumerate(CHARS)}

class LPRDataLoader(Dataset):
    def __init__(self, img_dir, imgSize, lpr_max_len, PreprocFun=None):
        self.img_dir = img_dir
        self.img_paths = []
        for i in range(len(img_dir)):
            self.img_paths += [el for el in paths.list_images(img_dir[i])]
        random.shuffle(self.img_paths)
        self.img_size = imgSize         # [94, 24]
        self.lpr_max_len = lpr_max_len  # 8
        if PreprocFun is not None:
            self.PreprocFun = PreprocFun
        else:
            self.PreprocFun = self.transform

    def __len__(self):
        return len(self.img_paths)

    def __getitem__(self, index):
        filename = self.img_paths[index]
        Image = cv2.imdecode(np.fromfile(filename, dtype=np.uint8), -1)
        Image = cv2.cvtColor(Image, cv2.COLOR_RGB2BGR)
        height, width, _ = Image.shape
        if height != self.img_size[1] or width != self.img_size[0]:
            Image = cv2.resize(Image, self.img_size)
        Image = self.PreprocFun(Image)

        basename = os.path.basename(filename)         # 'datasets/rec_images/train/沪A9B821.jpg'-->'沪A9B821.jpg'
        imgname, suffix = os.path.splitext(basename)  # '沪A9B821.jpg' -->  ('沪A9B821', '.jpg')
        imgname = imgname.split("-")[0].split("_")[0]
        label = list()
        for c in imgname:
            label.append(CHARS_DICT[c])

        if len(label) == 8:
            if self.check(label) == False:
                print(imgname)
                assert 0, "Error label ^~^!!!"

        return Image, label, len(label)

    def transform(self, img):
        img = img.astype('float32')   # 图片由Uint8转换为float32类型
        img -= 127.5                  # 图片减均值乘方差倒数实现归一化,去除噪声影响
        img *= 0.0078125
        img = np.transpose(img, (2, 0, 1))  #  [h,w,c]-->[c,h,w]

        return img

    def check(self, label):     # 检测标签是否正确
        if label[2] != CHARS_DICT['D'] and label[2] != CHARS_DICT['F'] \
                and label[-1] != CHARS_DICT['D'] and label[-1] != CHARS_DICT['F']:
            print("Error label, Please check!")
            return False
        else:
            return True

if __name__ == "__main__":
    train_img_dirs = "datasets/rec_images/train"
    img_size = [94, 24]
    lpr_max_len = 8
    train_dataset = LPRDataLoader(train_img_dirs.split(','), img_size, lpr_max_len)

2.模型输出步骤:
(1) 图片输入模型得到logits。
(2)对logits转换通道[6, 68,18]–>[18, 6, 68],
其中6是batch_size,68是一共68个类别,18是输出18个字符序列。
(3)用softmax把logits最后一维变成概率。代码如下:

logits = lprnet(images)
log_probs = logits.permute(2, 0, 1) # for ctc loss: T x N x C  torch.Size([18, 6, 68])
log_probs = log_probs.log_softmax(2).requires_grad_()  # [18, bs, 68]

2.2.3 损失函数

ctc_loss用来处理不等长序列的损失,用动态规划的方法找到有标签匹配的各种序列,通过使序列概率最大化来更新参数。代码如下:

loss = ctc_loss(log_probs, labels, input_lengths=input_lengths, target_lengths=target_lengths)
# input_lengths[18,18,18,...,18]  18是模型输出的字符数。target_lengths[7,7,7,...,7]  7是真实标签的的字符数,有些车牌是8个字符,依实际情况而定。

3. 流程

3.1 数据处理

3.1.1 YOLOV5数据处理

数据集 官方CCPD数据https://github.com/detectRecog/CCPD

  1. CCPD数据集中图片名称包含车牌框box的位置信息和车牌号,数据处理的目的是把获取车牌的中心点及高宽在图像中的相对位置并以txt格式保存。
    在这里插入图片描述
  2. 代码
import shutil
import cv2
import os

def txt_translate(path, txt_path):
   ''' 根据图片的地址获取车牌的左上角和右下角坐标,把左上角和右下角坐标转成中心点和宽高格式,最后中心点和宽高格式除以图片的宽高以.txt格式保存在指定位置'''
   for filename in os.listdir(path):
       print(filename)
       if not "-" in filename: # 对于np等无标签的图片,过滤
           continue
       subname = filename.split("-", 3)[2]  # 第一次分割,以减号'-'做分割,提取车牌两角坐标. '231&522_405&574'
       extension = filename.split(".", 1)[1] #判断车牌是否为图片
       if not extension == 'jpg':
           continue
       lt, rb = subname.split("_", 1)  # 第二次分割,以下划线'_'做分割
       lx, ly = lt.split("&", 1) # 左上角坐标
       rx, ry = rb.split("&", 1) # 右下角坐标
       width = int(rx) - int(lx) # 车牌宽度
       height = int(ry) - int(ly)  # bounding box的宽和高
       cx = float(lx) + width / 2
       cy = float(ly) + height / 2  # bounding box中心点

       img = cv2.imread(os.path.join(path , filename))
       if img is None:  # 自动删除失效图片(下载过程有的图片会存在无法读取的情况)
           os.remove(os.path.join(path, filename))
           continue
       width = width / img.shape[1]
       height = height / img.shape[0]
       cx = cx / img.s
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值