【项目复现】文字点选验证码识别

一、项目背景

随着互联网技术的快速发展,网络安全问题日益严重。为了保护网站和在线服务的安全,许多网站采用了验证码技术来防止恶意攻击,如防止暴力破解、垃圾邮件发送等。文字点选验证码是其中一种常见的验证码形式,它要求用户识别并选择出图片中的特定文字,以确认操作者是人类而非自动化程序。

然而,随着深度学习和计算机视觉技术的进步,传统的文字点选验证码逐渐暴露出安全隐患。一些先进的算法和技术已经能够破解这类验证码,从而绕过安全防护。因此,研究和开发更高效、更安全的文字点选验证码识别技术变得尤为重要。本项目复现了飞桨PaddlePaddle开源社区项目(原项目地址在附录中),旨在实现文字点选验证码识别,提高验证码识别的准确性和效率。项目的主要组成部分包括:

  • 目标检测:本项目使用YOLOv5实现目标检测训练。YOLOv5是一种实时目标检测算法,具有较高的准确性和实时性能。在本项目中,我们将使用YOLOv5对验证码图片中的文字进行检测和定位。
  • 特征提取:本项目使用Insightface + Triplet Loss实现特征提取训练。Insightface是一个用于人脸识别的深度学习框架,具有强大的特征提取能力。项目结合Triplet Loss进行特征提取训练,以提高文字识别的准确性。
  • 模型部署:本项目采用ONNX部署 + OpenVINO量化部署的方案。为了实现高性能的推理和部署,项目使用ONNX格式对训练好的模型进行转换,并通过OpenVINO工具套件进行量化部署,以实现在不同硬件平台上的高效运行。

项目通过输入验证码图片,通过YoloV5进行目标检测,识别到汉字目标,通过Insightface、Triplet Loss实现文字的特征识别,形成特征向量。通过计算特征向量的相似度得到相似度矩阵,最终输出识别结果并进行推理部署,进一步优化模型速度。

二、环境配置

1.换源

为了保证稳定性,我们将清华源换成百度源

!pip config set global.index-url https://mirror.baidu.com/pypi/simple

换源后需要重启内核,以更新系统配置

三、数据处理

1.数据集概况

数据通过真实的验证码应用场景中抓取得到,共含有带标注图片355张,无标注图片14张,纯背景图片15张。其中标注图片包括含语序的文字以及无序的文字图片。

图片的标注信息可以通过shapes列表查看。其中Text表示每个目标对应的文字,Points表示每个目标框的左上角和右下角坐标(数据集公开,同样放到附录中)

. . . 
"shapes": [
    {
      "label": "target",
      "text": "鸡",
      "points": [[182.0, 136.0], [247.0, 202.0]],
      "group_id": null,
      "shape_type": "rectangle",
      "flags": {}
    },
 . . . 

2.数据集解压

# 数据集解压
!unzip -o -q -d /home/aistudio/data /home/aistudio/data/data222386/captcha_click.zip

3.数据集格式转换

因为我们目标检测使用的是YoloV5模型,所以我们需要将数据转换为Yolo可以识别的格式,并划分为训练集和验证集

import json
import os
import random
import shutil


def xyxy2xywh(xyxy, img_size):
    img_w, img_h = img_size
    (x1, y1), (x2, y2) = xyxy
    x_c = (x1 + x2) / 2 / img_w
    y_c = (y1 + y2) / 2 / img_h
    h = (y2 - y1) / img_h
    w = (x2 - x1) / img_w
    return x_c, y_c, w, h


data_dir = 'data/captcha_click'
train_dir = 'datasets/train'
val_dir = 'datasets/val'
save_path = 'data/labels'
for folder in [train_dir, val_dir, save_path]:
    if not os.path.exists(folder):
        os.makedirs(folder)
split_ratio = 0.9  # 0.9 for training, 0.1 for validation
class_dict = {'target': 0, 'char': 1}

# 转yolo格式
for img_name in os.listdir(data_dir):
    if '.jpg' in img_name:
        annotation = os.path.join(data_dir, img_name.replace('.jpg', '.json'))
        with open(annotation, 'r', encoding='utf-8') as f:
            img_labels = json.load(f)
        img_h, img_w = (img_labels['imageHeight'], img_labels['imageWidth'])
        shapes = img_labels['shapes']

        save_label = os.path.join(save_path, img_name.replace('.jpg', '.txt'))
        f = open(save_label, 'w', encoding='utf-8')
        box_list = []
        for shape in shapes:
            label = shape['label']
            class_index = class_dict[label]
            points = shape['points']
            x_c, y_c, w, h = xyxy2xywh(points, (img_w, img_h))
            box_list.append(f'{class_index} {x_c} {y_c} {w} {h}')
        f.write('\n'.join(box_list))
        f.close()

# 划分训练集
label_list = os.listdir(save_path)
random.seed(100)
random.shuffle(label_list)
for set in [train_dir, val_dir]:
    for folder in ['labels/', 'images/']:
        path = os.path.join(set, folder)
        if not os.path.exists(path):
            os.mkdir(path)
# 训练集
train_set = label_list[:int(len(label_list)*split_ratio)]
print('train set num:', len(train_set))
for item in train_set:
    label_file = os.path.join(save_path, item)
    img_file = os.path.join(data_dir, item.replace('.txt', '.jpg'))
    train_label_path = os.path.join(train_dir, 'labels/' + item)
    train_img_path = os.path.join(train_dir, 'images/' + item.replace('.txt', '.jpg'))
    shutil.copy(label_file, train_label_path)
    shutil.copy(img_file, train_img_path)
# 验证集
val_set = label_list[int(len(label_list)*split_ratio):]
print('val set num:', len(val_set))
for item in val_set:
    label_file = os.path.join(save_path, item)
    img_file = os.path.join(data_dir, item.replace('.txt', '.jpg'))
    train_label_path = os.path.join(val_dir, 'labels/' + item)
    train_img_path = os.path.join(val_dir, 'images/' + item.replace('.txt', '.jpg'))
    shutil.copy(label_file, train_label_path)
    shutil.copy(img_file, train_img_path)

4.调整训练集

为了提高检测性能,减少误检,提高精度,项目将背景图片和空标签也添加到训练集中,共形成了15组。

# 添加背景图片到训练集
import os
import shutil

bg_img_path = 'data/backgrounds/images'
train_img_path = 'datasets/train/images'
count = 0
for item in os.listdir(bg_img_path):
    if '.jpg' in item:
        img_path = os.path.join(bg_img_path, item)
        shutil.copy(img_path, os.path.join(train_img_path, item))
        count += 1
print(f'add {count} images')

检查下训练集数量

# 统计下训练集数量是否正确:319+15
!cd datasets/train/images && ls -l|grep "^-"| wc -l
!cd datasets/train/labels && ls -l|grep "^-"| wc -l

5.配置数据集文件

%%writefile yolov5-Paddle/data/captcha.yaml

path: /home/aistudio/datasets  # dataset root dir
train: train
val: val  # 36 images
test:
# Classes
names:
  0: target
  1: char

四、YoloV5目标检测训练

1.模型训练

目标检测模型选择YoloV5s,学习率采用cos衰减策略,添加Arial.tff,预训练权重到Yolo5-Paddle文件夹中

# 开始训练,如有依赖包错误请重启内核后再次运行
!cd yolov5-Paddle && \
python train.py --data captcha.yaml --img 320 --epochs 50 --cfg yolov5s.yaml --cos-lr --weights yolov5s.pdparams  --batch-size 128

2.模型验证

模型验证的目的是验证训练得到的模型在测试数据集上的性能。由于数据集中target类目标分散,char类目标可能会出现重叠,但重叠部分较少,即IOU较小。所以我们适当调小IOU阈值,以调高准确率,在本项目中将阈值设置为0.3,即IOU超过0.3就认为是同一目标。同时,因为模型置信度较高,即适当提高置信度阈值,项目中设置为0.4

# 验证,在--weights后更换自己想要验证的模型地址
!cd yolov5-Paddle && \
python val.py --data captcha.yaml --weights runs/train/exp2/weights/best.pdparams --img 320 --conf 0.4 --iou-thres 0.3

3.onnx模型转换

为了将模型部署到onnx上,我们需要先将模型导出为onnx可以识别的格式

# --weights后修改权重路径为自己想要转换的模型权重
!cd yolov5-Paddle && \
python export.py --weights runs/train/exp2/weights/best.pdparams --include onnx --img 320 --opset 11

4.测试ONNX

4.1配置测试环境

# 安装onnxruntime
!pip install onnxruntime-gpu==1.9

4.2配置环境

import onnxruntime
import cv2
import matplotlib.pyplot as plt
import numpy as np
from siamese_network.utils import pre_process, get_color_list, post_process, draw_boxes
%matplotlib inline

4.3数据预处理

img_path = 'data/test/img_1000.jpg'
img_raw = cv2.imread(img_path)
img_data, scale, padd_data = pre_process(img_raw, img_size=320)

4.4onnx推理

# 加载 ONNX 模型生成推理用 sess,更换自己的onnx文件
onnx_path = "yolov5-Paddle/runs/train/exp2/weights/best.onnx"
sess = onnxruntime.InferenceSession(onnx_path)

# 使用 ONNXRuntime 推理
ort_inputs = {sess.get_inputs()[0].name: img_data}
result = sess.run(None, ort_inputs)
result = np.array(result)  # (1, 1, 6300, 7)
result = np.squeeze(result)

4.5数据后处理

# 后处理
confidence = 0.4
iou = 0.3
hw = img_raw.shape[:2]
boxes, confs, classes = post_process(result, confidence, iou, scale, padd_data[0], padd_data[1], hw)

print(confs)
print(classes)

4.6可视化

# 可视化
label_list = list(range(2))
colors = get_color_list(label_list)
img = draw_boxes(img_raw, boxes, confs, classes, colors, thickness=2)
img = img[:, :, ::-1]
plt.imshow(img)
plt.show()

五、特征提取

1.准备数据集

从训练集中切割出字体图片作为训练样本,统计样本的宽高以及各标签分布。

# data/captcha_click
# 总计有1032对样本,分为451个不同的字符;其中只出现一次的有269个,而出现最多的一个字有24次
"""
    裁切图片 为siamese构建数据集
"""
import json
import os
from collections import defaultdict
import numpy as np
import cv2
import matplotlib.pyplot as plt
%matplotlib inline


def crop_gtbox(xyxy, img_file):
    image = cv2.imread(img_file)
    x1, y1, x2, y2 = xyxy
    cropped_image = image[y1:y2, x1:x2]
    return cropped_image

data_file = 'data/captcha_click'
file_list = os.listdir(data_file)

# 孪生网络的训练数据存放路径
save_path = 'datasets/data_siamese'
if not os.path.exists(save_path):
    os.makedirs(save_path)

# 保存宽高信息用于分析
w_list = []
h_list = []
data_dict = {'char': defaultdict(list), 'target': defaultdict(list)}
count = 0
for file in file_list:
    if '.json' in file:
        file_path = os.path.join(data_file, file)
        file_name = file.split('.')[0]
        with open(file_path, 'r', encoding='utf-8') as f:
            img_labels = json.load(f)
        img_path = file_path.replace('.json', '.jpg')
        shapes = img_labels['shapes']
        x1_list = []
        char_list = []
        for item in shapes:
            text = item['text']
            text_unicode = text.encode('unicode_escape').decode().replace('\\', '')
            label = item['label']
            # 裁切出的图片的命名格式
            new_img_name = f'{label}_{text_unicode}_{file_name}.jpg'
            save_file = os.path.join(save_path, new_img_name)
            if label == 'char':
                data_dict['char'][text].append(new_img_name)
            elif label == 'target':
                data_dict['target'][text].append(new_img_name)
            count += 1
            points = item['points']
            np_data = np.array(points, dtype=np.int64)
            w = np_data[1][0] - np_data[0][0]
            h = np_data[1][1] - np_data[0][1]
            w_list.append(w)
            h_list.append(h)
            xyxy = np_data.reshape((-1,))
            # 裁切图片并保存
            crop_img = crop_gtbox(xyxy, img_path)
            cv2.imwrite(save_file, crop_img)
        
with open('datasets/data_annotation.json', 'w', encoding='utf-8') as f:
    json.dump(data_dict, f, indent=2, ensure_ascii=False)

print(f'总计样本{count}个, 分为不同的字符char{len(data_dict["char"])}个, target{len(data_dict["target"])}')

num_char_list = [len(data_dict["char"][i]) for i in data_dict["char"].keys()]
# 可视化
# 宽高分布
plt.figure(figsize=(12, 5))
plt.subplot(121)
plt.scatter(w_list, h_list)
plt.xlabel('Width')
plt.ylabel('Height')

# 字符数量分布统计
plt.subplot(122)
data = np.array(num_char_list)
x, y = np.unique(data, return_counts=True)
plt.bar(range(len(x)), y)
plt.xticks(range(len(x)), x)
plt.xlabel('character repetitions')
plt.ylabel('count')
plt.show()

通过统计图可以看出,汉字分布不均匀,出现次数为1的汉字占了样本大多数,同时也存在一些汉字出现了很多次。所以考虑对样本量较少的汉字进行样本扩充。

2.样本扩充

2.1配置样本扩充环境

!pip install fontTools

2.2数据扩充

从只出现1次的汉字中选择一部分作为验证集,将其他汉字样本扩充至50张作训练集

# 将验证集按anchor, positive, negative的方式组成固定的样本对
# 将其他汉字进行扩充

import os
import numpy as np
from PIL import Image, ImageDraw, ImageFont
import cv2
import random
import colorsys
from fontTools.ttLib import TTFont
import json

def check_character_in_font(ttfont, character):
    for table in ttfont['cmap'].tables:
        if ord(character) in table.cmap.keys():
            return True
    return False

def crop_gtbox(xyxy, img_file):
    image = cv2.imread(img_file)
    x1, y1, x2, y2 = xyxy
    cropped_image = image[y1:y2, x1:x2]
    return cropped_image

def random_color():
    rgb = colorsys.hsv_to_rgb(random.random(), 1, 1)
    result = (int(rgb[0] * 255), int(rgb[1] * 255), int(rgb[2] * 255))
    return result

def random_crop_bg(bg_path, size):
    bg_imgs = os.listdir(bg_path)
    while True:
        bg_img = random.choice(bg_imgs)
        if '.jpg' in bg_img:
            break
    img_path = os.path.join(bg_path, bg_img)
    bg_data = cv2.imread(img_path)
    # 随机翻转或旋转
    i = random.random()
    if i < 0.5:
        # 旋转图像90度(逆时针方向)
        bg_data = cv2.rotate(bg_data, cv2.ROTATE_90_COUNTERCLOCKWISE)
    i = random.random()
    if i < 0.5:
        # 水平翻转图像
        bg_data = cv2.flip(bg_data, 1)
    h, w = bg_data.shape[:2]
    side_len = min(h, w)
    image_rgb = cv2.cvtColor(bg_data, cv2.COLOR_BGR2RGB)
    left_top = (random.randint(0, side_len - size), random.randint(0, side_len - size))
    crop_img = image_rgb[left_top[1]:left_top[1] + size, left_top[0]:left_top[0] + size]
    return crop_img

def random_chinese():
    # 随机生成一个汉字的 Unicode 编码范围
    char_code = random.randint(0x4E00, 0x9FA5)
    # 使用 Unicode 编码创建汉字
    character = chr(char_code)
    return character

def create_a_target(font_style, font_color, char, bg_img, rotate=(-60, 60), scale=(1, 1)):
    img_size = bg_img.shape[:2][::-1]  # hw to wh
    crop_img = Image.fromarray(bg_img)
    draw = ImageDraw.Draw(crop_img)
    draw.text((0, 0), char, font_color, font=font_style)
    char_data = np.array(crop_img)
    char_box = font_style.getbbox(char)
    x_center = char_box[2] / 2
    y_center = char_box[3] / 2
    center = (x_center, y_center)
    angle = random.randint(rotate[0], rotate[1])
    scale = random.uniform(scale[0], scale[1])
    M = cv2.getRotationMatrix2D(center, angle, scale)
    rotated = cv2.warpAffine(char_data, M, img_size, flags=cv2.INTER_AREA, borderMode=cv2.BORDER_REPLICATE)
    return rotated

# 将451个字符划分为训练集和测试集,作为验证集每个字符只需要一对样本对即可
# 为了不浪费数据,这里从重复次数为1的字符中挑选验证集数据
annotation_file = 'datasets/data_annotation.json'
save_path = 'datasets/annotations'
save_eval_file = 'datasets/annotations/eval_samples.txt'
save_train_file = 'datasets/annotations/train_annotation.json'
if not os.path.exists(save_path):
    os.mkdir(save_path)

split_ratio = 0.1
with open(annotation_file, 'r', encoding='utf-8') as f_anno:
    json_file = json.load(f_anno)
char_set1 = []
for char, file_list in json_file['char'].items():
    if len(file_list) == 1:
        char_set1.append(char)
random.seed(0)
random.shuffle(char_set1)
eval_set = char_set1[:int(len(json_file['char'])*split_ratio)]
print(f'length of eval_set:{len(eval_set)}')

# 验证集样本对
with open(save_eval_file, 'w', encoding='utf-8') as f_eval:
    for char_name in eval_set:
        negative = random.choice(eval_set)
        while negative == char_name:
            negative = random.choice(eval_set)
            
        char_img = json_file['char'][char_name][0]
        target_img = json_file['target'][char_name][0]
        negative_img = json_file['target'][negative][0]
        f_eval.write(f'{char_img} {target_img} {negative_img}\n')

# 补充训练用的字符数量
char_size = 36
target_size = 64
char_font1 = 'font_files/konxin.ttf'
char_font2 = 'font_files/ShuangXianTiJian.ttf'
target_font1 = 'font_files/simhei.ttf'
target_font2 = 'font_files/simsun.ttf'
style1 = ImageFont.truetype(char_font1, char_size)
style2 = ImageFont.truetype(char_font2, char_size)
style3 = ImageFont.truetype(target_font1, target_size)
style4 = ImageFont.truetype(target_font2, target_size)

bg_path = 'data/backgrounds/images'
# 生成字符的数据,char标签,target标签
save_path = 'datasets/data_siamese'
if not os.path.exists(save_path):
    os.mkdir(save_path)
chinese_list = []

font1 = TTFont(char_font1)
font2 = TTFont(char_font2)
font3 = TTFont(target_font1)
font4 = TTFont(target_font2)

for char, img_list in json_file['char'].items(): 
    if char in eval_set:
        continue
    # 判断字体文件是否存在对应的文字
    char_list = []
    if check_character_in_font(font1, char):
        char_list.append(style1)
    if check_character_in_font(font2, char):
        char_list.append(style2)
    assert len(char_list) != 0, f'{char} not in char_font files'
    target_list = []
    if check_character_in_font(font3, char):
        target_list.append(style3)
    if check_character_in_font(font4, char):
        target_list.append(style4)
    assert len(target_list) != 0, f'{char} not in target_font files'

    text = char.encode('unicode_escape').decode().replace('\\', '')
    num = 0
    while len(img_list) < 50:
        num += 1
        # char部分
        char_style = random.choice(char_list)
        char_img_name = f'char_{text}_add-{num}.jpg'
        color1 = (255, 255, 255)
        img_data = Image.new("RGB", (char_size, char_size), color1)
        bg_data = np.array(img_data)
        color2 = random_color()
        # 防止背景色与前景色无法区分
        while sum([(color1[i]-color2[i])**2 for i in range(3)]) < 1200:
            color2 = random_color()
        char_data = create_a_target(char_style, color2, char, bg_data, (-45, 45))
        char_img = Image.fromarray(char_data)
        save_file = os.path.join(save_path, char_img_name)
        char_img.save(save_file)
        # target部分
        target_style = random.choice(target_list)
        target_img_name = f'target_{text}_add-{num}.jpg'
        crop_img = random_crop_bg(bg_path, target_size)
        color = random_color()
        char_data = create_a_target(target_style, color, char, crop_img, (-45, 45))
        target_img = Image.fromarray(char_data)
        save_file = os.path.join(save_path, target_img_name)
        target_img.save(save_file)
        # 更新annotation
        json_file['char'][char].append(char_img_name)
        json_file['target'][char].append(target_img_name)

for char in eval_set:
    del json_file['char'][char]
    del json_file['target'][char]
with open(save_train_file, 'w', encoding='utf-8') as f:
    json.dump(json_file, f, indent=2, ensure_ascii=False)

2.3保存label

# 保存label文件,Insightfaceloss需要用到
annotation_file = 'datasets/annotations/train_annotation.json'
save_label = 'datasets/annotations/text_label.txt'
with open(annotation_file, 'r', encoding='utf-8') as f:
    datas = json.load(f)
with open(save_label, 'w', encoding='utf-8') as f_label:
    index = 0
    for key in datas['char'].keys():
        f_label.write(f'{key} {index}\n')
        index += 1
# 查看扩充后的特征提取所用数据集的样本数目,406*2*50 + 45*2*1 = 40690
!cd datasets/data_siamese && ls -l|grep "^-"| wc -l

3.检查数据

检查数据增强效果

import matplotlib.pyplot as plt
from siamese_network.data_loader import CaptchaDataset, MyResize
import paddle.vision.transforms as T
import random
%matplotlib inline

image_path = 'datasets/data_siamese'
annotation_file = 'datasets/annotations/train_annotation.json'

classifies_file = 'datasets/annotations/text_label.txt'
label_list = []
with open(classifies_file, 'r') as f:
    datas = f.readlines()
    for item in datas:
        text_label = item.strip()
        label_list.append(text_label)

transforms = T.Compose([
    T.ColorJitter(0.2, 0.1, 0.4, 0.4),
    T.RandomRotation(60, 'bilinear'),
    MyResize(70),
    T.RandomCrop(64),
    T.RandomErasing(scale=(0.02, 0.2)),
])
dataset = CaptchaDataset(image_path, annotation_file, label_list, transforms)
random_index = random.randint(0, len(dataset))
data = dataset[random_index]
img_list, _ = data
for i in range(3):
    plt.subplot(1, 3, i+1)
    plt.imshow(img_list[i])
plt.show()

4.训练特征提取网络

!python siamese_network/train.py --savefolder 'siamese_network/checkpoint' \
                                 --feature_dim 64 \
                                 --epoch 100 \
                                 --lr 0.001 \
                                 --weight_decay 0.01 \
                                 --triplet_margin 0.3 \
                                 --insightface_loss_param 1.0 0.5 0 10 \
                                 --batch_size 128 \

5.可视化

# 训练过程的可视化
import matplotlib.pyplot as plt
%matplotlib inline

log_file = 'siamese_network/checkpoint/train_log.txt'
# log_file = 'siamese_network/checkpoint/copy/train_log.txt'
with open(log_file, 'r', encoding='utf-8') as f:
    raw = f.read()
items = raw.strip().split()
train_iter = []
train_loss = []
train_loss1 = []
train_loss2 = []
train_acc = []
eval_iter = []
eval_loss = []
eval_acc = []
for item in items:
    datas = item.split('@')
    if item.startswith('train'):
        _, iter_num, loss, loss1, loss2, acc = datas
        train_iter.append(int(iter_num))
        train_loss.append(float(loss))
        train_loss1.append(float(loss1))
        train_loss2.append(float(loss2))
        train_acc.append(float(acc))
    elif item.startswith('eval'):
        _, iter_num, loss, acc = datas
        eval_iter.append(int(iter_num))
        eval_loss.append(float(loss))
        eval_acc.append(float(acc))
plt.figure(figsize=(12, 5))
plt.subplot(131)
plt.plot(train_iter, train_loss2, label='trainloss_InsightfaceLoss')
plt.title("InsightfaceLoss vs iter")
plt.legend()
plt.subplot(132)
plt.plot(train_iter, train_loss1, label='trainloss_Triplet')
plt.plot(eval_iter, eval_loss, label='eval_loss_Triplet')
plt.title("TripletLoss vs iter")
plt.legend()
plt.subplot(133)
plt.plot(train_iter, train_acc, label='train_Acc')
plt.plot(eval_iter, eval_acc, label='eval_Acc')
plt.title('Acc vc iter')
plt.legend()
plt.show()

6.导出onnx

from siamese_network.siamese_resnet18 import SiameseNet18
import paddle

# 修改为自己要加载的权重
param_path = 'siamese_network/checkpoint/last.pdparams'
# param_path = 'siamese_network/checkpoint/best.pdparams'
params = paddle.load(param_path)
model = SiameseNet18(64)
model.set_state_dict(params)
# 保存的路径
save_path = 'siamese_network/onnx_save/SiameseNet18' 
x_spec = paddle.static.InputSpec([None, 3, 64, 64], 'float32', 'x') # 为模型指定输入的形状和数据类型,支持持 Tensor 或 InputSpec ,InputSpec 支持动态的 shape。
paddle.onnx.export(model, save_path, input_spec=[x_spec], opset_version=11)

7.测试onnx

import onnxruntime
import cv2
import matplotlib.pyplot as plt
import numpy as np
from siamese_network.utils import pre_process
%matplotlib inline


def cos_similar(v1, v2):
    dot = float(np.dot(v1, v2))
    mold = np.linalg.norm(v1) * np.linalg.norm(v2)
    return 0.5 + 0.5 * (dot / (mold + 1E-6))  # [0,1] 0相反, 1相同

# 数据读取与预处理
# 更换为自己验证集中的样本对
anchor_img = 'datasets/data_siamese/char_u6324_img_3450.jpg'
positive_img = 'datasets/data_siamese/target_u6324_img_3450.jpg'
negative_img = 'datasets/data_siamese/target_u6f6e_img_3407.jpg'

anchor_raw = cv2.imread(anchor_img)
positive_raw = cv2.imread(positive_img)
negative_raw = cv2.imread(negative_img)

img_size = 64
anchor_data, _, _ = pre_process(anchor_raw, img_size)
positive_data, _, _ = pre_process(positive_raw, img_size)
negative_data, _, _ = pre_process(negative_raw, img_size)
input_data = np.concatenate((anchor_data, positive_data, negative_data), axis=0)

# 加载onnx模型
onnx_path = 'siamese_network/onnx_save/SiameseNet18.onnx'
sess = onnxruntime.InferenceSession(onnx_path)

# onnx推理
ort_inputs = {sess.get_inputs()[0].name: input_data}
result = sess.run(None, ort_inputs)
result = np.array(result)
result = np.squeeze(result)  # [3, 64]

# 相似度计算
anchor_feature, positive_feature, negative_feature = result

sim1 = cos_similar(anchor_feature, positive_feature)
sim2 = cos_similar(anchor_feature, negative_feature)
threshold = 0.7
print(f'anchor & positive: {sim1>threshold}, cos_similar:{sim1}')
print(f'anchor & negative: {sim2>threshold}, cos_similar:{sim2}')

img_list = [anchor_raw[:, :, ::-1], positive_raw[:, :, ::-1], negative_raw[:, :, ::-1]]
for i, img in enumerate(img_list):
    plt.subplot(1,3,i+1)
    plt.imshow(img)
plt.show()

六、部署测试

可视化检测结果

# onnx测试
import onnxruntime
import cv2
import matplotlib.pyplot as plt
import numpy as np
from siamese_network.utils import pre_process, post_process, draw_boxes
import os
from siamese_network.data_loader import MyResize
from siamese_network.siamese_resnet18 import SiameseNet18
%matplotlib inline
np.set_printoptions(precision=2, suppress=True)

def crop_gtbox_nd(xyxy, img_ndarray):
    x1, y1, x2, y2 = xyxy
    cropped_data = img_ndarray[int(y1):int(y2), int(x1):int(x2)]
    return cropped_data

def postprocess_sort(sim_matrix):
    # 后面有乘法操作,避免0影响,这里将index全部加1
    max_arg = np.argsort(sim_matrix, axis=-1) + 1
    # 初始化keep matrix
    matrix1 = np.zeros_like(sim_matrix)
    matrix1[:, -1] += 1
    keep_matrix = matrix1 > 0
    unique_values, counts = np.unique(max_arg[keep_matrix], return_counts=True)
    repeat_index = unique_values[counts > 1]
    # 如果有重复的索引,取它们之中值最大的那一个,其他的索引向后顺移一位,再次统计是否有重复
    while len(repeat_index):
        for index in repeat_index:
            # argmax矩阵中相同索引,需要重新对比,用一个mask保存位置
            recompare_mask = max_arg*keep_matrix==index
            recompare_rows = np.sum(recompare_mask, axis=-1, keepdims=True)
            # 相同索引中最大值的位置,实际index还原需要减去1
            max_index = np.argmax((sim_matrix*recompare_rows)[:, index-1])
            # 保留需要变更的位置
            recompare_rows[max_index] = 0
            # 先在keep_matrix中将需要变更的位置置零
            to_zero = recompare_mask*recompare_rows
            keep_matrix = np.logical_xor(to_zero, keep_matrix)
            # 变更的位置取下一个最大索引,这里用数据滚动的方式实现
            array_new = np.roll(to_zero, -1, axis=-1)
            # 更新keep_matrix
            keep_matrix = np.logical_or(array_new, keep_matrix)
            # print(keep_matrix)
        # 使用新的keep_matrix再次查看是否有相同索引
        # print(keep_matrix)
        unique_values, counts = np.unique(max_arg[keep_matrix], return_counts=True)
        repeat_index = unique_values[counts > 1]
    order = max_arg[keep_matrix] - 1
    sim_list = sim_matrix[np.arange(len(order)), order]
    return order, sim_list

def draw_order(img, box_items, order_list, sim_list):
    for i, target in enumerate(order_list):
        box = box_items[target]['box']
        center = (int(box[0]), int((box[1]+box[3])/2))
        cv2.putText(img, f'{i}:{sim_list[i]:.2f}', center, cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
    return img

def cos_sim_metrix(m1, m2):
    # 本项目中特征的模长已经被网络归一化为1,这里就没有再除以模长的乘积
    dot = np.dot(m1, m2.T)
    return dot*0.5 + 0.5  # 缩放到[0,1] 0表示向量相反,0.5向量垂直,1向量相同

# 加载 ONNX 模型生成推理用 sess 换成自己训练模型的onnx路径
# 目标检测模型
det_onnx = "yolov5-Paddle/runs/train/exp3/weights/best.onnx"
sess_det = onnxruntime.InferenceSession(det_onnx)
# 特征提取模型
feature_onnx = "siamese_network/onnx_save/SiameseNet18.onnx"
sess_feature = onnxruntime.InferenceSession(feature_onnx)

show_img  = []
sims = []
img_path = 'data/test'
# img_path = 'test_new'
imgs = sorted(os.listdir(img_path))
for img_name in imgs:
    if '.jpg' not in img_name:
        continue
    img_file = os.path.join(img_path, img_name)
    img_raw = cv2.imread(img_file)  # BGR
    img_hw = img_raw.shape[:2]
    img_data, scale, padd_data = pre_process(img_raw, img_size=320)  # RGB

    # 使用 ONNXRuntime 推理
    ort_inputs = {sess_det.get_inputs()[0].name: img_data}
    result = sess_det.run(None, ort_inputs)
    result = np.array(result)  # (1, 1, 6300, 7)
    result = np.squeeze(result)
    # 后处理
    confidence = 0.4
    iou = 0.3
    boxes, confs, classes = post_process(result, confidence, iou, scale, padd_data[0], padd_data[1], img_hw)
    # 文字匹配部分
    # 裁切出文字图片
    target_crop_item = []
    char_list = []
    for index, box in enumerate(boxes):
        if classes[index] == 0:
            # 固定target顺序
            crop_img = crop_gtbox_nd(box, img_raw)  # BGR
            target_data, _, _ = pre_process(crop_img, img_size=64)  # RGB
            target_crop_item.append({'box': box, 'crop_img': target_data})
        else:
            crop_img = crop_gtbox_nd(box, img_raw)
            char_data, _, _ = pre_process(crop_img, img_size=64)
            char_list.append((box, char_data))

    # 从左到右排序char框
    sorted_pairs = sorted(char_list, key=lambda x: x[0][0])
    _, char_crops = zip(*sorted_pairs)
    target_crop_list = [target_crop_item[i]['crop_img'] for i in range(len(target_crop_item))]
    # 组合target和char一起送入模型提取特征
    datas = np.concatenate(list(char_crops)+target_crop_list, axis=0)
    # 特征提取
    feature_inputs = {sess_feature.get_inputs()[0].name: datas}
    result = sess_feature.run(None, feature_inputs)
    result = result[0]
    # 特征提取后处理
    char_feature = result[:len(char_crops)]
    target_feature = result[len(char_crops):]
    # 特征匹配
    result_matrix = cos_sim_metrix(char_feature, target_feature)
    # print(result_matrix)
    order_list, sim_list = postprocess_sort(result_matrix)    # 检测框可视化
    sims.extend(sim_list)
    img = draw_boxes(img_raw, boxes, confs, classes, [(255,255,0), (0,255,0)], thickness=1)
    # 检测框顺序可视化
    img = draw_order(img, target_crop_item, order_list, sim_list)
    show_img.append(img)

# 保存预测可视化结果
height = 384
width = 344
rows = 3
cols = 5
canvas = np.full((rows * height, cols * width, 3), 255, dtype=np.uint8)
for i, image in enumerate(show_img):
    row = i // cols
    col = i % cols
    image = cv2.resize(image, (width, height))
    x_start = col * width
    y_start = row * height
    x_end = x_start + width
    y_end = y_start + height
    canvas[y_start:y_end, x_start:x_end] = image
cv2.imwrite('result.jpg', canvas)
plt.figure(figsize=(12, 18))
canvas = canvas[:, :, ::-1]
plt.imshow(canvas)
plt.axis('off')
plt.show()

调整输出结果,输出为按语序的box列表,box为目标左上角坐标和右下角坐标

from onnx_test import ClickCaptcha
import time
import os
import json
session = ClickCaptcha(use_gpu=True)

img_path = 'data/test'
for img_name in os.listdir(img_path):
    img_file = os.path.join(img_path, img_name)
    with open(img_file, 'rb') as f:
        img_bytes = f.read()
    start_time = time.time()
    datas = session.run_inference(img_bytes)
    end_time = time.time()
    print('time:{}ms'.format((end_time-start_time) * 1000))
    result = json.loads(datas)
    print(result)

七、结论与展望

  1. 数据集的重要性:在项目复现过程中,发现数据集的质量和数量对模型性能有很大影响。为了提高特征提取网络的训练效果,需要花费大量时间收集和整理了更多的训练数据、进行数据的调优

  2. 调试与优化:在实际项目中,调试和优化是提高模型性能的关键环节。不能跑完模型就完事了

  3. 持续改进:虽然我们已经成功复现了该项目,但我们意识到,特征提取网络的性能仍有提升空间。在后续工作中,我们将继续收集更多的训练数据,优化模型结构和参数,以期获得更好的识别效果。

八、附录

原项目地址:文字点选验证码识别

数据集地址:文字点选验证码识别数据集

项目首发于飞桨PaddlePaddle平台:【项目复现】文字点选验证码识别

作者:鸿源

  • 17
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
为了复现deepsort项目,你可以按照以下步骤进行操作: 1. 首先,你需要下载deepsort项目的源代码。你可以在GitHub上找到deepsort项目的仓库,并将代码下载到本地。 2. 接下来,确保你已经安装了Python、TensorFlow和PyCharm编译器。如果你还没有安装它们,你可以在官方网站上找到相应的安装包并进行安装。 3. 打开PyCharm,并创建一个新的Python项目。 4. 导入deepsort的源代码到你的项目中。 5. 根据你的需求,可能需要下载一些额外的资源。你可以在deepsort的仓库中找到相关的资源链接,并按照指示进行下载。 6. 了解deepsort的算法原理和代码结构。你可以阅读deepsort项目中的文档和注释,并参考相关的学术论文来深入理解该算法。 7. 根据你的需求和数据集,配置deepsort的参数。你可以根据文档中的说明来调整参数,以获得最佳的结果。 8. 运行deepsort项目,并根据你的数据集和任务进行测试和评估。 9. 如果你遇到任何问题或困惑,你可以参考deepsort项目的文档、GitHub仓库的讨论和其他相关资源,或者向社区求助。 10. 最后,根据你的实验和结果,你可以对deepsort项目进行改进或优化,并分享你的经验和成果。 总结起来,复现deepsort项目需要下载源代码,安装必要的软件和工具,了解算法原理和代码结构,配置参数,运行项目,并根据实验结果进行调整和改进。希望这些步骤对你有帮助!<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *3* [多目标追踪算法Deepsort(2)复现MOTA低解决方法](https://blog.csdn.net/dbdxwyl/article/details/118308750)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] - *2* [deep_sort项目复现——新手](https://blog.csdn.net/Nie2014/article/details/106735595)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Derek__Robbie

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值