一、项目背景
随着互联网技术的快速发展,网络安全问题日益严重。为了保护网站和在线服务的安全,许多网站采用了验证码技术来防止恶意攻击,如防止暴力破解、垃圾邮件发送等。文字点选验证码是其中一种常见的验证码形式,它要求用户识别并选择出图片中的特定文字,以确认操作者是人类而非自动化程序。
然而,随着深度学习和计算机视觉技术的进步,传统的文字点选验证码逐渐暴露出安全隐患。一些先进的算法和技术已经能够破解这类验证码,从而绕过安全防护。因此,研究和开发更高效、更安全的文字点选验证码识别技术变得尤为重要。本项目复现了飞桨PaddlePaddle开源社区项目(原项目地址在附录中),旨在实现文字点选验证码识别,提高验证码识别的准确性和效率。项目的主要组成部分包括:
- 目标检测:本项目使用YOLOv5实现目标检测训练。YOLOv5是一种实时目标检测算法,具有较高的准确性和实时性能。在本项目中,我们将使用YOLOv5对验证码图片中的文字进行检测和定位。
- 特征提取:本项目使用Insightface + Triplet Loss实现特征提取训练。Insightface是一个用于人脸识别的深度学习框架,具有强大的特征提取能力。项目结合Triplet Loss进行特征提取训练,以提高文字识别的准确性。
- 模型部署:本项目采用ONNX部署 + OpenVINO量化部署的方案。为了实现高性能的推理和部署,项目使用ONNX格式对训练好的模型进行转换,并通过OpenVINO工具套件进行量化部署,以实现在不同硬件平台上的高效运行。
项目通过输入验证码图片,通过YoloV5进行目标检测,识别到汉字目标,通过Insightface、Triplet Loss实现文字的特征识别,形成特征向量。通过计算特征向量的相似度得到相似度矩阵,最终输出识别结果并进行推理部署,进一步优化模型速度。
二、环境配置
1.换源
为了保证稳定性,我们将清华源换成百度源
!pip config set global.index-url https://mirror.baidu.com/pypi/simple
换源后需要重启内核,以更新系统配置
三、数据处理
1.数据集概况
数据通过真实的验证码应用场景中抓取得到,共含有带标注图片355张,无标注图片14张,纯背景图片15张。其中标注图片包括含语序的文字以及无序的文字图片。
图片的标注信息可以通过shapes列表查看。其中Text表示每个目标对应的文字,Points表示每个目标框的左上角和右下角坐标(数据集公开,同样放到附录中)
. . .
"shapes": [
{
"label": "target",
"text": "鸡",
"points": [[182.0, 136.0], [247.0, 202.0]],
"group_id": null,
"shape_type": "rectangle",
"flags": {}
},
. . .
2.数据集解压
# 数据集解压
!unzip -o -q -d /home/aistudio/data /home/aistudio/data/data222386/captcha_click.zip
3.数据集格式转换
因为我们目标检测使用的是YoloV5模型,所以我们需要将数据转换为Yolo可以识别的格式,并划分为训练集和验证集
import json
import os
import random
import shutil
def xyxy2xywh(xyxy, img_size):
img_w, img_h = img_size
(x1, y1), (x2, y2) = xyxy
x_c = (x1 + x2) / 2 / img_w
y_c = (y1 + y2) / 2 / img_h
h = (y2 - y1) / img_h
w = (x2 - x1) / img_w
return x_c, y_c, w, h
data_dir = 'data/captcha_click'
train_dir = 'datasets/train'
val_dir = 'datasets/val'
save_path = 'data/labels'
for folder in [train_dir, val_dir, save_path]:
if not os.path.exists(folder):
os.makedirs(folder)
split_ratio = 0.9 # 0.9 for training, 0.1 for validation
class_dict = {'target': 0, 'char': 1}
# 转yolo格式
for img_name in os.listdir(data_dir):
if '.jpg' in img_name:
annotation = os.path.join(data_dir, img_name.replace('.jpg', '.json'))
with open(annotation, 'r', encoding='utf-8') as f:
img_labels = json.load(f)
img_h, img_w = (img_labels['imageHeight'], img_labels['imageWidth'])
shapes = img_labels['shapes']
save_label = os.path.join(save_path, img_name.replace('.jpg', '.txt'))
f = open(save_label, 'w', encoding='utf-8')
box_list = []
for shape in shapes:
label = shape['label']
class_index = class_dict[label]
points = shape['points']
x_c, y_c, w, h = xyxy2xywh(points, (img_w, img_h))
box_list.append(f'{class_index} {x_c} {y_c} {w} {h}')
f.write('\n'.join(box_list))
f.close()
# 划分训练集
label_list = os.listdir(save_path)
random.seed(100)
random.shuffle(label_list)
for set in [train_dir, val_dir]:
for folder in ['labels/', 'images/']:
path = os.path.join(set, folder)
if not os.path.exists(path):
os.mkdir(path)
# 训练集
train_set = label_list[:int(len(label_list)*split_ratio)]
print('train set num:', len(train_set))
for item in train_set:
label_file = os.path.join(save_path, item)
img_file = os.path.join(data_dir, item.replace('.txt', '.jpg'))
train_label_path = os.path.join(train_dir, 'labels/' + item)
train_img_path = os.path.join(train_dir, 'images/' + item.replace('.txt', '.jpg'))
shutil.copy(label_file, train_label_path)
shutil.copy(img_file, train_img_path)
# 验证集
val_set = label_list[int(len(label_list)*split_ratio):]
print('val set num:', len(val_set))
for item in val_set:
label_file = os.path.join(save_path, item)
img_file = os.path.join(data_dir, item.replace('.txt', '.jpg'))
train_label_path = os.path.join(val_dir, 'labels/' + item)
train_img_path = os.path.join(val_dir, 'images/' + item.replace('.txt', '.jpg'))
shutil.copy(label_file, train_label_path)
shutil.copy(img_file, train_img_path)
4.调整训练集
为了提高检测性能,减少误检,提高精度,项目将背景图片和空标签也添加到训练集中,共形成了15组。
# 添加背景图片到训练集
import os
import shutil
bg_img_path = 'data/backgrounds/images'
train_img_path = 'datasets/train/images'
count = 0
for item in os.listdir(bg_img_path):
if '.jpg' in item:
img_path = os.path.join(bg_img_path, item)
shutil.copy(img_path, os.path.join(train_img_path, item))
count += 1
print(f'add {count} images')
检查下训练集数量
# 统计下训练集数量是否正确:319+15
!cd datasets/train/images && ls -l|grep "^-"| wc -l
!cd datasets/train/labels && ls -l|grep "^-"| wc -l
5.配置数据集文件
%%writefile yolov5-Paddle/data/captcha.yaml
path: /home/aistudio/datasets # dataset root dir
train: train
val: val # 36 images
test:
# Classes
names:
0: target
1: char
四、YoloV5目标检测训练
1.模型训练
目标检测模型选择YoloV5s,学习率采用cos衰减策略,添加Arial.tff,预训练权重到Yolo5-Paddle文件夹中
# 开始训练,如有依赖包错误请重启内核后再次运行
!cd yolov5-Paddle && \
python train.py --data captcha.yaml --img 320 --epochs 50 --cfg yolov5s.yaml --cos-lr --weights yolov5s.pdparams --batch-size 128
2.模型验证
模型验证的目的是验证训练得到的模型在测试数据集上的性能。由于数据集中target类目标分散,char类目标可能会出现重叠,但重叠部分较少,即IOU较小。所以我们适当调小IOU阈值,以调高准确率,在本项目中将阈值设置为0.3,即IOU超过0.3就认为是同一目标。同时,因为模型置信度较高,即适当提高置信度阈值,项目中设置为0.4
# 验证,在--weights后更换自己想要验证的模型地址
!cd yolov5-Paddle && \
python val.py --data captcha.yaml --weights runs/train/exp2/weights/best.pdparams --img 320 --conf 0.4 --iou-thres 0.3
3.onnx模型转换
为了将模型部署到onnx上,我们需要先将模型导出为onnx可以识别的格式
# --weights后修改权重路径为自己想要转换的模型权重
!cd yolov5-Paddle && \
python export.py --weights runs/train/exp2/weights/best.pdparams --include onnx --img 320 --opset 11
4.测试ONNX
4.1配置测试环境
# 安装onnxruntime
!pip install onnxruntime-gpu==1.9
4.2配置环境
import onnxruntime
import cv2
import matplotlib.pyplot as plt
import numpy as np
from siamese_network.utils import pre_process, get_color_list, post_process, draw_boxes
%matplotlib inline
4.3数据预处理
img_path = 'data/test/img_1000.jpg'
img_raw = cv2.imread(img_path)
img_data, scale, padd_data = pre_process(img_raw, img_size=320)
4.4onnx推理
# 加载 ONNX 模型生成推理用 sess,更换自己的onnx文件
onnx_path = "yolov5-Paddle/runs/train/exp2/weights/best.onnx"
sess = onnxruntime.InferenceSession(onnx_path)
# 使用 ONNXRuntime 推理
ort_inputs = {sess.get_inputs()[0].name: img_data}
result = sess.run(None, ort_inputs)
result = np.array(result) # (1, 1, 6300, 7)
result = np.squeeze(result)
4.5数据后处理
# 后处理
confidence = 0.4
iou = 0.3
hw = img_raw.shape[:2]
boxes, confs, classes = post_process(result, confidence, iou, scale, padd_data[0], padd_data[1], hw)
print(confs)
print(classes)
4.6可视化
# 可视化
label_list = list(range(2))
colors = get_color_list(label_list)
img = draw_boxes(img_raw, boxes, confs, classes, colors, thickness=2)
img = img[:, :, ::-1]
plt.imshow(img)
plt.show()
五、特征提取
1.准备数据集
从训练集中切割出字体图片作为训练样本,统计样本的宽高以及各标签分布。
# data/captcha_click
# 总计有1032对样本,分为451个不同的字符;其中只出现一次的有269个,而出现最多的一个字有24次
"""
裁切图片 为siamese构建数据集
"""
import json
import os
from collections import defaultdict
import numpy as np
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
def crop_gtbox(xyxy, img_file):
image = cv2.imread(img_file)
x1, y1, x2, y2 = xyxy
cropped_image = image[y1:y2, x1:x2]
return cropped_image
data_file = 'data/captcha_click'
file_list = os.listdir(data_file)
# 孪生网络的训练数据存放路径
save_path = 'datasets/data_siamese'
if not os.path.exists(save_path):
os.makedirs(save_path)
# 保存宽高信息用于分析
w_list = []
h_list = []
data_dict = {'char': defaultdict(list), 'target': defaultdict(list)}
count = 0
for file in file_list:
if '.json' in file:
file_path = os.path.join(data_file, file)
file_name = file.split('.')[0]
with open(file_path, 'r', encoding='utf-8') as f:
img_labels = json.load(f)
img_path = file_path.replace('.json', '.jpg')
shapes = img_labels['shapes']
x1_list = []
char_list = []
for item in shapes:
text = item['text']
text_unicode = text.encode('unicode_escape').decode().replace('\\', '')
label = item['label']
# 裁切出的图片的命名格式
new_img_name = f'{label}_{text_unicode}_{file_name}.jpg'
save_file = os.path.join(save_path, new_img_name)
if label == 'char':
data_dict['char'][text].append(new_img_name)
elif label == 'target':
data_dict['target'][text].append(new_img_name)
count += 1
points = item['points']
np_data = np.array(points, dtype=np.int64)
w = np_data[1][0] - np_data[0][0]
h = np_data[1][1] - np_data[0][1]
w_list.append(w)
h_list.append(h)
xyxy = np_data.reshape((-1,))
# 裁切图片并保存
crop_img = crop_gtbox(xyxy, img_path)
cv2.imwrite(save_file, crop_img)
with open('datasets/data_annotation.json', 'w', encoding='utf-8') as f:
json.dump(data_dict, f, indent=2, ensure_ascii=False)
print(f'总计样本{count}个, 分为不同的字符char{len(data_dict["char"])}个, target{len(data_dict["target"])}')
num_char_list = [len(data_dict["char"][i]) for i in data_dict["char"].keys()]
# 可视化
# 宽高分布
plt.figure(figsize=(12, 5))
plt.subplot(121)
plt.scatter(w_list, h_list)
plt.xlabel('Width')
plt.ylabel('Height')
# 字符数量分布统计
plt.subplot(122)
data = np.array(num_char_list)
x, y = np.unique(data, return_counts=True)
plt.bar(range(len(x)), y)
plt.xticks(range(len(x)), x)
plt.xlabel('character repetitions')
plt.ylabel('count')
plt.show()
通过统计图可以看出,汉字分布不均匀,出现次数为1的汉字占了样本大多数,同时也存在一些汉字出现了很多次。所以考虑对样本量较少的汉字进行样本扩充。
2.样本扩充
2.1配置样本扩充环境
!pip install fontTools
2.2数据扩充
从只出现1次的汉字中选择一部分作为验证集,将其他汉字样本扩充至50张作训练集
# 将验证集按anchor, positive, negative的方式组成固定的样本对
# 将其他汉字进行扩充
import os
import numpy as np
from PIL import Image, ImageDraw, ImageFont
import cv2
import random
import colorsys
from fontTools.ttLib import TTFont
import json
def check_character_in_font(ttfont, character):
for table in ttfont['cmap'].tables:
if ord(character) in table.cmap.keys():
return True
return False
def crop_gtbox(xyxy, img_file):
image = cv2.imread(img_file)
x1, y1, x2, y2 = xyxy
cropped_image = image[y1:y2, x1:x2]
return cropped_image
def random_color():
rgb = colorsys.hsv_to_rgb(random.random(), 1, 1)
result = (int(rgb[0] * 255), int(rgb[1] * 255), int(rgb[2] * 255))
return result
def random_crop_bg(bg_path, size):
bg_imgs = os.listdir(bg_path)
while True:
bg_img = random.choice(bg_imgs)
if '.jpg' in bg_img:
break
img_path = os.path.join(bg_path, bg_img)
bg_data = cv2.imread(img_path)
# 随机翻转或旋转
i = random.random()
if i < 0.5:
# 旋转图像90度(逆时针方向)
bg_data = cv2.rotate(bg_data, cv2.ROTATE_90_COUNTERCLOCKWISE)
i = random.random()
if i < 0.5:
# 水平翻转图像
bg_data = cv2.flip(bg_data, 1)
h, w = bg_data.shape[:2]
side_len = min(h, w)
image_rgb = cv2.cvtColor(bg_data, cv2.COLOR_BGR2RGB)
left_top = (random.randint(0, side_len - size), random.randint(0, side_len - size))
crop_img = image_rgb[left_top[1]:left_top[1] + size, left_top[0]:left_top[0] + size]
return crop_img
def random_chinese():
# 随机生成一个汉字的 Unicode 编码范围
char_code = random.randint(0x4E00, 0x9FA5)
# 使用 Unicode 编码创建汉字
character = chr(char_code)
return character
def create_a_target(font_style, font_color, char, bg_img, rotate=(-60, 60), scale=(1, 1)):
img_size = bg_img.shape[:2][::-1] # hw to wh
crop_img = Image.fromarray(bg_img)
draw = ImageDraw.Draw(crop_img)
draw.text((0, 0), char, font_color, font=font_style)
char_data = np.array(crop_img)
char_box = font_style.getbbox(char)
x_center = char_box[2] / 2
y_center = char_box[3] / 2
center = (x_center, y_center)
angle = random.randint(rotate[0], rotate[1])
scale = random.uniform(scale[0], scale[1])
M = cv2.getRotationMatrix2D(center, angle, scale)
rotated = cv2.warpAffine(char_data, M, img_size, flags=cv2.INTER_AREA, borderMode=cv2.BORDER_REPLICATE)
return rotated
# 将451个字符划分为训练集和测试集,作为验证集每个字符只需要一对样本对即可
# 为了不浪费数据,这里从重复次数为1的字符中挑选验证集数据
annotation_file = 'datasets/data_annotation.json'
save_path = 'datasets/annotations'
save_eval_file = 'datasets/annotations/eval_samples.txt'
save_train_file = 'datasets/annotations/train_annotation.json'
if not os.path.exists(save_path):
os.mkdir(save_path)
split_ratio = 0.1
with open(annotation_file, 'r', encoding='utf-8') as f_anno:
json_file = json.load(f_anno)
char_set1 = []
for char, file_list in json_file['char'].items():
if len(file_list) == 1:
char_set1.append(char)
random.seed(0)
random.shuffle(char_set1)
eval_set = char_set1[:int(len(json_file['char'])*split_ratio)]
print(f'length of eval_set:{len(eval_set)}')
# 验证集样本对
with open(save_eval_file, 'w', encoding='utf-8') as f_eval:
for char_name in eval_set:
negative = random.choice(eval_set)
while negative == char_name:
negative = random.choice(eval_set)
char_img = json_file['char'][char_name][0]
target_img = json_file['target'][char_name][0]
negative_img = json_file['target'][negative][0]
f_eval.write(f'{char_img} {target_img} {negative_img}\n')
# 补充训练用的字符数量
char_size = 36
target_size = 64
char_font1 = 'font_files/konxin.ttf'
char_font2 = 'font_files/ShuangXianTiJian.ttf'
target_font1 = 'font_files/simhei.ttf'
target_font2 = 'font_files/simsun.ttf'
style1 = ImageFont.truetype(char_font1, char_size)
style2 = ImageFont.truetype(char_font2, char_size)
style3 = ImageFont.truetype(target_font1, target_size)
style4 = ImageFont.truetype(target_font2, target_size)
bg_path = 'data/backgrounds/images'
# 生成字符的数据,char标签,target标签
save_path = 'datasets/data_siamese'
if not os.path.exists(save_path):
os.mkdir(save_path)
chinese_list = []
font1 = TTFont(char_font1)
font2 = TTFont(char_font2)
font3 = TTFont(target_font1)
font4 = TTFont(target_font2)
for char, img_list in json_file['char'].items():
if char in eval_set:
continue
# 判断字体文件是否存在对应的文字
char_list = []
if check_character_in_font(font1, char):
char_list.append(style1)
if check_character_in_font(font2, char):
char_list.append(style2)
assert len(char_list) != 0, f'{char} not in char_font files'
target_list = []
if check_character_in_font(font3, char):
target_list.append(style3)
if check_character_in_font(font4, char):
target_list.append(style4)
assert len(target_list) != 0, f'{char} not in target_font files'
text = char.encode('unicode_escape').decode().replace('\\', '')
num = 0
while len(img_list) < 50:
num += 1
# char部分
char_style = random.choice(char_list)
char_img_name = f'char_{text}_add-{num}.jpg'
color1 = (255, 255, 255)
img_data = Image.new("RGB", (char_size, char_size), color1)
bg_data = np.array(img_data)
color2 = random_color()
# 防止背景色与前景色无法区分
while sum([(color1[i]-color2[i])**2 for i in range(3)]) < 1200:
color2 = random_color()
char_data = create_a_target(char_style, color2, char, bg_data, (-45, 45))
char_img = Image.fromarray(char_data)
save_file = os.path.join(save_path, char_img_name)
char_img.save(save_file)
# target部分
target_style = random.choice(target_list)
target_img_name = f'target_{text}_add-{num}.jpg'
crop_img = random_crop_bg(bg_path, target_size)
color = random_color()
char_data = create_a_target(target_style, color, char, crop_img, (-45, 45))
target_img = Image.fromarray(char_data)
save_file = os.path.join(save_path, target_img_name)
target_img.save(save_file)
# 更新annotation
json_file['char'][char].append(char_img_name)
json_file['target'][char].append(target_img_name)
for char in eval_set:
del json_file['char'][char]
del json_file['target'][char]
with open(save_train_file, 'w', encoding='utf-8') as f:
json.dump(json_file, f, indent=2, ensure_ascii=False)
2.3保存label
# 保存label文件,Insightfaceloss需要用到
annotation_file = 'datasets/annotations/train_annotation.json'
save_label = 'datasets/annotations/text_label.txt'
with open(annotation_file, 'r', encoding='utf-8') as f:
datas = json.load(f)
with open(save_label, 'w', encoding='utf-8') as f_label:
index = 0
for key in datas['char'].keys():
f_label.write(f'{key} {index}\n')
index += 1
# 查看扩充后的特征提取所用数据集的样本数目,406*2*50 + 45*2*1 = 40690
!cd datasets/data_siamese && ls -l|grep "^-"| wc -l
3.检查数据
检查数据增强效果
import matplotlib.pyplot as plt
from siamese_network.data_loader import CaptchaDataset, MyResize
import paddle.vision.transforms as T
import random
%matplotlib inline
image_path = 'datasets/data_siamese'
annotation_file = 'datasets/annotations/train_annotation.json'
classifies_file = 'datasets/annotations/text_label.txt'
label_list = []
with open(classifies_file, 'r') as f:
datas = f.readlines()
for item in datas:
text_label = item.strip()
label_list.append(text_label)
transforms = T.Compose([
T.ColorJitter(0.2, 0.1, 0.4, 0.4),
T.RandomRotation(60, 'bilinear'),
MyResize(70),
T.RandomCrop(64),
T.RandomErasing(scale=(0.02, 0.2)),
])
dataset = CaptchaDataset(image_path, annotation_file, label_list, transforms)
random_index = random.randint(0, len(dataset))
data = dataset[random_index]
img_list, _ = data
for i in range(3):
plt.subplot(1, 3, i+1)
plt.imshow(img_list[i])
plt.show()
4.训练特征提取网络
!python siamese_network/train.py --savefolder 'siamese_network/checkpoint' \
--feature_dim 64 \
--epoch 100 \
--lr 0.001 \
--weight_decay 0.01 \
--triplet_margin 0.3 \
--insightface_loss_param 1.0 0.5 0 10 \
--batch_size 128 \
5.可视化
# 训练过程的可视化
import matplotlib.pyplot as plt
%matplotlib inline
log_file = 'siamese_network/checkpoint/train_log.txt'
# log_file = 'siamese_network/checkpoint/copy/train_log.txt'
with open(log_file, 'r', encoding='utf-8') as f:
raw = f.read()
items = raw.strip().split()
train_iter = []
train_loss = []
train_loss1 = []
train_loss2 = []
train_acc = []
eval_iter = []
eval_loss = []
eval_acc = []
for item in items:
datas = item.split('@')
if item.startswith('train'):
_, iter_num, loss, loss1, loss2, acc = datas
train_iter.append(int(iter_num))
train_loss.append(float(loss))
train_loss1.append(float(loss1))
train_loss2.append(float(loss2))
train_acc.append(float(acc))
elif item.startswith('eval'):
_, iter_num, loss, acc = datas
eval_iter.append(int(iter_num))
eval_loss.append(float(loss))
eval_acc.append(float(acc))
plt.figure(figsize=(12, 5))
plt.subplot(131)
plt.plot(train_iter, train_loss2, label='trainloss_InsightfaceLoss')
plt.title("InsightfaceLoss vs iter")
plt.legend()
plt.subplot(132)
plt.plot(train_iter, train_loss1, label='trainloss_Triplet')
plt.plot(eval_iter, eval_loss, label='eval_loss_Triplet')
plt.title("TripletLoss vs iter")
plt.legend()
plt.subplot(133)
plt.plot(train_iter, train_acc, label='train_Acc')
plt.plot(eval_iter, eval_acc, label='eval_Acc')
plt.title('Acc vc iter')
plt.legend()
plt.show()
6.导出onnx
from siamese_network.siamese_resnet18 import SiameseNet18
import paddle
# 修改为自己要加载的权重
param_path = 'siamese_network/checkpoint/last.pdparams'
# param_path = 'siamese_network/checkpoint/best.pdparams'
params = paddle.load(param_path)
model = SiameseNet18(64)
model.set_state_dict(params)
# 保存的路径
save_path = 'siamese_network/onnx_save/SiameseNet18'
x_spec = paddle.static.InputSpec([None, 3, 64, 64], 'float32', 'x') # 为模型指定输入的形状和数据类型,支持持 Tensor 或 InputSpec ,InputSpec 支持动态的 shape。
paddle.onnx.export(model, save_path, input_spec=[x_spec], opset_version=11)
7.测试onnx
import onnxruntime
import cv2
import matplotlib.pyplot as plt
import numpy as np
from siamese_network.utils import pre_process
%matplotlib inline
def cos_similar(v1, v2):
dot = float(np.dot(v1, v2))
mold = np.linalg.norm(v1) * np.linalg.norm(v2)
return 0.5 + 0.5 * (dot / (mold + 1E-6)) # [0,1] 0相反, 1相同
# 数据读取与预处理
# 更换为自己验证集中的样本对
anchor_img = 'datasets/data_siamese/char_u6324_img_3450.jpg'
positive_img = 'datasets/data_siamese/target_u6324_img_3450.jpg'
negative_img = 'datasets/data_siamese/target_u6f6e_img_3407.jpg'
anchor_raw = cv2.imread(anchor_img)
positive_raw = cv2.imread(positive_img)
negative_raw = cv2.imread(negative_img)
img_size = 64
anchor_data, _, _ = pre_process(anchor_raw, img_size)
positive_data, _, _ = pre_process(positive_raw, img_size)
negative_data, _, _ = pre_process(negative_raw, img_size)
input_data = np.concatenate((anchor_data, positive_data, negative_data), axis=0)
# 加载onnx模型
onnx_path = 'siamese_network/onnx_save/SiameseNet18.onnx'
sess = onnxruntime.InferenceSession(onnx_path)
# onnx推理
ort_inputs = {sess.get_inputs()[0].name: input_data}
result = sess.run(None, ort_inputs)
result = np.array(result)
result = np.squeeze(result) # [3, 64]
# 相似度计算
anchor_feature, positive_feature, negative_feature = result
sim1 = cos_similar(anchor_feature, positive_feature)
sim2 = cos_similar(anchor_feature, negative_feature)
threshold = 0.7
print(f'anchor & positive: {sim1>threshold}, cos_similar:{sim1}')
print(f'anchor & negative: {sim2>threshold}, cos_similar:{sim2}')
img_list = [anchor_raw[:, :, ::-1], positive_raw[:, :, ::-1], negative_raw[:, :, ::-1]]
for i, img in enumerate(img_list):
plt.subplot(1,3,i+1)
plt.imshow(img)
plt.show()
六、部署测试
可视化检测结果
# onnx测试
import onnxruntime
import cv2
import matplotlib.pyplot as plt
import numpy as np
from siamese_network.utils import pre_process, post_process, draw_boxes
import os
from siamese_network.data_loader import MyResize
from siamese_network.siamese_resnet18 import SiameseNet18
%matplotlib inline
np.set_printoptions(precision=2, suppress=True)
def crop_gtbox_nd(xyxy, img_ndarray):
x1, y1, x2, y2 = xyxy
cropped_data = img_ndarray[int(y1):int(y2), int(x1):int(x2)]
return cropped_data
def postprocess_sort(sim_matrix):
# 后面有乘法操作,避免0影响,这里将index全部加1
max_arg = np.argsort(sim_matrix, axis=-1) + 1
# 初始化keep matrix
matrix1 = np.zeros_like(sim_matrix)
matrix1[:, -1] += 1
keep_matrix = matrix1 > 0
unique_values, counts = np.unique(max_arg[keep_matrix], return_counts=True)
repeat_index = unique_values[counts > 1]
# 如果有重复的索引,取它们之中值最大的那一个,其他的索引向后顺移一位,再次统计是否有重复
while len(repeat_index):
for index in repeat_index:
# argmax矩阵中相同索引,需要重新对比,用一个mask保存位置
recompare_mask = max_arg*keep_matrix==index
recompare_rows = np.sum(recompare_mask, axis=-1, keepdims=True)
# 相同索引中最大值的位置,实际index还原需要减去1
max_index = np.argmax((sim_matrix*recompare_rows)[:, index-1])
# 保留需要变更的位置
recompare_rows[max_index] = 0
# 先在keep_matrix中将需要变更的位置置零
to_zero = recompare_mask*recompare_rows
keep_matrix = np.logical_xor(to_zero, keep_matrix)
# 变更的位置取下一个最大索引,这里用数据滚动的方式实现
array_new = np.roll(to_zero, -1, axis=-1)
# 更新keep_matrix
keep_matrix = np.logical_or(array_new, keep_matrix)
# print(keep_matrix)
# 使用新的keep_matrix再次查看是否有相同索引
# print(keep_matrix)
unique_values, counts = np.unique(max_arg[keep_matrix], return_counts=True)
repeat_index = unique_values[counts > 1]
order = max_arg[keep_matrix] - 1
sim_list = sim_matrix[np.arange(len(order)), order]
return order, sim_list
def draw_order(img, box_items, order_list, sim_list):
for i, target in enumerate(order_list):
box = box_items[target]['box']
center = (int(box[0]), int((box[1]+box[3])/2))
cv2.putText(img, f'{i}:{sim_list[i]:.2f}', center, cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
return img
def cos_sim_metrix(m1, m2):
# 本项目中特征的模长已经被网络归一化为1,这里就没有再除以模长的乘积
dot = np.dot(m1, m2.T)
return dot*0.5 + 0.5 # 缩放到[0,1] 0表示向量相反,0.5向量垂直,1向量相同
# 加载 ONNX 模型生成推理用 sess 换成自己训练模型的onnx路径
# 目标检测模型
det_onnx = "yolov5-Paddle/runs/train/exp3/weights/best.onnx"
sess_det = onnxruntime.InferenceSession(det_onnx)
# 特征提取模型
feature_onnx = "siamese_network/onnx_save/SiameseNet18.onnx"
sess_feature = onnxruntime.InferenceSession(feature_onnx)
show_img = []
sims = []
img_path = 'data/test'
# img_path = 'test_new'
imgs = sorted(os.listdir(img_path))
for img_name in imgs:
if '.jpg' not in img_name:
continue
img_file = os.path.join(img_path, img_name)
img_raw = cv2.imread(img_file) # BGR
img_hw = img_raw.shape[:2]
img_data, scale, padd_data = pre_process(img_raw, img_size=320) # RGB
# 使用 ONNXRuntime 推理
ort_inputs = {sess_det.get_inputs()[0].name: img_data}
result = sess_det.run(None, ort_inputs)
result = np.array(result) # (1, 1, 6300, 7)
result = np.squeeze(result)
# 后处理
confidence = 0.4
iou = 0.3
boxes, confs, classes = post_process(result, confidence, iou, scale, padd_data[0], padd_data[1], img_hw)
# 文字匹配部分
# 裁切出文字图片
target_crop_item = []
char_list = []
for index, box in enumerate(boxes):
if classes[index] == 0:
# 固定target顺序
crop_img = crop_gtbox_nd(box, img_raw) # BGR
target_data, _, _ = pre_process(crop_img, img_size=64) # RGB
target_crop_item.append({'box': box, 'crop_img': target_data})
else:
crop_img = crop_gtbox_nd(box, img_raw)
char_data, _, _ = pre_process(crop_img, img_size=64)
char_list.append((box, char_data))
# 从左到右排序char框
sorted_pairs = sorted(char_list, key=lambda x: x[0][0])
_, char_crops = zip(*sorted_pairs)
target_crop_list = [target_crop_item[i]['crop_img'] for i in range(len(target_crop_item))]
# 组合target和char一起送入模型提取特征
datas = np.concatenate(list(char_crops)+target_crop_list, axis=0)
# 特征提取
feature_inputs = {sess_feature.get_inputs()[0].name: datas}
result = sess_feature.run(None, feature_inputs)
result = result[0]
# 特征提取后处理
char_feature = result[:len(char_crops)]
target_feature = result[len(char_crops):]
# 特征匹配
result_matrix = cos_sim_metrix(char_feature, target_feature)
# print(result_matrix)
order_list, sim_list = postprocess_sort(result_matrix) # 检测框可视化
sims.extend(sim_list)
img = draw_boxes(img_raw, boxes, confs, classes, [(255,255,0), (0,255,0)], thickness=1)
# 检测框顺序可视化
img = draw_order(img, target_crop_item, order_list, sim_list)
show_img.append(img)
# 保存预测可视化结果
height = 384
width = 344
rows = 3
cols = 5
canvas = np.full((rows * height, cols * width, 3), 255, dtype=np.uint8)
for i, image in enumerate(show_img):
row = i // cols
col = i % cols
image = cv2.resize(image, (width, height))
x_start = col * width
y_start = row * height
x_end = x_start + width
y_end = y_start + height
canvas[y_start:y_end, x_start:x_end] = image
cv2.imwrite('result.jpg', canvas)
plt.figure(figsize=(12, 18))
canvas = canvas[:, :, ::-1]
plt.imshow(canvas)
plt.axis('off')
plt.show()
调整输出结果,输出为按语序的box列表,box为目标左上角坐标和右下角坐标
from onnx_test import ClickCaptcha
import time
import os
import json
session = ClickCaptcha(use_gpu=True)
img_path = 'data/test'
for img_name in os.listdir(img_path):
img_file = os.path.join(img_path, img_name)
with open(img_file, 'rb') as f:
img_bytes = f.read()
start_time = time.time()
datas = session.run_inference(img_bytes)
end_time = time.time()
print('time:{}ms'.format((end_time-start_time) * 1000))
result = json.loads(datas)
print(result)
七、结论与展望
-
数据集的重要性:在项目复现过程中,发现数据集的质量和数量对模型性能有很大影响。为了提高特征提取网络的训练效果,需要花费大量时间收集和整理了更多的训练数据、进行数据的调优
-
调试与优化:在实际项目中,调试和优化是提高模型性能的关键环节。不能跑完模型就完事了
-
持续改进:虽然我们已经成功复现了该项目,但我们意识到,特征提取网络的性能仍有提升空间。在后续工作中,我们将继续收集更多的训练数据,优化模型结构和参数,以期获得更好的识别效果。
八、附录
原项目地址:文字点选验证码识别
数据集地址:文字点选验证码识别数据集
项目首发于飞桨PaddlePaddle平台:【项目复现】文字点选验证码识别
作者:鸿源