【OCR】基于深度学习的验证码识别模型训练的方法

幸福清风

于 2025-02-18 09:49:24 发布

阅读量1k

点赞数 8

分类专栏：大模型图像处理文章标签：深度学习人工智能验证码大模型

本文链接：https://blog.csdn.net/xun527/article/details/145697275

版权

图像处理同时被 2 个专栏收录

28 篇文章

订阅专栏

大模型

13 篇文章

订阅专栏

一、前言

安装包依赖

requirements.txt 这里面比较全可按需安装

tensorflow==2.9.1
Pillow==9.1.1
requests
numpy==1.23.2
opencv-python >=4.5.4, <4.6
torch==1.10.0

安装命令

pip install -r.txt requirements

1.1 输入要求

将训练集和验证集分别放到配置文件指定的目录中
目录中所有图片尺寸相同
图片命名规则验证码_编号.图片格式, 举例 abce_01.jpg

1.2 配置文件

默认文件 captcha.json
字段见文知义

1.3 项目结构

venv：虚拟环境，各个电脑因人而异

1.4 训练

python captcha.py

1.5 预测

predictor = Predictor()
# 预测本地磁盘文件
predictor.predict('xxx.jpg')
# 直接二进制内容预测
predictor.predict_single_image_content(b'PNGxxxxx')
# 预测远程图片
predictor.predict_remote_image('http://xxxxxx/xx.jpg', save_image_to_file='remote.jpg')

1.6 效果

根训练集样本大小有关
这种图片2w张左右的训练集训练后实际能达到90%以上的准确率

二、项目文件

captcha.json

{
  "image_height": 45,
  "image_width": 125,
  "fixed_length": 4,
  "batch_size": 128,
  "save_path": "model\\model.dat",
  "labels": "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ",
  "train_image_dir": "train_images",
  "validation_image_dir": "validation_images",
  "learning_rate": 0.0001,
  "dropout_rate": 0.25,
  "epochs": 100
}

2.1 配置文件字段解释

1. image_height 和 image_width

含义：定义输入图片的高度和宽度。

2. fixed_length

含义：验证码的固定长度，即每个验证码包含的字符数。

3. batch_size

含义：训练时每个批次的样本数量。

4. save_path

含义：模型权重保存的路径。

5. labels

含义：验证码中可能出现的所有字符集合。

6. train_image_dir 和 validation_image_dir

含义：
- train_image_dir：存放训练图片的文件夹路径。
- validation_image_dir：存放验证图片的文件夹路径。

7. learning_rate

含义：模型训练的学习率。

8. dropout_rate

含义：Dropout 层的比例，用于防止过拟合。

9. epochs

含义：训练的总轮数。

`2.2 train_image_dir` 文件夹图片示例

`2.3 validation_image_dir文件夹图片实例`

2.4 训练代码captcha.py

# -*- coding: utf-8 -*-
"""
CNN训练定长字符验证码识别模型
"""
import json
import io
import os
import time

import keras_preprocessing.image
import numpy as np
import PIL.Image
import requests
import tensorflow as tf
from tensorflow import keras


def label_to_array(text, labels):
    """
    转换成向量
    :param text: 验证码
    :param labels: 验证码所有可能字符集合
    :return: numpy array
    """
    hots = np.zeros(shape=(len(labels) * len(text)))
    for i, char in enumerate(text):
        index = i * len(labels) + labels.index(char)
        hots[index] = 1
    return hots


def array_to_label(array, labels):
    """
    向量转换成label
    :param array: numpy array
    :param labels: label
    :return: label string
    """
    text = []
    for index in array:
        text.append(labels[index])
    return ''.join(text)


def load_image_data(image_dir_path, image_height, image_width, labels, target_label_length):
    """
    加载图片数据
    图片标签从图片文件名中读取 图片文件名应该符合 label_xxxx.jpg(png)格式
    RGB图片将会转换成灰度图片
    :param image_dir_path: 图片路径
    :param image_height: 图片高度
    :param image_width: 图片宽度
    :param labels: 所有标签
    :param target_label_length: 图片标签固定长度
    :return: image_data, data_label
    """
    image_name_list = os.listdir(image_dir_path)
    image_data = np.zeros(shape=(len(image_name_list), image_height, image_width, 1))
    label_data = np.zeros(shape=(len(image_name_list), len(labels) * target_label_length))

    for index, image_name in enumerate(image_name_list):
        img = keras_preprocessing.image.utils.load_img(os.path.join(image_dir_path, image_name), color_mode='grayscale')
        x = keras_preprocessing.image.utils.img_to_array(img)
        y = label_to_array(image_name.split('_')[0], labels)
        if hasattr(img, 'close'):
            img.close()
        image_data[index] = x
        label_data[index] = y
    return image_data, label_data


class FixCaptchaLengthModel(object):
    """
    定长验证码模型
    Attributes:
        image_height: 高度
        image_width: 宽度
        learning_rate: 学习率
        dropout: dropout比例
        label_number: 所有可能字符的种类数量
        fixed_length: 验证码的固定长度
    """

    def __init__(self, image_height, image_width, label_number, fixed_length,
                 learning_rate=0.0001, dropout=0.25):
        self.image_height = image_height
        self.image_width = image_width
        # 这里固定转化为灰度图像
        self.image_channel = 1
        self.learning_rate = learning_rate
        self.dropout = dropout
        self.label_number = label_number
        self.fixed_length = fixed_length
        self.kernel_size = (3, 3)
        self.pool_size = (2, 2)
        self.padding = 'valid'
        self.activation = 'relu'

    def model(self):
        """
        :return: keras.Sequential instance
        """
        model = keras.Sequential()
        # 输入层
        input = keras.Input(shape=(self.image_height, self.image_width, self.image_channel), batch_size=None)
        model.add(input)
        # 第一层 卷积
        model.add(keras.layers.Convolution2D(filters=32, kernel_size=self.kernel_size, strides=1, padding=self.padding,
                                       activation=self.activation))
        model.add(keras.layers.MaxPooling2D(pool_size=self.pool_size, strides=self.pool_size))
        model.add(keras.layers.Dropout(rate=self.dropout))
        # 第二层 卷积
        model.add(keras.layers.Convolution2D(filters=64, kernel_size=self.kernel_size, strides=1, padding=self.padding,
                                       activation=self.activation))
        model.add(keras.layers.MaxPooling2D(pool_size=self.pool_size, strides=self.pool_size))
        model.add(keras.layers.Dropout(rate=self.dropout))
        # 第三层 卷积
        model.add(keras.layers.Convolution2D(filters=128, kernel_size=self.kernel_size, strides=1, padding=self.padding,
                                       activation=self.activation))
        model.add(keras.layers.MaxPooling2D(pool_size=self.pool_size, strides=self.pool_size))
        model.add(keras.layers.Dropout(rate=self.dropout))
        model.add(keras.layers.Flatten())
        # 第四层 全连接
        model.add(keras.layers.Dense(units=1024, activation=self.activation))
        model.add(keras.layers.Dropout(rate=self.dropout))
        # 第五层 全连接
        model.add(keras.layers.Dense(units=self.fixed_length * self.label_number, activation="sigmoid"))
        model.compile(optimizer=keras.optimizers.Adam(learning_rate=self.learning_rate), loss="binary_crossentropy",
                      metrics=["binary_accuracy"])
        return model

    # def load_from_disk(self, model_file_path):
    #     """
    #     从磁盘加载已经训练好的模型
    #     :param model_file_path: 模型文件路径
    #     :return: keras.Sequential
    #     """
    #     if not os.path.exists(model_file_path):
    #         raise Exception('%s do not exists' % model_file_path)
    #     model = self.model()
    #     model.load_weights(model_file_path)
    #     return model

    def load_from_disk(self, model_file_path):
        """
        从磁盘加载已经训练好的模型
        :param model_file_path: 模型文件路径（通常是 .index 文件的路径）
        :return: keras.Model
        """
        # 检查索引文件是否存在
        index_path = model_file_path + ".index"
        if not os.path.exists(index_path):
            raise FileNotFoundError(f"Model index file not found at path: {index_path}")

        print(f"Loading model weights from {model_file_path}")
        model = self.model()  # 构建模型结构

        # 加载权重
        model.load_weights(model_file_path)
        print("Model weights loaded successfully")
        return model


class CheckAccuracyCallback(keras.callbacks.Callback):
    """
    检查上一轮的训练准确率
    """

    def __init__(self, train_x, train_y, validation_x, validation_y, label_number, fixed_label_length, batch_size=128):
        super(CheckAccuracyCallback, self).__init__()
        self.train_x = train_x
        self.train_y = train_y
        self.validation_x = validation_x
        self.validation_y = validation_y
        self.label_number = label_number
        self.fixed_label_length = fixed_label_length
        self.batch_size = batch_size

    # def _compare_accuracy(self, data_x, data_y):
    #     predict_y = self.model.predict_on_batch(data_x)
    #     predict_y = keras.backend.reshape(predict_y, [len(data_x), self.fixed_label_length, self.label_number])
    #     data_y = keras.backend.reshape(data_y, [len(data_y), self.fixed_label_length, self.label_number])
    #     equal_result = keras.backend.equal(keras.backend.argmax(predict_y, axis=2),
    #                                        keras.backend.argmax(data_y, axis=2))
    #     return keras.backend.mean(keras.backend.min(keras.backend.cast(equal_result, tf.float32), axis=1))

    def _compare_accuracy(self, data_x, data_y):
        # 预测结果
        predict_y = self.model.predict(data_x)
        predict_y = tf.reshape(predict_y, [len(data_x), self.fixed_label_length, self.label_number])
        data_y = tf.reshape(data_y, [len(data_y), self.fixed_label_length, self.label_number])

        # 获取预测和真实标签的 argmax
        predict_labels = tf.argmax(predict_y, axis=2)
        true_labels = tf.argmax(data_y, axis=2)

        # 比较预测和真实标签
        correct_predictions = tf.equal(predict_labels, true_labels)

        # 计算每个样本的准确率（所有字符都预测正确才算一个样本正确）
        sample_accuracy = tf.reduce_all(correct_predictions, axis=1)

        # 计算平均准确率
        accuracy = tf.reduce_mean(tf.cast(sample_accuracy, tf.float32))
        return accuracy

    # def on_epoch_end(self, epoch, logs=None):
    #     print('\nEpoch %s with logs: %s' % (epoch, logs))
    #     # 选择一个batch并计算准确率
    #     batches = (len(self.train_x) + self.batch_size - 1) / self.batch_size
    #     target_batch = (epoch + 1) % batches
    #     batch_start = int((target_batch - 1) * self.batch_size)
    #     batch_x = self.train_x[batch_start: batch_start + self.batch_size]
    #     batch_y = self.train_y[batch_start: batch_start + self.batch_size]
    #     on_train_batch_acc = self._compare_accuracy(batch_x, batch_y)
    #     print('Epoch %s with image accuracy on train batch: %s' % (epoch, keras.backend.eval(on_train_batch_acc)))
    #     on_test_batch_acc = self._compare_accuracy(self.validation_x, self.validation_y)
    #     print('Epoch %s with image accuracy on validation: %s\n' % (epoch, keras.backend.eval(on_test_batch_acc)))

    def on_epoch_end(self, epoch, logs=None):
        print(f'\nEpoch {epoch} with logs: {logs}')

        # 计算训练批次的准确率
        batch_start = 0
        batch_x = self.train_x[batch_start: batch_start + self.batch_size]
        batch_y = self.train_y[batch_start: batch_start + self.batch_size]
        on_train_batch_acc = self._compare_accuracy(batch_x, batch_y)
        print(f'Epoch {epoch} with image accuracy on train batch: {on_train_batch_acc.numpy()}')

        # 计算验证集的准确率
        on_test_batch_acc = self._compare_accuracy(self.validation_x, self.validation_y)
        print(f'Epoch {epoch} with image accuracy on validation: {on_test_batch_acc.numpy()}\n')


class Config(object):

    def __init__(self, **kwargs):
        self.image_height = kwargs['image_height']
        self.image_width = kwargs['image_width']
        self.fixed_length = kwargs['fixed_length']
        self.train_batch_size = kwargs['batch_size']
        self.model_save_path = kwargs['save_path']
        self.labels = kwargs['labels']
        self.train_image_dir = kwargs['train_image_dir']
        self.validation_image_dir = kwargs['validation_image_dir']
        self.learning_rate = kwargs['learning_rate']
        self.dropout_rate = kwargs['dropout_rate']
        self.epochs = kwargs['epochs']

    @staticmethod
    def load_configs_from_json_file(file_path='fixed_length_captcha.json'):
        """
        :param file_path: file path
        :return: dict instance
        """
        with open(file_path, 'r') as fd:
            config_content = fd.read()
        return Config(**json.loads(config_content))


class Predictor(object):
    """
    预测器
    """

    def __init__(self, config_file_path='fixed_length_captcha.json'):
        self.config = Config.load_configs_from_json_file(config_file_path)
        self.model = FixCaptchaLengthModel(self.config.image_height, self.config.image_width, len(self.config.labels),
                                           self.config.fixed_length, learning_rate=self.config.learning_rate,
                                           dropout=self.config.dropout_rate).load_from_disk(self.config.model_save_path)
        self.label_number = len(self.config.labels)

    def predict(self, image_file_path):
        """
        预测单张图片
        :param image_file_path: 单张图片文件路径
        :return: predict text
        """
        with open(image_file_path, 'rb') as f:
            return self.predict_single_image_content(f.read())

    def predict_remote_image(self, remote_image_url, headers=None, timeout=30, save_image_to_file=None):
        """
        预测远程图片
        :param remote_image_url: 远程图片URL
        :param headers: 请求头
        :param timeout: 超时时间
        :param save_image_to_file: 是否保存图片到文件
        :return: predict text
        """
        response = requests.get(remote_image_url, headers=headers, timeout=timeout, stream=True)
        content = response.content
        if save_image_to_file is not None:
            with open(save_image_to_file, 'wb') as fd:
                fd.write(content)
        return self.predict_single_image_content(content)

    def predict_single_image_content(self, image_content):
        """
        预测单张图片
        :param image_content: byte content
        :return: predict text
        """
        p_image = PIL.Image.open(io.BytesIO(image_content))
        if p_image.mode not in ('L', 'I;16', 'I'):
            p_image = p_image.convert('L')
        image_data = np.zeros(shape=(1, self.config.image_height, self.config.image_width, 1))
        image_data[0] = keras_preprocessing.image.img_to_array(p_image)
        if hasattr(p_image, 'close'):
            p_image.close()
        result = self.model.predict_on_batch(image_data)
        result = keras.backend.reshape(result, [1, self.config.fixed_length, self.label_number])
        result = keras.backend.argmax(result, axis=2)
        return array_to_label(keras.backend.eval(result)[0], self.config.labels)


def train():
    """
    训练
    """
    config = Config.load_configs_from_json_file()
    train_x, train_y = load_image_data(config.train_image_dir, config.image_height, config.image_width,
                                       config.labels, config.fixed_length)
    validation_x, validation_y = load_image_data(config.validation_image_dir, config.image_height, config.image_width,
                                                 config.labels, config.fixed_length)
    print('total train image number: %s' % len(train_x))
    print('total validation image number: %s' % len(train_y))
    model = FixCaptchaLengthModel(config.image_height, config.image_width, len(config.labels), config.fixed_length,
                                  learning_rate=config.learning_rate, dropout=config.dropout_rate)
    if os.path.exists(config.model_save_path):
        model = model.load_from_disk(config.model_save_path)
    else:
        model = model.model()
    callbacks = [
        keras.callbacks.ModelCheckpoint(filepath=config.model_save_path, save_weights_only=True, save_best_only=True),
        CheckAccuracyCallback(train_x, train_y, validation_x, validation_y, len(config.labels), config.fixed_length,
                              batch_size=config.train_batch_size)
    ]
    model.fit(train_x, train_y, batch_size=config.train_batch_size, epochs=config.epochs,
              validation_data=(validation_x, validation_y), callbacks=callbacks)


if __name__ == '__main__':
    start_time = time.time()
    train()
    
    # predictor = Predictor()
    # # 预测本地磁盘文件
    # image_path = r'C:\Users\Administrator\Desktop\get.jpg'
    # ret = predictor.predict(image_path)
    # print(ret)

    end_time = time.time()
    print('total time: %s' % (end_time - start_time))

2.5 代码的主要功能和结构的概述：

代码功能

数据预处理：
- label_to_array：将验证码文本标签转换为独热编码向量。
- array_to_label：将独热编码向量转换回文本标签。
- load_image_data：从指定目录加载图片数据，并将RGB图片转换为灰度图片，同时从文件名中提取标签。
模型定义：
- FixCaptchaLengthModel：定义了一个卷积神经网络（CNN）模型，用于定长验证码的识别。
- 模型包含多层卷积、池化和全连接层，最终输出验证码的预测结果。
训练过程：
- train函数：加载训练和验证数据，初始化模型，使用回调函数（如CheckAccuracyCallback）监控训练过程，并保存最佳模型。
预测功能：
- Predictor类：加载训练好的模型，提供单张图片或远程图片的预测功能。
- 支持从本地文件或远程URL加载图片，并输出预测的验证码文本。
配置管理：
- Config类：通过JSON文件加载模型训练和预测的配置参数，如图片尺寸、标签集合、学习率等。

代码结构

工具函数：
- label_to_array和array_to_label用于标签和向量之间的转换。
- load_image_data用于加载和预处理图片数据。
模型类：
- FixCaptchaLengthModel定义了CNN模型的结构和训练参数。
- 提供model方法构建模型，load_from_disk方法加载已保存的模型权重。
回调类：
- CheckAccuracyCallback：在每个训练周期结束时，计算并打印训练批次和验证集的准确率。
配置类：
- Config类用于加载和管理训练和预测的配置参数。
- 支持从JSON文件加载配置。
预测类：
- Predictor类用于加载模型并进行预测。
- 提供本地图片和远程图片的预测功能。
主函数：
- train函数用于启动模型训练。
- 示例代码中还展示了如何使用Predictor类进行预测。

注意事项

依赖库：代码依赖TensorFlow、Keras、Pillow、NumPy等库，需要确保这些库已正确安装。

数据格式：
- 图片文件名应符合label_xxxx.jpg或label_xxxx.png格式，其中label是验证码文本。
- 训练和验证图片应分别存放在指定目录中。
模型保存路径：
- 模型权重将保存到配置文件中指定的路径。
预测功能：
- 预测功能支持本地图片和远程图片，远程图片通过URL加载。
配置文件：
- 配置文件（如fixed_length_captcha.json）应包含训练和预测所需的参数，如图片尺寸、标签集合、学习率等。

三、模型调用

import captcha

# 图片路径
image_path = r'C:\Users\Administrator\Desktop\get.jpg'
predictor = captcha.Predictor()
# 预测本地磁盘文件
predictor.predict(image_path)
# # 直接二进制内容预测
# predictor.predict_single_image_content(b'PNGxxxxx')
# # 预测远程图片
# predictor.predict_remote_image('http://xxxxxx/xx.jpg', save_image_to_file='remote.jpg')