深度学习技术在计算机视觉领域的应用日益广泛,手写数字和符号的自动识别在教育、金融、邮政等领域具有重要意义。本博客将系统地介绍如何构建一个基于YOLO模型的手写数字和符号识别系统,涵盖数据准备、模型训练、UI界面开发和部署。

一、引言
项目背景与意义

手写数字和符号的自动识别系统可以大大提高信息处理的效率,减少人工录入的错误率。利用深度学习技术,尤其是YOLO(You Only Look Once)目标检测模型,可以实现对手写字符的高效、准确识别。

目标
  • 构建一个基于YOLO的深度学习模型,能够准确识别手写数字和符号
  • 开发一个用户友好的Web界面,方便用户上传图片并获取识别结果
  • 部署系统,实现实时在线识别
二、技术方案
开发环境
  • 操作系统:Windows/Linux/MacOS
  • 编程语言:Python 3.8+
  • 开发工具:PyCharm/VSCode
  • 深度学习框架:PyTorch
  • Web框架:Flask
依赖库安装

首先,创建一个新的Python虚拟环境并安装所需的依赖库:

conda create -n handwriting_recognition python=3.8
conda activate handwriting_recognition
pip install torch torchvision torchaudio
pip install flask opencv-python pandas
pip install -U git+https://github.com/ultralytics/yolov5
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
三、数据准备
数据集获取

使用公开的手写数字和符号数据集,如MNIST和Kuzushiji-MNIST。可以从以下链接下载:

数据预处理

将数据集转换为YOLO格式,需要包含图像文件和相应的标签文件。标签文件格式如下:

<class_id> <x_center> <y_center> <width> <height>
  • 1.

编写数据预处理脚本,将数据集转换为YOLO格式:

import os
import pandas as pd
from PIL import Image

def convert_to_yolo_format(df, output_dir):
    for index, row in df.iterrows():
        img = Image.open(row['image_path'])
        width, height = img.size
        x_center = (row['x_min'] + row['x_max']) / 2 / width
        y_center = (row['y_min'] + row['y_max']) / 2 / height
        w = (row['x_max'] - row['x_min']) / width
        h = (row['y_max'] - row['y_min']) / height
        
        label_path = os.path.join(output_dir, f"{row['image_id']}.txt")
        with open(label_path, 'w') as f:
            f.write(f"{row['class_id']} {x_center} {y_center} {w} {h}\n")

# Example usage
# df = pd.read_csv('annotations.csv')  # Assuming annotations.csv has columns: image_id, image_path, x_min, y_min, x_max, y_max, class_id
# convert_to_yolo_format(df, 'labels/')
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
数据集划分

将数据集划分为训练集、验证集和测试集:

import shutil
import random

def split_dataset(source_dir, train_dir, val_dir, test_dir, train_ratio=0.7, val_ratio=0.2):
    all_files = os.listdir(source_dir)
    random.shuffle(all_files)
    train_count = int(len(all_files) * train_ratio)
    val_count = int(len(all_files) * val_ratio)

    for i, file in enumerate(all_files):
        if i < train_count:
            shutil.move(os.path.join(source_dir, file), train_dir)
        elif i < train_count + val_count:
            shutil.move(os.path.join(source_dir, file), val_dir)
        else:
            shutil.move(os.path.join(source_dir, file), test_dir)

split_dataset('data/images', 'data/train/images', 'data/val/images', 'data/test/images')
split_dataset('data/labels', 'data/train/labels', 'data/val/labels', 'data/test/labels')
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
四、模型训练
配置YOLO模型

下载YOLOv5预训练权重,并配置数据文件:

# handwrite.yaml
train: data/train
val: data/val
nc: 10  # number of classes (0-9 for MNIST, adjust accordingly for symbols)
names: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
模型训练

运行以下命令开始训练:

python train.py --img 640 --batch 16 --epochs 50 --data handwrite.yaml --cfg yolov5s.yaml --weights yolov5s.pt
  • 1.
模型评估

使用验证集评估模型性能,并进行必要的超参数调优:

from sklearn.metrics import accuracy_score, recall_score, f1_score

y_true = [...]  # true labels
y_pred = [...]  # predicted labels

accuracy = accuracy_score(y_true, y_pred)
recall = recall_score(y_true, y_pred, average='macro')
f1 = f1_score(y_true, y_pred, average='macro')

print(f"Accuracy: {accuracy}, Recall: {recall}, F1 Score: {f1}")
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
五、UI界面开发
Flask搭建Web应用
  1. 创建项目目录结构:
handwriting_recognition/
├── app.py
├── templates/
│   ├── index.html
│   └── result.html
├── static/
│   └── uploads/
└── models/
    └── yolov5s.pt
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  1. 编写网页模板:
  • index.html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Handwriting Recognition</title>
    <link rel="stylesheet" href="{{ url_for('static', filename='styles.css') }}">
</head>
<body>
    <h1>Handwriting Recognition</h1>
    <form action="/predict" method="post" enctype="multipart/form-data">
        <input type="file" name="file">
        <button type="submit">Upload</button>
    </form>
</body>
</html>
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • result.html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Result</title>
    <link rel="stylesheet" href="{{ url_for('static', filename='styles.css') }}">
</head>
<body>
    <h1>Detection Result</h1>
    <img src="{{ url_for('static', filename='uploads/' + filename) }}" alt="Uploaded Image">
    <p>{{ result }}</p>
</body>
</html>
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
实现后端逻辑
  • app.py
from flask import Flask, request, render_template, url_for
import os
from werkzeug.utils import secure_filename
import torch
from PIL import Image

app = Flask(__name__)
app.config['UPLOAD_FOLDER'] = 'static/uploads/'

model = torch.hub.load('ultralytics/yolov5', 'custom', path='models/yolov5s.pt')

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/predict', methods=['POST'])
def predict():
    if 'file' not in request.files:
        return 'No file part'
    file = request.files['file']
    if file.filename == '':
        return 'No selected file'
    if file:
        filename = secure_filename(file.filename)
        filepath = os.path.join(app.config['UPLOAD_FOLDER'], filename)
        file.save(filepath)
        img = Image.open(filepath)
        results = model(img)
        results.save(save_dir=app.config['UPLOAD_FOLDER'])
        return render_template('result.html', filename=filename, result=results.pandas().xyxy[0].to_json(orient="records"))

if __name__ == '__main__':
    app.run(debug=True)
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
六、模型部署
部署到云服务器
  1. 使用Gunicorn部署
pip install gunicorn
gunicorn -w 4 app
  • 1.
  • 2.

:app

2. **配置Nginx反向代理**
```nginx
server {
    listen 80;
    server_name your_domain;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.