小白如何使用神经网络来识别手写数字识别

想成为哥布林之王

于 2024-08-18 12:16:22 发布

阅读量857

点赞数 28

文章标签： python 神经网络卷积神经网络 pytorch

本文链接：https://blog.csdn.net/Ex564/article/details/140495969

版权

前言：作者想要利用神经网络实现手写数字的识别，同时尽可能增加识别准确率，提高模型的鲁棒性。本文主要操作集中于数据增强方面和上传图片转换为mnist数据集形式的过程方面。

关于全连接神经网络和卷积神经网络的详细内容可以看：机器学习入门（15）— 全连接层与卷积层的区别、卷积神经网络结构、卷积运算、填充、卷积步幅、三维数据卷积、多维卷积核运算以及批处理_全连接层和卷积层的区别-CSDN博客

3. 然后需要定义损失函数和优化器：

4.随后是模型训练，一般来说，模型训练是很耗时的，但这里我们只是应对一个简单的问题，所以我们只训练5轮，这样子可以节约大量时间。

绘制训练过程中的损失和准确率曲线：

（跑代码中会出现这些图）

5.最后是模型评价：在测试集上评估模型，计算总体准确率和每个数字类别的准确率。

（那些博主的帖子里跑完之后都会出现总体的准确率，但是我在实操时发现最好加上底下这条代码

当你遇到总体识别准确率很低的时候，可以去看是哪个数字种类的识别准确率很低。）

（结果图）

1.3上传手写数字图片并处理

一般来说上传完我们的自定义图片后，需要将图片中转换为我们训练时相似的图片，也就是Mnist数据集形式，那我们的转换需要满足几个条件：(mnist数据集本身的特点）

详细介绍可看mnist数据集转换为图片+测试手写字的demo_如何用pil制作mnist 测试用图-CSDN博客

（1）图片必须是256位黑白色
（2）黑底白字
（3）像素大小为28*28
（4）数字要在图片的中间

以上为一种代码。我们来详细讲述一下：

首先是mnist数据集的特点：黑底白字。所以我们一般在手写数字时，注意要在白字上手写数字（不能是格子纸，只能用白纸）不然的话不太好反转图片的颜色。

此时我们打印出来转换后的图片：

识别出错。

再看一些转换后的图片：

这些数字识别准确率很低，因为转换后的数字太暗了。

所以我们需要强化数字的亮度，就需要用到二值化相关知识点了，详细知识点可看：

[2] 图像处理之----二值化处理-CSDN博客

这里我分享一下如果直接使用自适应阈值和使用Otsu’s二值化的区别：

(第一张是我上传的原图，是拿网上的画图软件写的）

看起来效果差不多，但是有些情况直接使用自适应阈值效果会很差。

效果很不好，所以我更推荐使用Otsu’s二值化。

此时我们再进行测量仍会出现一些识别错误：

总结一下这几个识别错误的共同点，首先上传的图片本身的数字不在中间，所以转换后的数字也不在图片正中间，而那些数字中间的笔划跟识别出来的数字有一定相似处，然后再结合Mnist数据集本身特点（数字在图片中央）可以猜测出有可能是因为我们的数字不在图片正中间导致识别错误。

我们想要解决这个问题有两种方法，第一种是将图片转换为mnist数据集后将数字移到正中间，第二种是利用数据增强技术增强模型的鲁棒性，这里我们先提供第一种代码。

第一种代码实现也有两种方法，可以让数字质心移动到图片中心，关于质心的知识点可以看质心计算公式_质心公式-CSDN博客这篇文章。

代码为：

import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt

# 加载图像并进行预处理
image_path = "D:\\pythonProject8\\dataset\\number1\\number03.png"
image = cv.imread(image_path, cv.IMREAD_GRAYSCALE)

# 高斯模糊
image = cv.GaussianBlur(image, (5, 5), 0)

# 二值化图像
_, binary_image = cv.threshold(image, 0, 255, cv.THRESH_BINARY_INV + cv.THRESH_OTSU)

# 查找轮廓
contours, _ = cv.findContours(binary_image, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)

# 添加检查条件
if not contours:
print("无数字")
exit()

# 按x坐标对轮廓排序
contours = sorted(contours, key=lambda c: cv.boundingRect(c)[0])

# Ensure the final image size is 28x28
digit_images = []
for contour in contours:
x, y, w, h = cv.boundingRect(contour)
digit_image = binary_image[y:y+h, x:x+w]

# 计算质心
M = cv.moments(digit_image)
if M["m00"] != 0:
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])
else:
cX, cY = w // 2, h // 2

# 创建28x28的空白图像
img_padded = np.zeros((28, 28), dtype=np.uint8)

# 计算平移量
start_x = 14 - cX
start_y = 14 - cY

# 确保平移后的图像在边界内
translation_matrix = np.float32([[1, 0, start_x], [0, 1, start_y]])
digit_image = cv.warpAffine(digit_image, translation_matrix, (28, 28), borderValue=(0, 0, 0))

# 归一化图像
digit_image = digit_image.astype(np.float32) / 255.0
digit_images.append(digit_image)

# 显示分割后的数字图像
for idx, digit_image in enumerate(digit_images):
plt.subplot(1, len(digit_images), idx + 1)
plt.imshow(digit_image, cmap='gray')
plt.axis('off')
plt.show()
第二种即直接将自适应缩放后的图像放置在中心

代码为：

import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt

# 加载图像并进行预处理
image_path = "D:\\pythonProject8\\dataset\\number1\\number03.png"
image = cv.imread(image_path, cv.IMREAD_GRAYSCALE)

# 高斯模糊
image = cv.GaussianBlur(image, (5, 5), 0)

# 二值化图像
_, binary_image = cv.threshold(image, 0, 255, cv.THRESH_BINARY_INV + cv.THRESH_OTSU)

# 查找轮廓
contours, _ = cv.findContours(binary_image, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)

# 添加检查条件
if not contours:
print("无数字")
exit()

# 按x坐标对轮廓排序
contours = sorted(contours, key=lambda c: cv.boundingRect(c)[0])

# Ensure the final image size is 28x28
digit_images = []
for contour in contours:
x, y, w, h = cv.boundingRect(contour)
digit_image = binary_image[y:y+h, x:x+w]

# 创建28x28的空白图像
img_padded = np.zeros((28, 28), dtype=np.uint8)

# 计算缩放因子
scale = min(20 / w, 20 / h)
new_w = int(w * scale)
new_h = int(h * scale)

# 确保数字在中心
digit_image = cv.resize(digit_image, (new_w, new_h), interpolation=cv.INTER_AREA)
start_x = (28 - new_w) // 2
start_y = (28 - new_h) // 2
img_padded[start_y:start_y + new_h, start_x=start_x + new_w] = digit_image

# 归一化图像
digit_image = img_padded.astype(np.float32) / 255.0
digit_images.append(digit_image)

# 显示分割后的数字图像
for idx, digit_image in enumerate(digit_images):
plt.subplot(1, len(digit_images), idx + 1)
plt.imshow(digit_image, cmap='gray')
plt.axis('off')
plt.show()

1.4输出预测结果

打印出识别的数字

以下为成功的案例

1.5完整代码

两层全连接神经网络：

import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torchvision.datasets.utils import download_and_extract_archive
import torchvision.utils as utils
import torch.utils.data as data_utils
from PIL import Image, ImageOps
import os
import cv2 as cv

# Modify MNIST dataset download URL
mirrors = [
    "http://yann.lecun.com/exdb/mnist/",
    "https://ossci-datasets.s3.amazonaws.com/mnist/"
]

resources = [
    ("train-images-idx3-ubyte.gz", "f68b3c2dcbeaaa9fbdd348bbdeb94873"),
    ("train-labels-idx1-ubyte.gz", "d53e105ee54ea40749a09fcbcd1e9432"),
    ("t10k-images-idx3-ubyte.gz", "9fb629c4189551a2d022fa330f9573f3"),
    ("t10k-labels-idx1-ubyte.gz", "ec29112dd5afa0611ce80d1b7f02629c")
]

for filename, md5 in resources:
    for mirror in mirrors:
        url = f"{mirror}{filename}"
        try:
            download_and_extract_archive(url, download_root="mnist/MNIST/raw", filename=filename, md5=md5)
            break
        except Exception as e:
            print(f"Failed to download from {url}. Error: {e}")

print("Hello World")
print(torch.__version__)
print(torch.cuda.is_available())

# Use California housing dataset
housing = fetch_california_housing()

# Data preprocessing
scaler = MinMaxScaler()
data = scaler.fit_transform(housing.data)
target = housing.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2, random_state=42)

# Prepare MNIST dataset
train_data = datasets.MNIST(root="mnist",
                            train=True,
                            transform=transforms.ToTensor(),
                            download=False)

test_data = datasets.MNIST(root="mnist",
                           train=False,
                           transform=transforms.ToTensor(),
                           download=False)

# DataLoader for batch processing
train_loader = data_utils.DataLoader(dataset=train_data,
                                     batch_size=64,
                                     shuffle=True)

test_loader = data_utils.DataLoader(dataset=test_data,
                                    batch_size=64,
                                    shuffle=True)

# Display a batch of images
imgs, labels = next(iter(train_loader))
images = utils.make_grid(imgs)
images = images.numpy().transpose(1, 2, 0)

plt.imshow(images, cmap='gray')
plt.show()


# 2-3 定义网络模型：两层全连接神经网络
class NetB(torch.nn.Module):
    # 定义神经网络
    def __init__(self, n_feature, n_hidden, n_output):
        super(NetB, self).__init__()
        self.h1 = nn.Linear(n_feature, n_hidden)
        self.out = nn.Linear(n_hidden, n_output)
        self.softmax = nn.Softmax(dim=1)

    # 定义前向运算
    def forward(self, x):
        # 得到的数据格式torch.Size([64, 1, 28, 28])需要转变为（64,784）
        x = x.view(x.size()[0], -1)  # -1表示自动匹配
        h1 = self.h1(x)
        out = self.out(h1)
        a_out = self.softmax(out)
        return a_out


model_b = NetB(28 * 28, 64, 10)
print(model_b)
print(model_b.parameters)
print(model_b.parameters())

# 2-4 定义网络预测输出
x_train, _ = next(iter(train_loader))
y_pred = model_b.forward(x_train)
print(y_pred.shape)

# 3-1 定义loss函数:
loss_fn = nn.CrossEntropyLoss()
print(loss_fn)

# 3-2 定义优化器

model = model_b

Learning_rate = 0.01  # 学习率

# optimizer = SGD： 基本梯度下降法
# parameters：指明要优化的参数列表
# lr：指明学习率
# optimizer = torch.optim.Adam(model.parameters(), lr = Learning_rate)
optimizer = torch.optim.SGD(model.parameters(), lr=Learning_rate, momentum=0.9)
print(optimizer)

# 3-3 模型训练
# 定义迭代次数
epochs = 5

loss_history = []  # 训练过程中的loss数据
accuracy_history = []  # 中间的预测结果

accuracy_batch = 0.0

for i in range(0, epochs):
    for j, (x_train, y_train) in enumerate(train_loader):

        # (0) 复位优化器的梯度
        optimizer.zero_grad()

        # (1) 前向计算
        y_pred = model(x_train)

        # (2) 计算loss
        loss = loss_fn(y_pred, y_train)

        # (3) 反向求导
        loss.backward()

        # (4) 反向迭代
        optimizer.step()

        # 记录训练过程中的损失值
        loss_history.append(loss.item())  # loss for a batch

        # 记录训练过程中的准确率
        number_batch = y_train.size()[0]  # 图片的个数
        _, predicted = torch.max(y_pred.data, dim=1)
        correct_batch = (predicted == y_train).sum().item()  # 预测正确的数目
        accuracy_batch = 100 * correct_batch / number_batch
        accuracy_history.append(accuracy_batch)

        if (j % 100 == 0):
            print('epoch {} batch {} In {} loss = {:.4f} accuracy = {:.4f}%%'.format(i, j, len(train_data) / 64,
                                                                                     loss.item(), accuracy_batch))

print("\n迭代完成")
print("final loss =", loss.item())
print("final accu =", accuracy_batch)

# 显示loss的历史数据
plt.grid()
plt.xlabel("iters")
plt.ylabel("")
plt.title("loss", fontsize=12)
plt.plot(loss_history, "r")
plt.show()

# 显示准确率的历史数据
plt.grid()
plt.xlabel("iters")
plt.ylabel("%")
plt.title("accuracy", fontsize=12)
plt.plot(accuracy_history, "b+")
plt.show()

# 手工检查
index = 0
print("获取一个batch样本")
images, labels = next(iter(test_loader))
print(images.shape)
print(labels.shape)
print(labels)

print("\n对batch中所有样本进行预测")
outputs = model(images)
print(outputs.data.shape)

print("\n对batch中每个样本的预测结果，选择最可能的分类")
_, predicted = torch.max(outputs, 1)
print(predicted.data.shape)
print(predicted)

print("\n对batch中的所有结果进行比较")
bool_results = (predicted == labels)
print(bool_results.shape)
print(bool_results)

print("\n统计预测正确样本的个数和精度")
corrects = bool_results.sum().item()
accuracy = corrects / (len(bool_results))
print("corrects=", corrects)
print("accuracy=", accuracy)

print("\n样本index =", index)
print("标签值    ：", labels[index].item())
print("分类可能性：", outputs.data[index].numpy())
print("最大可能性：", predicted.data[index].item())
print("正确性    ：", bool_results.data[index].item())

# 对模型进行评估，测试其在训练集上的准确率
correct_dataset = 0
total_dataset = 0
accuracy_dataset = 0.0

# 进行评测的时候网络不更新梯度
with torch.no_grad():
    for i, data in enumerate(train_loader):
        # 获取一个batch样本"
        images, labels = data

        # 对batch中所有样本进行预测
        outputs = model(images)

        # 对batch中每个样本的预测结果，选择最可能的分类
        _, predicted = torch.max(outputs.data, 1)

        # 对batch中的样本数进行累计
        total_dataset += labels.size()[0]

        # 对batch中的所有结果进行比较"
        bool_results = (predicted == labels)

        # 统计预测正确样本的个数
        correct_dataset += bool_results.sum().item()

        # 统计预测正确样本的精度
        accuracy_dataset = 100 * correct_dataset / total_dataset

        if (i % 100 == 0):
            print('batch {} In {} accuracy = {:.4f}'.format(i, len(train_data) / 64, accuracy_dataset))

print('Final result with the model on the dataset, accuracy =', accuracy_dataset)

# 对模型进行评估，测试其在训练集上的准确率
correct_dataset = 0
total_dataset = 0
accuracy_dataset = 0.0

# 进行评测的时候网络不更新梯度
with torch.no_grad():
    for i, data in enumerate(test_loader):
        # 获取一个batch样本"
        images, labels = data

        # 对batch中所有样本进行预测
        outputs = model(images)

        # 对batch中每个样本的预测结果，选择最可能的分类
        _, predicted = torch.max(outputs.data, 1)

        # 对batch中的样本数进行累计
        total_dataset += labels.size()[0]

        # 对batch中的所有结果进行比较"
        bool_results = (predicted == labels)

        # 统计预测正确样本的个数
        correct_dataset += bool_results.sum().item()

        # 统计预测正确样本的精度
        accuracy_dataset = 100 * correct_dataset / total_dataset

        if (i % 100 == 0):
            print('batch {} In {} accuracy = {:.4f}'.format(i, len(test_data) / 64, accuracy_dataset))

print('Final result with the model on the dataset, accuracy =', accuracy_dataset)

# 对模型进行评估，测试其在测试集上的准确率
correct_dataset = 0
total_dataset = 0
accuracy_dataset = 0.0

# 用于跟踪每个数字类别的正确预测次数和总预测次数
class_correct = [0] * 10
class_total = [0] * 10

# 进行评测的时候网络不更新梯度
with torch.no_grad():
    for i, data in enumerate(test_loader):
        # 获取一个batch样本
        images, labels = data
        # 对batch中所有样本进行预测
        outputs = model(images)
        # 对batch中每个样本的预测结果，选择最可能的分类
        _, predicted = torch.max(outputs.data, 1)
        # 对batch中的样本数进行累计
        total_dataset += labels.size()[0]
        # 对batch中的所有结果进行比较
        bool_results = (predicted == labels)
        # 统计预测正确样本的个数
        correct_dataset += bool_results.sum().item()
        # 统计预测正确样本的精度
        accuracy_dataset = 100 * correct_dataset / total_dataset

        # 更新每个数字类别的正确预测次数和总预测次数
        for label, pred in zip(labels, predicted):
            class_total[label.item()] += 1
            if label == pred:
                class_correct[label.item()] += 1

        if (i % 100 == 0):
            print('batch {} In {} accuracy = {:.4f}'.format(i, len(test_data) / 64, accuracy_dataset))

print('Final result with the model on the dataset, accuracy =', accuracy_dataset)

# 输出每个数字类别的准确率
print("\nClass-wise accuracy:")
for i in range(10):
    if class_total[i] > 0:
        class_accuracy = 100 * class_correct[i] / class_total[i]
        print(f'Class {i}: Accuracy = {class_accuracy:.2f}%')
    else:
        print(f'Class {i}: No samples')


# Function to preprocess an uploaded image
def preprocess_image(image_path):
    img = Image.open(image_path)
    img = img.convert('L')  # 转换为灰度图像
    img = ImageOps.invert(img)  # 反转颜色，确保是黑底白字
    img = np.array(img)
    # 使用Otsu阈值法进行二值化
    _, img = cv.threshold(img, 0, 255, cv.THRESH_BINARY + cv.THRESH_OTSU)
    # 计算图像质心距
    M = cv.moments(img)
    if M['m00'] != 0:
        cx = int(M['m10'] / M['m00'])
        cy = int(M['m01'] / M['m00'])
    else:
        cx, cy = 14, 14
    center_x, center_y = img.shape[1] // 2, img.shape[0] // 2
    shift_x = center_x - cx
    shift_y = center_y - cy
    M_translation = np.float32([[1, 0, shift_x], [0, 1, shift_y]])
    img = cv.warpAffine(img, M_translation, (img.shape[1], img.shape[0]))
    img = Image.fromarray(img)
    img = img.resize((28, 28), Image.LANCZOS)  # 调整大小为28x28
    img_array = np.array(img).astype(np.float32) / 255.0  # 转换为numpy数组并归一化
    return img_array


# Load and preprocess an uploaded image
uploaded_image_path = "D:/pythonProject8/dataset/number/51.png"
uploaded_image_array = preprocess_image(uploaded_image_path)

# Display the processed image
plt.imshow(uploaded_image_array.squeeze(), cmap='gray')
plt.title("Uploaded Image")
plt.show()

# Convert to Tensor and add batch dimension
uploaded_image_tensor = torch.tensor(uploaded_image_array, dtype=torch.float32).unsqueeze(0)

# Ensure the model is in evaluation mode
model_b.eval()

# Predict the uploaded image
with torch.no_grad():
    output = model_b(uploaded_image_tensor)
    _, predicted = torch.max(output, 1)
    print("Predicted digit:", predicted.item())

# 对测试图片转换为Mnist数据集过程中添加了图片质心为中心的过程

复杂的全连接神经网络（三层+dropout）：

import numpy as np
from PIL import Image, ImageEnhance, ImageOps
import os
import gzip
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset, ConcatDataset
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import cv2 as cv
from torchvision.datasets.utils import download_and_extract_archive


# 下载 MNIST 数据集
def download_mnist():
    mirrors = [
        "http://yann.lecun.com/exdb/mnist/",
        "https://ossci-datasets.s3.amazonaws.com/mnist/"
    ]

    resources = [
        ("train-images-idx3-ubyte.gz", "f68b3c2dcbeaaa9fbdd348bbdeb94873"),
        ("train-labels-idx1-ubyte.gz", "d53e105ee54ea40749a09fcbcd1e9432"),
        ("t10k-images-idx3-ubyte.gz", "9fb629c4189551a2d022fa330f9573f3"),
        ("t10k-labels-idx1-ubyte.gz", "ec29112dd5afa0611ce80d1b7f02629c")
    ]

    for filename, md5 in resources:
        for mirror in mirrors:
            url = f"{mirror}{filename}"
            try:
                download_and_extract_archive(url, download_root="mnist/MNIST/raw", filename=filename, md5=md5)
                break
            except Exception as e:
                print(f"Failed to download from {url}. Error: {e}")


# 加载 MNIST 数据集
def load_mnist(path, kind='train'):
    """Load MNIST data from `path`"""
    labels_path = os.path.join(path, f'{kind}-labels-idx1-ubyte.gz')
    images_path = os.path.join(path, f'{kind}-images-idx3-ubyte.gz')

    with gzip.open(labels_path, 'rb') as lbpath:
        labels = np.frombuffer(lbpath.read(), dtype=np.uint8, offset=8)

    with gzip.open(images_path, 'rb') as imgpath:
        images = np.frombuffer(imgpath.read(), dtype=np.uint8, offset=16).reshape(len(labels), 28, 28)

    return images, labels


# 预处理自定义图片
def preprocess_image(image_path):
    img = Image.open(image_path)
    img = img.convert('L')  # 转换为灰度图像
    img = ImageOps.invert(img)  # 反转颜色，确保是黑底白字
    img = np.array(img)

    # 使用Otsu阈值法进行二值化
    _, img = cv.threshold(img, 0, 255, cv.THRESH_BINARY + cv.THRESH_OTSU)

    # 计算图像的矩
    M = cv.moments(img)
    if M['m00'] != 0:
        # 计算质心位置
        cx = int(M['m10'] / M['m00'])
        cy = int(M['m01'] / M['m00'])
    else:
        cx, cy = 14, 14  # 如果图像全黑，默认为中心

    # 计算图像中心位置
    center_x, center_y = img.shape[1] // 2, img.shape[0] // 2

    # 计算平移量
    shift_x = center_x - cx
    shift_y = center_y - cy

    # 平移图像
    M_translation = np.float32([[1, 0, shift_x], [0, 1, shift_y]])
    img = cv.warpAffine(img, M_translation, (img.shape[1], img.shape[0]))

    img = Image.fromarray(img)
    img = img.resize((28, 28), Image.LANCZOS)  # 调整大小为28x28
    img_array = np.array(img).astype(np.float32) / 255.0  # 转换为numpy数组并归一化
    return img_array


# 增强图像
def augment_image(image_array):
    img = Image.fromarray((image_array * 255).astype(np.uint8), 'L')
    augmented_images = []

    # 原图
    augmented_images.append(image_array)

    # 旋转
    for angle in range(-10, 11, 5):
        rotated = img.rotate(angle)
        rotated_array = np.array(rotated).astype(np.float32) / 255.0
        augmented_images.append(rotated_array)

    # 对比度增强
    enhancer = ImageEnhance.Contrast(img)
    for factor in [0.5, 1.5]:
        enhanced = enhancer.enhance(factor)
        enhanced_array = np.array(enhanced).astype(np.float32) / 255.0
        augmented_images.append(enhanced_array)

    return augmented_images


# 处理自定义数据集
def preprocess_images_in_directory(directory_path):
    images = []
    labels = []
    for filename in os.listdir(directory_path):
        if filename.endswith('.png') or filename.endswith('.jpg') or filename.endswith('.jpeg'):
            image_path = os.path.join(directory_path, filename)
            processed_image = preprocess_image(image_path)
            augmented_images = augment_image(processed_image)
            for img in augmented_images:
                images.append(img)
                label = int(filename.split('.')[0]) % 10  # 假设标签在文件名中
                labels.append(label)
    return np.array(images), np.array(labels)


# 使用示例
def main():
    # 下载 MNIST 数据集
    download_mnist()

    # 加载 MNIST 数据集
    mnist_path = 'mnist/MNIST/raw'
    mnist_train_images, mnist_train_labels = load_mnist(mnist_path, kind='train')
    mnist_test_images, mnist_test_labels = load_mnist(mnist_path, kind='t10k')

    # 转换数据为 PyTorch 张量
    mnist_train_images = torch.tensor(mnist_train_images, dtype=torch.float32).unsqueeze(1) / 255.0
    mnist_train_labels = torch.tensor(mnist_train_labels, dtype=torch.long)
    mnist_test_images = torch.tensor(mnist_test_images, dtype=torch.float32).unsqueeze(1) / 255.0
    mnist_test_labels = torch.tensor(mnist_test_labels, dtype=torch.long)

    # 创建 MNIST 数据集
    mnist_train_data = TensorDataset(mnist_train_images, mnist_train_labels)
    mnist_test_data = TensorDataset(mnist_test_images, mnist_test_labels)

    # 处理自定义数据集
    directory_path = 'D:\\pythonProject8\\dataset\\number'  # 替换为你的图片目录路径
    processed_images, labels = preprocess_images_in_directory(directory_path)

    # 转换自定义数据为 PyTorch 张量
    processed_images = processed_images.reshape(-1, 1, 28, 28)  # 调整形状为 (num_samples, 1, 28, 28)
    images_tensor = torch.tensor(processed_images, dtype=torch.float32)
    labels_tensor = torch.tensor(labels, dtype=torch.long)

    # 创建自定义数据集
    custom_data = TensorDataset(images_tensor, labels_tensor)

    # 合并 MNIST 训练数据集和自定义数据集
    combined_train_data = ConcatDataset([mnist_train_data, custom_data])

    # 创建数据加载器
    train_loader = DataLoader(combined_train_data, batch_size=32, shuffle=True)
    test_loader = DataLoader(mnist_test_data, batch_size=32, shuffle=False)

    # 定义简单的全连接神经网络
    class SimpleNN(nn.Module):
        def __init__(self):
            super(SimpleNN, self).__init__()
            self.flatten = nn.Flatten()
            self.fc1 = nn.Linear(28 * 28, 128)
            self.dropout1 = nn.Dropout(0.5)
            self.fc2 = nn.Linear(128, 64)
            self.dropout2 = nn.Dropout(0.5)
            self.fc3 = nn.Linear(64, 10)
            self.l2_reg = 1e-4

        def forward(self, x):
            x = self.flatten(x)
            x = torch.relu(self.fc1(x))
            x = self.dropout1(x)
            x = torch.relu(self.fc2(x))
            x = self.dropout2(x)
            x = self.fc3(x)
            return x

    # 实例化模型、损失函数和优化器
    model = SimpleNN()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=model.l2_reg)

    # 训练模型
    def train(model, train_loader, criterion, optimizer, epochs=10):
        for epoch in range(epochs):
            model.train()
            running_loss = 0.0
            for images, labels in train_loader:
                optimizer.zero_grad()
                outputs = model(images)
                loss = criterion(outputs, labels)
                loss.backward()
                optimizer.step()
                running_loss += loss.item()
            print(f'Epoch {epoch + 1}/{epochs}, Loss: {running_loss / len(train_loader)}')

    # 评估模型
    def evaluate(model, test_loader):
        model.eval()
        correct = 0
        total = 0
        global class_correct, class_total
        class_correct = [0] * 10
        class_total = [0] * 10

        with torch.no_grad():
            for images, labels in test_loader:
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

                # 计算每个类别的准确率
                c = (predicted == labels).squeeze()
                for i in range(len(labels)):
                    label = labels[i]
                    class_correct[label] += c[i].item()
                    class_total[label] += 1

        print(f'Accuracy: {100 * correct / total}%')

        # 输出每个数字类别的准确率
        print("\nClass-wise accuracy:")
        for i in range(10):
            if class_total[i] > 0:
                class_accuracy = 100 * class_correct[i] / class_total[i]
                print(f'Class {i}: Accuracy = {class_accuracy:.2f}%')
            else:
                print(f'Class {i}: No samples')

    # 运行训练和评估
    train(model, train_loader, criterion, optimizer, epochs=10)
    evaluate(model, test_loader)

    # 处理上传的图像
    uploaded_image_path = "D:\\pythonProject8\\dataset\\number\\20.png"
    uploaded_image_array = preprocess_image(uploaded_image_path)

    # 显示处理后的图像
    plt.imshow(uploaded_image_array.squeeze(), cmap='gray')
    plt.title("Uploaded Image")
    plt.show()

    # 转换为 Tensor 并添加批量维度
    uploaded_image_tensor = torch.tensor(uploaded_image_array, dtype=torch.float32).unsqueeze(0)

    # 确保模型处于评估模式
    model.eval()

    # 预测上传的图像
    with torch.no_grad():
        output = model(uploaded_image_tensor)
        _, predicted = torch.max(output, 1)
        print("Predicted digit:", predicted.item())


if __name__ == "__main__":
    main()

# 将上传的图片同样作为测试数据集

卷积神经网络：（运行时间长）

import numpy as np
from PIL import Image, ImageEnhance, ImageOps
import os
import gzip
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset, ConcatDataset
from torchvision.datasets.utils import download_and_extract_archive
import cv2 as cv
import matplotlib.pyplot as plt

# 下载 MNIST 数据集
def download_mnist():
    mirrors = [
        "http://yann.lecun.com/exdb/mnist/",
        "https://ossci-datasets.s3.amazonaws.com/mnist/"
    ]

    resources = [
        ("train-images-idx3-ubyte.gz", "f68b3c2dcbeaaa9fbdd348bbdeb94873"),
        ("train-labels-idx1-ubyte.gz", "d53e105ee54ea40749a09fcbcd1e9432"),
        ("t10k-images-idx3-ubyte.gz", "9fb629c4189551a2d022fa330f9573f3"),
        ("t10k-labels-idx1-ubyte.gz", "ec29112dd5afa0611ce80d1b7f02629c")
    ]

    for filename, md5 in resources:
        for mirror in mirrors:
            url = f"{mirror}{filename}"
            try:
                download_and_extract_archive(url, download_root="mnist/MNIST/raw", filename=filename, md5=md5)
                break
            except Exception as e:
                print(f"Failed to download from {url}. Error: {e}")

# 加载 MNIST 数据集
def load_mnist(path, kind='train'):
    """Load MNIST data from `path`"""
    labels_path = os.path.join(path, f'{kind}-labels-idx1-ubyte.gz')
    images_path = os.path.join(path, f'{kind}-images-idx3-ubyte.gz')

    with gzip.open(labels_path, 'rb') as lbpath:
        labels = np.frombuffer(lbpath.read(), dtype=np.uint8, offset=8)

    with gzip.open(images_path, 'rb') as imgpath:
        images = np.frombuffer(imgpath.read(), dtype=np.uint8, offset=16).reshape(len(labels), 28, 28)

    return images, labels

# 预处理自定义图片
def preprocess_image(image_path):
    img = Image.open(image_path)
    img = img.convert('L')  # 转换为灰度图像
    img = ImageOps.invert(img)  # 反转颜色，确保是黑底白字
    img = np.array(img)

    # 使用Otsu阈值法进行二值化
    _, img = cv.threshold(img, 0, 255, cv.THRESH_BINARY + cv.THRESH_OTSU)

    # 计算图像的矩
    M = cv.moments(img)
    if M['m00'] != 0:
        # 计算质心位置
        cx = int(M['m10'] / M['m00'])
        cy = int(M['m01'] / M['m00'])
    else:
        cx, cy = 14, 14  # 如果图像全黑，默认为中心

    # 计算图像中心位置
    center_x, center_y = img.shape[1] // 2, img.shape[0] // 2

    # 计算平移量
    shift_x = center_x - cx
    shift_y = center_y - cy

    # 平移图像
    M_translation = np.float32([[1, 0, shift_x], [0, 1, shift_y]])
    img = cv.warpAffine(img, M_translation, (img.shape[1], img.shape[0]))

    img = Image.fromarray(img)
    img = img.resize((28, 28), Image.LANCZOS)  # 调整大小为28x28
    img_array = np.array(img).astype(np.float32) / 255.0  # 转换为numpy数组并归一化
    return img_array

# 增强图像
def augment_image(image_array):
    img = Image.fromarray((image_array * 255).astype(np.uint8), 'L')
    augmented_images = []

    # 原图
    augmented_images.append(image_array)

    # 旋转
    for angle in range(-10, 11, 5):
        rotated = img.rotate(angle)
        rotated_array = np.array(rotated).astype(np.float32) / 255.0
        augmented_images.append(rotated_array)

    # 对比度增强
    enhancer = ImageEnhance.Contrast(img)
    for factor in [0.5, 1.5]:
        enhanced = enhancer.enhance(factor)
        enhanced_array = np.array(enhanced).astype(np.float32) / 255.0
        augmented_images.append(enhanced_array)

    return augmented_images

# 处理自定义数据集
def preprocess_images_in_directory(directory_path):
    images = []
    labels = []
    for filename in os.listdir(directory_path):
        if filename.endswith('.png') or filename.endswith('.jpg') or filename.endswith('.jpeg'):
            image_path = os.path.join(directory_path, filename)
            processed_image = preprocess_image(image_path)
            augmented_images = augment_image(processed_image)
            for img in augmented_images:
                images.append(img)
                label = int(filename.split('.')[0]) % 10  # 假设标签在文件名中
                labels.append(label)
    return np.array(images), np.array(labels)

# 使用示例
def main():
    # 下载 MNIST 数据集
    print("Downloading MNIST dataset...")
    download_mnist()

    # 加载 MNIST 数据集
    print("Loading MNIST dataset...")
    mnist_path = 'mnist/MNIST/raw'
    mnist_train_images, mnist_train_labels = load_mnist(mnist_path, kind='train')
    mnist_test_images, mnist_test_labels = load_mnist(mnist_path, kind='t10k')

    print("MNIST training data shape:", mnist_train_images.shape)
    print("MNIST test data shape:", mnist_test_images.shape)

    # 转换数据为 PyTorch 张量
    mnist_train_images = torch.tensor(mnist_train_images, dtype=torch.float32).unsqueeze(1) / 255.0
    mnist_train_labels = torch.tensor(mnist_train_labels, dtype=torch.long)
    mnist_test_images = torch.tensor(mnist_test_images, dtype=torch.float32).unsqueeze(1) / 255.0
    mnist_test_labels = torch.tensor(mnist_test_labels, dtype=torch.long)

    print("MNIST training tensor shape:", mnist_train_images.shape)
    print("MNIST test tensor shape:", mnist_test_images.shape)

    # 创建 MNIST 数据集
    mnist_train_data = TensorDataset(mnist_train_images, mnist_train_labels)
    mnist_test_data = TensorDataset(mnist_test_images, mnist_test_labels)

    # 处理自定义数据集
    directory_path = 'D:\\pythonProject8\\dataset\\number'  # 替换为你的图片目录路径
    print(f"Preprocessing custom images in {directory_path}...")
    processed_images, labels = preprocess_images_in_directory(directory_path)

    print("Custom images shape:", processed_images.shape)
    print("Custom labels shape:", labels.shape)

    # 转换自定义数据为 PyTorch 张量
    processed_images = processed_images.reshape(-1, 1, 28, 28)  # 调整形状为 (num_samples, 1, 28, 28)
    images_tensor = torch.tensor(processed_images, dtype=torch.float32)
    labels_tensor = torch.tensor(labels, dtype=torch.long)

    # 创建自定义数据集
    custom_data = TensorDataset(images_tensor, labels_tensor)

    # 合并 MNIST 训练数据集和自定义数据集
    combined_train_data = ConcatDataset([mnist_train_data, custom_data])

    # 创建数据加载器
    train_loader = DataLoader(combined_train_data, batch_size=32, shuffle=True)
    test_loader = DataLoader(mnist_test_data, batch_size=32, shuffle=False)

    # 定义简单的卷积神经网络
    class SimpleCNN(nn.Module):
        def __init__(self):
            super(SimpleCNN, self).__init__()
            self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
            self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
            self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
            self.fc1 = nn.Linear(64 * 7 * 7, 128)
            self.fc2 = nn.Linear(128, 10)

        def forward(self, x):
            x = self.pool(torch.relu(self.conv1(x)))
            x = self.pool(torch.relu(self.conv2(x)))
            x = x.view(-1, 64 * 7 * 7)
            x = torch.relu(self.fc1(x))
            x = self.fc2(x)
            return x

    # 实例化模型、损失函数和优化器
    model = SimpleCNN()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    # 训练模型
    def train(model, train_loader, criterion, optimizer, epochs=10):
        for epoch in range(epochs):
            running_loss = 0.0
            for images, labels in train_loader:
                optimizer.zero_grad()
                outputs = model(images)
                loss = criterion(outputs, labels)
                loss.backward()
                optimizer.step()
                running_loss += loss.item()
            print(f'Epoch {epoch + 1}/{epochs}, Loss: {running_loss / len(train_loader)}')

    # 评估模型
    def evaluate(model, test_loader):
        model.eval()
        correct = 0
        total = 0
        class_correct = [0] * 10
        class_total = [0] * 10

        with torch.no_grad():
            for images, labels in test_loader:
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

                c = (predicted == labels).squeeze()
                for i in range(len(labels)):
                    label = labels[i]
                    class_correct[label] += c[i].item()
                    class_total[label] += 1

        print(f'Accuracy: {100 * correct / total}%')

        # 输出每个数字类别的准确率
        print("\nClass-wise accuracy:")
        for i in range(10):
            if class_total[i] > 0:
                class_accuracy = 100 * class_correct[i] / class_total[i]
                print(f'Class {i}: Accuracy = {class_accuracy:.2f}%')
            else:
                print(f'Class {i}: No samples')

    # 运行训练和评估
    print("Training model...")
    train(model, train_loader, criterion, optimizer, epochs=10)
    print("Evaluating model...")
    evaluate(model, test_loader)

    # 处理上传的图像
    uploaded_image_path = "D:\\pythonProject8\\dataset\\number\\20.png"
    uploaded_image_array = preprocess_image(uploaded_image_path)

    # 显示处理后的图像
    plt.imshow(uploaded_image_array.squeeze(), cmap='gray')
    plt.title("Uploaded Image")
    plt.show()

    # 转换为 Tensor 并添加批量维度
    uploaded_image_tensor = torch.tensor(uploaded_image_array, dtype=torch.float32).unsqueeze(0).unsqueeze(0)

    # 确保模型处于评估模式
    model.eval()

    # 预测上传的图像
    with torch.no_grad():
        output = model(uploaded_image_tensor)
        _, predicted = torch.max(output, 1)
        print("Predicted digit:", predicted.item())

2.神经网络训练集的数据增强技术

这部分主要是使用数据增强技术来增加训练集的多样性，以此来提升整体模型的鲁棒性，关于数据增强的知识点，可以查看数据增强（Data Augmentation）汇总 - 知乎 (zhihu.com)

和数据增强（Data Augmentation）常用方法汇总-CSDN博客

2.1将训练集进行数据增强技术

一开始可以尝试仅仅让训练集的数字移动然后打印出经过数据增强技术的训练集图片（共60000张）大致图片如下：

此时我们会发现一个问题，就是有些数字不够清晰（白字太暗）以及部分数字因为移动丢失了部分笔划

我们解决这个问题有两个方法，一种比较简单（但是治标不治本）就是减少数字移动的范围，将其限制在一个值内（比如5像素）另一种即利用二值化寻找数字轮廓，然后限制数字轮廓不出界。

（效果图）

而对于数字不清晰这个也较好处理，也是利用二值化提高数字亮度（建议使用otsu方法）

因为直接使用二值化可能会使得图片上数字有很多毛刺

这是个不错的成效图。

然后在试过了简单的数据增强后，可以尝试将数字缩小，这样子会增强鲁棒性。

简单缩小的训练集，我们可以发现有些数字不清晰，且部分数字经过了缩小后，数字的笔划特别细，这里我们可能要用到膨胀操作，关于膨胀操作可以看：

数字图像处理：形态学操作、腐蚀、膨胀、开运算、闭运算_arcgis膨胀腐蚀运算-CSDN博客

这里讲述一点一般我们在膨胀操作中使用的核都是（奇数*奇数）比如3*3

让我们使用二值化提高数字亮度并且加上膨胀反应

效果不佳，可以尝试otsu二值化优化一下

效果好一些，但经过膨胀反应后有些数字内部的圆圈消失了，所以我想的方法是，如果有些数字经过缩小后，笔划很细再进行膨胀操作，其余的不变。

看起来比刚刚好一些。。。

因为使用的是全连接神经网络，可能模型不佳，所以识别准确率一直在50~60之间徘徊，比第一部分准确率下降很多，甚至我遇到了一个大问题，就是我打印出来每个数字的识别准确率，但是”5“的数字准确率经常为0

这里我们可以打印出来在测试集中”5“被识别成每个数字的概率，这样子我们可以大致有一个判断，然后再打印出来测试集中类型为”5“的数字，然后对比图片和我们训练集中所有”5“的图片，看看是我们的训练集出问题了还是单纯全连接神经网络能力不足的锅。

以下为”5“的测试集图片：

一共有两种在数据增强技术中使用了缩小操作的代码

第一种：

# Define the RandomTranslate transformation with proportion-based translation
class RandomTranslate(object):
    def __init__(self, max_translate_ratio):
        self.max_translate_ratio = max_translate_ratio

    def __call__(self, img):
        img = np.array(img)
        rows, cols = img.shape[:2]

        # Apply binary threshold
        _, binary_img = cv.threshold(img, 0, 255, cv.THRESH_BINARY + cv.THRESH_OTSU)

        # Find contours
        contours, _ = cv.findContours(binary_img, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)

        if len(contours) > 0:
            # Get bounding box of the largest contour
            x, y, w, h = cv.boundingRect(contours[0])

            # Set a minimum size threshold
            min_size = 7
            if w <= min_size or h <= min_size:
                return Image.fromarray(img)

            # Scale down the digit before applying translation
            scale_factor = 0.7
            new_w, new_h = int(w * scale_factor), int(h * scale_factor)

            # Ensure dimensions are positive and within valid bounds
            new_w = max(new_w, min_size)
            new_h = max(new_h, min_size)

            digit = img[y:y + h, x:x + w]

            # Resize only if digit dimensions are valid
            if new_w > 0 and new_h > 0:
                digit = cv.resize(digit, (new_w, new_h), interpolation=cv.INTER_AREA)

                # Increase brightness
                digit = cv.convertScaleAbs(digit, alpha=1.5, beta=0)
            else:
                digit = img[y:y + h, x:x + w]  # Fallback to original if scaling fails

            # Create a new blank image with the same size as the original
            img = np.zeros_like(img)
            x = (cols - new_w) // 2
            y = (rows - new_h) // 2
            img[y:y + new_h, x:x + new_w] = digit

            # Adjust translation limits
            max_translate_x = min(cols - new_w, new_w) * self.max_translate_ratio
            max_translate_y = min(rows - new_h, new_h) * self.max_translate_ratio

            translation_x = np.random.uniform(-max_translate_x, max_translate_x)
            translation_y = np.random.uniform(-max_translate_y, max_translate_y)

            # Apply translation
            M = np.float32([[1, 0, translation_x], [0, 1, translation_y]])
            img = cv.warpAffine(img, M, (cols, rows), borderMode=cv.BORDER_REFLECT)

        return Image.fromarray(img)

此为效果图：

另一种为缩小后进行判断，哪个数字笔划细，使用膨胀操作，否则就不使用

第二种：

# Define the RandomTranslate transformation with proportion-based translation and conditional dilation
class RandomTranslate(object):
    def __init__(self, max_translate_ratio):
        self.max_translate_ratio = max_translate_ratio

    def __call__(self, img):
        img = np.array(img)
        rows, cols = img.shape[:2]

        # Apply binary threshold
        _, binary_img = cv.threshold(img, 0, 255, cv.THRESH_BINARY + cv.THRESH_OTSU)

        # Find contours
        contours, _ = cv.findContours(binary_img, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)

        if len(contours) > 0:
            # Get bounding box of the largest contour
            x, y, w, h = cv.boundingRect(contours[0])

            # Ensure the bounding box does not exceed image boundaries
            max_translate_x = min(cols - x - w, x) * self.max_translate_ratio
            max_translate_y = min(rows - y - h, y) * self.max_translate_ratio

            translation_x = np.random.uniform(-max_translate_x, max_translate_x)
            translation_y = np.random.uniform(-max_translate_y, max_translate_y)

            # If the digit is large, scale it down before translation
            if w > cols * 0.7 or h > rows * 0.7:
                scale_factor = 0.7
                new_w, new_h = int(w * scale_factor), int(h * scale_factor)
                digit = img[y:y + h, x:x + w]
                digit = cv.resize(digit, (new_w, new_h), interpolation=cv.INTER_AREA)
                digit = cv.convertScaleAbs(digit, alpha=1.5, beta=0)  
# Increase brightness
                img = np.zeros_like(img)
                x, y = (cols - new_w) // 2, (rows - new_h) // 2
                img[y:y + new_h, x:x + new_w] = digit

            M = np.float32([[1, 0, translation_x], [0, 1, translation_y]])
            img = cv.warpAffine(img, M, (cols, rows), borderMode=cv.BORDER_REFLECT)

            # Check if dilation is needed
            area = cv.contourArea(contours[0])
            perimeter = cv.arcLength(contours[0], True)
            if perimeter > 0:
                ratio = area / (perimeter ** 2)
                # Use dilation if ratio is small (indicating thin strokes)
                if ratio < 0.02:
                    # Apply dilation to make the lines more prominent
                    kernel = np.ones((3, 3), np.uint8)  # Kernel size remains 3x3
                    img = cv.dilate(img, kernel, iterations=1)  # Apply dilation

        return Image.fromarray(img)

此为效果图：

2.2下载我们使用数据增强技术后的训练集（全连接神经网络的完整代码）

一共尝试了三种数据增强：

我的数据增强操作里只包含四种操作，里面每个方案都包括了随机移动数字位置，同时利用二值化找到数字轮廓，防止数字移动出界丢失部分笔划。以下为三种方案使用全连接神经网络和卷积神经网络的准确率对比结果：

以下为完整的代码：

注意：在代码中 “uploaded_image_path”的路径替换成你想要识别的自定义图片路径。

然后我将经过数据增强后的mnist训练集图片全部保存在了“mnist1"文件中，又把测试集中所有类型为”5“的图片保存在了”mnist2"文件中，所以需要单独建立这两个文件在你的python文件夹中。

1.这个代码是未经过缩小，仅仅移动和二值化使数字变亮的：

import os
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import torchvision.utils as utils
import torch.utils.data as data_utils
from PIL import Image, ImageOps
import cv2 as cv


# Define the RandomTranslate transformation with proportion-based translation and dilation
class RandomTranslate(object):
    def __init__(self, max_translate_ratio):
        self.max_translate_ratio = max_translate_ratio

    def __call__(self, img):
        img = np.array(img)
        rows, cols = img.shape[:2]

        # Apply Otsu's binary threshold
        _, binary_img = cv.threshold(img, 0, 255, cv.THRESH_BINARY + cv.THRESH_OTSU)

        # Check if the image needs Otsu’s binarization for better visibility
        num_white_pixels = cv.countNonZero(binary_img)
        num_total_pixels = binary_img.size
        ratio = num_white_pixels / num_total_pixels

        if ratio < 0.02:  # If the digit strokes are too thin, apply dilation
            kernel = np.ones((2, 2), np.uint8)
            img = cv.dilate(binary_img, kernel, iterations=1)
        else:
            img = binary_img

        # Find contours
        contours, _ = cv.findContours(img, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)

        if len(contours) > 0:
            # Get bounding box of the largest contour
            x, y, w, h = cv.boundingRect(contours[0])

            # Ensure the bounding box does not exceed image boundaries
            max_translate_x = min(cols - x - w, x)
            max_translate_y = min(rows - y - h, y)

            translation_x = np.random.uniform(-max_translate_x * self.max_translate_ratio, max_translate_x * self.max_translate_ratio)
            translation_y = np.random.uniform(-max_translate_y * self.max_translate_ratio, max_translate_y * self.max_translate_ratio)

            M = np.float32([[1, 0, translation_x], [0, 1, translation_y]])
            img = cv.warpAffine(img, M, (cols, rows), borderMode=cv.BORDER_REFLECT)

        return Image.fromarray(img)


# Define the network model
class NetB(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(NetB, self).__init__()
        self.h1 = nn.Linear(n_feature, n_hidden)
        self.out = nn.Linear(n_hidden, n_output)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = x.view(x.size()[0], -1)
        h1 = self.h1(x)
        out = self.out(h1)
        a_out = self.softmax(out)
        return a_out


# Create a directory to save the augmented images
save_dir = 'mnist1'
os.makedirs(save_dir, exist_ok=True)

save_dir_test_5 = 'mnist2'
os.makedirs(save_dir_test_5, exist_ok=True)

# Define the data transformations including data augmentation
transform_train = transforms.Compose([
    RandomTranslate(max_translate_ratio=0.1),  # Proportion-based translation
    transforms.ToTensor()
])

transform_test = transforms.Compose([
    transforms.ToTensor()
])

# Load MNIST dataset with data augmentation
train_data = datasets.MNIST(root="mnist",
                            train=True,
                            transform=transform_train,
                            download=False)

test_data = datasets.MNIST(root="mnist",
                           train=False,
                           transform=transform_test,
                           download=False)

# DataLoader for batch processing
train_loader = data_utils.DataLoader(dataset=train_data,
                                     batch_size=64,
                                     shuffle=True)  # Shuffle training data

test_loader = data_utils.DataLoader(dataset=test_data,
                                    batch_size=64,
                                    shuffle=False)  # Do not shuffle test data


# Function to save images from a batch to the specified directory
def save_images(images, labels, start_index, save_path):
    for i in range(images.size(0)):
        img = images[i].cpu().numpy().transpose(1, 2, 0)  # Convert tensor to numpy array
        img = (img * 255).astype(np.uint8)  # Convert to uint8 format
        label = labels[i].item()
        file_path = os.path.join(save_path, f'image_{start_index + i}_label_{label}.png')
        cv.imwrite(file_path, img)


# Save the augmented images
for batch_index, (images, labels) in enumerate(train_loader):
    start_index = batch_index * 64
    save_images(images, labels, start_index, save_dir)
    if batch_index % 10 == 0:  # Print progress every 10 batches
        print(f'Saved batch {batch_index + 1}')

print("Finished saving augmented MNIST images.")


# Save images of label 5 from test dataset
def save_label_5_images(images, labels, start_index, save_path):
    for i in range(images.size(0)):
        if labels[i].item() == 5:
            img = images[i].cpu().numpy().transpose(1, 2, 0)  # Convert tensor to numpy array
            img = (img * 255).astype(np.uint8)  # Convert to uint8 format
            file_path = os.path.join(save_path, f'image_{start_index + i}_label_5.png')
            cv.imwrite(file_path, img)


for batch_index, (images, labels) in enumerate(test_loader):
    start_index = batch_index * 64
    save_label_5_images(images, labels, start_index, save_dir_test_5)
    if batch_index % 10 == 0:  # Print progress every 10 batches
        print(f'Saved batch {batch_index + 1} of label 5 images')

print("Finished saving MNIST images with label 5 from the test set.")

# Display a batch of images
imgs, labels = next(iter(train_loader))
images = utils.make_grid(imgs)
images = images.numpy().transpose(1, 2, 0)

plt.imshow(images, cmap='gray')
plt.show()

# Initialize the network model
model_b = NetB(28 * 28, 32, 10)
print(model_b)

# Define the loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model_b.parameters(), lr=0.01, momentum=0.9)

# Training loop
epochs = 5
loss_history = []
accuracy_history = []

for epoch in range(epochs):
    model_b.train()  # Ensure the model is in training mode
    for batch_idx, (x_train, y_train) in enumerate(train_loader):
        optimizer.zero_grad()
        y_pred = model_b(x_train)
        loss = loss_fn(y_pred, y_train)
        loss.backward()
        optimizer.step()

        loss_history.append(loss.item())

        number_batch = y_train.size()[0]
        _, predicted = torch.max(y_pred.data, dim=1)
        correct_batch = (predicted == y_train).sum().item()
        accuracy_batch = 100 * correct_batch / number_batch
        accuracy_history.append(accuracy_batch)

        if batch_idx % 100 == 0:
            print(f'Epoch {epoch} Batch {batch_idx} Loss = {loss.item():.4f} Accuracy = {accuracy_batch:.4f}%')

print("\nTraining completed")
print(f"Final loss = {loss.item()}")
print(f"Final accuracy = {accuracy_batch}")

# Plot loss history
plt.figure()
plt.grid()
plt.xlabel("Iterations")
plt.ylabel("Loss")
plt.title("Loss History")
plt.plot(loss_history, "r")
plt.show()

# Plot accuracy history
plt.figure()
plt.grid()
plt.xlabel("Iterations")
plt.ylabel("Accuracy (%)")
plt.title("Accuracy History")
plt.plot(accuracy_history, "b+")
plt.show()

# Evaluate on test dataset
model_b.eval()  # Ensure the model is in evaluation mode
correct_dataset = 0
total_dataset = 0
accuracy_dataset = 0.0
class_correct = [0] * 10
class_total = [0] * 10

with torch.no_grad():
    for batch_idx, (images, labels) in enumerate(test_loader):
        outputs = model_b(images)
        _, predicted = torch.max(outputs.data, 1)
        total_dataset += labels.size()[0]
        correct_dataset += (predicted == labels).sum().item()

        for label, pred in zip(labels, predicted):
            class_total[label.item()] += 1
            if label == pred:
                class_correct[label.item()] += 1

        accuracy_dataset = 100 * correct_dataset / total_dataset

        if batch_idx % 100 == 0:
            print(f'Batch {batch_idx} Accuracy = {accuracy_dataset:.4f}')

print(f'Final result on the dataset: Accuracy = {accuracy_dataset:.2f}%')

# Output class-wise accuracy
print("\nClass-wise accuracy:")
for i in range(10):
    if class_total[i] > 0:
        class_accuracy = 100 * class_correct[i] / class_total[i]
        print(f'Class {i}: Accuracy = {class_accuracy:.2f}%')
    else:
        print(f'Class {i}: No samples')


# Function to preprocess an uploaded image
def preprocess_image(image_path):
    # Open and preprocess the image
    img = Image.open(image_path)
    img = img.convert('L')  # Convert to grayscale
    img = ImageOps.invert(img)  # Invert the image
    img = np.array(img)  # Convert to NumPy array

    # Apply Otsu's binary threshold
    _, img = cv.threshold(img, 0, 255, cv.THRESH_BINARY + cv.THRESH_OTSU)

    # Resize the image to 28x28
    img = Image.fromarray(img)
    img = img.resize((28, 28), Image.LANCZOS)

    # Convert to float array and normalize
    img_array = np.array(img).astype(np.float32) / 255.0
    return img_array


# Load and preprocess an uploaded image
uploaded_image_path = "D:/pythonProject8/dataset/number/15.png"
uploaded_image_array = preprocess_image(uploaded_image_path)

# Display the processed image
plt.imshow(uploaded_image_array.squeeze(), cmap='gray')
plt.title("Uploaded Image")
plt.show()

# Convert to Tensor and add batch dimension
uploaded_image_tensor = torch.tensor(uploaded_image_array, dtype=torch.float32).unsqueeze(0)

# Ensure the model is in evaluation mode
model_b.eval()

# Predict the uploaded image
with torch.no_grad():
    output = model_b(uploaded_image_tensor)
    _, predicted = torch.max(output, 1)

print(f"Predicted class for the uploaded image: {predicted.item()}")

成效图

2.这个代码是缩小后不进行膨胀操作：

import os
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import torchvision.utils as utils
import torch.utils.data as data_utils
from torchvision.datasets.utils import download_and_extract_archive
from PIL import Image, ImageOps
import cv2 as cv

# Define the RandomTranslate transformation with proportion-based translation
class RandomTranslate(object):
    def __init__(self, max_translate_ratio):
        self.max_translate_ratio = max_translate_ratio

    def __call__(self, img):
        img = np.array(img)
        rows, cols = img.shape[:2]

        # Apply binary threshold
        _, binary_img = cv.threshold(img, 0, 255, cv.THRESH_BINARY + cv.THRESH_OTSU)

        # Find contours
        contours, _ = cv.findContours(binary_img, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)

        if len(contours) > 0:
            # Get bounding box of the largest contour
            x, y, w, h = cv.boundingRect(contours[0])

            # Set a minimum size threshold
            min_size = 7
            if w <= min_size or h <= min_size:
                return Image.fromarray(img)

            # Scale down the digit before applying translation
            scale_factor = 0.7
            new_w, new_h = int(w * scale_factor), int(h * scale_factor)

            # Ensure dimensions are positive and within valid bounds
            new_w = max(new_w, min_size)
            new_h = max(new_h, min_size)

            digit = img[y:y + h, x:x + w]

            # Resize only if digit dimensions are valid
            if new_w > 0 and new_h > 0:
                digit = cv.resize(digit, (new_w, new_h), interpolation=cv.INTER_AREA)

                # Increase brightness
                digit = cv.convertScaleAbs(digit, alpha=1.5, beta=0)
            else:
                digit = img[y:y + h, x:x + w]  # Fallback to original if scaling fails

            # Create a new blank image with the same size as the original
            img = np.zeros_like(img)
            x = (cols - new_w) // 2
            y = (rows - new_h) // 2
            img[y:y + new_h, x:x + new_w] = digit

            # Adjust translation limits
            max_translate_x = min(cols - new_w, new_w) * self.max_translate_ratio
            max_translate_y = min(rows - new_h, new_h) * self.max_translate_ratio

            translation_x = np.random.uniform(-max_translate_x, max_translate_x)
            translation_y = np.random.uniform(-max_translate_y, max_translate_y)

            # Apply translation
            M = np.float32([[1, 0, translation_x], [0, 1, translation_y]])
            img = cv.warpAffine(img, M, (cols, rows), borderMode=cv.BORDER_REFLECT)

        return Image.fromarray(img)

# Define the optimized network model
class NetB(torch.nn.Module):
    def __init__(self, n_feature, n_hidden1, n_hidden2, n_output):
        super(NetB, self).__init__()
        self.h1 = nn.Linear(n_feature, n_hidden1)
        self.h2 = nn.Linear(n_hidden1, n_hidden2)
        self.out = nn.Linear(n_hidden2, n_output)

    def forward(self, x):
        x = x.view(x.size()[0], -1)
        h1 = torch.relu(self.h1(x))
        h2 = torch.relu(self.h2(h1))
        out = self.out(h2)
        return out

# Create directories to save the images
train_save_dir = 'mnist1'
os.makedirs(train_save_dir, exist_ok=True)

test_save_dir = 'mnist2'
os.makedirs(test_save_dir, exist_ok=True)

# Define the data transformations including data augmentation
transform_train = transforms.Compose([
    RandomTranslate(max_translate_ratio=0.1),  # Proportion-based translation
    transforms.ToTensor()
])

transform_test = transforms.Compose([
    transforms.ToTensor()
])

# Load MNIST dataset with data augmentation
train_data = datasets.MNIST(root="mnist",
                            train=True,
                            transform=transform_train,
                            download=False)

test_data = datasets.MNIST(root="mnist",
                           train=False,
                           transform=transform_test,
                           download=False)

# DataLoader for batch processing
train_loader = data_utils.DataLoader(dataset=train_data,
                                     batch_size=64,
                                     shuffle=True)  # Shuffle training data

test_loader = data_utils.DataLoader(dataset=test_data,
                                    batch_size=64,
                                    shuffle=False)  # Do not shuffle test data


# Function to save images from a batch to the specified directory
def save_images(images, labels, start_index, save_path, save_label=None):
    for i in range(images.size(0)):
        if save_label is None or labels[i].item() == save_label:
            img = images[i].cpu().numpy().transpose(1, 2, 0)  # Convert tensor to numpy array
            img = (img * 255).astype(np.uint8)  # Convert to uint8 format
            label = labels[i].item()
            file_path = os.path.join(save_path, f'image_{start_index + i}_label_{label}.png')
            cv.imwrite(file_path, img)


# Save the augmented images from the training dataset
for batch_index, (images, labels) in enumerate(train_loader):
    start_index = batch_index * 64
    save_images(images, labels, start_index, train_save_dir)
    if batch_index % 10 == 0:  # Print progress every 10 batches
        print(f'Saved batch {batch_index + 1} of training images')

print("Finished saving augmented MNIST images from training set.")

# Save images of label '5' from the test dataset
for batch_index, (images, labels) in enumerate(test_loader):
    start_index = batch_index * 64
    save_images(images, labels, start_index, test_save_dir, save_label=5)
    if batch_index % 10 == 0:  # Print progress every 10 batches
        print(f'Saved batch {batch_index + 1} of label 5 images')

print("Finished saving MNIST images with label '5' from the test set.")

# Display a batch of images
imgs, labels = next(iter(train_loader))
images = utils.make_grid(imgs)
images = images.numpy().transpose(1, 2, 0)

plt.imshow(images, cmap='gray')
plt.show()

# Initialize the network model
model_b = NetB(28 * 28, 128, 64, 10)
print(model_b)

# Define the loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model_b.parameters(), lr=0.01, momentum=0.9)

# Training loop
epochs = 5
loss_history = []
accuracy_history = []

for epoch in range(epochs):
    model_b.train()  # Ensure the model is in training mode
    for batch_idx, (x_train, y_train) in enumerate(train_loader):
        optimizer.zero_grad()
        y_pred = model_b(x_train)
        loss = loss_fn(y_pred, y_train)
        loss.backward()
        optimizer.step()

        loss_history.append(loss.item())

        number_batch = y_train.size()[0]
        _, predicted = torch.max(y_pred.data, dim=1)
        correct_batch = (predicted == y_train).sum().item()
        accuracy_batch = 100 * correct_batch / number_batch
        accuracy_history.append(accuracy_batch)

        if batch_idx % 100 == 0:
            print(f'Epoch {epoch} Batch {batch_idx} Loss = {loss.item():.4f} Accuracy = {accuracy_batch:.4f}%')

print("\nTraining completed")
print(f"Final loss = {loss.item()}")
print(f"Final accuracy = {accuracy_batch}")

# Plot loss history
plt.figure()
plt.grid()
plt.xlabel("Iterations")
plt.ylabel("Loss")
plt.title("Loss History")
plt.plot(loss_history, "r")
plt.show()

# Plot accuracy history
plt.figure()
plt.grid()
plt.xlabel("Iterations")
plt.ylabel("Accuracy (%)")
plt.title("Accuracy History")
plt.plot(accuracy_history, "b+")
plt.show()

# Evaluate on test dataset
model_b.eval()  # Ensure the model is in evaluation mode
correct_dataset = 0
total_dataset = 0
accuracy_dataset = 0.0
class_correct = [0] * 10
class_total = [0] * 10

with torch.no_grad():
    for batch_idx, (images, labels) in enumerate(test_loader):
        outputs = model_b(images)
        _, predicted = torch.max(outputs.data, 1)
        total_dataset += labels.size()[0]
        correct_dataset += (predicted == labels).sum().item()

        for label, pred in zip(labels, predicted):
            class_total[label.item()] += 1
            if label == pred:
                class_correct[label.item()] += 1

        accuracy_dataset = 100 * correct_dataset / total_dataset

        if batch_idx % 100 == 0:
            print(f'Batch {batch_idx} Accuracy = {accuracy_dataset:.4f}')

print(f'Final result on the dataset: Accuracy = {accuracy_dataset:.2f}%')

# Output class-wise accuracy
print("\nClass-wise accuracy:")
for i in range(10):
    if class_total[i] > 0:
        class_accuracy = 100 * class_correct[i] / class_total[i]
        print(f'Class {i}: Accuracy = {class_accuracy:.2f}%')
    else:
        print(f'Class {i}: No samples')

# Function to preprocess an uploaded image
def preprocess_image(image_path):
    # Open and preprocess the image
    img = Image.open(image_path)
    img = img.convert('L')  # Convert to grayscale
    img = ImageOps.invert(img)  # Invert the image
    img = np.array(img)  # Convert to NumPy array

    # Apply thresholding
    _, img = cv.threshold(img, 0, 255, cv.THRESH_BINARY + cv.THRESH_OTSU)

    # Resize the image to 28x28
    img = Image.fromarray(img)
    img = img.resize((28, 28), Image.LANCZOS)

    # Convert to float array and normalize
    img_array = np.array(img).astype(np.float32) / 255.0
    return img_array

# Load and preprocess an uploaded image
uploaded_image_path = "D:/pythonProject8/dataset/number/19.png"
uploaded_image_array = preprocess_image(uploaded_image_path)

# Display the processed image
plt.imshow(uploaded_image_array.squeeze(), cmap='gray')
plt.title("Uploaded Image")
plt.show()

# Convert to Tensor and add batch dimension
uploaded_image_tensor = torch.tensor(uploaded_image_array, dtype=torch.float32).unsqueeze(0)

# Ensure the model is in evaluation mode
model_b.eval()

# Predict the uploaded image
with torch.no_grad():
    output = model_b(uploaded_image_tensor)
    _, predicted = torch.max(output, 1)

print(f"Predicted class for the uploaded image: {predicted.item()}")

效果图

3.这个代码是”缩小后进行判断，哪个数字笔划细，使用膨胀操作“的那个：

import os
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import torchvision.utils as utils
import torch.utils.data as data_utils
from PIL import Image, ImageOps
import cv2 as cv


# Define the RandomTranslate transformation with proportion-based translation and conditional dilation
class RandomTranslate(object):
    def __init__(self, max_translate_ratio):
        self.max_translate_ratio = max_translate_ratio

    def __call__(self, img):
        img = np.array(img)
        rows, cols = img.shape[:2]

        # Apply binary threshold
        _, binary_img = cv.threshold(img, 0, 255, cv.THRESH_BINARY + cv.THRESH_OTSU)

        # Find contours
        contours, _ = cv.findContours(binary_img, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)

        if len(contours) > 0:
            # Get bounding box of the largest contour
            x, y, w, h = cv.boundingRect(contours[0])

            # Ensure the bounding box does not exceed image boundaries
            max_translate_x = min(cols - x - w, x) * self.max_translate_ratio
            max_translate_y = min(rows - y - h, y) * self.max_translate_ratio

            translation_x = np.random.uniform(-max_translate_x, max_translate_x)
            translation_y = np.random.uniform(-max_translate_y, max_translate_y)

            # If the digit is large, scale it down before translation
            if w > cols * 0.7 or h > rows * 0.7:
                scale_factor = 0.7
                new_w, new_h = int(w * scale_factor), int(h * scale_factor)
                digit = img[y:y + h, x:x + w]
                digit = cv.resize(digit, (new_w, new_h), interpolation=cv.INTER_AREA)
                digit = cv.convertScaleAbs(digit, alpha=1.5, beta=0)  # Increase brightness
                img = np.zeros_like(img)
                x, y = (cols - new_w) // 2, (rows - new_h) // 2
                img[y:y + new_h, x:x + new_w] = digit

            M = np.float32([[1, 0, translation_x], [0, 1, translation_y]])
            img = cv.warpAffine(img, M, (cols, rows), borderMode=cv.BORDER_REFLECT)

            # Check if dilation is needed
            area = cv.contourArea(contours[0])
            perimeter = cv.arcLength(contours[0], True)
            if perimeter > 0:
                ratio = area / (perimeter ** 2)
                # Use dilation if ratio is small (indicating thin strokes)
                if ratio < 0.02:
                    # Apply dilation to make the lines more prominent
                    kernel = np.ones((3, 3), np.uint8)  # Kernel size remains 3x3
                    img = cv.dilate(img, kernel, iterations=1)  # Apply dilation

        return Image.fromarray(img)


# Define the network model
class NetB(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(NetB, self).__init__()
        self.h1 = nn.Linear(n_feature, n_hidden)
        self.out = nn.Linear(n_hidden, n_output)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = x.view(x.size()[0], -1)
        h1 = self.h1(x)
        out = self.out(h1)
        a_out = self.softmax(out)
        return a_out


# Create a directory to save the augmented images
save_dir = 'mnist1'
os.makedirs(save_dir, exist_ok=True)

save_dir_test_5 = 'mnist2'
os.makedirs(save_dir_test_5, exist_ok=True)

# Define the data transformations including data augmentation
transform_train = transforms.Compose([
    RandomTranslate(max_translate_ratio=0.1),  # Proportion-based translation
    transforms.ToTensor()
])

transform_test = transforms.Compose([
    transforms.ToTensor()
])

# Load MNIST dataset with data augmentation
train_data = datasets.MNIST(root="mnist",
                            train=True,
                            transform=transform_train,
                            download=False)

test_data = datasets.MNIST(root="mnist",
                           train=False,
                           transform=transform_test,
                           download=False)

# Split the training dataset into training and validation sets
train_size = int(0.8 * len(train_data))
val_size = len(train_data) - train_size
train_dataset, val_dataset = data_utils.random_split(train_data, [train_size, val_size])

# DataLoader for batch processing
train_loader = data_utils.DataLoader(dataset=train_dataset,
                                     batch_size=64,
                                     shuffle=True)  # Shuffle training data

val_loader = data_utils.DataLoader(dataset=val_dataset,
                                   batch_size=64,
                                   shuffle=False)  # Do not shuffle validation data

test_loader = data_utils.DataLoader(dataset=test_data,
                                    batch_size=64,
                                    shuffle=False)  # Do not shuffle test data


# Function to save images from a batch to the specified directory
def save_images(images, labels, start_index, save_path):
    for i in range(images.size(0)):
        img = images[i].cpu().numpy().transpose(1, 2, 0)  # Convert tensor to numpy array
        img = (img * 255).astype(np.uint8)  # Convert to uint8 format
        label = labels[i].item()
        file_path = os.path.join(save_path, f'image_{start_index + i}_label_{label}.png')
        cv.imwrite(file_path, img)


# Save the augmented images
for batch_index, (images, labels) in enumerate(train_loader):
    start_index = batch_index * 64
    save_images(images, labels, start_index, save_dir)
    if batch_index % 10 == 0:  # Print progress every 10 batches
        print(f'Saved batch {batch_index + 1}')

print("Finished saving augmented MNIST images.")


# Save images of label 5 from test dataset
def save_label_5_images(images, labels, start_index, save_path):
    for i in range(images.size(0)):
        if labels[i].item() == 5:
            img = images[i].cpu().numpy().transpose(1, 2, 0)  # Convert tensor to numpy array
            img = (img * 255).astype(np.uint8)  # Convert to uint8 format
            file_path = os.path.join(save_path, f'image_{start_index + i}_label_5.png')
            cv.imwrite(file_path, img)


for batch_index, (images, labels) in enumerate(test_loader):
    start_index = batch_index * 64
    save_label_5_images(images, labels, start_index, save_dir_test_5)
    if batch_index % 10 == 0:  # Print progress every 10 batches
        print(f'Saved batch {batch_index + 1} of label 5 images')

print("Finished saving MNIST images with label 5 from the test set.")

# Display a batch of images
imgs, labels = next(iter(train_loader))
images = utils.make_grid(imgs)
images = images.numpy().transpose(1, 2, 0)

plt.imshow(images, cmap='gray')
plt.show()

# Initialize the network model
model_b = NetB(28 * 28, 32, 10)
print(model_b)

# Define the loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model_b.parameters(), lr=0.01, momentum=0.9)

# Training loop
epochs = 3
loss_history = []
accuracy_history = []

for epoch in range(epochs):
    model_b.train()  # Ensure the model is in training mode
    for batch_idx, (x_train, y_train) in enumerate(train_loader):
        optimizer.zero_grad()
        y_pred = model_b(x_train)
        loss = loss_fn(y_pred, y_train)
        loss.backward()
        optimizer.step()

        loss_history.append(loss.item())

        number_batch = y_train.size()[0]
        _, predicted = torch.max(y_pred.data, dim=1)
        correct_batch = (predicted == y_train).sum().item()
        accuracy_batch = 100 * correct_batch / number_batch
        accuracy_history.append(accuracy_batch)

        if batch_idx % 100 == 0:
            print(f'Epoch {epoch} Batch {batch_idx} Loss = {loss.item():.4f} Accuracy = {accuracy_batch:.4f}%')

print("\nTraining completed")
print(f"Final loss = {loss.item()}")
print(f"Final accuracy = {accuracy_batch}")

# Plot loss history
plt.figure()
plt.grid()
plt.xlabel("Iterations")
plt.ylabel("Loss")
plt.title("Loss History")
plt.plot(loss_history, "r")
plt.show()

# Plot accuracy history
plt.figure()
plt.grid()
plt.xlabel("Iterations")
plt.ylabel("Accuracy (%)")
plt.title("Accuracy History")
plt.plot(accuracy_history, "b+")
plt.show()

# Evaluate on test dataset
model_b.eval()  # Ensure the model is in evaluation mode
correct_dataset = 0
total_dataset = 0
accuracy_dataset = 0.0
class_correct = [0] * 10
class_total = [0] * 10

# Create lists to store predictions and true labels for digits labeled as '5'
true_labels_5 = []
predicted_labels_5 = []

with torch.no_grad():
    for batch_idx, (images, labels) in enumerate(test_loader):
        outputs = model_b(images)
        _, predicted = torch.max(outputs.data, 1)
        total_dataset += labels.size()[0]
        correct_dataset += (predicted == labels).sum().item()

        for label, pred in zip(labels, predicted):
            class_total[label.item()] += 1
            if label == pred:
                class_correct[label.item()] += 1

        # Filter out images labeled as '5'
        mask = (labels == 5)
        true_labels_5.extend(labels[mask].tolist())
        predicted_labels_5.extend(predicted[mask].tolist())

        accuracy_dataset = 100 * correct_dataset / total_dataset

        if batch_idx % 100 == 0:
            print(f'Batch {batch_idx} Accuracy = {accuracy_dataset:.4f}')

print(f'Final result on the dataset: Accuracy = {accuracy_dataset:.2f}%')

# Output class-wise accuracy
print("\nClass-wise accuracy:")
for i in range(10):
    if class_total[i] > 0:
        class_accuracy = 100 * class_correct[i] / class_total[i]
        print(f'Class {i}: Accuracy = {class_accuracy:.2f}%')
    else:
        print(f'Class {i}: No samples')

# Calculate and print the distribution of predictions for true '5' labels
if len(true_labels_5) > 0:
    predicted_labels_5 = np.array(predicted_labels_5)
    unique, counts = np.unique(predicted_labels_5, return_counts=True)
    prediction_distribution = dict(zip(unique, counts))

    print("\nDistribution of predictions for true '5' labels:")
    total_count = len(true_labels_5)
    for label, count in prediction_distribution.items():
        percentage = (count / total_count) * 100
        print(f'Predicted as {label}: {count} times ({percentage:.2f}%)')
else:
    print("No samples with true label '5' found in the test set.")

# Function to preprocess an uploaded image
def preprocess_image(image_path):
    # Open and preprocess the image
    img = Image.open(image_path)
    img = img.convert('L')  # Convert to grayscale
    img = ImageOps.invert(img)  # Invert the image
    img = np.array(img)  # Convert to NumPy array

    # Apply thresholding
    _, img = cv.threshold(img, 0, 255, cv.THRESH_BINARY + cv.THRESH_OTSU)

    # Resize the image to 28x28
    img = Image.fromarray(img)
    img = img.resize((28, 28), Image.LANCZOS)

    # Convert to float array and normalize
    img_array = np.array(img).astype(np.float32) / 255.0
    return img_array


# Load and preprocess an uploaded image
uploaded_image_path = "D:/pythonProject8/dataset/number/15.png"
uploaded_image_array = preprocess_image(uploaded_image_path)

# Display the processed image
plt.imshow(uploaded_image_array.squeeze(), cmap='gray')
plt.title("Uploaded Image")
plt.show()

# Convert to Tensor and add batch dimension
uploaded_image_tensor = torch.tensor(uploaded_image_array, dtype=torch.float32).unsqueeze(0)

# Ensure the model is in evaluation mode
model_b.eval()

# Predict the uploaded image
with torch.no_grad():
    output = model_b(uploaded_image_tensor)
    _, predicted = torch.max(output, 1)

print(f"Predicted class for the uploaded image: {predicted.item()}")

（此为保存下来的经过数据增强技术的mnist训练集）

3.识别含有多个手写数字的图片（卷积为主）

想要识别含有多个数字的图片，需要用到数字分割

3.1分割图片中的数字（完整代码）

我们使用数字分割的原理是通过二值化查找数字轮廓，然后按照数字的x坐标排序，来确保数字按从左到右的顺序排列。（如果是多层数字的话，就会按照最左列到最右列的总顺序，取每一列x坐标最小的数字来排序）同时，代码会优先判断出是否能识别出数字轮廓，如果上传的图片中没有任何数字，代码将直接终端，会节约大量时间。

1.全连接神经网络（识别准确率较低）

import torch
import torch.nn as nn
import torch.utils.data as data_utils
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import torchvision.utils as utils
import matplotlib.pyplot as plt
import numpy as np
import cv2 as cv

# 加载图像并进行预处理
image_path = "D:\\pythonProject8\\dataset\\number1\\number03.png"  # 修改为你的图像路径
image = cv.imread(image_path, cv.IMREAD_GRAYSCALE)

# 高斯模糊
image = cv.GaussianBlur(image, (5, 5), 0)

# 二值化图像
_, binary_image = cv.threshold(image, 0, 255, cv.THRESH_BINARY_INV + cv.THRESH_OTSU)

# 查找轮廓
contours, _ = cv.findContours(binary_image, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)

# 添加检查条件
if not contours:
    print("无数字")
    exit()

# 下载并准备MNIST数据集
train_data = datasets.MNIST(root="mnist",
                            train=True,
                            transform=transforms.ToTensor(),
                            download=True)

test_data = datasets.MNIST(root="mnist",
                           train=False,
                           transform=transforms.ToTensor(),
                           download=True)

# DataLoader for batch processing
train_loader = data_utils.DataLoader(dataset=train_data,
                                     batch_size=64,
                                     shuffle=True)

test_loader = data_utils.DataLoader(dataset=test_data,
                                    batch_size=64,
                                    shuffle=False)
# shuffle 是问是否要打乱顺序

# 显示一个批次的图像
imgs, labels = next(iter(train_loader))
images = utils.make_grid(imgs)
images = images.numpy().transpose(1, 2, 0)

plt.imshow(images, cmap='gray')
plt.show()


# 定义神经网络模型
class NetB(nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(NetB, self).__init__()
        self.h1 = nn.Linear(n_feature, n_hidden)
        self.bn1 = nn.BatchNorm1d(n_hidden)  # 添加批归一化
        self.out = nn.Linear(n_hidden, n_output)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = x.view(x.size()[0], -1)
        h1 = self.bn1(self.h1(x))  # 批归一化
        out = self.out(h1)
        a_out = self.softmax(out)
        return a_out


model_b = NetB(28 * 28, 128, 10)  # 增加隐藏层节点数
print(model_b)

# 定义loss函数和优化器
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model_b.parameters(), lr=0.001)  # 使用Adam优化器

# 模型训练
epochs = 5
loss_history = []
accuracy_history = []

for epoch in range(epochs):
    for i, (x_train, y_train) in enumerate(train_loader):
        optimizer.zero_grad()
        y_pred = model_b(x_train)
        loss = loss_fn(y_pred, y_train)
        loss.backward()
        optimizer.step()

        loss_history.append(loss.item())
        _, predicted = torch.max(y_pred.data, 1)
        accuracy_batch = (predicted == y_train).sum().item() / y_train.size(0)
        accuracy_history.append(accuracy_batch)

        if i % 100 == 0:
            print(
                f'Epoch {epoch + 1}/{epochs}, Batch {i}, Loss: {loss.item():.4f}, Accuracy: {accuracy_batch * 100:.2f}%')

print("训练完成")

# 显示loss和准确率的历史数据
plt.figure()
plt.plot(loss_history, label='Loss')
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.legend()
plt.show()

plt.figure()
plt.plot(accuracy_history, label='Accuracy')
plt.xlabel('Iteration')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

# 对测试集进行评估
model_b.eval()
correct = 0
total = 0
class_correct = [0 for _ in range(10)]
class_total = [0 for _ in range(10)]

with torch.no_grad():
    for images, labels in test_loader:
        outputs = model_b(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        for label, prediction in zip(labels, predicted):
            if label == prediction:
                class_correct[label] += 1
            class_total[label] += 1

print('测试集上的总体准确率: {:.2f}%'.format(100 * correct / total))

# 输出每个数字类别的准确率
print("\n每个数字类别的准确率:")
for i in range(10):
    if class_total[i] > 0:
        class_accuracy = 100 * class_correct[i] / class_total[i]
        print(f'数字 {i}: 准确率 = {class_accuracy:.2f}%')
    else:
        print(f'数字 {i}: 没有样本')


# 按x坐标对轮廓排序,可保证最终识别顺序跟图片上一致（按照数字左边界来排序）
contours = sorted(contours, key=lambda c: cv.boundingRect(c)[0])

# Ensure the final image size is 28x28
digit_images = []
for contour in contours:
    x, y, w, h = cv.boundingRect(contour)
    digit_image = binary_image[y:y+h, x:x+w]

    # 创建28x28的空白图像
    img_padded = np.zeros((28, 28), dtype=np.uint8)

    # 计算缩放因子
    scale = min(20 / w, 20 / h)
    new_w = int(w * scale)
    new_h = int(h * scale)

    # 确保数字在中心
    digit_image = cv.resize(digit_image, (new_w, new_h), interpolation=cv.INTER_AREA)
    start_x = (28 - new_w) // 2
    start_y = (28 - new_h) // 2
    img_padded[start_y:start_y + new_h, start_x:start_x + new_w] = digit_image

    # 归一化图像
    digit_image = img_padded.astype(np.float32) / 255.0
    digit_images.append(digit_image)

# 显示分割后的数字图像
for idx, digit_image in enumerate(digit_images):
    plt.subplot(1, len(digit_images), idx + 1)
    plt.imshow(digit_image, cmap='gray')
    plt.axis('off')
plt.show()

# 确保模型处于评估模式
model_b.eval()

# 将数字图像转换为张量，并添加批次维度
digit_tensors = [torch.tensor(digit_image, dtype=torch.float32).unsqueeze(0).unsqueeze(0) for digit_image in digit_images]

# 预测每个数字并存储结果
predicted_digits = []
with torch.no_grad():
    for digit_tensor in digit_tensors:
        output = model_b(digit_tensor)
        _, predicted = torch.max(output, 1)
        predicted_digits.append(predicted.item())

print("预测的数字:", predicted_digits)

2.卷积神经网络（准确率高）

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data as data_utils
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import torchvision.utils as utils
import matplotlib.pyplot as plt
import numpy as np
import cv2 as cv

# 加载图像并进行预处理
image_path = "D:\\pythonProject8\\dataset\\number1\\number03.png"  # 修改为你的图像路径
image = cv.imread(image_path, cv.IMREAD_GRAYSCALE)

# 高斯模糊
image = cv.GaussianBlur(image, (5, 5), 0)

# 二值化图像
_, binary_image = cv.threshold(image, 0, 255, cv.THRESH_BINARY_INV + cv.THRESH_OTSU)

# 查找轮廓
contours, _ = cv.findContours(binary_image, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
# 如果图片中没有数字，cv.findContours 将会返回一个空的 contours 列表。

# 添加检查条件
if not contours:
    print("无数字")
    exit()

# 数据增强
transform = transforms.Compose([
    transforms.RandomRotation(10),
    transforms.ToTensor(),
])

# 下载并准备MNIST数据集
train_data = datasets.MNIST(root="mnist",
                            train=True,
                            transform=transform,
                            download=True)

test_data = datasets.MNIST(root="mnist",
                           train=False,
                           transform=transforms.ToTensor(),
                           download=True)

# DataLoader for batch processing
train_loader = data_utils.DataLoader(dataset=train_data,
                                     batch_size=64,
                                     shuffle=True)

test_loader = data_utils.DataLoader(dataset=test_data,
                                    batch_size=64,
                                    shuffle=False)
# shuffle 是问是否要打乱顺序

# 显示一个批次的图像
imgs, labels = next(iter(train_loader))
images = utils.make_grid(imgs)
images = images.numpy().transpose(1, 2, 0)

plt.imshow(images, cmap='gray')
plt.show()

# 定义卷积神经网络模型
class NetC(nn.Module):
    def __init__(self):
        super(NetC, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2(x), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)
model_c = NetC()
print(model_c)

# 定义loss函数和优化器
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model_c.parameters(), lr=0.01, momentum=0.9)

# 模型训练
epochs = 5
loss_history = []
accuracy_history = []

for epoch in range(epochs):
    for i, (x_train, y_train) in enumerate(train_loader):
        optimizer.zero_grad()
        y_pred = model_c(x_train)
        loss = loss_fn(y_pred, y_train)
        loss.backward()
        optimizer.step()

        loss_history.append(loss.item())
        _, predicted = torch.max(y_pred.data, 1)
        accuracy_batch = (predicted == y_train).sum().item() / y_train.size(0)
        accuracy_history.append(accuracy_batch)

        if i % 100 == 0:
            print(f'Epoch {epoch+1}/{epochs}, Batch {i}, Loss: {loss.item():.4f}, Accuracy: {accuracy_batch*100:.2f}%')

print("训练完成")

# 显示loss和准确率的历史数据
plt.figure()
plt.plot(loss_history, label='Loss')
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.legend()
plt.show()

plt.figure()
plt.plot(accuracy_history, label='Accuracy')
plt.xlabel('Iteration')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

# 对测试集进行评估
model_c.eval()
correct = 0
total = 0
class_correct = [0 for _ in range(10)]
class_total = [0 for _ in range(10)]

with torch.no_grad():
    for images, labels in test_loader:
        outputs = model_c(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        for label, prediction in zip(labels, predicted):
            if label == prediction:
                class_correct[label] += 1
            class_total[label] += 1

print('测试集上的总体准确率: {:.2f}%'.format(100 * correct / total))

# 输出每个数字类别的准确率
print("\n每个数字类别的准确率:")
for i in range(10):
    if class_total[i] > 0:
        class_accuracy = 100 * class_correct[i] / class_total[i]
        print(f'数字 {i}: 准确率 = {class_accuracy:.2f}%')
    else:
        print(f'数字 {i}: 没有样本')


# 按x坐标对轮廓排序
contours = sorted(contours, key=lambda c: cv.boundingRect(c)[0])

# Ensure the final image size is 28x28
digit_images = []
for contour in contours:
    x, y, w, h = cv.boundingRect(contour)
    digit_image = binary_image[y:y+h, x:x+w]
    # 如果 contours 列表为空，这个循环体将不会执行。因此，digit_images 列表将会是空的。
    # 创建28x28的空白图像
    img_padded = np.zeros((28, 28), dtype=np.uint8)

    # 计算缩放因子
    scale = min(20 / w, 20 / h)
    new_w = int(w * scale)
    new_h = int(h * scale)

    # 确保数字在中心
    digit_image = cv.resize(digit_image, (new_w, new_h), interpolation=cv.INTER_AREA)
    start_x = (28 - new_w) // 2
    start_y = (28 - new_h) // 2
    img_padded[start_y:start_y + new_h, start_x:start_x + new_w] = digit_image

    # 归一化图像
    digit_image = img_padded.astype(np.float32) / 255.0
    digit_images.append(digit_image)

# 显示分割后的数字图像
for idx, digit_image in enumerate(digit_images):
    plt.subplot(1, len(digit_images), idx + 1)
    plt.imshow(digit_image, cmap='gray')
    plt.axis('off')
plt.show()

# 确保模型处于评估模式
model_c.eval()

# 将数字图像转换为张量，并添加批次维度
digit_tensors = [torch.tensor(digit_image, dtype=torch.float32).unsqueeze(0).unsqueeze(0) for digit_image in digit_images]

# 预测每个数字并存储结果
predicted_digits = []
with torch.no_grad():
    for digit_tensor in digit_tensors:
        output = model_c(digit_tensor)
        _, predicted = torch.max(output, 1)
        predicted_digits.append(predicted.item())

print("预测的数字:", predicted_digits)

以下为识别结果：