深度学习图像去噪项目pytorch实战-第二章“数据集构建”

本文链接：https://blog.csdn.net/qq_41103479/article/details/140615872

本文使用数据集为KODAK24。

在我们的深度学习研究中，往往对于训练集：验证集：测试集的数据比例采用6:2:2，这涉及到交叉验证（Cross validation）的知识。注意：请不要忽略测试集的重要性，因为我发现有部分已发表的论文，以及身边同学在ai研究中，往往忽略了测试集，仅仅以验证集的表现来评价自己的模型算法。这不科学，验证集上获得好的效果不能证明你的创新点具有价值！！有同学就曾因为毕业论文实验没有测试集而差点延毕，还是老师宽大处理，让紧急修改后才顺利毕业。而且很多发表的期刊文章甚至都没有测试集，这些研究没有意义，甚至误人子弟。

首先我们的python项目的目录形式如下：

-denoise #我们的项目名称

-----data

---------train

---------val

------KODAK24

首先需要使用以下库：

import cv2
from sklearn.model_selection import train_test_split
import os

然后我们需要实现从kodak数据集中读取图片并写入到data文件夹中，同时分配好训练与测试集。


def make_set(datafold_name, source_dataset):
    img_list = os.listdir(source_dataset)
    if not os.path.exists(datafold_name):
        os.makedirs(datafold_name)
    traindir = os.path.join(datafold_name, "train")
    testdir = os.path.join(datafold_name, "test")
    if not os.path.exists(traindir):
        os.makedirs(traindir)
    if not os.path.exists(testdir):
        os.makedirs(testdir)

    train, test = train_test_split(img_list, test_size=0.2, random_state=42)
    #这里对数据集进行了打乱，随机分配了训练集与测试集。
    for item in train:
        srcimgpath = os.path.join(source_dataset, item)
        img_preprocess(srcimgpath, write_path=traindir)
    for item in test:
        srcimgpath = os.path.join(source_dataset, item)
        img_preprocess(srcimgpath, write_path=testdir)

img_preprocess是我们定义的对图片进行读写同时预处理的函数，这里统一将图像进行了缩放：


def img_preprocess(img_filepath, write_path):
    _, img_name = os.path.split(img_filepath)
    src_img = cv2.imread(img_filepath)
    resized_img = cv2.resize(src_img, dsize=[256, 256])
    cv2.imwrite(os.path.join(write_path, img_name), resized_img)


if __name__ == '__main__':
    foldname = "data"
    source_dataset = "Kodak24"
    make_set(foldname, source_dataset)

以上为主函数，运行后就可以得到以上的目录格式。