imdb-wiki数据集划分及处理、训练

最新推荐文章于 2025-03-09 17:22:41 发布

Tninaiwohe

最新推荐文章于 2025-03-09 17:22:41 发布

阅读量2.4k

点赞数

分类专栏：机器学习&深度学习文章标签：深度学习 python pytorch

本文链接：https://blog.csdn.net/Tninaiwohe/article/details/122994999

版权

机器学习&深度学习专栏收录该内容

3 篇文章

订阅专栏

这篇博客讲述了如何使用Python进行数据集处理，特别是针对人脸识别数据集中年龄和性别的筛选。作者首先从wiki数据集中按年龄段分类图片，然后由于某些年龄段图片不足，从IMDB数据集中提取相应年龄段的图片。在处理过程中，发现部分图片的年龄标签与实际人脸年龄不符，导致数据质量问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

人脸年龄与性别识别数据集处理

使用python批量复制系统文件到指定位置

借鉴程序：

import os
import shutil
path_img='读取图片的路径'
ls = os.listdir(path_img)
print(len(ls))
for i in ls:
 if i.find('查找的关键词')!=-1:
 shutil.move(path_img+'/'+i,"输出保存的路径"+i)

实例：

import os
import shutil
path_img='C:\\Users\\chriszhang\\Desktop\\gender\\test'
ls = os.listdir(path_img)
print(len(ls))
for i in ls:
 if i.find('testnan')!=-1:
 shutil.move(path_img+'/'+i,"C:/Users/chriszhang/Desktop/male/"+i)

问题描述

我遇到的情况：深度学习数据集需要挑选不同年龄的图片

开源的wiki数据集：如下，00-99共100个文件夹，外加一个.mat标签文件。

解决方法:

对各个文件夹内的图片名遍历，筛选出8类不同年龄的图片
代码： 只按年龄分类了，没有分男女。

import os
import shutil

for j in range(0, 100):
    path_img = 'D:/pythoncode/PycharmProjects/CNN_SE_ELM Project/dataset_new'  # 读取图片的路径
    img_set = str(j).rjust(2, '0')  # 生成00-99，赋给img_set
    path_img = path_img + '/' + img_set  # 最低一级路径
    ls = os.listdir(path_img)  # 该路径下所有文件名 构成的列表
    print(len(ls))
    for i in range(len(ls)):  # 遍历文件名构成的列表ls[0],ls[1].....
        filename = ls[i]
        birth_year = int(filename[-19:-15])
        birth_month = int(filename[-14:-12])
        time = int(filename[-8:-4])
        age = time - birth_year  # 计算年龄
        if birth_month < 7:  # assume the photo was taken in the middle of the year
            pass
        else:
            age = age - 1
        path_destination = 'D:/pythoncode/PycharmProjects/CNN_SE_ELM Project/dataset_16000/25-32men'
        if age in range(25, 33):  # 25-32岁
            shutil.copy(path_img + '/' + filename, path_destination)

结果：
在这里插入图片描述

遇到的问题

wiki数据集图片的年龄为0-2，4-6,8-13的图片太少，所以从IMDB中提取一部分。
代码如下：

# -*- coding:utf-8 -*-
"""
作者：ASUS
日期：2022年02月17日
"""
import os
import shutil
# wiki数据集的0-2，4-6,8-13的图片太少，所以从IMDB中提取一部分
for j in range(0, 100):
    path_img = 'D:/pythoncode/PycharmProjects/CNN_SE_ELM Project/imdb_crop'  # 读取图片的路径
    img_set = str(j).rjust(2, '0')  # 生成00-99，赋给img_set
    path_img = path_img + '/' + img_set  # 最低一级路径
    ls = os.listdir(path_img)  # 该路径下所有文件名 构成的列表
    print(len(ls))
    for i in range(len(ls)):  # 遍历文件名构成的列表ls[0],ls[1].....
        filename = ls[i]
        birth = filename.split('_')[2]  # 出生日期，包括年月日，如'1955-1-6'
        birth_year = int(birth.split('-')[0])  # 出生年，如1955
        birth_month = int(birth.split('-')[1])  # 出生月，如1
        time = int(filename[-8:-4])
        age = time - birth_year  # 计算年龄
        if birth_month < 7:  # assume the photo was taken in the middle of the year
            pass
        else:
            age = age - 1
        path_destination = 'D:/pythoncode/PycharmProjects/CNN_SE_ELM Project/dataset_16000/8-13men'
        if age in range(8, 14):  # 8-13岁
            shutil.copy(path_img + '/' + filename, path_destination)