Large-scale CelebFaces Attributes (CelebA) Dataset
CelebA是一个大规模的人脸属性数据集,包含202,599 张人脸图像,每张图像有 40 个属性标签。
数据集下载衔接:https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
图片样例
40个属性:
5_o_Clock_Shadow:刚长出的双颊胡须
Arched_Eyebrows:柳叶眉
Attractive:吸引人的
Bags_Under_Eyes:眼袋
Bald:秃头
Bangs:刘海
Big_Lips:大嘴唇
Big_Nose:大鼻子
Black_Hair:黑发
Blond_Hair:金发
Blurry:模糊的
Brown_Hair:棕发
Bushy_Eyebrows:浓眉
Chubby:圆胖的
Double_Chin:双下巴
Eyeglasses:眼镜
Goatee:山羊胡子
Gray_Hair:灰发或白发
Heavy_Makeup:浓妆
High_Cheekbones:高颧骨
Male:男性
Mouth_Slightly_Open:微微张开嘴巴
Mustache:胡子,髭
Narrow_Eyes:细长的眼睛
No_Beard:无胡子
Oval_Face:椭圆形的脸
Pale_Skin:苍白的皮肤
Pointy_Nose:尖鼻子
Receding_Hairline:发际线后移
Rosy_Cheeks:红润的双颊
Sideburns:连鬓胡子
Smiling:微笑
Straight_Hair:直发
Wavy_Hair:卷发
Wearing_Earrings:戴着耳环
Wearing_Hat:戴着帽子
Wearing_Lipstick:涂了唇膏
Wearing_Necklace:戴着项链
Wearing_Necktie:戴着领带
Young:年轻人
划分训练集、验证集、测试集
import os
train_list = []
val_list = []
test_list = []
with open("Anno/list_eval_partition.txt") as f:
for line in f.readlines():
row = line.strip().split(" ")
img_name = row[0]
label = row[-1]
if label == "0":
train_list.append(img_name)
if label == "1":
val_list.append(img_name)
if label == "2":
test_list.append(img_name)
with open("Anno/list_attr_celeba.txt", encoding="utf-8") as f:
for line in f.readlines():
row = line.strip().split(" ")
row = [i.replace("-1", "0") for i in row if i != ""]
if len(row) == 41:
img_name = row[0]
if img_name in train_list:
with open("train_attr_list.txt", "a", encoding="utf-8") as ff:
ff.writelines(" ".join(row) + "\n")
if img_name in val_list:
with open("val_attr_list.txt", "a", encoding="utf-8") as ff:
ff.writelines(" ".join(row) + "\n")
if img_name in test_list:
with open("test_attr_list.txt", "a", encoding="utf-8") as ff:
ff.writelines(" ".join(row) + "\n")
温馨提示:
如果您使用pytorch,并且你的电脑可以上网(有些公司出于安全考虑,其服务器是不可以上网的),那么使用这个数据集就非常简单啦!直接如下代码,相信你一定可以看得懂。
# pytoch已经将CelebA数据集加载的代码封装成了API,直接几行代码搞定
celeba_data = torchvision.datasets.CelebA('path/to/celeba_root/')
data_loader = torch.utils.data.DataLoader(celeba_data,
batch_size=32,
shuffle=True,
num_workers=8)