CeleA是香港中文大学的开放数据,包含10177个名人身份的202599张图片,并且都做好了特征标记,这对人脸相关的训练是非常好用的数据集。网盘链接
数据包含了三个文件夹,一个描述文档如下:
img文件夹下有两个压缩包
img_align_celeba.zip & img_align_celeba_png.7z
我选择下载的是
img_align_celeba.zip
解压后的内容是包含202599张图片,如下
Anno文件夹下有个文档identity_CelebA,部分内容如下:
000001.jpg 2880
000002.jpg 2937
000003.jpg 8692
000004.jpg 5805
000005.jpg 9295
000006.jpg 4153
000007.jpg 9040
000008.jpg 6369
000009.jpg 3332
000010.jpg 612
此文档是10,177个名人身份标识,每张图片后面的数字即是该图片对应的标签;
下面我们利用这两个文档处理这个数据集:
首先我们利用dlib这个库做人脸检测,将人脸框出并保存下来,代码如下:
import dlib
import cv2
import os
# \B4\AB\C8\EB\B5\C4\C3\FC\C1\EE\D0в\CE\CA\FD
def read_txt_file(file):
inde=[]
with open(file,'r') as f:
lines=f.readlines()
for line in lines:
items=line.split(' ')
inde.append(items[0])
return inde
def face_path(path):
file_paths=[]
file_path=os.listdir(path)
file_path.sort(key=lambda x:int(x[:-4]))
for files in file_path:
paths=path+'/'+files
file_paths.append(paths)
return file_paths
def face_detction():
inde=read_txt_file('/home/zy/PycharmProjects/CelebA/identity_CelebA.txt')
file_path=face_path('/home/zy/PycharmProjects/CelebA/img_align_celeba')
i=1
for f in file_path:
img = cv2.imread(f, cv2.IMREAD_COLOR)
b, g, r = cv2.split(img)
img2 = cv2.merge([r, g, b])
detector = dlib.get_frontal_face_detector()
dets = detector(img, 1)
if len(dets)==0:
print(i)
i = i + 1
print("Number of faces detected: {}".format(len(dets)))
for index, face in enumerate(dets):
print('face {}; left {}; top {}; right {}; bottom {}'.format(index, face.left(), face.top(), face.right(), face.bottom()))
left = face.left()
top = face.top()
right = face.right()
bottom = face.bottom()
# cv2.rectangle(img, (left, top), (right, bottom), (0, 255, 0), 3)
imgs=img[top:bottom,left:right]
cv2.imwrite('/home/zy/PycharmProjects/CelebA/cropdata'+'/'+inde[i],imgs)
i=i+1
cv2.destroyAllWindows()
face_detction()
人脸检测完,你会发现,有的人脸不能检测出来,所以需要根据identity_CelebA文档重新制作一个图片路径,与对应标签文档,代码如下:
import os
import cv2
img_path='/home/zy/PycharmProjects/CelebA/cropdata'
text_file='/home/zy/PycharmProjects/CelebA/identity_CelebA.txt'
file_path=os.listdir(img_path)
file_path.sort(key=lambda x:int(x[:-4]))
def train_path():
with open(text_file,'r') as f:
inde=[]
lines=f.readlines()
print(lines)
for i in file_path:
print(i)
for line in lines:
items = line.split(' ')
if i==items[0]:
img_paths=img_path+'/'+i+" "+items[1]
inde.append(img_paths)
return inde
data_set=train_path()
with open('trainggg_text', "w") as f:
for i in range(len(data_set)):
f.write(data_set[i])
如果想要使数据集变成一个文件夹下为同一个人可以使用如下代码:
with open('./trainggg_text','r') as f:
lines = f.readlines()
print(lines[1])
inde=[]
paths=[]
for i in lines:
i = i.strip('\n')
item = i.split(" ")
paths.append(item[0])
inde.append(item[1])
# print(inde[2])
for j in range(11000):
j = j + 1
print(j)
os.makedirs('./ace/'+str(j)+'/'+str(0))
# path=os.path.join('./ace',os.mkdir(str(j)))
# paths=os.path.join(path,os.mkdir(str(0)))
l=0
for k,element in enumerate(inde):
# print('ss',k)
if j==int(element):
# print('s')
l=l+1
img=cv2.imread(paths[k])
# print(img)
cv2.imwrite('./ace/'+str(j)+'/'+str(0)+'/'+'zy'+str(l)+'.jpg',img)
# cv2.imwrite('./ace/zy'+str(j)+str(l)+'.jpg',img)
# print('dd')
---------------------
作者:益达888
来源:CSDN
原文:https://blog.csdn.net/qq_29023939/article/details/81299178?utm_source=copy
版权声明:本文为博主原创文章,转载请附上博文链接!