数据集来源:http://download.tensorflow.org/example_images/flower_photos.tgz
命名
命名前
代码
import os
train_image_path=r'D:\Projects\DeepLearning\Dataset\flower_photos\train'
flower_class=[flower for flower in os.listdir(train_image_path)]
num=0
for flower in flower_class:
flower_path=train_image_path+'\\'+flower
for file in os.listdir(flower_path):
os.rename(os.path.join(flower_path,file),os.path.join(flower_path, str(num)+'.jpg'))
num+=1
命名后
参考1
参考2
遇到问题
改名后发现继承Dataset类,重写__len__ 和__getitem__时,使用__getitem__时,发现遍历顺序有问题。
如果用循环给文件命名,则文件名就会是1,2,3…,10,11,12,13…,100,101…,但是遍历这些文件时,顺序就会变成1,10,100,101,…109,11,…,19,…,2,20,200,…,那么如何按1,2,3,4…的顺序遍历呢?
解决方法:
可以把原来的文件重命名,在原文件名前面加上0,例如00001,00002,…,这样就可以按顺序遍历了。
注: 这里我先将图片名命名为0,1,2…,没有 前面的花名,以便于操作。(实际上是菜)
代码
import os
train_image_path=r'D:\Projects\DeepLearning\Dataset\flower_photos\train'
flower_class=[flower for flower in os.listdir(train_image_path)]
for flower in flower_class:
flower_path=train_image_path+'\\'+flower
for file in os.listdir(flower_path):
name = file.split('.')[0]
os.rename(os.path.join(flower_path,file),os.path.join(flower_path, '%05d' % int(name)+'.jpg'))
#‘%05d’表示一共5位数
现在遍历就是按照顺序了,2就是00002.jpg,不再是10.jpg了。
参考
整合
import os
def ReNameFlower(image_path,flower_class):
num=0
for flower in flower_class:
flower_path=image_path+'\\'+flower
for file in os.listdir(flower_path):
os.rename(os.path.join(flower_path,file),os.path.join(flower_path, str(num)+'.jpg'))
num+=1
for flower in flower_class:
flower_path=image_path+'\\'+flower
for file in os.listdir(flower_path):
name = file.split('.')[0]
os.rename(os.path.join(flower_path,file),os.path.join(flower_path, '%05d'%int(name)+'.jpg'))
def main():
train_image_path = r'D:\Projects\DeepLearning\Dataset\flower_photos\train'
train_flower_class = [flower for flower in os.listdir(train_image_path)]
ReNameFlower(train_image_path,train_flower_class)
val_image_path = r'D:\Projects\DeepLearning\Dataset\flower_photos\val'
val_flower_class = [flower for flower in os.listdir(val_image_path)]
ReNameFlower(val_image_path,val_flower_class)
if __name__=='__main__':
main()