fgvc-aircraft-2013b是细粒度图像分类和识别研究中经典的benchmarks,它包含四种类型的标注:
(1)按照manufacturer进行划分,可分为30个类别,例如ATR、Airbus、Antonov、Beechcraft、Boeing。
(2)按照families进行划分,可分为70个类别。
(3)按照variants进行划分,可分为100个类别(一般细粒度图像分类中经常采用的划分标注)
(4)数据集的bounding_box
下面是python实现的fgvc-aircraft-2013b中100类别的训练集和测试后划分代码(供参考)。
我的文件夹目录如下:
其中,images文件夹为存放的10000张飞机图片;
dataset文件夹中包含train、test、trainval、val四个文件夹,分别用来存在划分后的图片。
文件夹30,70,100和bounding_box为上述的4种数据标注文件,分别保存有.txt文件。
# *_*coding: utf-8 *_*
# author --liming--
"""
给定train,test,val的txt文件,分别表示图像以文件夹的形式
"""
import os
import shutil
from PIL import Image
import argparse
path = '/media/lm/1E7FBDC6EEE168BC/fine_grained_dataset/FGVC_Aircraft/fgvc-aircraft-2013b'
image_path = path + '/images/'
save_train_path = path + '/dataset/train/'
save_test_path = path + '/dataset/test/'
save_trainval_path = path + '/dataset/trainval/'
save_val_path = path + '/dataset/val/'
# 读取图像文件夹,获取文件名列表
imgs = os.listdir(image_path)
num = len(imgs)
# 读取txt文件
f_test = open(path + '/100/images_variant_test.txt','r')
f_train = open(path + '/100/images_variant_train.txt','r')
f_trainval = open(path + '/100/images_variant_trainval.txt','r')
f_val = open(path + '/100/images_variant_val.txt','r')
test_list = list(f_test)
train_list = list(f_train)
trainval_list = list(f_trainval)
val_list = list(f_val)
parser = argparse.ArgumentParser(description='Data Split based on Txt')
parser.add_argument('--dataset',
default='test',
help='Select which dataset split, test, train, trainval, or val')
args = parser.parse_args()
# 判断输入图像属于哪一类
print('==> data processing...')
if args.dataset == 'test':
count = 0
for i in range(num):
aaaaa = len(test_list)
bbbbbb = imgs[i][:7]
for j in range(len(test_list)):
if imgs[i][:7] == test_list[j][:7]:
# 获取类别标签
label = test_list[j][8:]
label = label[:-1]
if os.path.isdir(save_test_path + label):
shutil.copy(image_path + imgs[i], save_test_path + label + '/' + imgs[i])
else:
os.makedirs(save_test_path + label)
shutil.copy(image_path + imgs[i], save_test_path+label+'/'+imgs[i])
count += 1
print('第%s张图片属于test类别' % count)
print('Finished!!')
elif args.dataset == 'train':
for i in range(num):
for j in range(len(train_list)):
if imgs[i][:7] == train_list[j][:7]:
print('该图像属于train类别')
# 获取类别标签
label = train_list[j][8:]
label = label[:-1]
if os.path.isdir(save_train_path + label):
shutil.copy(image_path + imgs[i], save_train_path + label + '/' + imgs[i])
else:
os.makedirs(save_train_path + label)
shutil.copy(image_path + imgs[i], save_train_path+label+'/'+imgs[i])
print('Finished!!')
elif args.dataset == 'trainval':
for i in range(num):
for j in range(len(trainval_list)):
if imgs[i][:7] == trainval_list[j][:7]:
print('该图像属于trainval类别')
# 获取类别标签
label = trainval_list[j][8:]
label = label[:-1]
if os.path.isdir(save_trainval_path + label):
shutil.copy(image_path + imgs[i], save_trainval_path + label + '/' + imgs[i])
else:
os.makedirs(save_trainval_path + label)
shutil.copy(image_path + imgs[i], save_trainval_path+label+'/'+imgs[i])
print('Finished!!')
else:
for i in range(num):
for j in range(len(val_list)):
if imgs[i][:7] == val_list[j][:7]:
print('该图像属于val类别')
# 获取类别标签
label = val_list[j][8:]
label = label[:-1]
if os.path.isdir(save_val_path + label):
shutil.copy(image_path + imgs[i], save_val_path + label + '/' + imgs[i])
else:
os.makedirs(save_val_path + label)
shutil.copy(image_path + imgs[i], save_val_path + label + '/' + imgs[i])
print('Finished!!')
划分完毕后,由于飞机文件夹名称问题,F-16A/B会保存为F-16A/B;F/A-18会保存为F/A-18,需要将其截取出来,统一文件名即可。
最后的训练集和测试集划分如下:
(1)测试集(100个类别,共3333张图片)
(2)训练集(100个类别,6667张图片)