深度学习中经常会遇到要从好几类图片中随机挑选部分数据训练或者测试的问题:
举个栗子:现有训练的标签文件train.txt,里面包含3类图片,现在要在这3类中每个类别随机选取10张
train.txt的内容比如:
…
/media/data/train/1.jpg 1
/media/data/train/88.jpg 2
/media/data/train/677.png 0
…
方法1:
code:
import random
from tqdm import tqdm
file = open('train.txt','r')
lines = file.readlines()
A = []
B = []
C = []
for line in tqdm(lines):
content = line.strip().split()
label = content[-1]
if label == '0':
A.append(line)
if label == '1':
B.append(line)
if label == '2':
C.append(line)
A1 = random.sample(A,10)
B1 = random.sample(B,10)
C1 = random.sample(C,10)
output = A1 + B1 + C1
random.shuffle(output)
for line in output:
with open('test.txt', 'a+') as f:
f.write(line)
方法2:
code:
import random
list_files = {}
i = 0
with open('train.txt','r') as f:
for line in f.readlines():
i += 1
line = line.strip().split()
filename = line[0]
cls = line[1]
if cls not in list_files.keys():
list_files[cls]=[]
list_files[cls].append(filename)
res_list = {}
for key,val in list_files.items():
res_val = random.sample(val,10)
res_list[key] = res_val
# 将list转换成字符串
a = ','.join(res_list[key])
roots = a.split(',')
for root in iter(roots):
with open('test.txt', 'a+') as f:
f.write(root + ' '+ key + '\n')