icdar2015为八个角坐标格式,为了训练需要左上角和右下角坐标格式,而且去掉标注为###的框,由于我是训练检测,所以全部标注为text类
测试集标注
需要创建好Test_GT文件夹
for i in range(1,501):
txts = 'Challenge4_Test_Task1_GT/gt_img_' + str(i) + '.txt'
w_txts = 'Test_GT/gt_img_' + str(i) + '.txt'
lines = open(txts,encoding='utf-8-sig').readlines()
txt_file = open(w_txts,'w')
for line in lines:
numbers = line.split(',')
x1 = min(numbers[0],numbers[2],numbers[4],numbers[6])
x2 = max(numbers[0],numbers[2],numbers[4],numbers[6])
y1 = min(numbers[1],numbers[3],numbers[5],numbers[7])
y2 = max(numbers[1],numbers[3],numbers[5],numbers[7])
if '###' not in numbers[8]:
txt_file.write(x1 + ' ' + y1 + ' ' + x2 + ' ' + y2 + ' ' + 'text' + '\n')
训练集标注
需要创建好Training_GT文件夹
for i in range(1,501):
txts = 'ch4_training_localization_transcription_gt/gt_img_' + str(i) + '.txt'
w_txts = 'Training_GT/gt_img_' + str(i) + '.txt'
lines = open(txts,encoding='utf-8-sig').readlines()
txt_file = open(w_txts,'w')
for line in lines:
numbers = line.split(',')
x1 = min(numbers[0],numbers[2],numbers[4],numbers[6])
x2 = max(numbers[0],numbers[2],numbers[4],numbers[6])
y1 = min(numbers[1],numbers[3],numbers[5],numbers[7])
y2 = max(numbers[1],numbers[3],numbers[5],numbers[7])
if '###' not in numbers[8]:
txt_file.write(x1 + ' ' + y1 + ' ' + x2 + ' ' + y2 + ' ' + 'text' + '\n')
3879

被折叠的 条评论
为什么被折叠?



