目录
1.ICDAR2015数据集
(1)下载
数据集资料找了很久,最后还是在csdn上下载的,下载地址:
mahttps://download.csdn.net/download/moonshapedpool/10645292
我没有积分,花了两元在淘宝买的代下.解压之后三个文件夹
(2)内容与格式
训练图像集:ch4_training_images
训练标注集:ch4_training_localization_transcription_gt
测试图像集:ch4_test_images
其中,ICDAR2015不包含测试标注集,但提供了测试web接口。因此,这里只对训练集进行了转换。
标注格式:x1,y1,x2,y2,x3,y3,x4,y4,text
其中,x1,y1为左上角坐标,x2,y2为右上角坐标,x3,y3为右下角坐标,x4,y4为左下角坐标。‘###’表示text难以辨认。
2.文件夹准备
新建文件夹VOC2007,并且在下面新建Annotations,ImageSets,JPEGImages文件夹,然后在ImageSets新建Main文件夹,目录如下:
VOC2007
-VOC2007/Annotations
-VOC2007/ImageSets
-VOC2007/ImageSets/Main
-VOC2007/JPEGImages
3.python实现
(1)在pycharm中新建项目,基于python2,新建一个python文件,然后利用File->settings安装依赖包:
备注:此处遇到一个问题,我电脑是win10,64位的,直接安装PIL会失败,然后百度出来的解决方案是安装Pillow-PIL.
参考博客:
https://blog.csdn.net/weixin_39837709/article/details/79829428
(2)在新建的python文件中添加以下代码,
# ! /usr/bin/python
# coding:utf-8
import os, sys
import glob
from PIL import Image
import cv2
import numpy as np
# target dir
base_dir = "E:/RuiJie/py-faster-rcnn/VOC2007"
target_img_dir = base_dir + "/" + "JPEGImages/"
target_ann_dir = base_dir + "/" + "Annotations/"
target_set_dir = base_dir + "/" + "ImageSets/"
# source train dir
train_img_dir = "E:/RuiJie/py-faster-rcnn/ICDAR2015data/ch4_training_images/"
train_txt_dir = "E:/RuiJie/py-faster-rcnn/ICDAR2015data/ch4_training_localization_transcription_gt/"
test_img_dir = "E:/RuiJie/py-faster-rcnn/ICDAR2015data/ch4_test_images"
# rename and move img to target_img_dir
# train img
for file in os.listdir(train_img_dir):
os.rename(os.path.join(train_img_dir, file),
os.path.join(target_img_dir, "ICDAR2015_Train_" + os.path.basename(file)))
for file in os.listdir(test_img_dir):
os.rename(os.path.join(test_img_dir, file),
os.path.join(target_img_dir, "ICDAR2015_Test_" + os.path.basename(file)))
gt_list = []
img_list = []
for file_name in os.listdir(target_img_dir):
img_list.append(file_name)
for idx in range(len(img_list)):
img_name = target_img_dir + img_list[idx]
gt_name = train_txt_dir + 'gt_img_' + img_list[idx].split('.')[0].split('_')[3] + '.txt'
# print gt_name
gt_obj = open(gt_name, 'r')
gt_txt = gt_obj.read()
gt_split = gt_txt.split('\n')
img = cv2.imread(img_name)
im = Image.open(img_name)
imgwidth, imgheight = im.size
# write in xml file
xml_file = open((target_ann_dir + img_list[idx].split('.')[0] + '.xml'), 'w')
xml_file.write('<annotation>\n')
xml_file.write(' <folder>VOC2007</folder>\n')
xml_file.write(' <filename>' + img_list[idx] + '</filename>\n')
xml_file.write(' <size>\n')
xml_file.write(' <width>' + str(imgwidth) + '</width>\n')
xml_file.write(' <height>' + str(imgheight) + '</height>\n')
xml_file.write(' <depth>3</depth>\n')
xml_file.write(' </size>\n')
f = False
difficult = 0
for gt_line in open(gt_name):
gt_ind = gt_line.split(',')
if len(gt_ind) > 3:
gt_ind[0] = filter(str.isdigit, gt_ind[0])
pt1 = (int(gt_ind[0]), int(gt_ind[1]))
pt2 = (int(gt_ind[2]), int(gt_ind[3]))
pt3 = (int(gt_ind[4]), int(gt_ind[5]))
pt4 = (int(gt_ind[6]), int(gt_ind[7]))
dtxt = gt_ind[8]
if "###" in dtxt:
difficult = 1
else:
difficult = 0
edge1 = np.sqrt((pt1[0] - pt2[0]) * (pt1[0] - pt2[0]) + (pt1[1] - pt2[1]) * (pt1[1] - pt2[1]))
edge2 = np.sqrt((pt2[0] - pt3[0]) * (pt2[0] - pt3[0]) + (pt2[1] - pt3[1]) * (pt2[1] - pt3[1]))
angle = 0
if edge1 > edge2:
width = edge1
height = edge2
if pt1[0] - pt2[0] != 0:
angle = -np.arctan(float(pt1[1] - pt2[1]) / float(pt1[0] - pt2[0])) / 3.1415926 * 180
else:
angle = 90.0
elif edge2 >= edge1:
width = edge2
height = edge1
# print pt2[0], pt3[0]
if pt2[0] - pt3[0] != 0:
angle = -np.arctan(float(pt2[1] - pt3[1]) / float(pt2[0] - pt3[0])) / 3.1415926 * 180
else:
angle = 90.0
if angle < -45.0:
angle = angle + 180
x_ctr = float(pt1[0] + pt3[0]) / 2 # pt1[0] + np.abs(float(pt1[0] - pt3[0])) / 2
y_ctr = float(pt1[1] + pt3[1]) / 2 # pt1[1] + np.abs(float(pt1[1] - pt3[1])) / 2
# write the region of text on xml file
xml_file.write(' <object>\n')
xml_file.write(' <name>text</name>\n')
xml_file.write(' <pose>Unspecified</pose>\n')
xml_file.write(' <truncated>0</truncated>\n')
xml_file.write(' <difficult>' + str(difficult) + '</difficult>\n')
xml_file.write(' <bndbox>\n')
xml_file.write(' <x>' + str(x_ctr) + '</x>\n')
xml_file.write(' <y>' + str(y_ctr) + '</y>\n')
xml_file.write(' <w>' + str(width) + '</w>\n')
xml_file.write(' <h>' + str(height) + '</h>\n')
xml_file.write(' <theta>' + str(angle) + '</theta>\n')
xml_file.write(' </bndbox>\n')
xml_file.write(' </object>\n')
xml_file.write('</annotation>')
# write info into target_set_dir
img_lists = glob.glob(target_ann_dir + '/*.xml')
img_names = []
for item in img_lists:
temp1, temp2 = os.path.splitext(os.path.basename(item))
img_names.append(temp1)
train_fd = open(target_set_dir + "/Main/trainval.txt", 'w')
for item in img_names:
train_fd.write(str(item) + '\n')
注意:修改路径
base_dir为自己电脑中存放之前新建的VOC2007的路径
train_img_dir,train_txt_dir,test_img_dir分别为自己电脑中存放IDCAR2015三个文件夹的路径
4.转换之后的结果
VOC2007\Annotations下有1500个xml文件
VOC2007\ImageSets\Main下有1个trainval.txt文件
E:\RuiJie\py-faster-rcnn\VOC2007\JPEGImages下有1500张图片
转换之后的格式
5.参考博客:
https://blog.csdn.net/u013250416/article/details/78821877