目录
1.数据标注
见《数据集标注》
2.单文件转换
使用conda安装完后,在conda安装目录下有如下几个可执行程序,使用labelme_json_to_dataset.exe可以转换单个json文件,但不能批量转换。
输入以下命令进行单个文件转换:
labelme_json_to_dataset.exe json文件路径
3.数据批处理
在conda安装目录中,进入到如下目录:"你的conda安装路径\Lib\site-packages\labelme\cli\",现将原先的json_to_dataset.py文件备份,将该脚本使用以下代码代替:
import argparse
import base64
import json
import os
import os.path as osp
import PIL.Image
import yaml
from labelme.logger import logger
from labelme import utils
def label(json_file, out_dir, label_name_to_value):
data = json.load(open(json_file))
if data['imageData']:
imageData = data['imageData']
else:
imagePath = os.path.join(os.path.dirname(json_file), data['imagePath'])
with open(imagePath, 'rb') as f:
imageData = f.read()
imageData = base64.b64encode(imageData).decode('utf-8')
img = utils.img_b64_to_arr(imageData)
for shape in sorted(data['shapes'], key=lambda x: x['label']):
label_name = shape['label']
if label_name in label_name_to_value:
label_value = label_name_to_value[label_name]
else:
label_value = len(label_name_to_value)
label_name_to_value[label_name] = label_value
lbl = utils.shapes_to_label(img.shape, data['shapes'], label_name_to_value)
label_names = [None] * (max(label_name_to_value.values()) + 1)
for name, value in label_name_to_value.items():
label_names[value] = name
lbl_viz = utils.draw_label(lbl, img, label_names)
PIL.Image.fromarray(img).save(osp.join(out_dir, 'img.png'))
utils.lblsave(osp.join(out_dir, 'label.png'), lbl)
PIL.Image.fromarray(lbl_viz).save(osp.join(out_dir, 'label_viz.png'))
with open(osp.join(out_dir, 'label_names.txt'), 'w') as f:
for lbl_name in label_names:
f.write(lbl_name + '\n')
logger.warning('info.yaml is being replaced by label_names.txt')
info = dict(label_names=label_names)
with open(osp.join(out_dir, 'info.yaml'), 'w') as f:
yaml.safe_dump(info, f, default_flow_style=False)
logger.info('Saved to: {}'.format(out_dir))
def main():
logger.warning('This script is aimed to demonstrate how to convert the'
'JSON file to a single image dataset, and not to handle'
'multiple JSON files to generate a real-use dataset.')
parser = argparse.ArgumentParser()
parser.add_argument('json_file_dir')
parser.add_argument('-o', '--out', default=None)
args = parser.parse_args()
label_name_to_value = {'_background_': 0}
path = args.json_file_dir
dirs = os.listdir(path)
for json_name in dirs:
if args.out is None:
out_dir = osp.basename(json_name).replace('.', '_')
out_dir = osp.join(osp.dirname(json_name), out_dir)
else:
out_dir = args.out+'/'+osp.basename(json_name).replace('.', '_')
if not osp.exists(out_dir):
os.mkdir(out_dir)
json_file=path+'/'+json_name
label(json_file, out_dir, label_name_to_value)
if __name__ == '__main__':
main()
将标注好的所有数据的json文件都放到一个文件夹中,在cmd中定位到该文件夹,进入conda环境,输入以下命令:
activate labelme
labelme_json_to_dataset json<文件夹>
参考:《制作自己的语义分割数据集》
4.数据集重命名处理
对所有label.png文件重命名,将转换后的所有文件夹统一放到一个文件夹下,我命名为dataset,作为批处理的输入,创建一个新的文件夹,我命名为labelpng,作为输出。批处理程序如下:
import os
import shutil
inputdir = 'E:\\temp\\dataset'
outputdir= 'E:\\temp\\labelpng'
for dir in os.listdir(inputdir):
#设置旧文件名(就是路径+文件名)
oldname = inputdir + os.sep + dir.split('_')[0]+'_'+dir.split('_')[1] + os.sep + 'label.png' # os.sep添加系统分隔符
#设置新文件名
newname = outputdir + os.sep + dir.split('_')[0]+'.png'
shutil.copyfile(oldname, newname) #用os模块中的rename方法对文件改名
print(oldname, '======>', newname)
执行完成后所有的label.png文件就转换成原始图像对应名字的.png文件了。
将所有mask和原始图像放到voc2012对于的文件夹中就可以了。
5. 创建imageSets
使用如下代码将创建好的mask进行数据分割:
from sklearn.model_selection import train_test_split
import os
imagedir = 'E:/temp/segment_dataset/0918/labelpng/'
outdir = 'E:/temp/segment_dataset/0918/imagesets_seg/'
images = []
for file in os.listdir(imagedir):
filename = file.split('.')[0]
images.append(filename)
train, test = train_test_split(images, train_size=0.7, random_state=0)
val, test = train_test_split(test, train_size=0.2/0.3, random_state=0)
with open(outdir+"train.txt", 'w') as f:
f.write('\n'.join(train))
with open(outdir+"val.txt", 'w') as f:
f.write('\n'.join(val))
with open(outdir+"test.txt", 'w') as f:
f.write('\n'.join(test))
with open(outdir+"trainval.txt", 'w') as f:
f.write('\n'.join(images))