数据清洗https://tianchi.aliyun.com/forum/postDetail?spm=5176.12586969.1002.21.125b13e2xpCMec&postId=87373
处理后图片为3370张
训练集和验证集划分
一个形象的比喻:
训练集-----------学生的课本;学生 根据课本里的内容来掌握知识。
验证集------------作业,通过作业可以知道 不同学生学习情况、进步的速度快慢。
测试集-----------考试,考的题是平常都没有见过,考察学生举一反三的能力。
传统上,一般三者切分的比例是:6:2:2,验证集并不是必须的。
json2txt.py文件:将清洗后的coco格式数据集转化为txt格式数据集
9/28
删除6 7 8类进行训练。map提高7.53%
加入FPN
代码地址:https://github.com/guoruoqian/FPN_Pytorch
制作voc数据集:将原始coco数据集转化为voc数据集格式;#数据集中删除不需要的类#;使用现成的xml:https://tianchi.aliyun.com/forum/postDetail?spm=5176.12586969.1002.36.125b13e2pXM9Nl&postId=86731;https://blog.csdn.net/qq_35153620/article/details/101902502
;根据xml制作voc数据集:https://www.cnblogs.com/tianxxl/p/10893285.html
[0 Background, 1 CapPoSun, 2 CapBianXing, 3 CapHuaiBian, 4 CapDaXuan, 5 CapDuanDian, 6 LabelWaiXie, 7 LabelQiZhou, 8 LabelQiPao, 9 CodeZhengChang, 10 CodeYiChang, ]
voc数据集拆分为训练集和测试集:https://blog.csdn.net/qq_41627642/article/details/104954331?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522163512422416780366515873%2522%252C%2522scm%2522%253A%252220140713.130102334…%2522%257D&request_id=163512422416780366515873&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2alltop_positive~default-1-104954331.pc_search_result_control_group&utm_term=FPN_Tensorflow&spm=1018.2226.3001.4187#t5
# -*- coding: utf-8 -*-
from __future__ import division, print_function, absolute_import
import sys
sys.path.append('../../')
import shutil
import os
import random
import math
def mkdir(path):
if not os.path.exists(path):
os.makedirs(path)
divide_rate = 0.8
#root_path = '/mnt/ExtraDisk/yangxue/data_ship_clean'
root_path="D:/Python base/Test2/FPN_Tensorflow-master" ##注释 修改成我们自己的主路径
#image_path = root_path + '/VOCdevkit/JPEGImages'
image_path = root_path + "/data/VOC/VOC_test/VOC2007/JPEGImages/" ##注释 修改成图像存放的主路径
xml_path = root_path + "/data/VOC/VOC_test/VOC2007/Annotations/" ##注释 修改成图像标注的存放的主路径
image_list = os.listdir(image_path)
image_name = [n.split('.')[0] for n in image_list]
random.shuffle(image_name)
train_image = image_name[:int(math.ceil(len(image_name)) * divide_rate)]
test_image = image_name[int(math.ceil(len(image_name)) * divide_rate):]
image_output_train = os.path.join(root_path, 'VOCdevkit_train/JPEGImages') ##注释 输出的train影像的路径
mkdir(image_output_train)
image_output_test = os.path.join(root_path, 'VOCdevkit_test/JPEGImages')#注释 输出的test影像的路径
mkdir(image_output_test)
xml_train = os.path.join(root_path, 'VOCdevkit_train/Annotations')##注释 输出的train影像的标注路径
mkdir(xml_train)
xml_test = os.path.join(root_path, 'VOCdevkit_test/Annotations')##注释 输出的test影像的标注路径
mkdir(xml_test)
count = 0
for i in train_image:
shutil.copy(os.path.join(image_path, i + '.jpg'), image_output_train) ##影像数据格式.jpg
shutil.copy(os.path.join(xml_path, i + '.xml'), xml_train)
if count % 1000 == 0:
print("process step {}".format(count))
count += 1
for i in test_image:
shutil.copy(os.path.join(image_path, i + '.jpg'), image_output_test)
shutil.copy(os.path.join(xml_path, i + '.xml'), xml_test)
if count % 1000 == 0:
print("process step {}".format(count))
count += 1
10/28 瓶盖断点ap为0.55,较之前0.22有较大提高
4-0.99 10-0.82 9-0.94 2-0.57 3-0.318 1-0.32 5-0.55
10/29
Background
CapPoSun
CapBianXing
CapHuaiBian
CapDaXuan
CapDuanDian
LabelWaiXie
LabelQiZhou
LabelQiPao
CodeZhengChang
CodeYiChang
删除标签:Background、LabelWaiXie、LabelQiZhou、LabelQiPao
删除前各类数目:
删除后各类数目:
将新的数据集放入训练
处理voc数据集
删除指定类别标签(https://blog.csdn.net/qq_35153620/article/details/101902502)
github项目地址(附有使用说明书):
https://github.com/A-mockingbird/VOCtype-datasetOperation
统计每个类别实际目标的个数(https://blog.csdn.net/DD_PP_JJ/article/details/102772793#comments_12179946)
import os
import os.path
from xml.etree.ElementTree import parse, Element
def changeName(xml_fold, origin_name, new_name):
&