图片 http://www.360doc.com/content/19/0412/14/63397480_828281464.shtml
文章目录
数据集介绍
MNIST
下载地址
MNIST是一个手写数字集,由大牛LeCun创立,训练集有60000个样例图片及其标签,测试集有10000个样例图片及其标签,每张样例图片是一个28px*28px灰度图,样例图片及其标签以二进制方式存放在4个文件中,格式如下:
ImageNet
下载地址
ImageNet是一个图像集,由斯坦福大学李飞飞创立,有1400W+张样例图片,分为27大类和2W+小类,只能用于非商业研究和教学使用。与ImageNet图像集相应的是著名的ILSVRC竞赛,各种新机器学习算法脱颖而出(AlexNet、ZFNet、GoogleNet、ResNet、…),图像识别率得以显著提高,在ILSVRC竞赛上一举成名是近几年来计算机视觉从业者的梦想。
PASCAL_VOC
PASCAL VOC挑战赛是视觉对象的分类识别和检测的一个基准测试,提供了检测算法和学习性能的标准图像注释数据集和标准的评估系统。PASCAL VOC图片集包括20个目录:人类;动物(鸟、猫、牛、狗、马、羊);交通工具(飞机、自行车、船、公共汽车、小轿车、摩托车、火车);室内(瓶子、椅子、餐桌、盆栽植物、沙发、电视)。PASCAL VOC挑战赛在2012年后便不再举办,但其数据集图像质量好,标注完备,非常适合用来测试算法性能
COCO
COCO(Common Objects in Context)是一个新的图像识别、分割和图像语义数据集,它有如下特点:
1)Object segmentation
2)Recognition in Context
3)Multiple objects per image
4)More than 300,000 images
5)More than 2 Million instances
6)80 object categories
7)5 captions per image
8)Keypoints on 100,000 people
COCO数据集由微软赞助,其对于图像的标注信息不仅有类别、位置信息,还有对图像的语义文本描述,COCO数据集的开源使得近两三年来图像分割语义理解取得了巨大的进展,也几乎成为了图像语义理解算法性能评价的“标准”数据集。
COCO数据集是微软团队获取的一个可以用来图像recognition+segmentation+captioning 数据集
这个数据集以scene understanding为目标,主要从复杂的日常场景中截取,图像中的目标通过精确的segmentation进行位置的标定。图像包括91类目标,328,000影像和2,500,000个label。
该数据集主要解决3个问题:目标检测,目标之间的上下文关系,目标的2维上的精确定位。
COCO数据集有91类,虽然比ImageNet和SUN类别少,但是每一类的图像多,这有利于获得更多的每类中位于某种特定场景的能力,对比PASCAL VOC,其有更多类和图像。
COCO难度更大,因为coco数据集每张图片中的物体数目很多,所以导致相对别的数据集,该数据集检测的准确率很低
数据集制作
PASCAL_VOC
采集数据及标注
拍完相应JPG照片后,需要进行标注,需要用到LabelImg等软件辅助进行处理
操作快捷键:
Ctrl + u 加载目录中的所有图像,鼠标点击Open dir同功能
Ctrl + r 更改默认注释目标目录(xml文件保存的地址)
Ctrl + s 保存
Ctrl + d 复制当前标签和矩形框
space 将当前图像标记为已验证
w 创建一个矩形框
d 下一张图片
a 上一张图片
del 删除选定的矩形框
Ctrl++ 放大
Ctrl– 缩小
↑→↓← 键盘箭头移动选定的矩形框
生成的xml文件即为每张图片对应的标注文件
文件结构
该目录可以选择手动创建,也可以自行代码创建,然后将JPG和XML文件移动成相应格式
- dataset
- VOCdevkit2007
- VOC2007
- Annotations (上述标注文件,*.xml)
- JPEGImages(图像文件,*.jpg)
- ImageSets(txt文件代码可以生成)
- Main
- test.txt
- train.txt
- val.txt
- trianval.txt
xml文件内容
<annotation>
<folder>xxx</folder>
<filename>xxx.jpg</filename>
<path>D:\xxx.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>1000</width>
<height>600</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>xxx</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>xxx</xmin>
<ymin>xxx</ymin>
<xmax>xxx</xmax>
<ymax>xxx</ymax>
</bndbox>
</object>
</annotation>
附实用代码:
批量重命名JPG文件代码
import os
import glob
import shutil
# 目录名称,你要自己修改
dir = "the location of your own picture"
file_name = os.listdir(dir)
# print(file_name)
n=1
for file in file_name:
pic_name = os.listdir(dir+file)
print(pic_name)
for pic in pic_name:
oldname = dir+file+"/"+pic
newname = dir+file+"/" + str(n).zfill(6) + ".jpg"
os.rename(oldname, newname)
n = n + 1
print(oldname, '--->', newname)
XML生成代码
特殊数据集(csv)文件标注的,可以用代码生成xml文件
import os,xml,codecs
from xml.dom.minidom import Document
img= # 图片的名称不加jpg后缀
width= # 图片的宽
heigh= # 图片的高
classname= # 目标的名称
nxmin= # xmin
nymin= # ymin
nxmax= # xmax
nymax= # ymax
xml_dir= # 保存路径
doc = Document()
ann = doc.createElement("annotation")
doc.appendChild(ann)
folder = doc.createElement("folder")
ann.appendChild(folder)
cf = doc.createTextNode("VOC2007")
folder.appendChild(cf)
filename = doc.createElement("filename")
cn = doc.createTextNode(str(img+'.jpg'))
filename.appendChild(cn)
ann.appendChild(filename)
size = doc.createElement("size")
w= doc.createElement("width")
cw = doc.createTextNode(str(width))
w.appendChild(cw)
h= doc.createElement("height")
ch = doc.createTextNode(str(height))
h.appendChild(ch)
d= doc.createElement("depth")
cd = doc.createTextNode(str(3))
d.appendChild(cd)
size.appendChild(w)
size.appendChild(h)
size.appendChild(d)
ann.appendChild(size)
obj= doc.createElement("object")
ann.appendChild(obj)
name= doc.createElement("name")
obj.appendChild(name)
cname = doc.createTextNode(str(classname))
name.appendChild(cname)
pose= doc.createElement("pose")
obj.appendChild(pose)
cuns = doc.createTextNode("Unspecified")
pose.appendChild(cuns)
truncated= doc.createElement("truncated")
obj.appendChild(truncated)
ctru = doc.createTextNode(str(0))
truncated.appendChild(ctru)
difficult= doc.createElement("difficult")
obj.appendChild(difficult)
cdif = doc.createTextNode(str(0))
difficult.appendChild(cdif)
bndbox = doc.createElement("bndbox")
xmin= doc.createElement("xmin")
cxmin = doc.createTextNode(str(nxmin))
xmin.appendChild(cxmin)
ymin= doc.createElement("ymin")
cymin = doc.createTextNode(str(nymin))
ymin.appendChild(cymin)
xmax= doc.createElement("xmax")
cxmax = doc.createTextNode(str(nxmax))
xmax.appendChild(cxmax)
ymax= doc.createElement("ymax")
cymax = doc.createTextNode(str(nymax))
ymax.appendChild(cymax)
bndbox.appendChild(xmin)
bndbox.appendChild(ymin)
bndbox.appendChild(xmax)
bndbox.appendChild(ymax)
obj.appendChild(bndbox)
f = codecs.open(xml_dir + '/' + img + '.xml','w','utf-8')
doc.writexml(f,addindent = ' ',newl='\n',encoding = 'utf-8')
f.close()
txt生成代码
import os
import random
trainval_percent = 0.66 # 训练集和验证集总占比
train_percent = 0.5 # 训练集在trainval中的占比
# 最终计算train需要将以上两个比值相乘
xmlfilepath = 'Annotations'
txtsavepath = 'ImageSets\Main'
total_xml = os.listdir(xmlfilepath)
num=len(total_xml)
list=range(num)
tv=int(num*trainval_percent)
tr=int(tv*train_percent)
trainval= random.sample(list,tv)
train=random.sample(trainval,tr)
ftrainval = open('ImageSets/Main/trainval.txt', 'w')
ftest = open('ImageSets/Main/test.txt', 'w')
ftrain = open('ImageSets/Main/train.txt', 'w')
fval = open('ImageSets/Main/val.txt', 'w')
for i in list:
name=total_xml[i][:-4]+'\n'
if i in trainval:
ftrainval.write(name)
if i in train:
ftrain.write(name)
else:
fval.write(name)
else:
ftest.write(name)
ftrainval.close()
ftrain.close()
fval.close()
ftest .close()
COCO
文件结构
VOC2007
|_ JPEGImages
| |_ <im-1-name>.jpg
| |_ ...
| |_ <im-N-name>.jpg
|_ annotations
| |_ pascal_trainval2007.json
| |_ ...
|_ VOCdevkit2007
VOCdevkit2007文件夹是voc2007官方数据集解压后文件,需根据自己的数据集目标识别种类修改:
VOCopts.classes={... '标签1' '标签2' '标签3' '标签4'};
这里附上一个牛面部识别的数据集转换的python代码,仅做参考
# -*- coding:utf-8 -*-
# @Author: frothmoon
# @Description: xml转换到coco数据集json格式
import os, sys, json
from collections import OrderedDict
from xml.etree.ElementTree import ElementTree, Element
txt_Path="F:\\COW\\datasets\\VOCdevkit2007\\VOC2007\\ImageSets\\Mains\\"
XML_PATH = "F:\\COW\\datasets\\VOCdevkit2007\\VOC2007\\Annotations\\"
JSON_PATH = "F:\\COW\\datasets\\VOCdevkit2007\\VOC2007\\JsonAnnotations\\"
txt_Name=["train","test","trainval","val"]
def read_xml(in_path):
'''读取并解析xml文件'''
tree = ElementTree()
tree.parse(in_path)
return tree
def if_match(node, kv_map):
'''判断某个节点是否包含所有传入参数属性
node: 节点
kv_map: 属性及属性值组成的map'''
for key in kv_map:
if node.get(key) != kv_map.get(key):
return False
return True
def get_node_by_keyvalue(nodelist, kv_map):
'''根据属性及属性值定位符合的节点,返回节点
nodelist: 节点列表
kv_map: 匹配属性及属性值map'''
result_nodes = []
for node in nodelist:
if if_match(node, kv_map):
result_nodes.append(node)
return result_nodes
def find_nodes(tree, path):
'''查找某个路径匹配的所有节点
tree: xml树
path: 节点路径'''
return tree.findall(path)
print ("-----------------Start------------------")
for div in txt_Name:
f = open(txt_Path+div+".txt")
lines = f.readlines()
f.close()
mol_list = []
sorttxt=[]
for line in lines:
line = line.strip('\n')
line=line.replace('Cow_','')
linediv=line.split("_")
linediv[0]=int(linediv[0])
linediv[1]=int(linediv[1])
linediv=tuple(linediv)
sorttxt.append(linediv)
sorttxt=sorted(sorttxt,key=lambda x: (x[0], x[1]))
print(sorttxt)
for i in sorttxt:
add="Cow_"+str(i[0])+"_"+str(i[1])+".xml"
mol_list.append(add)
if not os.path.exists(XML_PATH+div):
os.mkdir(XML_PATH+div)
for mol in mol_list:
molpath=XML_PATH+mol
copyto=XML_PATH+div
# print mol_path
cmd = "move %s %s" % (molpath,copyto)
os.system(cmd)
json_obj = {}
images = []
annotations = []
categories = []
categories_list = []
annotation_id = 1
xml_names = []
for xml in os.listdir(copyto):
xml_names.append(xml)
for xml in xml_names:
tree = read_xml(copyto+ "\\" + xml)
object_nodes = get_node_by_keyvalue(find_nodes(tree, "object"), {})
if len(object_nodes) == 0:
print (xml, "no object")
continue
else:
image = OrderedDict()
file_name = os.path.splitext(xml)[0];
para1 = file_name + ".jpg"
height_nodes = get_node_by_keyvalue(find_nodes(tree, "size/height"), {})
para2 = int(height_nodes[0].text)
width_nodes = get_node_by_keyvalue(find_nodes(tree, "size/width"), {})
para3 = int(width_nodes[0].text)
fname=file_name[4:]
para4 = int(fname)
for f,i in [("file_name",para1),("height",para2),("width",para3),("id",para4)]:
image.setdefault(f,i)
images.append(image) #构建images
# image = {}
# file_name = os.path.splitext(xml)[0]; # 文件名
# image["file_name"] = file_name + ".jpg"
# width_nodes = get_node_by_keyvalue(find_nodes(tree, "size/width"), {})
# image["width"] = int(width_nodes[0].text)
# height_nodes = get_node_by_keyvalue(find_nodes(tree, "size/height"), {})
# image["height"] = int(height_nodes[0].text)
# fname=file_name[4:]
# image["id"] = int(fname)
# images.append(image) #构建images
name_nodes = get_node_by_keyvalue(find_nodes(tree, "object/name"), {})
xmin_nodes = get_node_by_keyvalue(find_nodes(tree, "object/bndbox/xmin"), {})
ymin_nodes = get_node_by_keyvalue(find_nodes(tree, "object/bndbox/ymin"), {})
xmax_nodes = get_node_by_keyvalue(find_nodes(tree, "object/bndbox/xmax"), {})
ymax_nodes = get_node_by_keyvalue(find_nodes(tree, "object/bndbox/ymax"), {})
# print ymax_nodes
for index, node in enumerate(object_nodes):
annotation = {}
segmentation = []
bbox = []
seg_coordinate = [] #坐标
seg_coordinate.append(int(xmin_nodes[index].text))
seg_coordinate.append(int(ymin_nodes[index].text))
seg_coordinate.append(int(xmin_nodes[index].text))
seg_coordinate.append(int(ymax_nodes[index].text))
seg_coordinate.append(int(xmax_nodes[index].text))
seg_coordinate.append(int(ymax_nodes[index].text))
seg_coordinate.append(int(xmax_nodes[index].text))
seg_coordinate.append(int(ymin_nodes[index].text))
segmentation.append(seg_coordinate)
width = int(xmax_nodes[index].text) - int(xmin_nodes[index].text)
height = int(ymax_nodes[index].text) - int(ymin_nodes[index].text)
area = width * height
bbox.append(int(xmin_nodes[index].text))
bbox.append(int(ymin_nodes[index].text))
bbox.append(width)
bbox.append(height)
annotation["segmentation"] = segmentation
annotation["area"] = area
annotation["iscrowd"] = 0
fname=file_name[4:]
annotation["image_id"] = int(fname)
annotation["bbox"] = bbox
cate=name_nodes[index].text
if cate=='head':
category_id=1
elif cate=='eye':
category_id=2
elif cate=='nose':
category_id=3
annotation["category_id"] = category_id
annotation["id"] = annotation_id
annotation_id += 1
annotation["ignore"] = 0
annotations.append(annotation)
if category_id in categories_list:
pass
else:
categories_list.append(category_id)
categorie = {}
categorie["supercategory"] = "none"
categorie["id"] = category_id
categorie["name"] = name_nodes[index].text
categories.append(categorie)
json_obj["images"] = images
json_obj["type"] = "instances"
json_obj["annotations"] = annotations
json_obj["categories"] = categories
f = open(JSON_PATH+div+".json", "w")
#json.dump(json_obj, f)
json_str = json.dumps(json_obj)
f.write(json_str)
print ("------------------End-------------------")
数据集扩充处理
切割
同样参考牛的数据集,当一张图片过大,小目标提取到的特征会很小,检测效率需要提高,切割采用下图方法
切割代码,可根据实际情况调整阈值等
# -*- coding: utf-8 -*-
"""
@author: frothmoon
"""
import os,xml,codecs
from xml.dom.minidom import Document
import glob
import cv2
import time
image_path="F://COW//datasets//VOCdevkit2007//img"
xml_path="F://COW//datasets//VOCdevkit2007//ann"
image_outdir="F://COW//datasets//VOCdevkit2007//imgdiv"
xml_outdir="F://COW//datasets//VOCdevkit2007//anndiv"
def slice_im(image_name, sliceHeight=416, sliceWidth=416,
zero_frac_thresh=0.2, overlap=0.2, verbose=False):
'''Slice large satellite image into smaller pieces,
ignore slices with a percentage null greater then zero_fract_thresh
Assume three bands!'''
ext = '.jpg'
image0 = cv2.imread(image_path+"//"+image_name+ext, 1) # color
win_h, win_w = image0.shape[:2]
# if slice sizes are large than image, pad the edges
pad = 0
if sliceHeight > win_h:
pad = sliceHeight - win_h
if sliceWidth > win_w:
pad = max(pad, sliceWidth - win_w)
# pad the edge of the image with black pixels
if pad > 0:
border_color = (0,0,0)
image0 = cv2.copyMakeBorder(image0, pad, pad, pad, pad,
cv2.BORDER_CONSTANT, value=border_color)
win_size = sliceHeight*sliceWidth
t0 = time.time()
n_ims = 0
n_ims_nonull = 0
dx = int((1. - overlap) * sliceWidth)
dy = int((1. - overlap) * sliceHeight)
for y0 in range(0, image0.shape[0], dy):#sliceHeight):
for x0 in range(0, image0.shape[1], dx):#sliceWidth):
n_ims += 1
# make sure we don't have a tiny image on the edge
if y0+sliceHeight > image0.shape[0]:
y = image0.shape[0] - sliceHeight
else:
y = y0
if x0+sliceWidth > image0.shape[1]:
x = image0.shape[1] - sliceWidth
else:
x = x0
# extract image
window_c = image0[y:y + sliceHeight, x:x + sliceWidth]
# get black and white image
window = cv2.cvtColor(window_c, cv2.COLOR_BGR2GRAY)
# find threshold that's not black
# https://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_thresholding/py_thresholding.html?highlight=threshold
ret,thresh1 = cv2.threshold(window, 2, 255, cv2.THRESH_BINARY)
non_zero_counts = cv2.countNonZero(thresh1)
zero_counts = win_size - non_zero_counts
zero_frac = float(zero_counts) / win_size
#print "zero_frac", zero_fra
# skip if image is mostly empty
if zero_frac >= zero_frac_thresh:
if verbose:
print ("Zero frac too high at:", zero_frac)
continue
# else save
else:
out_name=image_name+"_"+str(n_ims)
slice_xml(image_name, out_name, x, y, x + sliceWidth-1, y + sliceHeight-1)
outpath = os.path.join(image_outdir, out_name + ext)
if verbose:
print ("outpath:", outpath)
cv2.imwrite(outpath, window_c)
n_ims_nonull += 1
print ("Num slices:", n_ims, "Num non-null slices:", n_ims_nonull, \
"sliceHeight", sliceHeight, "sliceWidth", sliceWidth)
print ("Time to slice", image_path+"//"+image_name, time.time()-t0, "seconds")
return
def isOverlap(xmin,ymin,xmax,ymax,bxmin,bymin,bxmax,bymax):
if xmax>bxmin and bxmax>xmin and ymax>bymin and bymax>ymin:
return True
else:
return False
def objbbx(xmin,ymin,xmax,ymax,bxmin,bymin,bxmax,bymax):
rxmin=max(xmin,bxmin)
rymin=max(ymin,bymin)
rxmax=min(xmax,bxmax)
rymax=min(ymax,bymax)
return rxmin,rymin,rxmax,rymax
def slice_xml(imname, outname, imxmin, imymin, imxmax, imymax):
if os.path.exists(xml_path+ '//' + imname +'.xml'):
dom = xml.dom.minidom.parse(xml_path + '//' + imname + '.xml')
root = dom.documentElement
objs= root.getElementsByTagName("object")
bbx=[[0 for col in range(5)] for row in range(len(objs))]
i=0
for obj in objs:
bbx[i][0]=str(obj.getElementsByTagName("name")[0].childNodes[0].nodeValue.strip())
bbx[i][1]=int(obj.getElementsByTagName("xmin")[0].childNodes[0].nodeValue.strip())
bbx[i][2]=int(obj.getElementsByTagName("ymin")[0].childNodes[0].nodeValue.strip())
bbx[i][3]=int(obj.getElementsByTagName("xmax")[0].childNodes[0].nodeValue.strip())
bbx[i][4]=int(obj.getElementsByTagName("ymax")[0].childNodes[0].nodeValue.strip())
i=i+1
doc=Document()
ann=doc.createElement("annotation")
doc.appendChild(ann)
folder=root.getElementsByTagName("folder")[0]
ann.appendChild(folder)
filename=doc.createElement("filename")
name=doc.createTextNode(outname+".jpg")
filename.appendChild(name)
ann.appendChild(filename)
path=doc.createElement("path")
cpath=doc.createTextNode(image_outdir+"//"+outname+".jpg")
path.appendChild(cpath)
ann.appendChild(path)
size=doc.createElement("size")
w=doc.createElement("width")
cw=doc.createTextNode(str(imxmax-imxmin+1))
w.appendChild(cw)
h=doc.createElement("height")
ch=doc.createTextNode(str(imymax-imymin+1))
h.appendChild(ch)
d=doc.createElement("depth")
cd=doc.createTextNode(str(3))
d.appendChild(cd)
size.appendChild(w)
size.appendChild(h)
size.appendChild(d)
ann.appendChild(size)
for i in range(0,len(objs)):
if isOverlap(imxmin,imymin,imxmax,imymax,bbx[i][1],bbx[i][2],bbx[i][3],bbx[i][4]):
res=objbbx(imxmin,imymin,imxmax,imymax,bbx[i][1],bbx[i][2],bbx[i][3],bbx[i][4])
objecttag=doc.createElement("object")
ann.appendChild(objecttag)
name=doc.createElement("name")
objecttag.appendChild(name)
cname=doc.createTextNode(bbx[i][0])
name.appendChild(cname)
pose= doc.createElement("pose")
objecttag.appendChild(pose)
cuns=doc.createTextNode("Unspecified")
pose.appendChild(cuns)
truncated=doc.createElement("truncated")
objecttag.appendChild(truncated)
ctru=doc.createTextNode(str(0))
truncated.appendChild(ctru)
difficult=doc.createElement("difficult")
objecttag.appendChild(difficult)
cdif=doc.createTextNode(str(0))
difficult.appendChild(cdif)
bndbox=doc.createElement("bndbox")
xmin=doc.createElement("xmin")
cxmin=doc.createTextNode(str(res[0]-imxmin))
xmin.appendChild(cxmin)
ymin=doc.createElement("ymin")
cymin=doc.createTextNode(str(res[1]-imymin))
ymin.appendChild(cymin)
xmax=doc.createElement("xmax")
cxmax=doc.createTextNode(str(res[2]-imxmin))
xmax.appendChild(cxmax)
ymax=doc.createElement("ymax")
cymax=doc.createTextNode(str(res[3]-imymin))
ymax.appendChild(cymax)
bndbox.appendChild(xmin)
bndbox.appendChild(ymin)
bndbox.appendChild(xmax)
bndbox.appendChild(ymax)
objecttag.appendChild(bndbox)
f=codecs.open(xml_outdir+"//"+outname+'.xml','w','utf-8')
doc.writexml(f,addindent = ' ',newl='\n',encoding = 'utf-8')
if __name__=="__main__":
img_Lists = glob.glob(image_path + '//*.jpg')
img_basenames = []
for item in img_Lists:
img_basenames.append(os.path.basename(item))
img_names = []
for item in img_basenames:
temp1, temp2 = os.path.splitext(item)
img_names.append(temp1)
for item in img_names:
slice_im(item)
其他
对图片进行其他处理,添加噪声,改变对比度,甚至翻转等等变化
github上有很多处理图片的库
参考链接
https://www.jianshu.com/p/9990284bc4d5?from=singlemessage
https://blog.csdn.net/meccaendless/article/details/79457330