Python - 深度学习系列3-图像区域标注及抠图

最新推荐文章于 2024-08-16 13:28:49 发布

yukai08008

最新推荐文章于 2024-08-16 13:28:49 发布

阅读量3.4k

点赞数 2

分类专栏：未整理文章标签： python

本文链接：https://blog.csdn.net/yukai08008/article/details/108256554

版权

未整理专栏收录该内容

9 篇文章 1 订阅

订阅专栏

说明

一种最简单的标注是用文件名给某张图片标注。这里主要讨论的是给图像进行区域的划分和标注后，把这部分的图像和标注拉下来保存成单独的图片。想象在图片上拉一个小矩形，给这个矩阵打了类别的文本，最后我们根据这个矩形和文本把图扣下来单独的保存。
介绍的内容：

1 VOC标注格式和工具
2 Pytorch的Dataset格式
3 对应的转换脚本

进行目标识别通常需要coco和voc两种数据集格式，本文主要讨论voc格式的使用。

coco 数据集。COCO 的全称是Common Objects in COntext，是微软团队提供的一个可以用来进行图像识别的数据集。MS COCO数据集中的图像分为训练、验证和测试集。COCO通过在Flickr上搜索80个对象类别和各种场景类型来收集图像，其使用了亚马逊的Mechanical Turk（AMT）
voc 数据集。PASCAL的全称是Pattern Analysis, Statistical Modelling and Computational Learning。VOC的全称是Visual Object Classes。第一届PASCAL VOC举办于2005年，然后每年一届，于2012年终止。

关于coco的详细介绍
 关于voc的详细介绍

1 VOC格式

YOLOV3训练自己的数据集PyTorch版本这篇文章介绍了如何使用工具打标，以及获得的文件格式。通常来说LabelImg在windows和linux下安装比较容易，在Mac下可以参考LabelImg 图片标注工具 for Mac,但是保存似乎总会出错(不想在mac上装了)。
LabelImg在ubuntu下的使用
在这里插入图片描述
标注后的XML文件

<annotation>
	<folder>Desktop</folder>
	<filename>BloodImage_00000.jpg</filename>
	<path>/Users/xxx/Desktop/BloodImage_00000.jpg</path>
	<source>
		<database>Unknown</database>
	</source>
	<size>
		<width>640</width>
		<height>480</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>cell</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>200</xmin>
			<ymin>337</ymin>
			<xmax>304</xmax>
			<ymax>446</ymax>
		</bndbox>
	</object>

尝试根据一张图片的xml进行抠图, 从voc2007中取一张图及其xml。
图片：
在这里插入图片描述
xml文件：这个图片里面标记的物体主要是椅子(chair)，在object标签下面。

<annotation>
	<folder>VOC2007</folder>
	<filename>000005.jpg</filename>
	<source>
		<database>The VOC2007 Database</database>
		<annotation>PASCAL VOC2007</annotation>
		<image>flickr</image>
		<flickrid>325991873</flickrid>
	</source>
	<owner>
		<flickrid>archintent louisville</flickrid>
		<name>?</name>
	</owner>
	<size>
		<width>500</width>
		<height>375</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>chair</name>
		<pose>Rear</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>263</xmin>
			<ymin>211</ymin>
			<xmax>324</xmax>
			<ymax>339</ymax>
		</bndbox>
	</object>
	<object>
		<name>chair</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>165</xmin>
			<ymin>264</ymin>
			<xmax>253</xmax>
			<ymax>372</ymax>
		</bndbox>
	</object>
	<object>
		<name>chair</name>
		<pose>Unspecified</pose>
		<truncated>1</truncated>
		<difficult>1</difficult>
		<bndbox>
			<xmin>5</xmin>
			<ymin>244</ymin>
			<xmax>67</xmax>
			<ymax>374</ymax>
		</bndbox>
	</object>
	<object>
		<name>chair</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>241</xmin>
			<ymin>194</ymin>
			<xmax>295</xmax>
			<ymax>299</ymax>
		</bndbox>
	</object>
	<object>
		<name>chair</name>
		<pose>Unspecified</pose>
		<truncated>1</truncated>
		<difficult>1</difficult>
		<bndbox>
			<xmin>277</xmin>
			<ymin>186</ymin>
			<xmax>312</xmax>
			<ymax>220</ymax>
		</bndbox>
	</object>
</annotation>

我们要抠图，并不需要这么多信息，只需要类别和坐标就可以，因此要先提取一下。打算使用Image.crop函数，对应的参数如下

crop(left,top,right,bottom)

left:距离左边边框的位置,

top:距离顶部的位置；

right:距离左边的位置，需要比left大，

bottom,距离顶部的位置，需要比top大

因为图片规范的原因，xmin,ymin, xmax, ymax并不是我们理解的左下方作为原点的相对坐标，而是左上方作为原点。因此crop函数需要我们按xmin, ymin ,xmax, ymax的方式取数，不需要再做别的变换。

from PIL import Image
import xml.etree.ElementTree as ET
import os
from os import listdir, getcwd
from os.path import join


classes = ['chair']
# 读取xml文件
def voc_xml_extract(xml_fpath, txt_fpath, classes):
    # 一次读入xml的ElementTree
    with open(xml_fpath) as f:
        tree = ET.parse(f)
        root = tree.getroot()
        size = root.find('size')
        w = int(size.find('width').text)
        h = int(size.find('height').text)

    # 循环的将标记目标存入输出文件
    with open(txt_fpath, 'w') as f:
        for obj in root.iter('object'):
            difficult = obj.find('difficult').text
            clsname = obj.find('name').text
            if clsname not in classes or int(difficult) == 1:
                continue
            cls_id = classes.index(clsname)
            xmlbox = obj.find('bndbox')
            b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),float(xmlbox.find('ymax').text))
            bb = (b[0] ,b[2],b[1],b[3] )
            f.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
    return True 

voc_xml_extract('pic1.xml', 'pic1.txt' ,classes=classes)

经过这个函数处理后，生成了一个txt文件pic1.txt

0 263.0 211.0 324.0 339.0
0 165.0 264.0 253.0 372.0
0 241.0 194.0 295.0 299.0

通过读取txt中的类别信息和位置信息，生成三张新的图：

# 读取txt文件
with open('pic1.txt', 'r') as f:
    lines = f.readlines()
lines1 = [line.replace('\n','').split() for line in lines]
lines2 = []
for line in lines1:
    classname = classes[int(line[0])]
    print(line[1:])
    xywh = [int(float(x)) for x in line[1:]]
    tem_res = [classname, xywh]
    lines2.append(tem_res)

# img.crop
# left：与左边界的距离
# up：与上边界的距离
# right：还是与左边界的距离
# below：还是与上边界的距离
# 简而言之就是，左上右下。
with open('pic1.jpg', 'rb') as f:
    img = Image.open(f)
    for i in range(len(lines2)):
        img_crop = img.crop(lines2[i][1])
        img_crop.save(lines2[i][0]+str(i) +'.jpg')

第一张椅子的抠图(chair1.jpg)：
在这里插入图片描述
第二张(chair2.jpg)：

第三张(chair3.jpg)：

2 Pytorch的数据格式

在我的另一篇介绍人脸对比的文章里Python - 深度学习系列2-人脸比对 Siamese提到的数据格式：
├── LICENSE
├── README.md
├── Siamese-networks-medium.ipynb 函数都在这里
├── conda-env.yml conda的配置文件，不用conda应该不用理会
├── data 数据
│ └── faces 人脸数据集
│ ├── testing 测试数据集
│ │ ├── s5 某个样本(s5)的10张图片
│ │ │ ├── 1.pgm
│ │ │ ├── 10.pgm
│ │ │ ├── 2.pgm
│ │ │ ├── 3.pgm
│ │ │ ├── 4.pgm
│ │ │ ├── 5.pgm
│ │ │ ├── 6.pgm
│ │ │ ├── 7.pgm
│ │ │ ├── 8.pgm
│ │ │ └── 9.pgm
│ └── training 训练数据集
│ ├── README
│ ├── s1 某个样本(s1)的10张图片
│ │ ├── 1.pgm
│ │ ├── 10.pgm
│ │ ├── 2.pgm
│ │ ├── 3.pgm
│ │ ├── 4.pgm
│ │ ├── 5.pgm
│ │ ├── 6.pgm
│ │ ├── 7.pgm
│ │ ├── 8.pgm
│ │ └── 9.pgm
├── my_model_all.pth - 原项目没有，这个是我保存的
└── my_model_param.pth - 原项目没有，这个是我保存的

可以考虑从voc图片的类别中提取一些图片，按照 voc.train和voc.test的路径放到data文件夹下。
所以以下：

设置好生成的目标位置和文件名处理函数（含数字顺序叠加）
读取voc.train 下的类别及个数，准备每类抽取10个图(主要是因为训练太慢)，抠下来后转为pgm存储
对人脸识别的模型进行微调，训练voc的图片集，比较相似性

3 转换脚本

转换的脚本如下

import xml.etree.ElementTree as ET
import os
from os import listdir, getcwd
from os.path import join
from PIL import Image

# 读取xml文件
def voc_xml_extract(xml_fpath, txt_fpath, classes):
    # 一次读入xml的ElementTree
    with open(xml_fpath) as f:
        tree = ET.parse(f)
        root = tree.getroot()
        size = root.find('size')
        w = int(size.find('width').text)
        h = int(size.find('height').text)

    # 循环的将标记目标存入输出文件
    with open(txt_fpath, 'w') as f:
        for obj in root.iter('object'):
            difficult = obj.find('difficult').text
            clsname = obj.find('name').text
            if clsname not in classes or int(difficult) == 1:
                continue
            cls_id = classes.index(clsname)
            xmlbox = obj.find('bndbox')
            b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(
                xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
            bb = (b[0], b[2], b[1], b[3])
            f.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
    return True


classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
# 1 源路径
## 1.1 xml路径
source_xml = '你的路径/VOC2007/train/Annotations/'
## 1.2 img路径
source_img = '你的路径/VOC2007/train/JPEGImages/'
# 2 保存路径
## 2.1 voc txt 路径
voc_txt = './voctxt/'
if not os.path.exists(voc_txt):
    os.makedirs(voc_txt)
# 3 目标路径
target_train = '你的路径/data/voc/train/'
target_test = '你的路径/data/voc/test/'
if not os.path.exists(target_train):
    os.makedirs(target_train)
if not os.path.exists(target_test):
    os.makedirs(target_test)


# -- 获取所有类别的列表
# 获取文件清单
filelist = [x for x in os.listdir(source_xml) if x.endswith('.xml')]

for xml in filelist:
    xmlfpath =  source_xml + xml 
    voc_txtfpath = voc_txt + xml.replace('.xml', '.txt')
    voc_xml_extract(xmlfpath, voc_txtfpath, classes=classes)

# txt文件列表
txtlist = [x for x in os.listdir(voc_txt) if x.endswith('.txt')]

lines2 = []
for txt in txtlist:
    txt_file = voc_txt + txt 
    # 读取txt文件
    with open(txt_file, 'r') as f:
        lines = f.readlines()
    lines1 = [line.replace('\n', '').split() for line in lines]
    # lines2 = []
    for line in lines1:
        classname = classes[int(line[0])]
        print(line[1:])
        xywh = [int(float(x)) for x in line[1:]]
        tem_res = [classname, xywh, txt.replace('.txt', '.jpg')]
        lines2.append(tem_res)

import pandas as pd 
df = pd.DataFrame(lines2, columns =['cate', 'coordinate', 'filename'])

df.head()
# 每类拿10个
df1 = df.groupby(['cate']).head(10)
# 增加排序，方法1
df1['ord'] = df1.groupby(['cate']).cumcount() + 1
df1.sort_values(['cate', 'ord'])

import random 
# 随机选取三类测试
random.seed(123)
test_set = random.sample(classes,3)
# train_set = set(classes) - set(test_set)
for i in range(len(df1)):
    tem_dict = dict(df1.iloc[i])
    # 根据序号存图片名称
    pic_name = str(tem_dict['ord'])+'.pgm'
    source_img_fpath = source_img + tem_dict['filename']
    if tem_dict['cate'] in test_set:
        # 如果是随机选中的测试类别，存在测试文件夹
        save_path = target_test + tem_dict['cate'] + '/'
    else:
        save_path = target_train + tem_dict['cate'] + '/'
    if not os.path.exists(save_path):
        os.makedirs(save_path)
    # 读取图片，转为灰度图后存储
    with open(source_img_fpath ,'rb') as f :
        img = Image.open(f)
        img_crop = img.crop(tem_dict['coordinate'])
        img_crop.convert('L').save(save_path + pic_name)