tensorflow2.3.0图像定位（二）

最新推荐文章于 2022-02-27 09:55:35 发布

wchwdog13

最新推荐文章于 2022-02-27 09:55:35 发布

阅读量477

点赞数 1

分类专栏：实际案例文章标签：人工智能 python tensorflow 深度学习

本文链接：https://blog.csdn.net/wchwdog13/article/details/111742592

版权

实际案例专栏收录该内容

17 篇文章 7 订阅

订阅专栏

紧接tensorflow2.3.0图像定位（一）的内容，本部分主要对读取数据集，并对数据进行一定的清洗和预处理。

一、读取并处理图片数据

读取所有的图片，并输出前五个文件目录。

images = sorted(glob.glob('/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/images/*.jpg'))
images[:5]

输出为：

['/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/images/Abyssinian_1.jpg',
 '/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/images/Abyssinian_10.jpg',
 '/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/images/Abyssinian_100.jpg',
 '/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/images/Abyssinian_101.jpg',
 '/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/images/Abyssinian_102.jpg']

取出图片名称，并输出前三个。

images_names = [x.split('/')[-1].split('.jpg')[0] for x in images]
images_names[:3]

输出结果为['Abyssinian_1', 'Abyssinian_10', 'Abyssinian_100']

二、读取并处理注释文件（xml）数据

读取所有的xml注释文件，并输出前五个文件目录。

xmls = sorted(glob.glob('/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/annotations/xmls/*.xml'))
xmls[:5]

输出为：

['/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/annotations/xmls/Abyssinian_1.xml',
 '/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/annotations/xmls/Abyssinian_10.xml',
 '/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/annotations/xmls/Abyssinian_100.xml',
 '/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/annotations/xmls/Abyssinian_101.xml',
 '/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/annotations/xmls/Abyssinian_102.xml']

取出图片名称，并输出前三个。

xmls_names = [x.split('/')[-1].split('.xml')[0] for x in xmls]
xmls_names[:3]

输出结果为['Abyssinian_1', 'Abyssinian_10', 'Abyssinian_100']

三、数据清洗

为什么要进行清洗，因为图片的个数和注释文件的个数不一致，不一致就肯定无法完美的一一对应。那么从哪里看出个数不一致的？很简单，查看一下长度就知道了：

len(images_names)

输出7390

len(xmls_names)

输出3686

可见，一个是7390，一个是3686，个数不一致。这是因为有的图片没有注释文件，那么我取图像数据集和注释文件数据集的交集。

names = sorted(list(set(images_names)&set(xmls_names)))
len(names)

输出3686

可见，names就是交集。

但是names只是图片（或注释文件）的文件名，但是我不仅需要文件名，还需要文件的目录。接下来就names获取文件的目录+文件名。

获取图片的目录+文件名

imgs = [img for img in images if img.split('/')[-1].split('.jpg')[0] in names]
imgs.sort(key=lambda x: x.split('\\')[-1].split('.jpg')[0])
len(imgs)

输出3686，说明没错。

查看前三张图片目录

imgs[:3]

输出

['/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/images/Abyssinian_1.jpg',
 '/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/images/Abyssinian_10.jpg',
 '/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/images/Abyssinian_100.jpg']

说明没错。

获取注释文件的目录+文件名

xmls_ = [xml for xml in xmls if xml.split('/')[-1].split('.xml')[0] in names]
len(xmls_)

输出3686，说明没错。

查看前三张注释文件目录

xmls_[:3]

输出

['/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/annotations/xmls/Abyssinian_1.xml',
 '/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/annotations/xmls/Abyssinian_10.xml',
 '/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/annotations/xmls/Abyssinian_100.xml']

说明没错。

四、预处理

4.1 获取目标在图像中的相对位置

定义函数，该函数用于获取目标在图像中相对位置。

def to_labels(path):
    xml = open(r'{}'.format(path)).read()
    sel = etree.HTML(xml)
    width = int(sel.xpath('//size/width/text()')[0])
    height = int(sel.xpath('//size/height/text()')[0])
    xmin = int(sel.xpath('//bndbox/xmin/text()')[0])
    ymin = int(sel.xpath('//bndbox/ymin/text()')[0])
    xmax = int(sel.xpath('//bndbox/xmax/text()')[0])
    ymax = int(sel.xpath('//bndbox/ymax/text()')[0])
    return [xmin/width, ymin/height, xmax/width, ymax/height]

对于每个xml文件，均调用该函数，用于获取目标在图像中的相对位置。

labels = [to_labels(path) for path in xmls]

labels的值如下

4.2 对数据集进行乱序

定义out1_label, out2_label, out3_label, out4_label，分别用来存放四个坐标值。比如out1_label是一个列表，列表中的每个元素为各个图片的xmin。

out1_label, out2_label, out3_label, out4_label = list(zip(*labels))

获取乱序序列

index = np.random.permutation(len(imgs))

使用乱序序列对图片进行乱序

images = np.array(imgs)[index]

使用相同的乱序序列对目标位置集进行乱序，由于使用的是相同的乱序序列，所以乱序后，目标位置与图片依然是正确对应。

out1_label = np.array(out1_label)[index]
out2_label = np.array(out2_label)[index]
out3_label = np.array(out3_label)[index]
out4_label = np.array(out4_label)[index]

4.3 组合

把xmin，ymin，xmax，ymax作为一个元组放到一块，并生成TensorSliceDataset类型的数据。

label_datset = tf.data.Dataset.from_tensor_slices((
                                              out1_label, 
                                              out2_label, 
                                              out3_label, 
                                              out4_label))
image_dataset = tf.data.Dataset.from_tensor_slices(images)

4.4 定义图片预处理函数

def read_jpg(path):
    img = tf.io.read_file(path)
    img = tf.image.decode_jpeg(img, channels=3)
    return img

def normalize(input_image):
    input_image = tf.image.resize(input_image, [scal, scal])
    input_image = tf.cast(input_image, tf.float32)/127.5 - 1   #范围是-1到1
    return input_image

@tf.function
def load_image(input_image_path):
    input_image = read_jpg(input_image_path)
    input_image = normalize(input_image)
    return input_image

调研预处理函数，对所有的图片进行预处理。

image_dataset = image_dataset.map(load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)

将图片与位置标签打包成一块，使之一一对应。

dataset = tf.data.Dataset.zip((image_dataset, label_datset))

查看一下dataset

dataset

输出为<ZipDataset shapes: ((224, 224, 3), ((), (), (), ())), types: (tf.float32, (tf.float64, tf.float64, tf.float64, tf.float64))>

4.5 定义训练相关的参数

划分训练集与测集

test_count = int(len(imgs)*0.2)
train_count = len(images) - test_count
dataset_train = dataset.skip(test_count)
dataset_test = dataset.take(test_count)

定义相关参数

BATCH_SIZE = 8
BUFFER_SIZE = 300
STEPS_PER_EPOCH = train_count // BATCH_SIZE
VALIDATION_STEPS = test_count // BATCH_SIZE

train_dataset = dataset_train.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
train_dataset = train_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
test_dataset = dataset_test.batch(BATCH_SIZE)

验证一下

for img, label in train_dataset.take(1):
    plt.imshow(tf.keras.preprocessing.image.array_to_img(img[0]))
    out1, out2, out3, out4 = label
    xmin, ymin, xmax, ymax = (out1[0]).numpy()*scal, (out2[0]).numpy()*scal, (out3[0]).numpy()*scal, (out4[0]).numpy()*scal
    rect = Rectangle((xmin, ymin), (xmax-xmin), (ymax-ymin), fill=False, color='red')
    ax = plt.gca()
    ax.axes.add_patch(rect)

输出结果为

wchwdog13

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
tensorflow2.3.0图像定位（二）

紧接tensorflow2.3.0图像定位（一）的内容，本部分主要对读取数据集，并对数据进行一定的清洗。一、读取并处理图片数据读取所有的图片，并输出前五个文件目录。images = sorted(glob.glob('/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Dataset/images/*.jpg'))images[:5]输出为：['/home/wchw/资料/人工智能/下载数据集/The Oxford-IIIT Pet Datas
复制链接

扫一扫

专栏目录