annotation x 和 y 值必须介于 0 与 1 之间_用Keras Tensorflow 2.0实现YOLO V1

最新推荐文章于 2022-11-02 22:21:26 发布

Davider_Wu

最新推荐文章于 2022-11-02 22:21:26 发布

阅读量567

点赞数 1

文章标签： annotation x 和 y 值必须介于 0 与 1 之间

本文链接：https://blog.csdn.net/weixin_26713059/article/details/113669040

版权

本文详述了使用TensorFlow 2.0实现YOLO V1模型的过程，包括数据预处理、输入输出数据准备、模型训练、损失函数定义等步骤。在数据预处理阶段，将XML标注转换为txt文件并调整图像尺寸。模型的输出是7x7x30的张量，其中位置信息已归一化。训练过程中，学习率经过分阶段调整，模型最终虽未达到论文精度，但加深了对YOLO V1的理解。

摘要由CSDN通过智能技术生成

本文尝试使用Tensorflow 2.0复现论文的效果。

import tensorflow as tf
# for plotting the images
import matplotlib.pyplot as plt

1.数据预处理

使用VOC 2007(http://host.robots.ox.ac.uk /pascal/VOC/voc2007/)数据集进行神经网络训练。

获取训练集、验证集和测试集数据。

!wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/
VOCtrainval_06-Nov-2007.tar
!wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/
VOCtest_06-Nov-2007.tar

!tar xvf VOCtrainval_06-Nov-2007.tar
!tar xvf VOCtest_06-Nov-2007.tar

!rm VOCtrainval_06-Nov-2007.tar
!rm VOCtest_06-Nov-2007.tar

预处理Annotation，将XML转换成txt文件，方便后续处理。

import argparse
import xml.etree.ElementTree as ET
import os

parser = argparse.ArgumentParser(description='Build Annotations.')
parser.add_argument('dir', default='..', help='Annotations.')

sets = [('2007', 'train'), ('2007', 'val'), ('2007', 'test')]

classes_num = {
    'aeroplane': 0, 'bicycle': 1, 'bird': 2, 'boat': 3, 'bottle': 4, 'bus': 5, 'car': 6, 'cat': 7, 'chair': 8, 'cow': 9, 'diningtable': 10, 'dog': 11, 'horse': 12, 'motorbike': 13, 'person': 14, 'pottedplant': 15, 'sheep': 16, 'sofa': 17, 'train': 18, 'tvmonitor': 19}


def convert_annotation(year, image_id, f):
    in_file = os.path.join('VOCdevkit/VOC%s/Annotations/%s.xml' % (year, image_id))
    tree = ET.parse(in_file)
    root = tree.getroot()

for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        classes = list(classes_num.keys())
if cls not in classes or int(difficult) == 1:
continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (int(xmlbox.find('xmin').text), int(xmlbox.find('ymin').text),
             int(xmlbox.find('xmax').text), int(xmlbox.find('ymax').text))
        f.write(' ' + ','.join([str(a) for a in b]) + ',' + str(cls_id))

for year, image_set in sets:
  print(year, image_set)
with open(os.path.join('VOCdevkit/VOC%s/ImageSets/Main/%s.txt' % (year, image_set)), 'r') as f:
      image_ids = f.read().strip().split()
with open(os.path.join("VOCdevkit", '%s_%s.txt' % (year, image_set)), 'w') as f:
for image_id in image_ids:
          f.write('%s/VOC%s/JPEGImages/%s.jpg' % ("VOCdevkit", year, image_id))
          convert_annotation(year, image_id, f)
          f.write('\n')

转换后生成的文本如下：

 ./data/VOCtrainval_06-Nov-2007/VOCdevkit/VOC2007/JPEGImages/000012.jpg 156,97,351,270,6
 ./data/VOCtrainval_06-Nov-2007/VOCdevkit/VOC2007/JPEGImages/000017.jpg 185,62,279,199,14 90,78,403,336,12
 ./data/VOCtrainval_06-Nov-2007/VOCdevkit/VOC2007/JPEGImages/000023.jpg 9,230,245,500,1 230,220,334,500,1 2,1,117,369,14 3,2,243,462,14 225,1,334,486,14
 ./data/VOCtrainval_06-Nov-2007/VOCdevkit/VOC2007/JPEGImages/000026.jpg 90,125,337,212,6
 ./data/VOCtrainval_06-Nov-2007/VOCdevkit/VOC2007/JPEGImages/000032.jpg 104,78,375,183,0 133,88,197,123,0 195,180,213,229,14 26,189,44,238,14
 ./data/VOCtrainval_06-Nov-2007/VOCdevkit/VOC2007/JPEGImages/000033.jpg 9,107,499,263,0 421,200,482,226,0 325,188,411,223,0
 ......

2.准备输入输出数据

YOLO V1输入是大小为448x448x3的图片，我们把数据集的所有的图片大小缩放到448x448，然后将图片的所有像素值缩放到[0, 1]之间。

YOLO的输出是大小为7x7x30的张量(Tensor)。

其中，BoundingBox中的(x,y)是相对于Grid Cell左上角坐标的偏移量，并使用Grid Cell的宽高做归一化处理；(w,h)是相对于整个图片的宽和高的比例；(x,y,w,h)的数值都在[0,1]范围内。

import cv2 as cv
import numpy as np

def read(image_path, label):
    image = cv.imread(image_path)
    image = cv.cvtColor(image, cv.COLOR_BGR2RGB)
    image_h, image_w = image.shape[0:2]
    image = cv.resize(image, (448, 448))
    image = image / 255.

    label_matrix = np.zeros([7, 7, 30])
for l in label:
        l = l.split(',')
        l = np.array(l, dtype=np.int)
        xmin = l[0]
        ymin = l[1]
        xmax = l[2]
        ymax = l[3]
        cls = l[4]
        x = (xmin + xmax) / 2 / image_w
        y = (ymin + ymax) / 2 / image_h
        w = (xmax - xmin) / image_w
        h = (ymax - ymin) / image_h
        loc = [7 * x, 7 * y]
        loc_i = int(loc[1])
        loc_j = int(loc[0])
        y = loc[1] - loc_i
        x = loc[0] - loc_j

if label_matrix[loc_i, loc_j, 24] == 0:
            label_matrix[loc_i, loc_j, cls] = 1
            label_matrix[loc_i, loc_j, 20:24] = [x, y, w, h]
            label_matrix[loc_i, loc_j, 24] = 1  # response

return image, label_matrix

最低0.47元/天解锁文章

Davider_Wu

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
annotation x 和 y 值必须介于 0 与 1 之间_用Keras Tensorflow 2.0实现YOLO V1

本文尝试使用Tensorflow 2.0复现论文的效果。import tensorflow as tf# for plotting the imagesimport matplotlib.pyplot as plt1.数据预处理使用VOC 2007(http://host.robots.ox.ac.uk /pascal/VOC/voc2007/)数据集进行神经网络训练。获取训练集、验证集...
复制链接

扫一扫