Ubuntu16.04下tensorflow+SSD实现目标检测，采用widerface数据集进行模型的构建（三）

最新推荐文章于 2021-11-20 18:13:23 发布

bingbing0607

最新推荐文章于 2021-11-20 18:13:23 发布

阅读量539

点赞数 1

文章标签：深度学习 tensorflow 神经网络

本文链接：https://blog.csdn.net/bingbign0607/article/details/105727746

版权

Ubuntu16.04下tensorflow+SSD实现人脸目标检测，采用widerface数据集，对数据进行打包和清洗（三）

任务：
1.将数据进行voc格式的打包
2.将打包好的voc格式的数据集转化成tfrecord的文件

补充：我在（一）发布的models中，这个模型太老了，有些代码还处在Python2的编译环境中，而我使用的是Python3环境。所以，我在进行数据的训练过程中遇到了Python2和Python3代码不兼容的问题，然后我使用较新的models进行模型的训练。同样的将（一）的步骤放到新models也是可以的。下面我给出我已经配好的models百度网盘下载路径：链接: https://pan.baidu.com/s/1PCk3rkrB8c-YacpbefKCUQ 提取码: tssj

1.下在widerface数据集，我的百度网盘地址：链接: https://pan.baidu.com/s/1AqRdjUpOA0-QLLCwNTe82Q 提取码: 9w7s

2.一共有4个压缩包分别是训练数据集，测试数据集和验证数据集，以及人脸的标注信息，解压后是下面这样
在这里插入图片描述

3.在我是在数据集下的相同目录下创建5个文件夹：
JPEGImages：用来保存你的数据图片
Annotations：这里是存放你对所有数据图片做的标注，每张照片的标注信息必须是xml格式
ImageSets/Main：train.txt、val.txt
TF-record:是将voc格式的数据转化成tfrecord文件的
fit-model：是用来存放训练模型以及tensorboard日志数据的

在这里插入图片描述
3.对数据进行voc格式的转化
需要注意改的地方就是下面的路径了
rootdir对应的是你创建那5个文件的上一级目录
gtfile是标注信息的txt文件，我下面的是验证集的标注信息
im_folder是你验证集的图片路径
fwrite是存放了你打包好的数据同样你要先创建ImageSets/Main，上面我提过

注意的一点是我的#1代码wider_face_val_bbx_gt.txt，WIDER_val/images，ImageSets/Main/val.txt都是验证集的数据打包，而实际操作还需要测试集的（我给出了#2代码），当然你需要根据自己的文件路径进行修改，修改可以参照将val改成train

运行：我是在research的目录下使用python3 object_detection/to_voc.py进行代码的分别编译train和val

#1
rootdir = "/home/hyb/models/dataset/widerface"
gtfile = "/home/hyb/models/dataset/widerface/wider_face_split/wider_face_val_bbx_gt.txt"
im_folder = "/home/hyb/models/dataset/widerface/WIDER_val/images"
fwrite = open("/home/hyb/models/dataset/widerface/ImageSets/Main/val.txt", "w")

#2
rootdir = "/home/hyb/models/dataset/widerface"
gtfile = "/home/hyb/models/dataset/widerface/wider_face_split/wider_face_train_bbx_gt.txt"
im_folder = "/home/hyb/models/dataset/widerface/WIDER_val/trains"
fwrite = open("/home/hyb/models/dataset/widerface/ImageSets/Main/train.txt", "w")

import os, cv2, sys, shutil, numpy

from xml.dom.minidom import Document
import os


# 本程序可以讲widerface转为VOC格式的数据
def writexml(filename, saveimg, bboxes, xmlpath):
    doc = Document()

    annotation = doc.createElement('annotation')

    doc.appendChild(annotation)

    folder = doc.createElement('folder')

    folder_name = doc.createTextNode('widerface')
    folder.appendChild(folder_name)
    annotation.appendChild(folder)
    filenamenode = doc.createElement('filename')
    filename_name = doc.createTextNode(filename)
    filenamenode.appendChild(filename_name)
    annotation.appendChild(filenamenode)
    source = doc.createElement('source')
    annotation.appendChild(source)
    database = doc.createElement('database')
    database.appendChild(doc.createTextNode('wider face Database'))
    source.appendChild(database)
    annotation_s = doc.createElement('annotation')
    annotation_s.appendChild(doc.createTextNode('PASCAL VOC2007'))
    source.appendChild(annotation_s)
    image = doc.createElement('image')
    image.appendChild(doc.createTextNode('flickr'))
    source.appendChild(image)
    flickrid = doc.createElement('flickrid')
    flickrid.appendChild(doc.createTextNode('-1'))
    source.appendChild(flickrid)
    owner = doc.createElement('owner')
    annotation.appendChild(owner)
    flickrid_o = doc.createElement('flickrid')
    flickrid_o.appendChild(doc.createTextNode('muke'))
    owner.appendChild(flickrid_o)
    name_o = doc.createElement('name')
    name_o.appendChild(doc.createTextNode('muke'))
    owner.appendChild(name_o)

    size = doc.createElement('size')
    annotation.appendChild(size)

    width = doc.createElement('width')
    width.appendChild(doc.createTextNode(str(saveimg.shape[1])))
    height = doc.createElement('height')
    height.appendChild(doc.createTextNode(str(saveimg.shape[0])))
    depth = doc.createElement('depth')
    depth.appendChild(doc.createTextNode(str(saveimg.shape[2])))

    size.appendChild(width)

    size.appendChild(height)
    size.appendChild(depth)
    segmented = doc.createElement('segmented')
    segmented.appendChild(doc.createTextNode('0'))
    annotation.appendChild(segmented)
    for i in range(len(bboxes)):
        bbox = bboxes[i]
        objects = doc.createElement('object')
        annotation.appendChild(objects)
        object_name = doc.createElement('name')
        object_name.appendChild(doc.createTextNode('face'))
        objects.appendChild(object_name)
        pose = doc.createElement('pose')
        pose.appendChild(doc.createTextNode('Unspecified'))
        objects.appendChild(pose)
        truncated = doc.createElement('truncated')
        truncated.appendChild(doc.createTextNode('0'))
        objects.appendChild(truncated)
        difficult = doc.createElement('difficult')
        difficult.appendChild(doc.createTextNode('0'))
        objects.appendChild(difficult)
        bndbox = doc.createElement('bndbox')
        objects.appendChild(bndbox)
        xmin = doc.createElement('xmin')
        xmin.appendChild(doc.createTextNode(str(bbox[0])))
        bndbox.appendChild(xmin)
        ymin = doc.createElement('ymin')
        ymin.appendChild(doc.createTextNode(str(bbox[1])))
        bndbox.appendChild(ymin)
        xmax = doc.createElement('xmax')
        xmax.appendChild(doc.createTextNode(str(bbox[0] + bbox[2])))
        bndbox.appendChild(xmax)
        ymax = doc.createElement('ymax')
        ymax.appendChild(doc.createTextNode(str(bbox[1] + bbox[3])))
        bndbox.appendChild(ymax)
    f = open(xmlpath, "w")
    f.write(doc.toprettyxml(indent=''))
    f.close()


rootdir = "/home/hyb/models/dataset/widerface"
gtfile = "/home/hyb/models/dataset/widerface/wider_face_split/wider_face_val_bbx_gt.txt"
im_folder = "/home/hyb/models/dataset/widerface/WIDER_val/images"
fwrite = open("/home/hyb/models/dataset/widerface/ImageSets/Main/val.txt", "w")

# wider_face_train_bbx_gt.txt的文件内容
# 第一行为名字
# 第二行为头像的数量 n
# 剩下的为n行人脸数据
# 以下为示例
# 0--Parade/0_Parade_marchingband_1_117.jpg
# 9
# 69 359 50 36 1 0 0 0 0 1
# 227 382 56 43 1 0 1 0 0 1
# 296 305 44 26 1 0 0 0 0 1
# 353 280 40 36 2 0 0 0 2 1
# 885 377 63 41 1 0 0 0 0 1
# 819 391 34 43 2 0 0 0 1 0
# 727 342 37 31 2 0 0 0 0 1
# 598 246 33 29 2 0 0 0 0 1
# 740 308 45 33 1 0 0 0 2 1

with open(gtfile, "r") as gt:
    while (True):
        gt_con = gt.readline()[:-1]
        if gt_con is None or gt_con == "":
            break;
        im_path = im_folder + "/" + gt_con;
        print(im_path)
        im_data = cv2.imread(im_path)
        if im_data is None:
            continue
        # 可视化的部分
        # cv2.imshow(im_path, im_data)
        # cv2.waitKey(0)

        numbox = int(gt.readline())

        # 获取每一行人脸数据
        bboxes = []
        if numbox == 0:  # numbox 为0 的情况处理
            gt.readline()
        else:
            for i in range(numbox):
                line = gt.readline()
                infos = line.split(" ")  # 用空格分割
                # x y w h .....
                bbox = (int(infos[0]), int(infos[1]), int(infos[2]), int(infos[3]))
                # 绘制人脸框
                # cv2.rectangle(im_data, (int(infos[0]), int(infos[1])),
                #               (int(infos[0]) + int(infos[2]), int(infos[1]) + int(infos[3])),
                #               color=(0, 0, 255), thickness=1)
                bboxes.append(bbox)  # 将一张图片的所有人脸数据加入bboxes
            # cv2.imshow(im_path, im_data)
            # cv2.waitKey(0)
            filename = gt_con.replace("/", "_")  # 将存储位置作为图片名称，斜杠转为下划线
            fwrite.write(filename.split(".")[0] + "\n")
            cv2.imwrite("{}/JPEGImages/{}".format(rootdir, filename), im_data)
            xmlpath = "{}/Annotations/{}.xml".format(rootdir, filename.split(".")[0])
            writexml(filename, im_data, bboxes, xmlpath)
fwrite.close()

4.将打包好的数据转化成tfrecord文件
在我的models中下：/home/hyb/muke/models/research/object_detection/dataset_tools/create_face_tf_record.py这个文件就能对数据进行tfrecord的数据打包

# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

r"""Convert raw PASCAL dataset to TFRecord for object_detection.

Example usage:
    python object_detection/dataset_tools/create_pascal_tf_record.py \
        --data_dir=/home/user/VOCdevkit \
        --year=VOC2012 \
        --output_path=/home/user/pascal.record
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import hashlib
import io
import logging
import os

from lxml import etree
import PIL.Image
import tensorflow as tf

from object_detection.utils import dataset_util
from object_detection.utils import label_map_util


flags = tf.app.flags
flags.DEFINE_string('data_dir', '', 'Root directory to raw PASCAL VOC dataset.')
flags.DEFINE_string('set', 'train', 'Convert training set, validation set or '
                    'merged set.')
flags.DEFINE_string('annotations_dir', 'Annotations',
                    '(Relative) path to annotations directory.')
flags.DEFINE_string('year', 'VOC2007', 'Desired challenge year.')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
flags.DEFINE_string('label_map_path', 'object_detection/data/face_label_map.pbtxtop',
                    'Path to label map proto')
flags.DEFINE_boolean('ignore_difficult_instances', False, 'Whether to ignore '
                     'difficult instances')
FLAGS = flags.FLAGS

SETS = ['train', 'val', 'trainval', 'test']
YEARS = ["fddb", 'widerface']


def dict_to_tf_example(data,
                       dataset_directory,
                       label_map_dict,
                       ignore_difficult_instances=False,
                       image_subdirectory='JPEGImages'):
  """Convert XML derived dict to tf.Example proto.

  Notice that this function normalizes the bounding box coordinates provided
  by the raw data.

  Args:
    data: dict holding PASCAL XML fields for a single image (obtained by
      running dataset_util.recursive_parse_xml_to_dict)
    dataset_directory: Path to root directory holding PASCAL dataset
    label_map_dict: A map from string label names to integers ids.
    ignore_difficult_instances: Whether to skip difficult instances in the
      dataset  (default: False).
    image_subdirectory: String specifying subdirectory within the
      PASCAL dataset directory holding the actual image data.

  Returns:
    example: The converted tf.Example.

  Raises:
    ValueError: if the image pointed to by data['filename'] is not a valid JPEG
  """
  img_path = os.path.join(data['folder'], image_subdirectory, data['filename'])
  full_path = os.path.join(dataset_directory, img_path)
  with tf.gfile.GFile(full_path, 'rb') as fid:
    encoded_jpg = fid.read()
  encoded_jpg_io = io.BytesIO(encoded_jpg)
  image = PIL.Image.open(encoded_jpg_io)
  if image.format != 'JPEG':
    raise ValueError('Image format not JPEG')
  key = hashlib.sha256(encoded_jpg).hexdigest()

  width = int(data['size']['width'])
  height = int(data['size']['height'])

  xmin = []
  ymin = []
  xmax = []
  ymax = []
  classes = []
  classes_text = []
  truncated = []
  poses = []
  difficult_obj = []
  if 'object' in data:
    for obj in data['object']:
      difficult = bool(int(obj['difficult']))
      if ignore_difficult_instances and difficult:
        continue

      difficult_obj.append(int(difficult))

      xmin.append(float(obj['bndbox']['xmin']) / width)
      ymin.append(float(obj['bndbox']['ymin']) / height)
      xmax.append(float(obj['bndbox']['xmax']) / width)
      ymax.append(float(obj['bndbox']['ymax']) / height)
      classes_text.append(obj['name'].encode('utf8'))
      classes.append(label_map_dict[obj['name']])
      truncated.append(int(obj['truncated']))
      poses.append(obj['pose'].encode('utf8'))

  example = tf.train.Example(features=tf.train.Features(feature={
      'image/height': dataset_util.int64_feature(height),
      'image/width': dataset_util.int64_feature(width),
      'image/filename': dataset_util.bytes_feature(
          data['filename'].encode('utf8')),
      'image/source_id': dataset_util.bytes_feature(
          data['filename'].encode('utf8')),
      'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
      'image/encoded': dataset_util.bytes_feature(encoded_jpg),
      'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
      'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
      'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
      'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
      'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),
      'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
      'image/object/class/label': dataset_util.int64_list_feature(classes),
      'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
      'image/object/truncated': dataset_util.int64_list_feature(truncated),
      'image/object/view': dataset_util.bytes_list_feature(poses),
  }))
  return example


def main(_):
  if FLAGS.set not in SETS:
    raise ValueError('set must be in : {}'.format(SETS))
  if FLAGS.year not in YEARS:
    raise ValueError('year must be in : {}'.format(YEARS))

  data_dir = FLAGS.data_dir
  years = ["fddb", 'widerface']
  if FLAGS.year != 'merged':
    years = [FLAGS.year]

  writer = tf.python_io.TFRecordWriter(FLAGS.output_path)

  label_map_dict = label_map_util.get_label_map_dict(FLAGS.label_map_path)

  for year in years:
    logging.info('Reading from PASCAL %s dataset.', year)
    examples_path = os.path.join(data_dir, year, 'ImageSets', 'Main',
                                 FLAGS.set + '.txt')
    annotations_dir = os.path.join(data_dir, year, FLAGS.annotations_dir)
    examples_list = dataset_util.read_examples_list(examples_path)
    for idx, example in enumerate(examples_list):
      if idx % 100 == 0:
        logging.info('On image %d of %d', idx, len(examples_list))
      path = os.path.join(annotations_dir, example + '.xml')
      with tf.gfile.GFile(path, 'r') as fid:
        xml_str = fid.read()
      xml = etree.fromstring(xml_str)
      data = dataset_util.recursive_parse_xml_to_dict(xml)['annotation']

      tf_example = dict_to_tf_example(data, FLAGS.data_dir, label_map_dict,
                                      FLAGS.ignore_difficult_instances)
      writer.write(tf_example.SerializeToString())

  writer.close()


if __name__ == '__main__':
  tf.app.run()

而你需要对这个脚本进行相应的修改：
1.YEARS = [“fddb”, ‘widerface’]，如果你拿到的是github下的models，YEARS = [“这里的内容是不一样的”]
2.years = [“fddb”, ‘widerface’] 同样代码靠后位置还有一个year需要你修改
3. examples_path = os.path.join(data_dir, year, ‘ImageSets’, ‘Main’,FLAGS.set + ‘.txt’)我这里‘Main’后面是没有加东西的直接是我给出的这样一个代码，如果是github上的话会有个“airXXX”的前缀，你把他删除掉，当然，我的代码里已经删除了，为什么呢？你看我在创建ImageSets/Main这个文件夹是，我后直接带的是train.txt和val.txt，没有在添加多其他目录
4. 运行的时候，我依旧在research目录下的终端上运行，使用的是下面这个代码
5. 训练集打包：注意的是–data_dir是你ImageSets/Main这5个文件夹的上一级目录 --output_path是你打包好存放的路径
以及flags.DEFINE_string(‘label_map_path’, ‘object_detection/data/face_label_map.pbtxtop’,
‘Path to label map proto’)将标签指向face_label_map.pbtxtop

python3 object_detection/dataset_tools/create_face_tf_record.py 
        --data_dir=/home/hyb/muke/models/dataset/widerface 
        --year=widerface
        --output_path=/home/hyb/muke/models/dataset/widerface/TF-record/train.record
        --set=train`

验证数据集的打包

python3 object_detection/dataset_tools/create_face_tf_record.py 
        --data_dir=/home/hyb/muke/models/dataset/widerface 
        --year=widerface
        --output_path=/home/hyb/muke/models/dataset/widerface/TF-record/val.record
        --set=val

打包好后
在这里插入图片描述