Ubuntu 16.04 下 tensorflow objection detection api 训练VOC 数据集

最新推荐文章于 2022-04-12 13:05:38 发布

xulei_Tao

最新推荐文章于 2022-04-12 13:05:38 发布

阅读量626

点赞数 1

本文链接：https://blog.csdn.net/xulei_Tao/article/details/88976098

版权

安装好 tensorflow objection detectiono api 后自然要跑跑啦。VOC数据集走一个。

一、数据准备

（1）VOC数据集

PASCAL VOC数据集为图像识别和分类提供了一套格式标准，许多自己标注的数据都会按照 VOC 的格式来。

Pascal VOC网址、各位大牛算法的排名、VOC2012数据集

主要任务：目标检测，分割。待识别的物体有20类：

person
bird, cat, cow, dog, horse, sheep
aeroplane, bicycle, boat, bus, car, motorbike, train
bottle, chair, dining table, potted plant, sofa, tv/monitor

数据：

所有的图片都有目标检测的标注，部分图片有分割的标注；
检测任务：VOC2012的trainval/test包含08-11年的所有对应图片。 trainval有11540张图片共27450个物体；
分割任务： VOC2012的trainval包含07-11年的所有对应图片， test只包含08-11。trainval有 2913张图片共6929个物体。

这里针对 VOC2012 做出说明：

└── VOCdevkit     #根目录
    └── VOC2012   #不同年份的数据集，这里只下载了2012的，还有2007等其它年份的
        ├── Annotations   #存放xml文件，与JPEGImages中的图片一一对应，解释图片的内容等等
        ├── ImageSets     #该目录下存放的都是txt文件，txt文件中每一行包含一个图片的名称，末尾会加上±1表示正负样本
        │   ├── Action
        │   ├── Layout
        │   ├── Main
        │   └── Segmentation
        ├── JPEGImages         #存放源图片
        ├── SegmentationClass  #存放的是图片，分割后的效果
        └── SegmentationObject #存放的是图片，分割后的效果

针对目标检测，下面主要介绍 Annotations 、ImageSets 和 JPEImages 三个文件夹。

Annotations 文件夹

用于存放 xml 文件，用于对存放图片的标注信息，每个 xml 文件对应一张图片。文件内容主要介绍了对应图片的基本信息，如来自那个文件夹、文件名、来源、图像尺寸以及图像中包含哪些目标以及目标框bbox的信息。

<annotation>
	<folder>VOC2012</folder>              # 图片所在文件夹
	<filename>2007_003194.jpg</filename>  # 图片名称
	<source>                              # 图片来源信息
		<database>The VOC2007 Database</database>
		<annotation>PASCAL VOC2007</annotation>
		<image>flickr</image>
	</source>
	<size>                                # 图片尺寸
		<width>500</width>
		<height>333</height>
		<depth>3</depth>
	</size>
	<segmented>1</segmented>              # 是否分割（目标检测无所谓）
	<object>                              # 所有检测目标
		<name>dog</name>              # 目标类别
		<pose>Right</pose>            # 拍摄角度
		<truncated>0</truncated>      # 物体是否被遮挡（>15%）
		<difficult>0</difficult>      # 是否识别困难
		<bndbox>                      # bbox
			<xmin>1</xmin>
			<ymin>16</ymin>
			<xmax>374</xmax>
			<ymax>299</ymax>
		</bndbox>
	</object>
</annotation>

对应的图片：

ImageSets 文件夹

包含如下四个文件夹：

Action 人的动作（如running、jumping等等，VOC challenge的一部分）
Layout 具有人体部位的数据（人的head、hand、feet等等，VOC challenge的一部分）
Main 图像物体识别的数据，总共分为20类。
Segmentation 可用于分割的数据。

文件夹下都是 txt 文档，文档中存放的是对应的图片名称。主要针对 Main 文件夹，train.txt以及trainval.txt文件存放训练和验证集的图片名称。

JPEGImages 文件夹

存放源图片，如下所示：

（2）转换数据格式，TF_record

Tensorflow对象检测API使用TFRecord文件格式读取数据。提供脚本 ( models/research/object_detection/dataset_tools/create_pascal_tf_record.py ) 将PASCAL VOC2012数据集转换为TFRecords格式。首先，将 VOCdevkit 置于某一文件夹，修改 create_pascal_tf_record.py 第164行，删除其中的 ‘aeroplane_’ ，修改后如下。

examples_path = os.path.join(data_dir, year, 'ImageSets', 'Main', FLAGS.set + '.txt')

运行如下指令，具体路径根据自己所放置数据位置而定。我运行时终端什么都没显示，但最后也生成了。

python dataset/create_pascal_tf_record.py \ 
    --data_dir=dataset/VOCdevkit \
    --year=VOC2012 \
    --set=train \
    --output_path=record/voc2012_train.record

python dataset/create_pascal_tf_record.py \ 
    --data_dir=dataset/VOCdevkit \
    --year=VOC2012 \
    --set=val \
    --output_path=record/voc2012_val.record

二、训练

（1）下载预训练模型

TensorFlow Object Detection API 默认提供了 5 个预训练模型，都是使用 coco 数据集训练完成的，结构分别为

SSD+MobileNet
SSD+Inception
R-FCN+ResNet10I
Faster RCNN+ResNetl0l
Faster RCNN+Inception_ResNet

原文：https://blog.csdn.net/c20081052/article/details/81710436

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md

（2）添加标签信息

voc数据集做好映射信息了，放在object_detection/data/文件夹下。voc共20类。

格式如下

item {
  id: 1
  name: 'aeroplane'
}

item {
  id: 2
  name: 'bicycle'
}

item {
  id: 3
  name: 'bird'
}

（3）修改配置文件

以 ssd_mobilenet_v1_fpn 为例子

model {      # 网络结构及相关超参数的设置       
  ssd {
    num_classes: 90  # 修改为自己的类别数目
    image_resizer {
      fixed_shape_resizer {
        height: 640
        width: 640
      }
    }

    feature_extractor {  # 特征提取网络
      type: "ssd_mobilenet_v1_fpn"
      depth_multiplier: 1.0
      min_depth: 16
      conv_hyperparams {
        regularizer {
          l2_regularizer {
            weight: 3.99999989895e-05
          }
        }
        initializer {
          random_normal_initializer {
            mean: 0.0
            stddev: 0.00999999977648
          }
        }
        activation: RELU_6
        batch_norm {
          decay: 0.996999979019
          scale: true
          epsilon: 0.0010000000475
        }
      }
      override_base_feature_extractor_hyperparams: true
    }

    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }

    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }

    similarity_calculator {
      iou_similarity {
      }
    }

    box_predictor {
      weight_shared_convolutional_box_predictor {
        conv_hyperparams {
          regularizer {
            l2_regularizer {
              weight: 3.99999989895e-05
            }
          }
          initializer {
            random_normal_initializer {
              mean: 0.0
              stddev: 0.00999999977648
            }
          }
          activation: RELU_6
          batch_norm {
            decay: 0.996999979019
            scale: true
            epsilon: 0.0010000000475
          }
        }
        depth: 256
        num_layers_before_predictor: 4
        kernel_size: 3
        class_prediction_bias_init: -4.59999990463
      }
    }

    anchor_generator {
      multiscale_anchor_generator {
        min_level: 3
        max_level: 7
        anchor_scale: 4.0
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        scales_per_octave: 2
      }
    }

    post_processing {
      batch_non_max_suppression {
        score_threshold: 0.300000011921
        iou_threshold: 0.600000023842
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }

    normalize_loss_by_num_matches: true
    loss {
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_loss {
        weighted_sigmoid_focal {
          gamma: 2.0
          alpha: 0.25
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }

    encode_background_as_zeros: true
    normalize_loc_loss_by_codesize: true
    inplace_batchnorm_update: true
    freeze_batchnorm: false
  }
}

#反向传播的方法选择及相关参数设置
train_config {     
  batch_size: 128
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_crop_image {
      min_object_covered: 0.0
      min_aspect_ratio: 0.75
      max_aspect_ratio: 3.0
      min_area: 0.75
      max_area: 1.0
      overlap_thresh: 0.0
    }
  }
  sync_replicas: true
  optimizer {
    momentum_optimizer {
      learning_rate {
        cosine_decay_learning_rate {
          learning_rate_base: 0.0799999982119
          total_steps: 12500
          warmup_learning_rate: 0.0266660004854
          warmup_steps: 1000
        }
      }
      momentum_optimizer_value: 0.899999976158
    }
    use_moving_average: false
  }
  fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
  num_steps: 12500
  startup_delay_steps: 0.0
  replicas_to_aggregate: 8
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
}

# 训练数据集的地址
train_input_reader {
  label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-00000-of-00100"
  }
}

# 模型测试时的参数设置
eval_config {
  num_examples: 8000
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
}

# 测试数据集的地址
eval_input_reader {
  label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
  shuffle: false
  num_readers: 1
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-00000-of-00010"
  }
}

（4）开始训练

根据自己的文件所在处修改路径

# Train
python ./models-master/research/object_detection/model_main.py \ 
--pipeline_config_path = pipeline.config \  # 配置文件的位置
--model_dir=./models-master/research/data/voc/model \ # 模型保存的位置
--alsologtostderr

（5）tensorboard 查看

切换到模型生成的文件夹中

tensorboard logdir = ./

（6）生成 pd 模型

# Export
python export_inference_graph.py  \ # object_detection 下有这个文件
--input_type image_tensor \
--pipeline_config_path= pipeline.config \ # 配置文件的位置
--trained_checkpoint_prefix model.ckpt-20000 \ # 训练出的模型位置
--output_directory exported_model