面向遥感图像的基于Tensorflow object detection api 的模型训练及多张单图片目标检测

最新推荐文章于 2023-04-02 15:57:27 发布

Jerry_liu20080504

最新推荐文章于 2023-04-02 15:57:27 发布

阅读量532

点赞数 1

分类专栏：人工智能文章标签：目标检测 Tensorflow object detection 遥感图像

本文链接：https://blog.csdn.net/Jerry_liu20080504/article/details/91042458

版权

人工智能专栏收录该内容

15 篇文章 1 订阅

订阅专栏

一、通过 Tensorflow object detection api 训练新模型

本文主要参考《21个项目玩转深度学习：基于TensorFlow的实践详解》2018年3月第1版。何之源编著，中的5.2.3节训练新的模型。在此首先感谢何之源老师。

何之源老师的这本书中是下载pascal voc 官方的数据集。经尝试，好用，但是数据集不适合遥感图像相关的研究。
然后，我自己转换的数据集，详见网址：
将数据集NWPU VHR-10转成pascal voc的格式 - Jerry_liu20080504的专栏 - CSDN博客
https://blog.csdn.net/Jerry_liu20080504/article/details/90483131

思路还是跟书中基本一样，只不过数据集用的是NWPU VHR-10转成pascal的。也即书中提到的：
“这里的转换代码是为 VOC 2012 数据集提前编写好的。如果读者希望使用自己的数据集，有两种方法，第一种方法是修改自己的数据集的标注格式，使其和 VOC 2012 一模一样，然后就可以直接使用create_pascal_tf_record.py 脚本转焕了，另外一种方法是修改create_pascal_tf_record.py，对读取标签的代码进行修改。”

我采用的就是第一种方式，将数据集转换为 voc 格式。

回到 voc 2012 数据集的训练。下载数据集后，需要选择合适的模型。这里以 Faster R-CNN + Inception_ResNet_v2 模型为例进行介绍。首先下载在 coco 上预训练的Faster R-CNN + Inception_ResNet_v2 模型。下载地址： http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017.tar.gz
解压后得到frozen_inference_graph.pb 、graph.pbtxt 、model.ckpt.data-00000-of-00001、model.ckpt.index 、model.ckpt.meta 5 个文件。在 voc 文件夹（或者其他文件夹，比如我的是object_detection\voc\NWPUdevkit，因为我用的不是voc官方的，自建了个目录存放相关资源。只要在训练的时候设置好对应的路径即可）中新建一个 pretrained 文件夹，并将这5个文件复制进去。

TensorFlow Object Detection API 是依赖一个特殊的设置文件进行训练的。在object_detection/samples/configs／文件夹下，有一些设置文件的示例。可以参考faster_rcnn_ inception_resnet_v2_atrous_pets.config 文件创建的设置文件。先将 faster_rcnn_inception_resnet_v2_atrous_pets.config 复制一份到voc文件夹下，改名为voc.config。该文件在将NWPU VHR-10转成pascal的时候也用到了，注意修改文件存放位置。

voc.config 一共有7 处需要修改的地方，修改后的配置如下：

model {
  faster_rcnn {
    num_classes: 10
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_inception_resnet_v2'
      first_stage_features_stride: 8
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 8
        width_stride: 8
      }
    }
    first_stage_atrous_rate: 2
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 17
    maxpool_kernel_size: 1
    maxpool_stride: 1
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
          schedule {
            step: 900000
            learning_rate: .00003
          }
          schedule {
            step: 1200000
            learning_rate: .000003
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "voc/NWPUdevkit/pretrained/model.ckpt"
  from_detection_checkpoint: true
  load_all_detection_checkpoint_vars: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "voc/NWPUdevkit/pascal_train.record"
  }
  label_map_path: "voc/NWPUdevkit/pascal_label_map.pbtxt"
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  num_examples: 650
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "voc/NWPUdevkit/pascal_val.record"
  }
  label_map_path: "voc/NWPUdevkit/pascal_label_map.pbtxt"
  shuffle: false
  num_readers: 1
}

最后，在voc\NWPUdevkit 文件夹中新建一个train dir 作为保存模型和日志的目录，使用下面的命令就可以开始训练了：
python legacy/train.py --train_dir voc/NWPUdevkit/train_dir/ --pipeline_config_path voc/NWPUdevkit/voc.config

训练的日志和最终的模型都会被保存在train_dir 中，因此，同样可以使用TensorBoard 来监控训练情况：
tensorboard --logdir voe/NWPUdevkit/train_dir/

文中提到内存和显存不足的处理方法，我没遇到，所以这里不介绍了。

二、导出模型并预测单张图片
如何将train_dir 中的checkpoint 文件导出并用于单张图片的目标检测？TensorFlow Object Detection API 提供了一个export_inference_graph.py 脚本用于导出训练好的模型。具体方法是执行：
python export_inference_graph.py --input_type_image_tensor --pipeline_config_path voc/NWPUdevkit/voc.config --trained_checkpoint_prefix voc/NWPUdevkit/train_dir/model.ckpt-276 --output_directory voc/NWPUdevkit/export/

其中， model.ckpt-276 表示使用第276 步保存的模型。读者需要根据voc/NWPUdevkit/train_dir／里实际保存的checkpoint，将276改为合适的数值。导出的模型是 voc/NWPUdevkit/export/frozen_inference_graph.pb 文件。

预测单张图片的时候主要用到了 object_detection\object_detection_tutorial.py 程序。我对该源码进行了简单修改，主要是禁止从网上下载模型（以便加载本地自己训练的模型）和修改预测数据和模型的路径。另外，该代码是适合在浏览器环境下的Jupyter Notebook运行的，改为pycharm环境下后，还需要修改显示框图部分。修改后的代码如下：

#from:
# models/object_detection_tutorial.ipynb at master · tensorflow/models · GitHub
#https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile

from distutils.version import StrictVersion
from collections import defaultdict
from io import StringIO

#Jerry added, 2019/04/25
import matplotlib

print(matplotlib.get_backend())

from matplotlib import pyplot as plt
from PIL import Image

#Jerry added.20190425
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
from tensorflow.models.research.object_detection.utils import ops as utils_ops

if StrictVersion(tf.__version__) < StrictVersion('1.12.0'):
  raise ImportError('Please upgrade your TensorFlow installation to v1.12.*.')

# This is needed to display the images.
#%matplotlib inline

from tensorflow.models.research.object_detection.utils import label_map_util
from tensorflow.models.research.object_detection.utils import visualization_utils as vis_util
matplotlib.use('TkAgg')
# What model to download.
# MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

# Pat to frozen detection graph. This is the actual model that is used for the object detection.
# PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'
PATH_TO_FROZEN_GRAPH = 'voc/NWPUdevkit/export' + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
# PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')
PATH_TO_LABELS = os.path.join('voc/NWPUdevkit', 'pascal_label_map.pbtxt')

opener = urllib.request.URLopener()
#opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
# print(""+os.getcwd())
print("mobile file:"+MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
  file_name = os.path.basename(file.name)
  if 'frozen_inference_graph.pb' in file_name:
    tar_file.extract(file, os.getcwd())

detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')

category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

# For the sake of simplicity we will use only 2 images:
# image1.jpg
# image2.jpg
# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1,9) ]

# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)

def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session() as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
      ]:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
              tensor_name)
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[1], image.shape[2])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: image})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])
      output_dict['detection_classes'] = output_dict[
          'detection_classes'][0].astype(np.int64)
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
      output_dict['detection_scores'] = output_dict['detection_scores'][0]
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict

for image_path in TEST_IMAGE_PATHS:
  image = Image.open(image_path)
  # the array based representation of the image will be used later in order to prepare the
  # result image with boxes and labels on it.
  image_np = load_image_into_numpy_array(image)
  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
  image_np_expanded = np.expand_dims(image_np, axis=0)
  # Actual detection.
  output_dict = run_inference_for_single_image(image_np_expanded, detection_graph)
  # Visualization of the results of a detection.
  vis_util.visualize_boxes_and_labels_on_image_array(
      image_np,
      output_dict['detection_boxes'],
      output_dict['detection_classes'],
      output_dict['detection_scores'],
      category_index,
      instance_masks=output_dict.get('detection_masks'),
      use_normalized_coordinates=True,
      line_thickness=8)
  plt.figure(figsize=IMAGE_SIZE)
  plt.imshow(image_np)
  plt.show()

运行，慢慢等每一张图片的识别效果吧。每次识别一张，类似在Jupyer Notebook上运行一样，会弹出窗口。关闭窗口之后再弹出下一张，直到识别完所有你想识别的照片。

Jerry_liu20080504

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
面向遥感图像的基于Tensorflow object detection api 的模型训练及多张单图片目标检测

一、通过 Tensorflow object detection api 训练新模型本文主要参考《21个项目玩转深度学习：基于TensorFlow的实践详解》2018年3月第1版。何之源编著，中的5.2.3节训练新的模型。在此首先感谢何之源老师。何之源老师的这本书中是下载pascal voc 官方的数据集。经尝试，好用，但是数据集不适合遥感图像相关的研究。然后...
复制链接

扫一扫