使用TensorFlow Object Detection API训练自己的目标检测模型（二）配置环境、训练

陌生的天花板

已于 2022-03-29 15:50:39 修改

阅读量1.1k

点赞数

分类专栏： TEMI机器人文章标签： tensorflow 目标检测 cuda

于 2021-12-09 09:57:45 首次发布

本文链接：https://blog.csdn.net/weixin_41680653/article/details/121796281

版权

TEMI机器人专栏收录该内容

13 篇文章 4 订阅

订阅专栏

1. 环境配置

由于没有显卡，租了云端的GPU服务，现在有很多厂商都提供，阿里云、腾讯云等等，但是这些大的厂商显卡比较高级所以费用比较高，找了一个比较便宜的4块钱一小时智星云。这些服务器都差不多，操作看他们的文档，我们用ubuntu，ssh连上以后就跟操作自己的电脑差不多，需要注意的是如果关闭服务器，里边的数据就都没了，如果修改环境了，也都没了，和上网吧差不多，可能比网吧还便宜。用VNC可以连接图形界面，但是也很卡，没啥必要。我用这个服务器下载东西有时候很慢，用ssh拷贝很快，所以最好就直接拷贝过去，用filezilla就可以，挺好用的。

按照TensorFlowObjectionDetectionAPI官方文档安装object detection API，里边也有不少坑：

1. TensorFlowObjectDetectionAPI 版本

下载TensorFlowObjectDetectionAPI时必须下载master,如果不是master就不会有research这目录，如果是master就必须使用最新的tensorflow,所有前边第一步里只能选择对应的tensorflow版本，例如我的就是tensorflow2.8.0，这样我的系统环境就需要按照tensorflow2.8来配置。

Version	Python version	Compiler	Build tools	cuDNN	CUDA
tensorflow-2.8.0	3.7-3.10	GCC 7.3.1	Bazel 4.2.1	8.1	11.2

需要把服务器配置好的cuda,cudnn改成匹配版本

卸载cuda:

1. 到/usr/local/cuda-xx.x/bin目录下找到uninstall程序运行

2. 在/usr/local目录下把残余的cuda目录删除

3. 这个操作会把cudann一同删掉

安装cuda:

1. 首先确定自己需要的版本: tensorflow需要的cuda版本

2. 下载并安装cuda，这里注意安装的时候选不安装nvidia driver，自己去官网查看需要什么版本的额driver，我记得很久之前在装cuda时选安装driver会出错，现在不知道修复了没有，可以自己尝试一下。

3. 下载并安装cudann：很简单，就是把解压出来的文件复制到相应位置，记得写到环境变量LD_LIBRARY_PATH里，要不然是找不到的。

参考连接：

Ubuntu16.04下cuda和cudnn的卸载和升级_隔壁老王的博客-CSDN博客_卸载cudnn

2. pip 进行 ObjectDetectionAPI setup 时的一些错误

# From within TensorFlow/models/research/
cp object_detection/packages/tf2/setup.py .
python -m pip install .

不要换源，我换源以后会报一些错误，导致 install the object detection api 这步安装不成功

还有如果遇到一些奇奇怪怪的错误，直接重新开一个机器，不用跟错误死磕，浪费时间，例如：

DEPRECATION: A future pip version will change local packages to be built in-

这个setup会安装tensorflow2.8，之前最好把系统里原来的tensorflow卸载了，tensorflow2.x以后就不分cpu,gpu了，会自动识别。

3. cv2 的错误

cannot import name '_registerMatType' from 'cv2.cv2' (/home/vipuser/miniconda3/lib/python3.8/site-packages/cv2/cv2.cpython-38-x86_64-linux-gnu.so

运行训练网络脚本时报出的，又是一个版本的错误，比较坑，参考链接更换版本解决。

2. 训练

按照官方文档操作，主要是根据自己需要修改pipeline.config这个文件

model {
  ssd {
    num_classes: 2            # 这个根据自己要检测的类别来，背景也算一类
    image_resizer {
      fixed_shape_resizer {
        height: 300            # 这个根据自己的输入尺寸来，小一些的可以降低计算量
        width: 300
      }
    }
    feature_extractor {
      type: "ssd_mobilenet_v2_keras"
      depth_multiplier: 0.1    # 这个控制每次卷积channel数量
      min_depth: 8             # 这个是最小的channel数量
      conv_hyperparams {       # 下边这些参数我都没有调
        regularizer {
          l2_regularizer {
            weight: 3.9999998989515007e-05
          }
        }
        initializer {
          truncated_normal_initializer {
            mean: 0.0
            stddev: 0.029999999329447746
          }
        }
        activation: RELU_6
        batch_norm {
          decay: 0.9700000286102295
          center: true
          scale: true
          epsilon: 0.0010000000474974513
          train: true
        }
      }
      override_base_feature_extractor_hyperparams: true
    }
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    box_predictor {
      convolutional_box_predictor {
        conv_hyperparams {
          regularizer {
            l2_regularizer {
              weight: 3.9999998989515007e-05
            }
          }
          initializer {
            random_normal_initializer {
              mean: 0.0
              stddev: 0.009999999776482582
            }
          }
          activation: RELU_6
          batch_norm {
            decay: 0.9700000286102295
            center: true
            scale: true
            epsilon: 0.0010000000474974513
            train: true
          }
        }
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.800000011920929
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        class_prediction_bias_init: -4.599999904632568
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.20000000298023224
        max_scale: 0.949999988079071
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.33329999446868896
      }
    }
    post_processing {
      batch_non_max_suppression {
        score_threshold: 9.99999993922529e-09
        iou_threshold: 0.6000000238418579
        max_detections_per_class: 100
        max_total_detections: 100
        use_static_shapes: false
      }
      score_converter: SIGMOID
    }
    normalize_loss_by_num_matches: true
    loss {
      localization_loss {
        weighted_smooth_l1 {
          delta: 1.0
        }
      }
      classification_loss {
        weighted_sigmoid_focal {
          gamma: 2.0
          alpha: 0.75
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    encode_background_as_zeros: true
    normalize_loc_loss_by_codesize: true
    inplace_batchnorm_update: true
    freeze_batchnorm: false
  }
}
train_config {
  batch_size: 32            # 这个根据自己的显卡内存来，不要out of memory
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
  sync_replicas: true
  optimizer {
    momentum_optimizer {
      learning_rate {        # 学习率可以自己多调一调试一试，这玩意儿比较玄学
        cosine_decay_learning_rate {
          learning_rate_base: 0.800000011920929
          total_steps: 50000
          warmup_learning_rate: 0.13333000242710114
          warmup_steps: 2000
        }
      }
      momentum_optimizer_value: 0.8999999761581421
    }
    use_moving_average: false
  }
  num_steps: 50000
  startup_delay_steps: 0.0
  replicas_to_aggregate: 8
  max_number_of_boxes: 10
  unpad_groundtruth_tensors: false
}
train_input_reader {            # 改成自己的路径
  label_map_path: "annotations/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "annotations/train.record"
  }
}
eval_config {                   # 这个只有eval的时候用到，需要安装coco-api 
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
}
eval_input_reader {
  label_map_path: "annotations/label_map.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "annotations/test.record"
  }
}

训练好了以后还要转成tflite，先转成pb文件，用这个脚本

/research/object_detection/export_tflite_graph_tf2.py

这里还有好多转换脚本，但是我看这个用的是TF2，其他的我还没有尝试，不知道能不能用，不要用教程里那个脚本，那个是本地推理用的。

再用以下脚本转换成tflite文件


import tensorflow as tf

#Replace the path below to your own
saved_model_dir="/home/ss/TensorFlow/workspace/training_demo/models/ssd_mobilenet_v2_320x320_coco17_tpu-8/saved_model"
# Convert the model
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) # path to the SavedModel directory
#converter.optimizations = [tf.lite.Optimize.DEFAULT] 这个可以做训练后量化
#converter.target_spec.supported_types = [tf.float16]
tflite_model = converter.convert()

# Save the model.
with open('./my_ssd.tflite', 'wb') as f:
  f.write(tflite_model)

训练完成以后可以使用如下脚本对模型进行评估

python model_main_tf2.py --model_dir=models/my_ssd_resnet50_v1_fpn --pipeline_config_path=models/my_ssd_resnet50_v1_fpn/pipeline.config --checkpoint_dir=models/my_ssd_resnet50_v1_fpn

陌生的天花板

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
使用TensorFlow Object Detection API训练自己的目标检测模型（二）配置环境、训练

由于没有显卡，租了云端的GPU服务，现在有很多厂商都提供，阿里云、腾讯云等等，但是这些大的厂商显卡比较高级所以费用比较高，找了一个比较便宜的4块钱一小时智星云。这些服务器都差不多，操作看他们的文档，我们用ubuntu，ssh连上以后就跟操作自己的电脑差不多，需要注意的是如果关闭服务器，里边的数据就都没了，如果修改环境了，也都没了，和上网吧差不多，可能比网吧还便宜...
复制链接

扫一扫

专栏目录