Detectron2 介绍
一、背景
Detectron是构建在Caffe2和Python之上,实现了10多篇计算机视觉最新的成果。Facebook AI研究院又开源了Detectron的升级版,也就是接下来我们要介绍的:Detectron2。
Detectron2 是 Facebook AI Research 推出的一个CV库,它实现了最新的目标检测算法,是对先前版本 Detectron 的完全重写,号称目标检测三大开源神器之一(Detectron2/mmDetection/SimpleDet)。
与 mmdetection 、TensorFlow Object Detection API一样,Detectron2 也是通过配置文件来设置各种参数,一点点修改到最后进行目标检测。
二、特性
- 基于PyTorch深度学习框架进行进一步的封装:PyTorch可以提供更直观的命令式编程模型,开发者可以更快的进行迭代模型设计和实验。
- 包含更多的功能:支持panoptic segmentation(Kaiming He et.al, CVPR2019),densepose,Cascade R-CNN,rotated bounding boxes等等。
- 可扩展性强:从Detectron2开始,Facebook引入了自定义设计,允许用户更加方便地定制适合自己任务的目标检测器。这种可扩展性使得Detectron2更加灵活。
- 更及时与全面的支持语义分割和全景分割的最新学术成果,而且将一直更新下去。
- 实现质量:从头开始重写推出的Detectron2,解决了原始Detectron中的几个实现问题,比原始Detectron更快。
模型是目标检测和分割最核心的部分,detectron2的模型也是模块的,主要包括4个核心的部分:
-
backbone:模型的CNN特征提取器,目前只支持resnet,另外一点是detectron2也把FPN作为backbone的一部分;
-
proposal_generator:候选框生成器,目前只支持RPN,一般用于faster R-CNN的一部分,其实RPN单拿出来也是一个简单的one-stage检测模型;
-
roi_heads:faster R-CNN系列的detector,包括ROIPooler,box_head,mask_head等,其中ROIPooler就是指的ROIPool和ROIAlign方法;
-
meta_arch:定义最终的模型,不能说是一个单独的模块,应该要集成backbone,proposal_generator和roi_heads构建最终模型。
模块的话好处是可以复用模块来进行组合,比如你可以组合backbone,proposal_generator和roi_heads来构建不同的模型。由于detectron2官方只是支持RCNN模型,所以实现的模块可能不够通用。
三、环境要求
- 带有Python≥3.6的Linux或macOS
- PyTorch≥1.3
- torchvision的PyTorch安装相匹配。你可以在pytorch.org上将它们安装在一起以确保这一点。
- 演示和可视化所需的OpenCV(可选)
- pycocotools
四、环境配置
官方教程文档:https://detectron2.readthedocs.io/en/latest/tutorials/install.html
Detectron2开源代码链接:
https://github.com/facebookresearch/detectron2
Detectron2的使用说明文档:
https://detectron2.readthedocs.io/index.html
Detectron2环境配置(Ubuntu):
https://www.bilibili.com/video/BV1bK4y1Y7Qq?from=search&seid=8204924034312637977
Detectron2环境配置(Win10):
https://www.bilibili.com/video/BV1jZ4y1W7Nb?from=search&seid=8204924034312637977
Win10环境配置感兴趣的话可以尝试配置
五、defaults配置详解
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
from .config import CfgNode as CN
# -----------------------------------------------------------------------------
# Convention about Training / Test specific parameters
# 关于训练和测试特定参数的设定
# -----------------------------------------------------------------------------
# Whenever an argument can be either used for training or for testing, the
# corresponding name will be post-fixed by a _TRAIN for a training parameter,
# or _TEST for a test-specific parameter.
# 当一个参数既可以用于训练又可以用于测试时,对应的名称将被一个_TRAIN作为训练参数,或_TEST为作为特定于测试的参数
# For example, the number of images during training will be
# IMAGES_PER_BATCH_TRAIN, while the number of images for testing will be
# IMAGES_PER_BATCH_TEST
# 例如,训练期间的图像数量将是IMAGES_PER_BATCH_TRAIN,而测试的图像数量将是IMAGES_PER_BATCH_TEST
# -----------------------------------------------------------------------------
# Config definition
# 配置定义
# -----------------------------------------------------------------------------
_C = CN() # 模型部分配置
# The version number, to upgrade from old configs to new ones if any
# changes happen. It's recommended to keep a VERSION in your config file.
# 如果版本号发生任何变化,将从旧的配置升级到新配置,建议在配置文件中保留一个版本
_C.VERSION = 2
_C.MODEL = CN()
_C.MODEL.LOAD_PROPOSALS = False # 表示在FasterRCNN中是否使用候选框
_C.MODEL.MASK_ON = False # 是不是分割任务,在mask-rcnn中会用到
_C.MODEL.KEYPOINT_ON = False # 是不是关键点检测的任务
_C.MODEL.DEVICE = "cuda" # 计算是运行在GPU(cuda)还是cpu,如果想使用cpu可以设置cpu
_C.MODEL.META_ARCHITECTURE = "GeneralizedRCNN" # 模型结构也就是指定网络结构构建方式
# Path (a file path, or URL like detectron2://.., https://..) to a checkpoint file
# to be loaded to the model. You can find available models in the model zoo.
# detectron2提供许多预先训练好的模型参数,在训练完之后可以将训练好的模型参数指定在此处
_C.MODEL.WEIGHTS = ""
# Values to be used for image normalization (BGR order, since INPUT.FORMAT defaults to BGR).
# To train on images of different number of channels, just set different mean & std.
# Default values are the mean pixel value from ImageNet: [103.53, 116.28, 123.675]
# 用于将图像进行标准化处理(因为 INPUT.FORMAT默认为BGR,所以图片颜色通道按BGR)。
# 要训练不同数量通道的图像,只需设置不同的均值和标准差即可
# 默认值为ImageNet的均值像素值[103.530, 116.280, 123.675]
_C.MODEL.PIXEL_MEAN = [103.530, 116.280, 123.675]
# When using pre-trained models in Detectron1 or any MSRA models,
# std has been absorbed into its conv1 weights, so the std needs to be set 1.
# Otherwise, you can use [57.375, 57.120, 58.395] (ImageNet std)
# 当你在Detectron1或者任何 MSRA 模型中使用预训练模型时,std已经被加载到它的conv1权重中,因此std需要设置为1
# 否则,你可以使用[57.375, 57.120, 58.395] (ImageNet std)
_C.MODEL.PIXEL_STD = [1.0, 1.0, 1.0]
# -----------------------------------------------------------------------------
# INPUT 输入部分配置
# -----------------------------------------------------------------------------
_C.INPUT = CN()
# Size of the smallest side of the image during training
# 指定训练集中图片的最小尺寸
_C.INPUT.MIN_SIZE_TRAIN = (800,)
# Sample size of smallest side by choice or random selection from range give by
# 从给定范围内随机选择或随机选择的最小边的样本
# INPUT.MIN_SIZE_TRAIN
_C.INPUT.MIN_SIZE_TRAIN_SAMPLING = "choice"
# Maximum size of the side of the image during training
# 训练期间图像的最大尺寸
_C.INPUT.MAX_SIZE_TRAIN = 1333
# Size of the smallest side of the image during testing. Set to zero to disable resize in testing.
# 测试期间图片最小边的大小,设置为0可以在测试中禁用大小调整
_C.INPUT.MIN_SIZE_TEST = 800
# Maximum size of the side of the image during testing
# 测试期间图像的最大尺寸
_C.INPUT.MAX_SIZE_TEST = 1333
# `True` if cropping is used for data augmentation during training
# 如果训练期间需要通过裁剪做数据增强,那这里设置为true
_C.INPUT.CROP = CN({"ENABLED": False})
# Cropping type:
# - "relative" crop (H * CROP.SIZE[0], W * CROP.SIZE[1]) part of an input of size (H, W)
# "relative"方法为以CROP.SIZE为系数,对原图的宽和高进行等比例缩放
# - "relative_range" uniformly sample relative crop size from between [CROP.SIZE[0], [CROP.SIZE[1]].
# "relative_range"方法从[CROP.SIZE[0], [CROP.SIZE[1]]中均匀取值
# and [1, 1] and use it as in "relative" scenario.
# - "absolute" crop part of an input with absolute size: (CROP.SIZE[0], CROP.SIZE[1]).
# "absolute"方法为选取一个绝对值
_C.INPUT.CROP.TYPE = "relative_range"
# Size of crop in range (0, 1] if CROP.TYPE is "relative" or "relative_range" and in number of
# pixels if CROP.TYPE is "absolute"
_C.INPUT.CROP.SIZE = [0.9, 0.9]
# Whether the model needs RGB, YUV, HSV etc.
# Should be one of the modes defined here, as we use PIL to read the image:
# https://pillow.readthedocs.io/en/stable/handbook/concepts.html#concept-modes
# with BGR being the one exception. One can set image format to BGR, we will
# internally use RGB for conversion and flip the channels over
# 颜色有很多模式,例如我们熟悉RGB或者是YUV HSV,不同颜色模式用于不同用途,有关颜色模式的更详细内容
# 大家可以自己上网找一找。这里我们只要指定颜色模式,detectron2内部根据指定模式进行颜色模式转换
_C.INPUT.FORMAT = "BGR"
# The ground truth mask format that the model will use.
# Mask R-CNN supports either "polygon" or "bitmask" as ground truth.
# 在此用于指定mask格式(也就是语义分割形式)其中ground truth格式可以使ploy(多边形)或者
# bitmask两种方式中任意选择一个
# ground truth 正确标注的数据
_C.INPUT.MASK_FORMAT = "polygon" # alternative: "bitmask"