Detectorn2预训练模型复现：数据准备、训练命令、日志分析与输出目录

fydw_715

于 2024-09-08 04:49:16 发布

阅读量1.9k

点赞数 30

分类专栏： Detectron2 文章标签：人工智能

本文链接：https://blog.csdn.net/fydw_715/article/details/142007810

版权

Detectron2 专栏收录该内容

4 篇文章

订阅专栏

Detectorn2预训练模型复现：数据准备、训练命令、日志分析与输出目录

在深度学习项目中，目标检测是一项重要的任务。本文将详细介绍如何使用Detectron2进行目标检测模型的复现训练，涵盖训练数据准备、训练命令、训练日志分析、训练指标以及训练输出目录的各个文件及其作用。特别地，我们将演示在训练过程中出现中断后，如何使用 resume 功能继续训练，并将我们复现的模型与Model Zoo中的模型进行比较。

一、训练数据准备

COCO（Common Objects in Context）数据集是一个广泛使用的图像识别、目标检测和分割数据集。我们将使用COCO数据集进行模型训练和评估。以下是COCO数据集的目录结构：

/mnt/coco
├── annotations
├── annotations_trainval2014.zip
├── annotations_trainval2017.zip
├── test2014
├── test2014.zip
├── test2017
├── test2017.zip
├── train2014
├── train2014.zip
├── train2017
├── train2017.zip
├── val2014
├── val2014.zip
└── val2017
    └── val2017.zip

目录和文件解释

annotations/：存放COCO数据集的注释文件，这些文件通常是JSON格式，包含了图像的标签、边界框、分割掩码等信息。
annotations_trainval2014.zip 和 annotations_trainval2017.zip：COCO 2014和2017训练和验证集的注释文件压缩包。
test2014/ 和 test2017/：存放COCO 2014和2017测试集的图像文件，用于模型测试。
test2014.zip 和 test2017.zip：COCO 2014和2017测试集的图像文件压缩包。
train2014/ 和 train2017/：存放COCO 2014和2017训练集的图像文件，用于模型训练。
train2014.zip 和 train2017.zip：COCO 2014和2017训练集的图像文件压缩包。
val2014/ 和 val2017/：存放COCO 2014和2017验证集的图像文件，用于模型验证。
val2014.zip 和 val2017.zip：COCO 2014和2017验证集的图像文件压缩包。

二、训练命令

在开始训练之前，需要设置环境变量来指定数据集的路径：

export DETECTRON2_DATASETS=/mnt/

第一个训练命令

nohup ./train_net.py --config-file ../configs/COCO-Detection/faster_rcnn_R_50_FPN_1x.yaml --num-gpus 8 OUTPUT_DIR /mnt/output/ > train.log 2>&1 &

nohup：使命令在后台运行，即使关闭终端也不会中断。
./train_net.py：训练脚本，负责启动训练过程。
–config-file：指定配置文件路径。
–num-gpus 8：使用8个GPU进行训练。
OUTPUT_DIR /mnt/output/：指定输出目录。
> train.log 2>&1：将标准输出和错误输出重定向到 train.log。
&：将命令放到后台运行。

第二个训练命令（使用resume功能）

在训练过程中出现中断后，我们可以使用 resume 功能继续训练：

nohup ./train_net.py --config-file ../configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml --num-gpus 8 --resume OUTPUT_DIR /mnt/output/ MODEL.WEIGHTS /mnt/output/model_0029999.pth > train.log 2>&1 &

–config-file：指定配置文件路径。
–resume：从上一次中断的地方继续训练。
MODEL.WEIGHTS：指定预训练模型的权重文件路径。

三、训练日志分析

nohup: ignoring input
Command Line Args: Namespace(config_file='../configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=8, num_machines=1, opts=['OUTPUT_DIR', '/mnt/output/', 'MODEL.WEIGHTS', '/mnt/output/model_0029999.pth'], resume=True)
[09/06 02:16:26 detectron2]: Rank of current process: 0. World size: 8
[09/06 02:16:30 detectron2]: Environment info:
-------------------------------  --------------------------------------------------------------
sys.platform                     linux
Python                           3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
numpy                            1.22.2
detectron2                       0.6 @/root/detectron2/detectron2
Compiler                         GCC 9.4
CUDA compiler                    CUDA 12.0
detectron2 arch flags            5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6, 9.0
DETECTRON2_ENV_MODULE            <not set>
PyTorch                          1.14.0a0+44dac51 @/usr/local/lib/python3.8/dist-packages/torch
PyTorch debug build              False
torch._C._GLIBCXX_USE_CXX11_ABI  True
GPU available                    Yes
GPU 0,1,2,3,4,5,6,7              Tesla V100-SXM2-16GB (arch=7.0)
Driver version                   535.161.08
CUDA_HOME                        /usr/local/cuda
Pillow                           9.2.0
torchvision                      0.15.0a0 @/usr/local/lib/python3.8/dist-packages/torchvision
torchvision arch flags           5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6, 9.0
fvcore                           0.1.5.post20221221
iopath                           0.1.9
cv2                              4.6.0
-------------------------------  --------------------------------------------------------------
PyTorch built with:
  - GCC 9.4
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.1-Product Build 20201104 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.0 (Git Hash N/A)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - CUDA Runtime 12.0
  - NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_90,code=compute_90
  - CuDNN 8.7  (built against CUDA 11.8)
  - Magma 2.6.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.0, CUDNN_VERSION=8.7.0, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS=-fno-gnu-unique -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=1.14.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

[09/06 02:16:30 detectron2]: Command line arguments: Namespace(config_file='../configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=8, num_machines=1, opts=['OUTPUT_DIR', '/mnt/output/', 'MODEL.WEIGHTS', '/mnt/output/model_0029999.pth'], resume=True)
[09/06 02:16:30 detectron2]: Contents of args.config_file=../configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml:
_BASE_: "../Base-RCNN-FPN.yaml"
MODEL:
  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
  MASK_ON: False
  RESNETS:
    DEPTH: 50
SOLVER:
  STEPS: (210000, 250000)
  MAX_ITER: 270000

[09/06 02:16:30 detectron2]: Running with full config:
CUDNN_BENCHMARK: false
DATALOADER:
  ASPECT_RATIO_GROUPING: true
  FILTER_EMPTY_ANNOTATIONS: true
  NUM_WORKERS: 4
  REPEAT_SQRT: true
  REPEAT_THRESHOLD: 0.0
  SAMPLER_TRAIN: TrainingSampler
DATASETS:
  PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
  PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
  PROPOSAL_FILES_TEST: []
  PROPOSAL_FILES_TRAIN: []
  TEST:
  - coco_2017_val
  TRAIN:
  - coco_2017_train
GLOBAL:
  HACK: 1.0
INPUT:
  CROP:
    ENABLED: false
    SIZE:
    - 0.9
    - 0.9
    TYPE: relative_range
  FORMAT: BGR
  MASK_FORMAT: polygon
  MAX_SIZE_TEST: 1333
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MIN_SIZE_TRAIN:
  - 640
  - 672
  - 704
  - 736
  - 768
  - 800
  MIN_SIZE_TRAIN_SAMPLING: choice
  RANDOM_FLIP: horizontal
MODEL:
  ANCHOR_GENERATOR:
    ANGLES:
    - - -90
      - 0
      - 90
    ASPECT_RATIOS:
    - - 0.5
      - 1.0
      - 2.0
    NAME: DefaultAnchorGenerator
    OFFSET: 0.0
    SIZES:
    - - 32
    - - 64
    - - 128
    - - 256
    - - 512
  BACKBONE:
    FREEZE_AT: 2
    NAME: build_resnet_fpn_backbone
  DEVICE: cuda
  FPN:
    FUSE_TYPE: sum
    IN_FEATURES:
    - res2
    - res3
    - res4
    - res5
    NORM: ''
    OUT_CHANNELS: 256
  KEYPOINT_ON: false
  LOAD_PROPOSALS: false
  MASK_ON: false
  META_ARCHITECTURE: GeneralizedRCNN
  PANOPTIC_FPN:
    COMBINE:
      ENABLED: true
      INSTANCES_CONFIDENCE_THRESH: 0.5
      OVERLAP_THRESH: 0.5
      STUFF_AREA_LIMIT: 4096
    INSTANCE_LOSS_WEIGHT: 1.0
  PIXEL_MEAN:
  - 103.53
  - 116.28
  - 123.675
  PIXEL_STD:
  - 1.0
  - 1.0
  - 1.0
  PROPOSAL_GENERATOR:
    MIN_SIZE: 0
    NAME: RPN
  RESNETS:
    DEFORM_MODULATED: false
    DEFORM_NUM_GROUPS: 1
    DEFORM_ON_PER_STAGE:
    - false
    - false
    - false
    - false
    DEPTH: 50
    NORM: FrozenBN
    NUM_GROUPS: 1
    OUT_FEATURES:
    - res2
    - res3
    - res4
    - res5
    RES2_OUT_CHANNELS: 256
    RES5_DILATION: 1
    STEM_OUT_CHANNELS: 64
    STRIDE_IN_1X1: true
    WIDTH_PER_GROUP: 64
  RETINANET:
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_WEIGHTS: &id002
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    FOCAL_LOSS_ALPHA: 0.25
    FOCAL_LOSS_GAMMA: 2.0
    IN_FEATURES:
    - p3
    - p4
    - p5
    - p6
    - p7
    IOU_LABELS:
    - 0
    - -1
    - 1
    IOU_THRESHOLDS:
    - 0.4
    - 0.5
    NMS_THRESH_TEST: 0.5
    NORM: ''
    NUM_CLASSES: 80
    NUM_CONVS: 4
    PRIOR_PROB: 0.01
    SCORE_THRESH_TEST: 0.05
    SMOOTH_L1_LOSS_BETA: 0.1
    TOPK_CANDIDATES_TEST: 1000
  ROI_BOX_CASCADE_HEAD:
    BBOX_REG_WEIGHTS:
    - &id001
      - 10.0
      - 10.0
      - 5.0
      - 5.0
    - - 20.0
      - 20.0
      - 10.0
      - 10.0
    - - 30.0
      - 30.0
      - 15.0
      - 15.0
    IOUS:
    - 0.5
    - 0.6
    - 0.7
  ROI_BOX_HEAD:
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_LOSS_WEIGHT: 1.0
    BBOX_REG_WEIGHTS: *id001
    CLS_AGNOSTIC_BBOX_REG: false
    CONV_DIM: 256
    FC_DIM: 1024
    FED_LOSS_FREQ_WEIGHT_POWER: 0.5
    FED_LOSS_NUM_CLASSES: 50
    NAME: FastRCNNConvFCHead
    NORM: ''
    NUM_CONV: 0
    NUM_FC: 2
    POOLER_RESOLUTION: 7
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
    SMOOTH_L1_BETA: 0.0
    TRAIN_ON_PRED_BOXES: false
    USE_FED_LOSS: false
    USE_SIGMOID_CE: false
  ROI_HEADS:
    BATCH_SIZE_PER_IMAGE: 512
    IN_FEATURES:
    - p2
    - p3
    - p4
    - p5
    IOU_LABELS:
    - 0
    - 1
    IOU_THRESHOLDS:
    - 0.5
    NAME: StandardROIHeads
    NMS_THRESH_TEST: 0.5
    NUM_CLASSES: 80
    POSITIVE_FRACTION: 0.25
    PROPOSAL_APPEND_GT: true
    SCORE_THRESH_TEST: 0.05
  ROI_KEYPOINT_HEAD:
    CONV_DIMS:
    - 512
    - 512
    - 512
    - 512
    - 512
    - 512
    - 512
    - 512
    LOSS_WEIGHT: 1.0
    MIN_KEYPOINTS_PER_IMAGE: 1
    NAME: KRCNNConvDeconvUpsampleHead
    NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
    NUM_KEYPOINTS: 17
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
  ROI_MASK_HEAD:
    CLS_AGNOSTIC_MASK: false
    CONV_DIM: 256
    NAME: MaskRCNNConvUpsampleHead
    NORM: ''
    NUM_CONV: 4
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
  RPN:
    BATCH_SIZE_PER_IMAGE: 256
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_LOSS_WEIGHT: 1.0
    BBOX_REG_WEIGHTS: *id002
    BOUNDARY_THRESH: -1
    CONV_DIMS:
    - -1
    HEAD_NAME: StandardRPNHead
    IN_FEATURES:
    - p2
    - p3
    - p4
    - p5
    - p6
    IOU_LABELS:
    - 0
    - -1
    - 1
    IOU_THRESHOLDS:
    - 0.3
    - 0.7
    LOSS_WEIGHT: 1.0
    NMS_THRESH: 0.7
    POSITIVE_FRACTION: 0.5
    POST_NMS_TOPK_TEST: 1000
    POST_NMS_TOPK_TRAIN: 1000
    PRE_NMS_TOPK_TEST: 1000
    PRE_NMS_TOPK_TRAIN: 2000
    SMOOTH_L1_BETA: 0.0
  SEM_SEG_HEAD:
    COMMON_STRIDE: 4
    CONVS_DIM: 128
    IGNORE_VALUE: 255
    IN_FEATURES:
    - p2
    - p3
    - p4
    - p5
    LOSS_WEIGHT: 1.0
    NAME: SemSegFPNHead
    NORM: GN
    NUM_CLASSES: 54
  WEIGHTS: /mnt/output/model_0029999.pth
OUTPUT_DIR: /mnt/output/
SEED: -1
SOLVER:
  AMP:
    ENABLED: false
  BASE_LR: 0.02
  BASE_LR_END: 0.0
  BIAS_LR_FACTOR: 1.0
  CHECKPOINT_PERIOD: 5000
  CLIP_GRADIENTS:
    CLIP_TYPE: value
    CLIP_VALUE: 1.0
    ENABLED: false
    NORM_TYPE: 2.0
  GAMMA: 0.1
  IMS_PER_BATCH: 16
  LR_SCHEDULER_NAME: WarmupMultiStepLR
  MAX_ITER: 270000
  MOMENTUM: 0.9
  NESTEROV: false
  NUM_DECAYS: 3
  REFERENCE_WORLD_SIZE: 0
  RESCALE_INTERVAL: false
  STEPS:
  - 210000
  - 250000
  WARMUP_FACTOR: 0.001
  WARMUP_ITERS: 1000
  WARMUP_METHOD: linear
  WEIGHT_DECAY: 0.0001
  WEIGHT_DECAY_BIAS: null
  WEIGHT_DECAY_NORM: 0.0
TEST:
  AUG:
    ENABLED: false
    FLIP: true
    MAX_SIZE: 4000
    MIN_SIZES:
    - 400
    - 500
    - 600
    - 700
    - 800
    - 900
    - 1000
    - 1100
    - 1200
  DETECTIONS_PER_IMAGE: 100
  EVAL_PERIOD: 0
  EXPECTED_RESULTS: []
  KEYPOINT_OKS_SIGMAS: []
  PRECISE_BN:
    ENABLED: false
    NUM_ITER: 200
VERSION: 2
VIS_PERIOD: 0

[09/06 02:16:30 detectron2]: Full config saved to /mnt/output/config.yaml
[09/06 02:16:30 d2.utils.env]: Using a generated random seed 30790687
[09/06 02:16:32 d2.engine.defaults]: Model:
GeneralizedRCNN(
  (backbone): FPN(
    (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (top_block): LastLevelMaxPool()
    (bottom_up): ResNet(
      (stem): BasicStem(
        (conv1): Conv2d(
          3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
      )
      (res2): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv1): Conv2d(
            64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv2): Conv2d(
            64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv3): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv2): Conv2d(
            64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv3): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv2): Conv2d(
            64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv3): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
        )
      )
      (res3): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv1): Conv2d(
            256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
        (3): BottleneckBlock(
          (conv1): Conv2d(
            512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
      )
      (res4): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
          (conv1): Conv2d(
            512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (3): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (4): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (5): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
      )
      (res5): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
          (conv1): Conv2d(
            1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv2): Conv2d(
            512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv3): Conv2d(
            512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv2): Conv2d(
            512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv3): Conv2d(
            512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv2): Conv2d(
            512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv3): Conv2d(
            512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
        )
      )
    )
  )
  (proposal_generator): RPN(
    (rpn_head): StandardRPNHead(
      (conv): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
        (activation): ReLU()
      )
      (objectness_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
      (anchor_deltas): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
    )
    (anchor_generator): DefaultAnchorGenerator(
      (cell_anchors): BufferList()
    )
  )
  (roi_heads): StandardROIHeads(
    (box_pooler): ROIPooler(
      (level_poolers): ModuleList(
        (0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=0, aligned=True)
        (1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=0, aligned=True)
        (2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
        (3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, aligned=True)
      )
    )
    (box_head): FastRCNNConvFCHead(
      (flatten): Flatten(start_dim=1, end_dim=-1)
      (fc1): Linear(in_features=12544, out_features=1024, bias=True)
      (fc_relu1): ReLU()
      (fc2): Linear(in_features=1024, out_features=1024, bias=True)
      (fc_relu2): ReLU()
    )
    (box_predictor): FastRCNNOutputLayers(
      (cls_score): Linear(in_features=1024, out_features=81, bias=True)
      (bbox_pred): Linear(in_features=1024, out_features=320, bias=True)
    )
  )
)
[09/06 02:16:53 d2.data.datasets.coco]: Loading /mnt/coco/annotations/instances_train2017.json takes 21.55 seconds.
[09/06 02:16:55 d2.data.datasets.coco]: Loaded 118287 images in COCO format from /mnt/coco/annotations/instances_train2017.json
[09/06 02:17:05 d2.data.build]: Removed 1021 images with no usable annotations. 117266 images left.
[09/06 02:17:10 d2.data.build]: Distribution of instances among all 80 categories:
|   category    | #instances   |   category   | #instances   |   category    | #instances   |
|:-------------:|:-------------|:------------:|:-------------|:-------------:|:-------------|
|    person     | 257253       |   bicycle    | 7056         |      car      | 43533        |
|  motorcycle   | 8654         |   airplane   | 5129         |      bus      | 6061         |
|     train     | 4570         |    truck     | 9970         |     boat      | 10576        |
| traffic light | 12842        | fire hydrant | 1865         |   stop sign   | 1983         |
| parking meter | 1283         |    bench     | 9820         |     bird      | 10542        |
|      cat      | 4766         |     dog      | 5500         |     horse     | 6567         |
|     sheep     | 9223         |     cow      | 8014         |   elephant    | 5484         |
|     bear      | 1294         |    zebra     | 5269         |    giraffe    | 5128         |
|   backpack    | 8714         |   umbrella   | 11265        |    handbag    | 12342        |
|      tie      | 6448         |   suitcase   | 6112         |    frisbee    | 2681         |
|     skis      | 6623         |  snowboard   | 2681         |  sports ball  | 6299         |
|     kite      | 8802         | baseball bat | 3273         | baseball gl.. | 3747         |
|  skateboard   | 5536         |  surfboard   | 6095         | tennis racket | 4807         |
|    bottle     | 24070        |  wine glass  | 7839         |      cup      | 20574        |
|     fork      | 5474         |    knife     | 7760         |     spoon     | 6159         |
|     bowl      | 14323        |    banana    | 9195         |     apple     | 5776         |
|   sandwich    | 4356         |    orange    | 6302         |   broccoli    | 7261         |
|    carrot     | 7758         |   hot dog    | 2884         |     pizza     | 5807         |
|     donut     | 7005         |     cake     | 6296         |     chair     | 38073        |
|     couch     | 5779         | potted plant | 8631         |      bed      | 4192         |
| dining table  | 15695        |    toilet    | 4149         |      tv       | 5803         |
|    laptop     | 4960         |    mouse     | 2261         |    remote     | 5700         |
|   keyboard    | 2854         |  cell phone  | 6422         |   microwave   | 1672         |
|     oven      | 3334         |   toaster    | 225          |     sink      | 5609         |
| refrigerator  | 2634         |     book     | 24077        |     clock     | 6320         |
|     vase      | 6577         |   scissors   | 1464         |  teddy bear   | 4729         |
|  hair drier   | 198          |  toothbrush  | 1945         |               |              |
|     total     | 849949       |              |              |               |              |
[09/06 02:17:10 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()]
[09/06 02:17:10 d2.data.build]: Using training sampler TrainingSampler
[09/06 02:17:11 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'>
[09/06 02:17:11 d2.data.common]: Serializing 117266 elements to byte tensors and concatenating them all ...
[09/06 02:17:16 d2.data.common]: Serialized dataset takes 450.77 MiB
[09/06 02:17:16 d2.data.build]: Making batched data loader with batch_size=2
[09/06 02:17:19 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /mnt/output/model_0029999.pth ...
[09/06 02:17:19 fvcore.common.checkpoint]: [Checkpointer] Loading from /mnt/output/model_0029999.pth ...
[09/06 02:17:19 fvcore.common.checkpoint]: Loading trainer from /mnt/output/model_0029999.pth ...
[09/06 02:17:19 d2.engine.hooks]: Loading scheduler from state_dict ...
[09/06 02:17:20 d2.engine.train_loop]: Starting training from iteration 30000
[09/06 02:17:35 d2.utils.events]:  eta: 13:29:58  iter: 30019  total_loss: 0.6783  loss_cls: 0.2512  loss_box_reg: 0.2507  loss_rpn_cls: 0.05119  loss_rpn_loc: 0.08998    time: 0.2031  last_time: 0.2046  data_time: 0.4823  last_data_time: 0.0075   lr: 0.02  max_mem: 2903M
[09/06 02:17:39 d2.utils.events]:  eta: 13:29:54  iter: 30039  total_loss: 0.7224  loss_cls: 0.2514  loss_box_reg: 0.2828  loss_rpn_cls: 0.05798  loss_rpn_loc: 0.09668    time: 0.2028  last_time: 0.2075  data_time: 0.0088  last_data_time: 0.0105   lr: 0.02  max_mem: 2903M
[09/06 02:17:43 d2.utils.events]:  eta: 13:22:12  iter: 30059  total_loss: 0.6778  loss_cls: 0.2609  loss_box_reg: 0.28  loss_rpn_cls: 0.0531  loss_rpn_loc: 0.08019    time: 0.2015  last_time: 0.1991  data_time: 0.0087  last_data_time: 0.0075   lr: 0.02  max_mem: 2903M
[09/06 02:17:47 d2.utils.events]:  eta: 13:20:22  iter: 30079  total_loss: 0.6472  loss_cls: 0.2415  loss_box_reg: 0.2495  loss_rpn_cls: 0.04993  loss_rpn_loc: 0.09604    time: 0.2003  last_time: 0.1824  data_time: 0.0092  last_data_time: 0.0114   lr: 0.02  max_mem: 2903M
[09/06 02:17:51 d2.utils.events]:  eta: 13:20:18  iter: 30099  total_loss: 0.6032  loss_cls: 0.2444  loss_box_reg: 0.2445  loss_rpn_cls: 0.05288  loss_rpn_loc: 0.0732    time: 0.2008  last_time: 0.2081  data_time: 0.0090  last_data_time: 0.0170   lr: 0.02  max_mem: 2904M
[09/06 02:17:55 d2.utils.events]:  eta: 13:20:14  iter: 30119  total_loss: 0.5806  loss_cls: 0.2233  loss_box_reg: 0.2357  loss_rpn_cls: 0.04176  loss_rpn_loc: 0.07658    time: 0.2006  last_time: 0.2017  data_time: 0.0080  last_data_time: 0.0070   lr: 0.02  max_mem: 2904M
[09/06 15:45:25 d2.utils.events]:  eta: 0:01:19  iter: 269599  total_loss: 0.4819  loss_cls: 0.1778  loss_box_reg: 0.221  loss_rpn_cls: 0.02964  loss_rpn_loc: 0.05748    time: 0.1988  last_time: 0.2063  data_time: 0.0083  last_data_time: 0.0076   lr: 0.0002  max_mem: 2904M
[09/06 15:45:29 d2.utils.events]:  eta: 0:01:15  iter: 269619  total_loss: 0.4636  loss_cls: 0.1539  loss_box_reg: 0.2098  loss_rpn_cls: 0.02256  loss_rpn_loc: 0.06406    time: 0.1988  last_time: 0.2030  data_time: 0.0086  last_data_time: 0.0077   lr: 0.0002  max_mem: 2904M
[09/06 15:45:33 d2.utils.events]:  eta: 0:01:11  iter: 269639  total_loss: 0.5086  loss_cls: 0.1783  loss_box_reg: 0.2321  loss_rpn_cls: 0.02421  loss_rpn_loc: 0.06799    time: 0.1988  last_time: 0.1933  data_time: 0.0089  last_data_time: 0.0124   lr: 0.0002  max_mem: 2904M
[09/06 15:45:37 d2.utils.events]:  eta: 0:01:07  iter: 269659  total_loss: 0.4706  loss_cls: 0.1592  loss_box_reg: 0.2124  loss_rpn_cls: 0.02371  loss_rpn_loc: 0.06897    time: 0.1988  last_time: 0.2003  data_time: 0.0083  last_data_time: 0.0089   lr: 0.0002  max_mem: 2904M
[09/06 15:45:41 d2.utils.events]:  eta: 0:01:03  iter: 269679  total_loss: 0.4713  loss_cls: 0.1709  loss_box_reg: 0.2129  loss_rpn_cls: 0.02803  loss_rpn_loc: 0.06166    time: 0.1988  last_time: 0.1977  data_time: 0.0081  last_data_time: 0.0068   lr: 0.0002  max_mem: 2904M
[09/06 15:45:45 d2.utils.events]:  eta: 0:00:59  iter: 269699  total_loss: 0.4516  loss_cls: 0.1572  loss_box_reg: 0.2083  loss_rpn_cls: 0.02273  loss_rpn_loc: 0.06108    time: 0.1988  last_time: 0.1891  data_time: 0.0086  last_data_time: 0.0120   lr: 0.0002  max_mem: 2904M
[09/06 15:45:49 d2.utils.events]:  eta: 0:00:55  iter: 269719  total_loss: 0.4771  loss_cls: 0.1766  loss_box_reg: 0.2144  loss_rpn_cls: 0.02578  loss_rpn_loc: 0.06036    time: 0.1988  last_time: 0.1882  data_time: 0.0081  last_data_time: 0.0060   lr: 0.0002  max_mem: 2904M
[09/06 15:45:53 d2.utils.events]:  eta: 0:00:51  iter: 269739  total_loss: 0.4586  loss_cls: 0.168  loss_box_reg: 0.2119  loss_rpn_cls: 0.02173  loss_rpn_loc: 0.05767    time: 0.1988  last_time: 0.1921  data_time: 0.0091  last_data_time: 0.0063   lr: 0.0002  max_mem: 2904M
[09/06 15:45:57 d2.utils.events]:  eta: 0:00:47  iter: 269759  total_loss: 0.442  loss_cls: 0.1605  loss_box_reg: 0.2144  loss_rpn_cls: 0.02168  loss_rpn_loc: 0.05123    time: 0.1988  last_time: 0.1978  data_time: 0.0084  last_data_time: 0.0089   lr: 0.0002  max_mem: 2904M
[09/06 15:46:01 d2.utils.events]:  eta: 0:00:43  iter: 269779  total_loss: 0.4803  loss_cls: 0.1671  loss_box_reg: 0.2056  loss_rpn_cls: 0.02325  loss_rpn_loc: 0.06247    time: 0.1988  last_time: 0.1842  data_time: 0.0091  last_data_time: 0.0063   lr: 0.0002  max_mem: 2904M
[09/06 15:46:05 d2.utils.events]:  eta: 0:00:39  iter: 269799  total_loss: 0.4994  loss_cls: 0.181  loss_box_reg: 0.2173  loss_rpn_cls: 0.02877  loss_rpn_loc: 0.06976    time: 0.1988  last_time: 0.2037  data_time: 0.0082  last_data_time: 0.0078   lr: 0.0002  max_mem: 2904M
[09/06 15:46:09 d2.utils.events]:  eta: 0:00:35  iter: 269819  total_loss: 0.4605  loss_cls: 0.162  loss_box_reg: 0.2145  loss_rpn_cls: 0.02834  loss_rpn_loc: 0.06403    time: 0.1988  last_time: 0.2047  data_time: 0.0078  last_data_time: 0.0067   lr: 0.0002  max_mem: 2904M
[09/06 15:46:13 d2.utils.events]:  eta: 0:00:31  iter: 269839  total_loss: 0.5042  loss_cls: 0.1746  loss_box_reg: 0.2268  loss_rpn_cls: 0.02664  loss_rpn_loc: 0.05538    time: 0.1988  last_time: 0.2013  data_time: 0.0087  last_data_time: 0.0077   lr: 0.0002  max_mem: 2904M
[09/06 15:46:17 d2.utils.events]:  eta: 0:00:27  iter: 269859  total_loss: 0.4772  loss_cls: 0.1592  loss_box_reg: 0.2132  loss_rpn_cls: 0.02413  loss_rpn_loc: 0.05851    time: 0.1988  last_time: 0.1826  data_time: 0.0074  last_data_time: 0.0107   lr: 0.0002  max_mem: 2904M
[09/06 15:46:21 d2.utils.events]:  eta: 0:00:23  iter: 269879  total_loss: 0.4978  loss_cls: 0.1759  loss_box_reg: 0.2295  loss_rpn_cls: 0.02774  loss_rpn_loc: 0.07485    time: 0.1988  last_time: 0.2152  data_time: 0.0080  last_data_time: 0.0072   lr: 0.0002  max_mem: 2904M
[09/06 15:46:26 d2.utils.events]:  eta: 0:00:19  iter: 269899  total_loss: 0.4582  loss_cls: 0.157  loss_box_reg: 0.2078  loss_rpn_cls: 0.02078  loss_rpn_loc: 0.05431    time: 0.1988  last_time: 0.2094  data_time: 0.0076  last_data_time: 0.0074   lr: 0.0002  max_mem: 2904M
[09/06 15:46:30 d2.utils.events]:  eta: 0:00:15  iter: 269919  total_loss: 0.477  loss_cls: 0.1648  loss_box_reg: 0.2149  loss_rpn_cls: 0.02556  loss_rpn_loc: 0.06299    time: 0.1988  last_time: 0.1939  data_time: 0.0075  last_data_time: 0.0061   lr: 0.0002  max_mem: 2904M
[09/06 15:46:34 d2.utils.events]:  eta: 0:00:11  iter: 269939  total_loss: 0.4678  loss_cls: 0.1682  loss_box_reg: 0.2207  loss_rpn_cls: 0.02335  loss_rpn_loc: 0.06278    time: 0.1988  last_time: 0.1984  data_time: 0.0086  last_data_time: 0.0074   lr: 0.0002  max_mem: 2904M
[09/06 15:46:38 d2.utils.events]:  eta: 0:00:07  iter: 269959  total_loss: 0.4705  loss_cls: 0.1607  loss_box_reg: 0.2123  loss_rpn_cls: 0.02339  loss_rpn_loc: 0.06207    time: 0.1988  last_time: 0.1914  data_time: 0.0090  last_data_time: 0.0083   lr: 0.0002  max_mem: 2904M
[09/06 15:46:42 d2.utils.events]:  eta: 0:00:03  iter: 269979  total_loss: 0.4843  loss_cls: 0.168  loss_box_reg: 0.2255  loss_rpn_cls: 0.0248  loss_rpn_loc: 0.07147    time: 0.1988  last_time: 0.2150  data_time: 0.0081  last_data_time: 0.0128   lr: 0.0002  max_mem: 2904M
[09/06 15:46:46 fvcore.common.checkpoint]: Saving checkpoint to /mnt/output/model_0269999.pth
[09/06 15:46:47 fvcore.common.checkpoint]: Saving checkpoint to /mnt/output/model_final.pth
[09/06 15:46:48 d2.utils.events]:  eta: 0:00:00  iter: 269999  total_loss: 0.4217  loss_cls: 0.1577  loss_box_reg: 0.191  loss_rpn_cls: 0.02127  loss_rpn_loc: 0.0584    time: 0.1988  last_time: 0.2100  data_time: 0.0084  last_data_time: 0.0062   lr: 0.0002  max_mem: 2904M
[09/06 15:46:48 d2.engine.hooks]: Overall training speed: 239998 iterations in 13:15:06 (0.1988 s / it)
[09/06 15:46:48 d2.engine.hooks]: Total training time: 13:29:17 (0:14:10 on hooks)
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
[09/06 15:46:49 d2.data.datasets.coco]: Loaded 5000 images in COCO format from /mnt/coco/annotations/instances_val2017.json
[09/06 15:46:49 d2.data.build]: Distribution of instances among all 80 categories:
|   category    | #instances   |   category   | #instances   |   category    | #instances   |
|:-------------:|:-------------|:------------:|:-------------|:-------------:|:-------------|
|    person     | 10777        |   bicycle    | 314          |      car      | 1918         |
|  motorcycle   | 367          |   airplane   | 143          |      bus      | 283          |
|     train     | 190          |    truck     | 414          |     boat      | 424          |
| traffic light | 634          | fire hydrant | 101          |   stop sign   | 75           |
| parking meter | 60           |    bench     | 411          |     bird      | 427          |
|      cat      | 202          |     dog      | 218          |     horse     | 272          |
|     sheep     | 354          |     cow      | 372          |   elephant    | 252          |
|     bear      | 71           |    zebra     | 266          |    giraffe    | 232          |
|   backpack    | 371          |   umbrella   | 407          |    handbag    | 540          |
|      tie      | 252          |   suitcase   | 299          |    frisbee    | 115          |
|     skis      | 241          |  snowboard   | 69           |  sports ball  | 260          |
|     kite      | 327          | baseball bat | 145          | baseball gl.. | 148          |
|  skateboard   | 179          |  surfboard   | 267          | tennis racket | 225          |
|    bottle     | 1013         |  wine glass  | 341          |      cup      | 895          |
|     fork      | 215          |    knife     | 325          |     spoon     | 253          |
|     bowl      | 623          |    banana    | 370          |     apple     | 236          |
|   sandwich    | 177          |    orange    | 285          |   broccoli    | 312          |
|    carrot     | 365          |   hot dog    | 125          |     pizza     | 284          |
|     donut     | 328          |     cake     | 310          |     chair     | 1771         |
|     couch     | 261          | potted plant | 342          |      bed      | 163          |
| dining table  | 695          |    toilet    | 179          |      tv       | 288          |
|    laptop     | 231          |    mouse     | 106          |    remote     | 283          |
|   keyboard    | 153          |  cell phone  | 262          |   microwave   | 55           |
|     oven      | 143          |   toaster    | 9            |     sink      | 225          |
| refrigerator  | 126          |     book     | 1129         |     clock     | 267          |
|     vase      | 274          |   scissors   | 36           |  teddy bear   | 190          |
|  hair drier   | 11           |  toothbrush  | 57           |               |              |
|     total     | 36335        |              |              |               |              |
[09/06 15:46:49 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[09/06 15:46:49 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'>
[09/06 15:46:49 d2.data.common]: Serializing 5000 elements to byte tensors and concatenating them all ...
[09/06 15:46:50 d2.data.common]: Serialized dataset takes 19.08 MiB
[09/06 15:46:50 d2.evaluation.evaluator]: Start inference on 625 batches
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
[09/06 15:46:54 d2.evaluation.evaluator]: Inference done 11/625. Dataloading: 0.0013 s/iter. Inference: 0.0479 s/iter. Eval: 0.0003 s/iter. Total: 0.0495 s/iter. ETA=0:00:30
[09/06 15:46:59 d2.evaluation.evaluator]: Inference done 121/625. Dataloading: 0.0018 s/iter. Inference: 0.0437 s/iter. Eval: 0.0003 s/iter. Total: 0.0459 s/iter. ETA=0:00:23
[09/06 15:47:04 d2.evaluation.evaluator]: Inference done 218/625. Dataloading: 0.0018 s/iter. Inference: 0.0463 s/iter. Eval: 0.0004 s/iter. Total: 0.0486 s/iter. ETA=0:00:19
[09/06 15:47:09 d2.evaluation.evaluator]: Inference done 327/625. Dataloading: 0.0019 s/iter. Inference: 0.0454 s/iter. Eval: 0.0004 s/iter. Total: 0.0477 s/iter. ETA=0:00:14
[09/06 15:47:14 d2.evaluation.evaluator]: Inference done 439/625. Dataloading: 0.0019 s/iter. Inference: 0.0447 s/iter. Eval: 0.0004 s/iter. Total: 0.0470 s/iter. ETA=0:00:08
[09/06 15:47:19 d2.evaluation.evaluator]: Inference done 548/625. Dataloading: 0.0018 s/iter. Inference: 0.0446 s/iter. Eval: 0.0004 s/iter. Total: 0.0468 s/iter. ETA=0:00:03
[09/06 15:47:23 d2.evaluation.evaluator]: Total inference time: 0:00:29.375996 (0.047381 s / iter per device, on 8 devices)
[09/06 15:47:23 d2.evaluation.evaluator]: Total inference pure compute time: 0:00:27 (0.044494 s / iter per device, on 8 devices)
[09/06 15:47:25 d2.evaluation.coco_evaluation]: Preparing results for COCO format ...
[09/06 15:47:25 d2.evaluation.coco_evaluation]: Saving results to /mnt/output/inference/coco_instances_results.json
[09/06 15:47:26 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API...
Loading and preparing results...
DONE (t=0.87s)
creating index...
index created!
[09/06 15:47:27 d2.evaluation.fast_eval_api]: Evaluate annotation type *bbox*
[09/06 15:47:38 d2.evaluation.fast_eval_api]: COCOeval_opt.evaluate() finished in 11.66 seconds.
[09/06 15:47:39 d2.evaluation.fast_eval_api]: Accumulating evaluation results...
[09/06 15:47:40 d2.evaluation.fast_eval_api]: COCOeval_opt.accumulate() finished in 1.11 seconds.
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.401
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.608
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.435
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.238
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.434
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.521
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.326
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.512
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.537
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.350
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.573
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.675
[09/06 15:47:40 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 40.064 | 60.844 | 43.457 | 23.807 | 43.418 | 52.071 |
[09/06 15:47:40 d2.evaluation.coco_evaluation]: Per-category bbox AP: 
| category      | AP     | category     | AP     | category       | AP     |
|:--------------|:-------|:-------------|:-------|:---------------|:-------|
| person        | 54.495 | bicycle      | 30.683 | car            | 44.135 |
| motorcycle    | 42.270 | airplane     | 63.287 | bus            | 63.137 |
| train         | 60.485 | truck        | 33.669 | boat           | 26.893 |
| traffic light | 27.209 | fire hydrant | 66.158 | stop sign      | 65.836 |
| parking meter | 43.886 | bench        | 24.038 | bird           | 36.551 |
| cat           | 62.168 | dog          | 58.581 | horse          | 56.822 |
| sheep         | 49.903 | cow          | 53.541 | elephant       | 59.534 |
| bear          | 68.135 | zebra        | 65.108 | giraffe        | 64.143 |
| backpack      | 15.814 | umbrella     | 37.993 | handbag        | 14.656 |
| tie           | 32.060 | suitcase     | 37.098 | frisbee        | 63.240 |
| skis          | 22.691 | snowboard    | 32.603 | sports ball    | 46.379 |
| kite          | 41.527 | baseball bat | 27.225 | baseball glove | 34.695 |
| skateboard    | 49.108 | surfboard    | 35.116 | tennis racket  | 47.308 |
| bottle        | 38.765 | wine glass   | 35.143 | cup            | 40.831 |
| fork          | 34.533 | knife        | 17.379 | spoon          | 15.792 |
| bowl          | 40.787 | banana       | 23.224 | apple          | 19.382 |
| sandwich      | 31.478 | orange       | 29.419 | broccoli       | 21.541 |
| carrot        | 21.991 | hot dog      | 30.782 | pizza          | 50.570 |
| donut         | 42.976 | cake         | 34.088 | chair          | 26.252 |
| couch         | 39.110 | potted plant | 26.181 | bed            | 37.305 |
| dining table  | 26.871 | toilet       | 58.641 | tv             | 54.275 |
| laptop        | 57.916 | mouse        | 62.390 | remote         | 30.654 |
| keyboard      | 52.291 | cell phone   | 33.529 | microwave      | 52.223 |
| oven          | 31.390 | toaster      | 44.195 | sink           | 37.394 |
| refrigerator  | 52.529 | book         | 15.838 | clock          | 47.557 |
| vase          | 37.813 | scissors     | 23.726 | teddy bear     | 43.291 |
| hair drier    | 4.950  | toothbrush   | 21.977 |                |        |
[09/06 15:47:41 d2.engine.defaults]: Evaluation results for coco_2017_val in csv format:
[09/06 15:47:41 d2.evaluation.testing]: copypaste: Task: bbox
[09/06 15:47:41 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[09/06 15:47:41 d2.evaluation.testing]: copypaste: 40.0645,60.8442,43.4570,23.8066,43.4178,52.0706

训练日志文件记录了训练过程中的详细信息，包括环境信息、配置文件内容、模型定义、数据集加载、训练过程等。以下是日志文件的关键部分和解释：

环境信息

[09/06 02:16:30 detectron2]: Environment info:
-------------------------------  --------------------------------------------------------------
sys.platform                     linux
Python                           3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
numpy                            1.22.2
detectron2                       0.6 @/root/detectron2/detectron2
Compiler                         GCC 9.4
CUDA compiler                    CUDA 12.0
...
-------------------------------  --------------------------------------------------------------

配置文件内容

[09/06 02:16:30 detectron2]: Contents of args.config_file=../configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml:
_BASE_: "../Base-RCNN-FPN.yaml"
MODEL:
  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
  MASK_ON: False
  RESNETS:
    DEPTH: 50
SOLVER:
  STEPS: (210000, 250000)
  MAX_ITER: 270000

模型定义

[09/06 02:16:32 d2.engine.defaults]: Model: GeneralizedRCNN(...)

数据集加载

[09/06 02:16:53 d2.data.datasets.coco]: Loading /mnt/coco/annotations/instances_train2017.json takes 21.55 seconds.
[09/06 02:16:55 d2.data.datasets.coco]: Loaded 118287 images in COCO format from /mnt/coco/annotations/instances_train2017.json

训练过程

[09/06 02:17:20 d2.engine.train_loop]: Starting training from iteration 30000
...
[09/06 15:46:48 d2.engine.hooks]: Overall training speed: 239998 iterations in 13:15:06 (0.1988 s / it)

推理时间

[09/06 15:47:23 d2.evaluation.evaluator]: Total inference time: 0:00:29.375996 (0.047381 s / iter per device, on 8 devices)
[09/06 15:47:23 d2.evaluation.evaluator]: Total inference pure compute time: 0:00:27 (0.044494 s / iter per device, on 8 devices)

平均精度（AP）

[09/06 15:47:40 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 40.064 | 60.844 | 43.457 | 23.807 | 43.418 | 52.071 |

每类别的AP

[09/06 15:47:40 d2.evaluation.coco_evaluation]: Per-category bbox AP: 
| category      | AP     | category     | AP     | category       | AP     |
|:--------------|:-------|:-------------|:-------|:---------------|:-------|
| person        | 54.495 | bicycle      | 30.683 | car            | 44.135 |
| motorcycle    | 42.270 | airplane     | 63.287 | bus            | 63.137 |
| train         | 60.485 | truck        | 33.669 | boat           | 26.893 |
...
| hair drier    | 4.950  | toothbrush   | 21.977 |                |        |

四、训练指标比较

Model Zoo中的指标

Name	lr sched	train time (s/iter)	inference time (s/im)	train mem (GB)	box AP	model id	download
R50-C4	1x	0.551	0.102	4.8	35.7	137257644	model \| metrics
R50-DC5	1x	0.380	0.068	5.0	37.3	137847829	model \| metrics
R50-FPN	1x	0.210	0.038	3.0	37.9	137257794	model \| metrics
R50-C4	3x	0.543	0.104	4.8	38.4	137849393	model \| metrics
R50-DC5	3x	0.378	0.070	5.0	39.0	137849425	model \| metrics
R50-FPN	3x	0.209	0.038	3.0	40.2	137849458	model \| metrics
R101-C4	3x	0.619	0.139	5.9	41.1	138204752	model \| metrics
R101-DC5	3x	0.452	0.086	6.1	40.6	138204841	model \| metrics
R101-FPN	3x	0.286	0.051	4.1	42.0	137851257	model \| metrics
X101-FPN	3x	0.638	0.098	6.7	43.0	139173657	model \| metrics

我们复现的模型指标

训练时间 (s/iter)：0.1988
推理时间 (s/im)：0.047381
训练内存 (GB)：最大约 2.9GB
检测精度 (box AP)：40.0645

比较分析

训练时间 (s/iter)：
- Model Zoo 中 R50-FPN 3x 模型的训练时间为 0.209s/iter，而我们训练的模型为 0.1988s/iter。我们的训练时间略短，可能是由于硬件配置或优化的差异。
推理时间 (s/im)：
- Model Zoo 中 R50-FPN 3x 模型的推理时间为 0.038s/im，而我们复现的模型推理时间为 0.047381s/im，稍长一些。
训练内存 (GB)：
- Model Zoo 中 R50-FPN 3x 模型的训练内存为 3.0GB，而我们的训练内存为 2.9GB，基本相当。
检测精度 (box AP)：
- Model Zoo 中 R50-FPN 3x 模型的检测精度为 40.2，而我们训练的模型检测精度为 40.0645，基本相当，略低于 Model Zoo 中的结果。

结果总结

通过比较可以看出，我们训练的模型在各项指标上与 Model Zoo 中的 R50-FPN 3x 模型非常接近，训练时间略短，内存使用相当，检测精度稍低。总体来说，复现的结果是成功的，证明了训练过程的可靠性和模型的有效性。

五、训练输出目录

训练输出目录包含了训练过程中生成的所有重要文件，包括配置文件、事件日志、训练日志、评估指标和多个模型检查点文件。

目录结构

/mnt/output
├── config.yaml
├── events.out.tfevents.*
├── inference/
├── last_checkpoint
├── log.txt
├── log.txt.rank*
├── metrics.json
├── model_*.pth
└── model_final.pth

文件解释

config.yaml：记录训练配置，便于复现实验。
events.out.tfevents.*：用于TensorBoard可视化，帮助监控训练过程。
inference/：存储推理结果。
last_checkpoint：记录最新的检查点文件名，便于恢复训练。
log.txt 和 log.txt.rank*：记录训练过程中的详细日志信息，便于调试和分析。
metrics.json：记录训练和验证过程中计算的各种指标。
model_*.pth 和 model_final.pth：保存模型权重，用于恢复训练、评估或部署。

文件作用总结

配置文件（config.yaml）：记录训练配置，便于复现实验。
事件文件（events.out.tfevents.*）：用于TensorBoard可视化，帮助监控训练过程。
推理目录（inference/）：存储推理结果。
检查点记录（last_checkpoint）：记录最新的检查点文件名，便于恢复训练。
日志文件（log.txt、log.txt.rank）*：记录训练过程中的详细日志信息，便于调试和分析。
指标文件（metrics.json）：记录训练和验证过程中计算的各种指标，便于分析和比较。
模型检查点文件（model_*.pth、model_final.pth）：保存模型权重，用于恢复训练、评估或部署。

这些文件和目录共同构成了一个完整的模型训练输出，便于后续的分析、调试和部署。

复现过程的价值

复现过程不仅验证了原始研究结果的可靠性，还帮助我们深入理解了模型的训练和评估过程。通过比较不同模型的指标，我们可以选择最合适的模型架构和训练策略，提高模型的性能和效率。

结论

本文详细介绍了如何使用Detectron2进行目标检测模型的训练，涵盖数据准备、训练命令、训练日志分析、训练指标以及训练输出目录的各个文件及其作用。通过复现Model Zoo中的模型训练过程，并在训练过程中出现中断后使用 resume 功能继续训练，我们验证了训练结果的可靠性，并深入理解了模型的性能指标。希望这篇文章能为读者提供有价值的参考，帮助大家更好地进行模型训练和评估。