Sylph模型单机多卡与单机单卡训练的区别

做一个徘徊在牛a与牛c之间

已于 2024-09-02 22:08:08 修改

阅读量272

点赞数 5

文章标签：目标检测深度学习

于 2024-09-02 21:12:29 首次发布

本文链接：https://blog.csdn.net/qq_43591172/article/details/141830332

版权

文章目录

其他文章列表：
相同batch_size下单机单卡与单机多卡运行代码的区别

其他文章列表：

一篇关于Incremental few shot object detection的论文汇总。
配置论文《Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection》代码过程中遇到的bug修复记录
未完待续…

相同batch_size下单机单卡与单机多卡运行代码的区别

一、背景描述

研究的方向是incremental few shot object detection
复现论文《Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection》
论文代码：https://github.com/facebookresearch/sylph-few-shot-detection

虽然代码开源出来了，但是没有任何的训练日志，issue上提问作者也表示无能为力，作者的回复如下：

Snipaste_2024-09-02_20-42-25

这篇论文代码采用 AdelaiDet + detectron2 + mobile-vision + d2go 进行构建的，与单纯的detectron2架构代码还是有区别的，所以后面验证的内容如果看着很简单的可以跳过，不喜勿喷！！！

由于论文中采用的配置基本都是64张gpu这种集群环境进行训练而我们一般人没有这么多的算力无法进行复现，其中一个配置文件如下：

这个配置文件路径为：sylph-few-shot-detection/configs/COCO-Detection/Meta-FCOS-pretrain.yaml

# Note: REFERENCE_WORLD_SIZE is the number of gpus, IMS_PER_BATCH/REFERENCE_WORLD_SIZE is the images/gpu
# modify this to your use
_BASE_: "Base-FCOS.yaml"

......

SOLVER:
  IMS_PER_BATCH: 128	# 每一次迭代的Batch_size = GPUS * bs_pre_gpu
  BASE_LR: 0.01  # Note that RetinaNet uses a different default learning rate
  STEPS: (60000, 80000)
  MAX_ITER: 90000
  REFERENCE_WORLD_SIZE: 64 # 采用64块gpu的集群进行训练
  CHECKPOINT_PERIOD: 10000

  # LR_MULTIPLIER_OVERWRITE, [{'backbone': 0.1}, {'reference_points': 0.1, 'sampling_offsets': 0.1}]
......

如果仅仅使用1块gpu进行训练，那么d2go中的auto_scale_world_size函数将根据实际gpu数量对运行参数进行调整

auto_scale_world_size函数的实现如下：

def auto_scale_world_size(cfg, new_world_size):
    """
    Usually the config file is written for a specific number of devices, this method
    scales the config (in-place!) according to the actual world size using the
    pre-registered scaling methods specified as cfg.SOLVER.AUTO_SCALING_METHODS.

    Note for registering scaling methods:
        - The method will only be called when scaling is needed. It won't be called
            if SOLVER.REFERENCE_WORLD_SIZE is 0 or equal to target world size. Thus
            cfg.SOLVER.REFERENCE_WORLD_SIZE will always be positive.
        - The method updates cfg in-place, no return is required.
        - No need for changing SOLVER.REFERENCE_WORLD_SIZE.

    Args:
        cfg (CfgNode): original config which contains SOLVER.REFERENCE_WORLD_SIZE and
            SOLVER.AUTO_SCALING_METHODS.
        new_world_size: the target world size
    """

    old_world_size = cfg.SOLVER.REFERENCE_WORLD_SIZE
    if old_world_size == 0 or old_world_size == new_world_size:
        return cfg

    if len(cfg.SOLVER.AUTO_SCALING_METHODS) == 0:
        return cfg

    original_cfg = cfg.clone()
    frozen = original_cfg.is_frozen()
    cfg.defrost()

    assert len(cfg.SOLVER.AUTO_SCALING_METHODS) > 0, cfg.SOLVER.AUTO_SCALING_METHODS
    for scaling_method in cfg.SOLVER.AUTO_SCALING_METHODS:
        logger.info("Applying auto scaling method: {}".format(scaling_method))
        CONFIG_SCALING_METHOD_REGISTRY.get(scaling_method)(cfg, new_world_size)

    assert (
        cfg.SOLVER.REFERENCE_WORLD_SIZE == cfg.SOLVER.REFERENCE_WORLD_SIZE
    ), "Runner's scale_world_size shouldn't change SOLVER.REFERENCE_WORLD_SIZE"
    cfg.SOLVER.REFERENCE_WORLD_SIZE = new_world_size

    if frozen:
        cfg.freeze()

    from d2go.config.utils import get_cfg_diff_table

    table = get_cfg_diff_table(cfg, original_cfg)
    logger.info("Auto-scaled the config according to the actual world size: \n" + table)

使用1块gpus进行运行，转换之后的参数如下：

[06/28 21:26:30] d2go.config.config INFO: Applying auto scaling method: default_scale_d2_configs
[06/28 21:26:30] d2go.config.config INFO: Applying auto scaling method: default_scale_quantization_configs
[06/28 21:26:30] d2go.config.config INFO: Auto-scaled the config according to the actual world size: 
| config key                                      | old value      | new value          |
|:------------------------------------------------|:---------------|:-------------------|
| QUANTIZATION.QAT.DISABLE_OBSERVER_ITER          | 38000          | 2432000            |
| QUANTIZATION.QAT.ENABLE_LEARNABLE_OBSERVER_ITER | 36000          | 2304000            |
| QUANTIZATION.QAT.ENABLE_OBSERVER_ITER           | 35000          | 2240000            |
| QUANTIZATION.QAT.FREEZE_BN_ITER                 | 37000          | 2368000            |
| QUANTIZATION.QAT.START_ITER                     | 35000          | 2240000            |
| SOLVER.BASE_LR                                  | 0.01           | 0.00015625         |
| SOLVER.IMS_PER_BATCH                            | 128            | 2                  |
| SOLVER.MAX_ITER                                 | 90000          | 5760000            |
| SOLVER.REFERENCE_WORLD_SIZE                     | 64             | 1                  |
| SOLVER.STEPS                                    | (60000, 80000) | (3840000, 5120000) |
| SOLVER.WARMUP_ITERS                             | 1000           | 64000              |

观察可以发现，转换前后batch_size/gpu不变，主要的思想就是将之前一次大batch_size的运算分开成若干次小batch_size的运算。

batch_size的计算如下：

总的batch_size由于gpu数量的减少而减少，但是每块gpu的batch_size却不变。
$128/64 == 2/1 == 2$

迭代次数的计算如下：

$128 * 90000 = 2 * (n e w i t ers)$

其他参数也都是按照等比例进行转换的，其中主要的参数就是Batch_Size和Base_LR,还有迭代参数。可以阅读论文《Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour》

从迭代次数来看，很浪费时间（达到了百万次），因此这种方法由于耗时又耗力而无法复现。但是通过观察可以发现每一块gpus的batch_size很小，如果我们的gpu显存很大，但是数量不多的时候，这种方式无异于暴殄天物，所以需要提高每一块gpu显存的利用率，那么参数又该如何进行配置呢。

Tips：我有两张Quadro RTX 5000（16G显存）的显卡。

二、任务描述：

接下来需要进行实验的验证，主要的任务如下：

保持原始的总Batch_Size（即配置文件中的SOLVER.IMS_PER_BATCH)、学习率（即配置文件中的SOLVER.BASE_LR）、迭代次数（即配置文件中的SOLVER.MAX_ITER）、Gpu数量（即配置文件中的SOLVER.REFERENCE_WORLD_SIZE），得到一个baseline结果。
总Batch_Size（即配置文件中的SOLVER.IMS_PER_BATCH)、学习率（即配置文件中的SOLVER.BASE_LR）、迭代次数（即配置文件中的SOLVER.MAX_ITER）保持不变的情况之下，通过减少gpu的数量（即配置文件中的SOLVER.REFERENCE_WORLD_SIZE）来观察对于结果是否有太大的影响。
尽可能的提高每一块gpu的利用率，即通过增加每一块gpu的Batch_Size从而间接增加总Batch_Size（即配置文件中的SOLVER.IMS_PER_BATCH)，并按照线性变换的方式来调整（减倍）学习率（即配置文件中的SOLVER.BASE_LR）、迭代次数（即配置文件中的SOLVER.MAX_ITER），最后观察对于结果是否有太大的影响。

采用的配置文件路径为：sylph-few-shot-detection/configs/COCO-Detection/TFA/FCOS_pretrain.yaml，代码如下：

MODEL:
  META_ARCHITECTURE: "MetaOneStageDetector"
  BACKBONE:
    NAME: "build_fcos_resnet_fpn_backbone"
    FREEZE: False # freeze backbone
  RESNETS:
    OUT_FEATURES: ["res3", "res4", "res5"]
  FPN:
    IN_FEATURES: ["res3", "res4", "res5"]
  PROPOSAL_GENERATOR:
    NAME: "MetaFCOS"
  META_LEARN:
    EPISODIC_LEARNING: False # pretraining
  TFA:
    TRAIN_SHOT: 10
  FCOS:
    NUM_CLASSES: 60 #80
  # PIXEL_MEAN: [102.9801, 115.9465, 122.7717]
DATASETS:
  # TRAIN: ("coco_pretrain_train_all",)
  # TEST: ("coco_pretrain_val_all", ) # "coco_pretrain_val_novel", )
  TRAIN: ("coco_pretrain_train_base",)
  TEST: ("coco_pretrain_val_base", ) # "coco_pretrain_val_novel", )
TEST:
  EVAL_PERIOD: 10000
SOLVER:
  IMS_PER_BATCH: 16
  BASE_LR: 0.01  # Note that RetinaNet uses a different default learning rate
  STEPS: (60000, 80000)
  MAX_ITER: 10 #90000
  # LR_MULTIPLIER_OVERWRITE, [{'backbone': 0.1}, {'reference_points': 0.1, 'sampling_offsets': 0.1}]
INPUT:
  MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)

如果没有配置SOLVER.REFERENCE_WORLD_SIZE参数，默认值是8.

2.1、利用两块GPU跑出一个Baseline结果：

IMS_PER_BATCH: 16、BASE_LR: 0.01、MAX_ITER: 30000、REFERENCE_WORLD_SIZE: 2、 STEPS:（20000，25000）

配置如下：

DATASETS:
  TEST:
  - coco_pretrain_val_base
  TRAIN:
  - coco_pretrain_train_base
INPUT:
  MIN_SIZE_TRAIN:
  - 640
  - 672
  - 704
  - 736
  - 768
  - 800
MODEL:
  BACKBONE:
    NAME: build_fcos_resnet_fpn_backbone
  DDP_FIND_UNUSED_PARAMETERS: true
  FCOS:
    NUM_CLASSES: 60
  FPN:
    IN_FEATURES:
    - res3
    - res4
    - res5
  META_ARCHITECTURE: MetaOneStageDetector
  PROPOSAL_GENERATOR:
    NAME: MetaFCOS
  RESNETS:
    OUT_FEATURES:
    - res3
    - res4
    - res5
  WEIGHTS: detectron2://ImageNetPretrained/MSRA/R-50.pkl
OUTPUT_DIR: output_sylph/TFA_shuangka/coco/pretrain/
SOLVER:
  IMS_PER_BATCH: 16
  BASE_LR: 0.01
  MAX_ITER: 30000
  REFERENCE_WORLD_SIZE: 2
  STEPS:
  - 20000
  - 25000
TEST:
  EVAL_PERIOD: 10000

训练结果如下：

[09/02 03:46:48] d2.engine.hooks INFO: Overall training speed: 29998 iterations in 11:05:49 (1.3317 s / it)
[09/02 03:46:48] d2.engine.hooks INFO: Total training time: 11:17:36 (0:11:47 on hooks) # 训练时间
[09/02 03:50:43] d2.evaluation.coco_evaluation INFO: Preparing results for COCO format ...
[09/02 03:50:43] d2.evaluation.coco_evaluation INFO: Saving results to output_sylph/TFA_shuangka/coco/pretrain/inference/default/final/coco_pretrain_val_base/coco_instances_results.json
[09/02 03:50:45] d2.evaluation.coco_evaluation INFO: Evaluating predictions with official COCO API...
[09/02 03:52:09] d2.evaluation.coco_evaluation INFO: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |  ARdet1  |  ARdet10  |  ARdet100  |  ARs   |  ARm   |  ARl   |
|:------:|:------:|:------:|:------:|:------:|:------:|:--------:|:---------:|:----------:|:------:|:------:|:------:|
| 32.887 | 50.833 | 35.163 | 19.228 | 37.868 | 42.126 |  29.625  |  50.698   |   54.897   | 38.173 | 60.524 | 69.414 |
[09/02 03:52:09] d2.evaluation.coco_evaluation INFO: Per-category bbox AP: 
| category       | AP     | category      | AP     | category     | AP     |
|:---------------|:-------|:--------------|:-------|:-------------|:-------|
| truck          | 25.783 | traffic light | 24.838 | fire hydrant | 58.756 |
| stop sign      | 60.566 | parking meter | 36.520 | bench        | 18.567 |
| elephant       | 53.398 | bear          | 59.048 | zebra        | 60.039 |
| giraffe        | 60.111 | backpack      | 13.367 | umbrella     | 34.310 |
| handbag        | 12.899 | tie           | 27.715 | suitcase     | 30.286 |
| frisbee        | 62.595 | skis          | 15.911 | snowboard    | 21.826 |
| sports ball    | 43.707 | kite          | 32.891 | baseball bat | 25.121 |
| baseball glove | 34.357 | skateboard    | 44.367 | surfboard    | 31.114 |
| tennis racket  | 44.440 | wine glass    | 33.262 | cup          | 37.429 |
| fork           | 20.322 | knife         | 11.687 | spoon        | 9.416  |
| bowl           | 38.036 | banana        | 20.496 | apple        | 17.709 |
| sandwich       | 29.315 | orange        | 29.050 | broccoli     | 20.888 |
| carrot         | 17.289 | hot dog       | 24.932 | pizza        | 46.245 |
| donut          | 41.060 | cake          | 30.549 | bed          | 33.140 |
| toilet         | 52.188 | laptop        | 51.783 | mouse        | 57.023 |
| remote         | 23.473 | keyboard      | 41.127 | cell phone   | 30.168 |
| microwave      | 49.580 | oven          | 27.093 | toaster      | 9.304  |
| sink           | 31.164 | refrigerator  | 44.687 | book         | 11.876 |
| clock          | 48.607 | vase          | 35.526 | scissors     | 17.430 |
| teddy bear     | 35.318 | hair drier    | 0.712  | toothbrush   | 12.794 |
[09/02 03:52:10] d2.evaluation.testing INFO: copypaste: Task: bbox
[09/02 03:52:10] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl,ARdet1,ARdet10,ARdet100,ARs,ARm,ARl
[09/02 03:52:10] d2.evaluation.testing INFO: copypaste: 32.8868,50.8326,35.1630,19.2278,37.8680,42.1257,29.6249,50.6976,54.8969,38.1732,60.5238,69.4136
[09/02 03:52:10] d2go.utils.misc INFO: Dumping trained config file: output_sylph/TFA_shuangka/coco/pretrain/trained_model_configs/model_final.yaml
[09/02 03:52:10] d2go.utils.misc INFO: Finished dumping trained config file
[09/02 03:52:10] mobile_cv.torch.utils_pytorch.distributed_helper INFO: Save tools.setup.new_main_func return to: /tmp/mcvdh_detectron2.utils.serialize.new_main_func_returnpriwfza6.pth.rank0

Tips：需要设置DDP_FIND_UNUSED_PARAMETERS: true。

2.2、利用一块GPU运行出来结果：

IMS_PER_BATCH: 16、BASE_LR: 0.01、MAX_ITER: 30000、REFERENCE_WORLD_SIZE: 1、 STEPS:（20000，25000）

配置如下：

DATASETS:
  TEST:
  - coco_pretrain_val_base
  TRAIN:
  - coco_pretrain_train_base
INPUT:
  MIN_SIZE_TRAIN:
  - 640
  - 672
  - 704
  - 736
  - 768
  - 800
MODEL:
  BACKBONE:
    NAME: build_fcos_resnet_fpn_backbone
  FCOS:
    NUM_CLASSES: 60
  FPN:
    IN_FEATURES:
    - res3
    - res4
    - res5
  META_ARCHITECTURE: MetaOneStageDetector
  PROPOSAL_GENERATOR:
    NAME: MetaFCOS
  RESNETS:
    OUT_FEATURES:
    - res3
    - res4
    - res5
  WEIGHTS: detectron2://ImageNetPretrained/MSRA/R-50.pkl
OUTPUT_DIR: output_sylph/TFA_danka/coco/pretrain/
SOLVER:
  IMS_PER_BATCH: 16
  BASE_LR: 0.01
  MAX_ITER: 30000
  REFERENCE_WORLD_SIZE: 1
  STEPS:
  - 20000
  - 25000
TEST:
  EVAL_PERIOD: 10000

训练结果如下：

[09/01 15:07:15] d2.engine.hooks INFO: Overall training speed: 29998 iterations in 19:58:49 (2.3978 s / it)
[09/01 15:07:15] d2.engine.hooks INFO: Total training time: 20:15:22 (0:16:32 on hooks) # 总训练时间
[09/01 15:13:40] d2.evaluation.coco_evaluation INFO: Preparing results for COCO format ...
[09/01 15:13:41] d2.evaluation.coco_evaluation INFO: Saving results to output_sylph/TFA_danka/coco/pretrain/inference/default/final/coco_pretrain_val_base/coco_instances_results.json
[09/01 15:13:43] d2.evaluation.coco_evaluation INFO: Evaluating predictions with official COCO API...
[09/01 15:15:05] d2.evaluation.coco_evaluation INFO: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |  ARdet1  |  ARdet10  |  ARdet100  |  ARs   |  ARm   |  ARl   |
|:------:|:------:|:------:|:------:|:------:|:------:|:--------:|:---------:|:----------:|:------:|:------:|:------:|
| 32.882 | 50.836 | 35.346 | 19.495 | 37.770 | 41.777 |  30.014  |  50.698   |   55.017   | 38.052 | 60.512 | 69.699 |
[09/01 15:15:05] d2.evaluation.coco_evaluation INFO: Per-category bbox AP: 
| category       | AP     | category      | AP     | category     | AP     |
|:---------------|:-------|:--------------|:-------|:-------------|:-------|
| truck          | 26.234 | traffic light | 24.540 | fire hydrant | 57.536 |
| stop sign      | 61.126 | parking meter | 39.529 | bench        | 19.172 |
| elephant       | 52.451 | bear          | 57.825 | zebra        | 59.816 |
| giraffe        | 60.008 | backpack      | 13.684 | umbrella     | 33.598 |
| handbag        | 12.576 | tie           | 27.643 | suitcase     | 30.193 |
| frisbee        | 60.575 | skis          | 16.626 | snowboard    | 22.355 |
| sports ball    | 42.712 | kite          | 35.412 | baseball bat | 24.698 |
| baseball glove | 33.945 | skateboard    | 44.483 | surfboard    | 31.571 |
| tennis racket  | 44.272 | wine glass    | 32.958 | cup          | 37.498 |
| fork           | 21.523 | knife         | 10.945 | spoon        | 8.277  |
| bowl           | 37.415 | banana        | 20.202 | apple        | 16.854 |
| sandwich       | 28.217 | orange        | 29.647 | broccoli     | 21.222 |
| carrot         | 18.313 | hot dog       | 25.558 | pizza        | 46.495 |
| donut          | 42.006 | cake          | 31.377 | bed          | 32.272 |
| toilet         | 51.671 | laptop        | 51.830 | mouse        | 58.112 |
| remote         | 23.474 | keyboard      | 41.571 | cell phone   | 28.927 |
| microwave      | 48.750 | oven          | 26.837 | toaster      | 10.041 |
| sink           | 32.883 | refrigerator  | 44.787 | book         | 12.040 |
| clock          | 47.192 | vase          | 35.346 | scissors     | 17.160 |
| teddy bear     | 34.904 | hair drier    | 0.921  | toothbrush   | 13.110 |
[09/01 15:15:06] d2.evaluation.testing INFO: copypaste: Task: bbox
[09/01 15:15:06] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl,ARdet1,ARdet10,ARdet100,ARs,ARm,ARl
[09/01 15:15:06] d2.evaluation.testing INFO: copypaste: 32.8819,50.8364,35.3460,19.4948,37.7705,41.7771,30.0138,50.6982,55.0174,38.0516,60.5119,69.6988
[09/01 15:15:06] d2go.utils.misc INFO: Dumping trained config file: output_sylph/TFA_danka/coco/pretrain/trained_model_configs/model_final.yaml
[09/01 15:15:06] d2go.utils.misc INFO: Finished dumping trained config file
[09/01 15:15:06] mobile_cv.torch.utils_pytorch.distributed_helper INFO: Launched jobs finished, dist_url: file:///tmp/d2go_dist_file_1725101434.1924598

从上述结果可以看出两者结果近似，得出的结论是：

当保持总BS、lr、iters不变的时候，训练结果与GPU数量无关，可以使用一张大显存的GPU来代替若干张小显存GPU进行运算。

使用Tensorboard来可视化一下损失图也可以验证这个想法(在文章的最后）。

2.3、利用两块GPU（bs_pre最大）运行出来结果：

IMS_PER_BATCH: 30、BASE_LR: 0.02、MAX_ITER: 15000、REFERENCE_WORLD_SIZE: 1、 STEPS:（10000，12500）

Tips: 原本IMS_PER_BATCH应该是32的，但是由于每张卡的bs=16会爆显存，所以设置成30.

配置如下：

DATASETS:
  TEST:
  - coco_pretrain_val_base
  TRAIN:
  - coco_pretrain_train_base
INPUT:
  MIN_SIZE_TRAIN:
  - 640
  - 672
  - 704
  - 736
  - 768
  - 800
MODEL:
  BACKBONE:
    NAME: build_fcos_resnet_fpn_backbone
  DDP_FIND_UNUSED_PARAMETERS: true
  FCOS:
    NUM_CLASSES: 60
  FPN:
    IN_FEATURES:
    - res3
    - res4
    - res5
  META_ARCHITECTURE: MetaOneStageDetector
  PROPOSAL_GENERATOR:
    NAME: MetaFCOS
  RESNETS:
    OUT_FEATURES:
    - res3
    - res4
    - res5
  WEIGHTS: detectron2://ImageNetPretrained/MSRA/R-50.pkl
OUTPUT_DIR: output_sylph/TFA_shuangbeibs/coco/pretrain/
SOLVER:
  BASE_LR: 0.02
  IMS_PER_BATCH: 30
  MAX_ITER: 15000
  REFERENCE_WORLD_SIZE: 2
  STEPS:
  - 10000
  - 12500
TEST:
  EVAL_PERIOD: 10000

训练结果如下：

[09/02 19:14:36] d2.engine.hooks INFO: Overall training speed: 14998 iterations in 9:43:15 (2.3333 s / it)
[09/02 19:14:36] d2.engine.hooks INFO: Total training time: 9:49:10 (0:05:55 on hooks)	# 总的训练时间
[09/02 19:18:28] d2.evaluation.coco_evaluation INFO: Preparing results for COCO format ...
[09/02 19:18:29] d2.evaluation.coco_evaluation INFO: Saving results to output_sylph/TFA_shuangbeibs/coco/pretrain/inference/default/final/coco_pretrain_val_base/coco_instances_results.json
[09/02 19:18:31] d2.evaluation.coco_evaluation INFO: Evaluating predictions with official COCO API...
[09/02 19:19:53] d2.evaluation.coco_evaluation INFO: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |  ARdet1  |  ARdet10  |  ARdet100  |  ARs   |  ARm   |  ARl   |
|:------:|:------:|:------:|:------:|:------:|:------:|:--------:|:---------:|:----------:|:------:|:------:|:------:|
| 32.662 | 50.715 | 35.003 | 19.986 | 37.365 | 41.966 |  29.936  |  50.815   |   54.903   | 39.312 | 60.476 | 69.437 |
[09/02 19:19:53] d2.evaluation.coco_evaluation INFO: Per-category bbox AP: 
| category       | AP     | category      | AP     | category     | AP     |
|:---------------|:-------|:--------------|:-------|:-------------|:-------|
| truck          | 24.719 | traffic light | 25.109 | fire hydrant | 56.087 |
| stop sign      | 59.856 | parking meter | 36.393 | bench        | 18.575 |
| elephant       | 52.799 | bear          | 55.483 | zebra        | 60.004 |
| giraffe        | 58.331 | backpack      | 12.968 | umbrella     | 33.007 |
| handbag        | 12.360 | tie           | 27.676 | suitcase     | 31.035 |
| frisbee        | 61.520 | skis          | 16.335 | snowboard    | 24.008 |
| sports ball    | 43.596 | kite          | 36.135 | baseball bat | 24.210 |
| baseball glove | 34.805 | skateboard    | 44.214 | surfboard    | 31.307 |
| tennis racket  | 45.242 | wine glass    | 32.089 | cup          | 37.601 |
| fork           | 20.472 | knife         | 10.713 | spoon        | 9.313  |
| bowl           | 37.740 | banana        | 20.521 | apple        | 18.616 |
| sandwich       | 27.233 | orange        | 30.228 | broccoli     | 21.418 |
| carrot         | 16.801 | hot dog       | 24.280 | pizza        | 46.001 |
| donut          | 39.685 | cake          | 31.516 | bed          | 30.078 |
| toilet         | 49.645 | laptop        | 51.554 | mouse        | 57.478 |
| remote         | 24.332 | keyboard      | 41.024 | cell phone   | 29.425 |
| microwave      | 46.203 | oven          | 26.922 | toaster      | 16.231 |
| sink           | 31.143 | refrigerator  | 43.622 | book         | 11.921 |
| clock          | 47.546 | vase          | 34.431 | scissors     | 17.359 |
| teddy bear     | 36.482 | hair drier    | 0.428  | toothbrush   | 13.893 |
[09/02 19:19:54] d2.evaluation.testing INFO: copypaste: Task: bbox
[09/02 19:19:54] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl,ARdet1,ARdet10,ARdet100,ARs,ARm,ARl
[09/02 19:19:54] d2.evaluation.testing INFO: copypaste: 32.6619,50.7154,35.0027,19.9863,37.3647,41.9661,29.9360,50.8152,54.9029,39.3122,60.4762,69.4366
[09/02 19:19:54] d2go.utils.misc INFO: Dumping trained config file: output_sylph/TFA_shuangbeibs/coco/pretrain/trained_model_configs/model_final.yaml
[09/02 19:19:54] d2go.utils.misc INFO: Finished dumping trained config file
[09/02 19:19:54] mobile_cv.torch.utils_pytorch.distributed_helper INFO: Save tools.setup.new_main_func return to: /tmp/mcvdh_detectron2.utils.serialize.new_main_func_returnp17qhdor.pth.rank0

三、结果与结论：

定义如下：

方法1 即：Baseline，采用2GPUS，保持总的BS、iteration、base_lr不变。
方法2 即：采用1GPU，保持总的Batch_Size、iteration、base_lr不变。
方法3 即：采用2GPUs，适当调整iteration（减倍）、Batch_Size(加倍)、base_lr(加倍)。

3.1、训练结果：

方法	AP	AP50	AP75	APs	APm	APl	ARdet1	ARdet10	ARdet100	ARs	ARm	ARl
方法1	32.887	50.833	35.163	19.228	37.868	42.126	29.625	50.698	54.897	38.173	60.524	69.414
方法2	32.882	50.836	35.346	19.495	37.770	41.777	30.014	50.698	55.017	38.052	60.512	69.699
方法3	32.622	50.715	35.003	19.986	37.365	41.966	29.936	50.815	54.903	39.312	60.476	69.437