Sylph模型单机多卡与单机单卡训练的区别

其他文章列表:

相同batch_size下单机单卡与单机多卡运行代码的区别


一、背景描述

虽然代码开源出来了,但是没有任何的训练日志,issue上提问作者也表示无能为力,作者的回复如下:

Snipaste_2024-09-02_20-42-25

这篇论文代码采用 AdelaiDet + detectron2 + mobile-vision + d2go 进行构建的,与单纯的detectron2架构代码还是有区别的,所以后面验证的内容如果看着很简单的可以跳过,不喜勿喷!!!

由于论文中采用的配置基本都是64张gpu这种集群环境进行训练而我们一般人没有这么多的算力无法进行复现,其中一个配置文件如下 :

这个配置文件路径为:sylph-few-shot-detection/configs/COCO-Detection/Meta-FCOS-pretrain.yaml

# Note: REFERENCE_WORLD_SIZE is the number of gpus, IMS_PER_BATCH/REFERENCE_WORLD_SIZE is the images/gpu
# modify this to your use
_BASE_: "Base-FCOS.yaml"

......

SOLVER:
  IMS_PER_BATCH: 128	# 每一次迭代的Batch_size = GPUS * bs_pre_gpu
  BASE_LR: 0.01  # Note that RetinaNet uses a different default learning rate
  STEPS: (60000, 80000)
  MAX_ITER: 90000
  REFERENCE_WORLD_SIZE: 64 # 采用64块gpu的集群进行训练
  CHECKPOINT_PERIOD: 10000

  # LR_MULTIPLIER_OVERWRITE, [{'backbone': 0.1}, {'reference_points': 0.1, 'sampling_offsets': 0.1}]
......

如果仅仅使用1块gpu进行训练,那么d2go中的auto_scale_world_size函数将根据实际gpu数量对运行参数进行调整

auto_scale_world_size函数的实现如下:

def auto_scale_world_size(cfg, new_world_size):
    """
    Usually the config file is written for a specific number of devices, this method
    scales the config (in-place!) according to the actual world size using the
    pre-registered scaling methods specified as cfg.SOLVER.AUTO_SCALING_METHODS.

    Note for registering scaling methods:
        - The method will only be called when scaling is needed. It won't be called
            if SOLVER.REFERENCE_WORLD_SIZE is 0 or equal to target world size. Thus
            cfg.SOLVER.REFERENCE_WORLD_SIZE will always be positive.
        - The method updates cfg in-place, no return is required.
        - No need for changing SOLVER.REFERENCE_WORLD_SIZE.

    Args:
        cfg (CfgNode): original config which contains SOLVER.REFERENCE_WORLD_SIZE and
            SOLVER.AUTO_SCALING_METHODS.
        new_world_size: the target world size
    """

    old_world_size = cfg.SOLVER.REFERENCE_WORLD_SIZE
    if old_world_size == 0 or old_world_size == new_world_size:
        return cfg

    if len(cfg.SOLVER.AUTO_SCALING_METHODS) == 0:
        return cfg

    original_cfg = cfg.clone()
    frozen = original_cfg.is_frozen()
    cfg.defrost()

    assert len(cfg.SOLVER.AUTO_SCALING_METHODS) > 0, cfg.SOLVER.AUTO_SCALING_METHODS
    for scaling_method in cfg.SOLVER.AUTO_SCALING_METHODS:
        logger.info("Applying auto scaling method: {}".format(scaling_method))
        CONFIG_SCALING_METHOD_REGISTRY.get(scaling_method)(cfg, new_world_size)

    assert (
        cfg.SOLVER.REFERENCE_WORLD_SIZE == cfg.SOLVER.REFERENCE_WORLD_SIZE
    ), "Runner's scale_world_size shouldn't change SOLVER.REFERENCE_WORLD_SIZE"
    cfg.SOLVER.REFERENCE_WORLD_SIZE = new_world_size

    if frozen:
        cfg.freeze()

    from d2go.config.utils import get_cfg_diff_table

    table = get_cfg_diff_table(cfg, original_cfg)
    logger.info("Auto-scaled the config according to the actual world size: \n" + table)

使用1块gpus进行运行,转换之后的参数如下:

[06/28 21:26:30] d2go.config.config INFO: Applying auto scaling method: default_scale_d2_configs
[06/28 21:26:30] d2go.config.config INFO: Applying auto scaling method: default_scale_quantization_configs
[06/28 21:26:30] d2go.config.config INFO: Auto-scaled the config according to the actual world size: 
| config key                                      | old value      | new value          |
|:------------------------------------------------|:---------------|:-------------------|
| QUANTIZATION.QAT.DISABLE_OBSERVER_ITER          | 38000          | 2432000            |
| QUANTIZATION.QAT.ENABLE_LEARNABLE_OBSERVER_ITER | 36000          | 2304000            |
| QUANTIZATION.QAT.ENABLE_OBSERVER_ITER           | 35000          | 2240000            |
| QUANTIZATION.QAT.FREEZE_BN_ITER                 | 37000          | 2368000            |
| QUANTIZATION.QAT.START_ITER                     | 35000          | 2240000            |
| SOLVER.BASE_LR                                  | 0.01           | 0.00015625         |
| SOLVER.IMS_PER_BATCH                            | 128            | 2                  |
| SOLVER.MAX_ITER                                 | 90000          | 5760000            |
| SOLVER.REFERENCE_WORLD_SIZE                     | 64             | 1                  |
| SOLVER.STEPS                                    | (60000, 80000) | (3840000, 5120000) |
| SOLVER.WARMUP_ITERS                             | 1000           | 64000              |

观察可以发现,转换前后batch_size/gpu不变,主要的思想就是将之前一次大batch_size的运算分开成若干次小batch_size的运算。

batch_size的计算如下:

总的batch_size由于gpu数量的减少而减少,但是每块gpu的batch_size却不变。
128 / 64 = = 2 / 1 = = 2 128 / 64 == 2 / 1 == 2 128/64==2/1==2

迭代次数的计算如下:

128 ∗ 90000 = 2 ∗ ( n e w i t e r s ) 128 * 90000 = 2 * (newiters) 12890000=2(newiters)

其他参数也都是按照等比例进行转换的,其中主要的参数就是Batch_Size和Base_LR,还有迭代参数。可以阅读论文《Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour》

从迭代次数来看,很浪费时间(达到了百万次),因此这种方法由于耗时又耗力而无法复现。但是通过观察可以发现每一块gpus的batch_size很小,如果我们的gpu显存很大,但是数量不多的时候,这种方式无异于暴殄天物,所以需要提高每一块gpu显存的利用率,那么参数又该如何进行配置呢。

Tips: 我有两张Quadro RTX 5000(16G显存)的显卡。


二、任务描述:

接下来需要进行实验的验证,主要的任务如下:

  • 保持原始的总Batch_Size(即配置文件中的SOLVER.IMS_PER_BATCH)、学习率(即配置文件中的SOLVER.BASE_LR)、迭代次数(即配置文件中的SOLVER.MAX_ITER)、Gpu数量(即配置文件中的SOLVER.REFERENCE_WORLD_SIZE),得到一个baseline结果。
  • 总Batch_Size(即配置文件中的SOLVER.IMS_PER_BATCH)、学习率(即配置文件中的SOLVER.BASE_LR)、迭代次数(即配置文件中的SOLVER.MAX_ITER)保持不变的情况之下,通过减少gpu的数量(即配置文件中的SOLVER.REFERENCE_WORLD_SIZE)来观察对于结果是否有太大的影响。
  • 尽可能的提高每一块gpu的利用率,即通过增加每一块gpu的Batch_Size从而间接增加总Batch_Size(即配置文件中的SOLVER.IMS_PER_BATCH),并按照线性变换的方式来调整(减倍)学习率(即配置文件中的SOLVER.BASE_LR)、迭代次数(即配置文件中的SOLVER.MAX_ITER),最后观察对于结果是否有太大的影响。

采用的配置文件路径为:sylph-few-shot-detection/configs/COCO-Detection/TFA/FCOS_pretrain.yaml,代码如下:

MODEL:
  META_ARCHITECTURE: "MetaOneStageDetector"
  BACKBONE:
    NAME: "build_fcos_resnet_fpn_backbone"
    FREEZE: False # freeze backbone
  RESNETS:
    OUT_FEATURES: ["res3", "res4", "res5"]
  FPN:
    IN_FEATURES: ["res3", "res4", "res5"]
  PROPOSAL_GENERATOR:
    NAME: "MetaFCOS"
  META_LEARN:
    EPISODIC_LEARNING: False # pretraining
  TFA:
    TRAIN_SHOT: 10
  FCOS:
    NUM_CLASSES: 60 #80
  # PIXEL_MEAN: [102.9801, 115.9465, 122.7717]
DATASETS:
  # TRAIN: ("coco_pretrain_train_all",)
  # TEST: ("coco_pretrain_val_all", ) # "coco_pretrain_val_novel", )
  TRAIN: ("coco_pretrain_train_base",)
  TEST: ("coco_pretrain_val_base", ) # "coco_pretrain_val_novel", )
TEST:
  EVAL_PERIOD: 10000
SOLVER:
  IMS_PER_BATCH: 16
  BASE_LR: 0.01  # Note that RetinaNet uses a different default learning rate
  STEPS: (60000, 80000)
  MAX_ITER: 10 #90000
  # LR_MULTIPLIER_OVERWRITE, [{'backbone': 0.1}, {'reference_points': 0.1, 'sampling_offsets': 0.1}]
INPUT:
  MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)

如果没有配置SOLVER.REFERENCE_WORLD_SIZE参数,默认值是8.

2.1、利用两块GPU跑出一个Baseline结果:

IMS_PER_BATCH: 16、BASE_LR: 0.01、MAX_ITER: 30000、REFERENCE_WORLD_SIZE: 2、 STEPS:(20000,25000)

配置如下:

DATASETS:
  TEST:
  - coco_pretrain_val_base
  TRAIN:
  - coco_pretrain_train_base
INPUT:
  MIN_SIZE_TRAIN:
  - 640
  - 672
  - 704
  - 736
  - 768
  - 800
MODEL:
  BACKBONE:
    NAME: build_fcos_resnet_fpn_backbone
  DDP_FIND_UNUSED_PARAMETERS: true
  FCOS:
    NUM_CLASSES: 60
  FPN:
    IN_FEATURES:
    - res3
    - res4
    - res5
  META_ARCHITECTURE: MetaOneStageDetector
  PROPOSAL_GENERATOR:
    NAME: MetaFCOS
  RESNETS:
    OUT_FEATURES:
    - res3
    - res4
    - res5
  WEIGHTS: detectron2://ImageNetPretrained/MSRA/R-50.pkl
OUTPUT_DIR: output_sylph/TFA_shuangka/coco/pretrain/
SOLVER:
  IMS_PER_BATCH: 16
  BASE_LR: 0.01
  MAX_ITER: 30000
  REFERENCE_WORLD_SIZE: 2
  STEPS:
  - 20000
  - 25000
TEST:
  EVAL_PERIOD: 10000

训练结果如下:

[09/02 03:46:48] d2.engine.hooks INFO: Overall training speed: 29998 iterations in 11:05:49 (1.3317 s / it)
[09/02 03:46:48] d2.engine.hooks INFO: Total training time: 11:17:36 (0:11:47 on hooks) # 训练时间
[09/02 03:50:43] d2.evaluation.coco_evaluation INFO: Preparing results for COCO format ...
[09/02 03:50:43] d2.evaluation.coco_evaluation INFO: Saving results to output_sylph/TFA_shuangka/coco/pretrain/inference/default/final/coco_pretrain_val_base/coco_instances_results.json
[09/02 03:50:45] d2.evaluation.coco_evaluation INFO: Evaluating predictions with official COCO API...
[09/02 03:52:09] d2.evaluation.coco_evaluation INFO: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |  ARdet1  |  ARdet10  |  ARdet100  |  ARs   |  ARm   |  ARl   |
|:------:|:------:|:------:|:------:|:------:|:------:|:--------:|:---------:|:----------:|:------:|:------:|:------:|
| 32.887 | 50.833 | 35.163 | 19.228 | 37.868 | 42.126 |  29.625  |  50.698   |   54.897   | 38.173 | 60.524 | 69.414 |
[09/02 03:52:09] d2.evaluation.coco_evaluation INFO: Per-category bbox AP: 
| category       | AP     | category      | AP     | category     | AP     |
|:---------------|:-------|:--------------|:-------|:-------------|:-------|
| truck          | 25.783 | traffic light | 24.838 | fire hydrant | 58.756 |
| stop sign      | 60.566 | parking meter | 36.520 | bench        | 18.567 |
| elephant       | 53.398 | bear          | 59.048 | zebra        | 60.039 |
| giraffe        | 60.111 | backpack      | 13.367 | umbrella     | 34.310 |
| handbag        | 12.899 | tie           | 27.715 | suitcase     | 30.286 |
| frisbee        | 62.595 | skis          | 15.911 | snowboard    | 21.826 |
| sports ball    | 43.707 | kite          | 32.891 | baseball bat | 25.121 |
| baseball glove | 34.357 | skateboard    | 44.367 | surfboard    | 31.114 |
| tennis racket  | 44.440 | wine glass    | 33.262 | cup          | 37.429 |
| fork           | 20.322 | knife         | 11.687 | spoon        | 9.416  |
| bowl           | 38.036 | banana        | 20.496 | apple        | 17.709 |
| sandwich       | 29.315 | orange        | 29.050 | broccoli     | 20.888 |
| carrot         | 17.289 | hot dog       | 24.932 | pizza        | 46.245 |
| donut          | 41.060 | cake          | 30.549 | bed          | 33.140 |
| toilet         | 52.188 | laptop        | 51.783 | mouse        | 57.023 |
| remote         | 23.473 | keyboard      | 41.127 | cell phone   | 30.168 |
| microwave      | 49.580 | oven          | 27.093 | toaster      | 9.304  |
| sink           | 31.164 | refrigerator  | 44.687 | book         | 11.876 |
| clock          | 48.607 | vase          | 35.526 | scissors     | 17.430 |
| teddy bear     | 35.318 | hair drier    | 0.712  | toothbrush   | 12.794 |
[09/02 03:52:10] d2.evaluation.testing INFO: copypaste: Task: bbox
[09/02 03:52:10] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl,ARdet1,ARdet10,ARdet100,ARs,ARm,ARl
[09/02 03:52:10] d2.evaluation.testing INFO: copypaste: 32.8868,50.8326,35.1630,19.2278,37.8680,42.1257,29.6249,50.6976,54.8969,38.1732,60.5238,69.4136
[09/02 03:52:10] d2go.utils.misc INFO: Dumping trained config file: output_sylph/TFA_shuangka/coco/pretrain/trained_model_configs/model_final.yaml
[09/02 03:52:10] d2go.utils.misc INFO: Finished dumping trained config file
[09/02 03:52:10] mobile_cv.torch.utils_pytorch.distributed_helper INFO: Save tools.setup.new_main_func return to: /tmp/mcvdh_detectron2.utils.serialize.new_main_func_returnpriwfza6.pth.rank0

Tips: 需要设置DDP_FIND_UNUSED_PARAMETERS: true。

2.2、利用一块GPU运行出来结果:

IMS_PER_BATCH: 16、BASE_LR: 0.01、MAX_ITER: 30000、REFERENCE_WORLD_SIZE: 1、 STEPS:(20000,25000)

配置如下:

DATASETS:
  TEST:
  - coco_pretrain_val_base
  TRAIN:
  - coco_pretrain_train_base
INPUT:
  MIN_SIZE_TRAIN:
  - 640
  - 672
  - 704
  - 736
  - 768
  - 800
MODEL:
  BACKBONE:
    NAME: build_fcos_resnet_fpn_backbone
  FCOS:
    NUM_CLASSES: 60
  FPN:
    IN_FEATURES:
    - res3
    - res4
    - res5
  META_ARCHITECTURE: MetaOneStageDetector
  PROPOSAL_GENERATOR:
    NAME: MetaFCOS
  RESNETS:
    OUT_FEATURES:
    - res3
    - res4
    - res5
  WEIGHTS: detectron2://ImageNetPretrained/MSRA/R-50.pkl
OUTPUT_DIR: output_sylph/TFA_danka/coco/pretrain/
SOLVER:
  IMS_PER_BATCH: 16
  BASE_LR: 0.01
  MAX_ITER: 30000
  REFERENCE_WORLD_SIZE: 1
  STEPS:
  - 20000
  - 25000
TEST:
  EVAL_PERIOD: 10000

训练结果如下:

[09/01 15:07:15] d2.engine.hooks INFO: Overall training speed: 29998 iterations in 19:58:49 (2.3978 s / it)
[09/01 15:07:15] d2.engine.hooks INFO: Total training time: 20:15:22 (0:16:32 on hooks) # 总训练时间
[09/01 15:13:40] d2.evaluation.coco_evaluation INFO: Preparing results for COCO format ...
[09/01 15:13:41] d2.evaluation.coco_evaluation INFO: Saving results to output_sylph/TFA_danka/coco/pretrain/inference/default/final/coco_pretrain_val_base/coco_instances_results.json
[09/01 15:13:43] d2.evaluation.coco_evaluation INFO: Evaluating predictions with official COCO API...
[09/01 15:15:05] d2.evaluation.coco_evaluation INFO: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |  ARdet1  |  ARdet10  |  ARdet100  |  ARs   |  ARm   |  ARl   |
|:------:|:------:|:------:|:------:|:------:|:------:|:--------:|:---------:|:----------:|:------:|:------:|:------:|
| 32.882 | 50.836 | 35.346 | 19.495 | 37.770 | 41.777 |  30.014  |  50.698   |   55.017   | 38.052 | 60.512 | 69.699 |
[09/01 15:15:05] d2.evaluation.coco_evaluation INFO: Per-category bbox AP: 
| category       | AP     | category      | AP     | category     | AP     |
|:---------------|:-------|:--------------|:-------|:-------------|:-------|
| truck          | 26.234 | traffic light | 24.540 | fire hydrant | 57.536 |
| stop sign      | 61.126 | parking meter | 39.529 | bench        | 19.172 |
| elephant       | 52.451 | bear          | 57.825 | zebra        | 59.816 |
| giraffe        | 60.008 | backpack      | 13.684 | umbrella     | 33.598 |
| handbag        | 12.576 | tie           | 27.643 | suitcase     | 30.193 |
| frisbee        | 60.575 | skis          | 16.626 | snowboard    | 22.355 |
| sports ball    | 42.712 | kite          | 35.412 | baseball bat | 24.698 |
| baseball glove | 33.945 | skateboard    | 44.483 | surfboard    | 31.571 |
| tennis racket  | 44.272 | wine glass    | 32.958 | cup          | 37.498 |
| fork           | 21.523 | knife         | 10.945 | spoon        | 8.277  |
| bowl           | 37.415 | banana        | 20.202 | apple        | 16.854 |
| sandwich       | 28.217 | orange        | 29.647 | broccoli     | 21.222 |
| carrot         | 18.313 | hot dog       | 25.558 | pizza        | 46.495 |
| donut          | 42.006 | cake          | 31.377 | bed          | 32.272 |
| toilet         | 51.671 | laptop        | 51.830 | mouse        | 58.112 |
| remote         | 23.474 | keyboard      | 41.571 | cell phone   | 28.927 |
| microwave      | 48.750 | oven          | 26.837 | toaster      | 10.041 |
| sink           | 32.883 | refrigerator  | 44.787 | book         | 12.040 |
| clock          | 47.192 | vase          | 35.346 | scissors     | 17.160 |
| teddy bear     | 34.904 | hair drier    | 0.921  | toothbrush   | 13.110 |
[09/01 15:15:06] d2.evaluation.testing INFO: copypaste: Task: bbox
[09/01 15:15:06] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl,ARdet1,ARdet10,ARdet100,ARs,ARm,ARl
[09/01 15:15:06] d2.evaluation.testing INFO: copypaste: 32.8819,50.8364,35.3460,19.4948,37.7705,41.7771,30.0138,50.6982,55.0174,38.0516,60.5119,69.6988
[09/01 15:15:06] d2go.utils.misc INFO: Dumping trained config file: output_sylph/TFA_danka/coco/pretrain/trained_model_configs/model_final.yaml
[09/01 15:15:06] d2go.utils.misc INFO: Finished dumping trained config file
[09/01 15:15:06] mobile_cv.torch.utils_pytorch.distributed_helper INFO: Launched jobs finished, dist_url: file:///tmp/d2go_dist_file_1725101434.1924598

从上述结果可以看出两者结果近似,得出的结论是:

当保持总BS、lr、iters不变的时候,训练结果与GPU数量无关,可以使用一张大显存的GPU来代替若干张小显存GPU进行运算。

使用Tensorboard来可视化一下损失图也可以验证这个想法(在文章的最后)。

2.3、利用两块GPU(bs_pre最大)运行出来结果:

IMS_PER_BATCH: 30、BASE_LR: 0.02、MAX_ITER: 15000、REFERENCE_WORLD_SIZE: 1、 STEPS:(10000,12500)

Tips: 原本IMS_PER_BATCH应该是32的,但是由于每张卡的bs=16会爆显存,所以设置成30.

配置如下:

DATASETS:
  TEST:
  - coco_pretrain_val_base
  TRAIN:
  - coco_pretrain_train_base
INPUT:
  MIN_SIZE_TRAIN:
  - 640
  - 672
  - 704
  - 736
  - 768
  - 800
MODEL:
  BACKBONE:
    NAME: build_fcos_resnet_fpn_backbone
  DDP_FIND_UNUSED_PARAMETERS: true
  FCOS:
    NUM_CLASSES: 60
  FPN:
    IN_FEATURES:
    - res3
    - res4
    - res5
  META_ARCHITECTURE: MetaOneStageDetector
  PROPOSAL_GENERATOR:
    NAME: MetaFCOS
  RESNETS:
    OUT_FEATURES:
    - res3
    - res4
    - res5
  WEIGHTS: detectron2://ImageNetPretrained/MSRA/R-50.pkl
OUTPUT_DIR: output_sylph/TFA_shuangbeibs/coco/pretrain/
SOLVER:
  BASE_LR: 0.02
  IMS_PER_BATCH: 30
  MAX_ITER: 15000
  REFERENCE_WORLD_SIZE: 2
  STEPS:
  - 10000
  - 12500
TEST:
  EVAL_PERIOD: 10000

训练结果如下:

[09/02 19:14:36] d2.engine.hooks INFO: Overall training speed: 14998 iterations in 9:43:15 (2.3333 s / it)
[09/02 19:14:36] d2.engine.hooks INFO: Total training time: 9:49:10 (0:05:55 on hooks)	# 总的训练时间
[09/02 19:18:28] d2.evaluation.coco_evaluation INFO: Preparing results for COCO format ...
[09/02 19:18:29] d2.evaluation.coco_evaluation INFO: Saving results to output_sylph/TFA_shuangbeibs/coco/pretrain/inference/default/final/coco_pretrain_val_base/coco_instances_results.json
[09/02 19:18:31] d2.evaluation.coco_evaluation INFO: Evaluating predictions with official COCO API...
[09/02 19:19:53] d2.evaluation.coco_evaluation INFO: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |  ARdet1  |  ARdet10  |  ARdet100  |  ARs   |  ARm   |  ARl   |
|:------:|:------:|:------:|:------:|:------:|:------:|:--------:|:---------:|:----------:|:------:|:------:|:------:|
| 32.662 | 50.715 | 35.003 | 19.986 | 37.365 | 41.966 |  29.936  |  50.815   |   54.903   | 39.312 | 60.476 | 69.437 |
[09/02 19:19:53] d2.evaluation.coco_evaluation INFO: Per-category bbox AP: 
| category       | AP     | category      | AP     | category     | AP     |
|:---------------|:-------|:--------------|:-------|:-------------|:-------|
| truck          | 24.719 | traffic light | 25.109 | fire hydrant | 56.087 |
| stop sign      | 59.856 | parking meter | 36.393 | bench        | 18.575 |
| elephant       | 52.799 | bear          | 55.483 | zebra        | 60.004 |
| giraffe        | 58.331 | backpack      | 12.968 | umbrella     | 33.007 |
| handbag        | 12.360 | tie           | 27.676 | suitcase     | 31.035 |
| frisbee        | 61.520 | skis          | 16.335 | snowboard    | 24.008 |
| sports ball    | 43.596 | kite          | 36.135 | baseball bat | 24.210 |
| baseball glove | 34.805 | skateboard    | 44.214 | surfboard    | 31.307 |
| tennis racket  | 45.242 | wine glass    | 32.089 | cup          | 37.601 |
| fork           | 20.472 | knife         | 10.713 | spoon        | 9.313  |
| bowl           | 37.740 | banana        | 20.521 | apple        | 18.616 |
| sandwich       | 27.233 | orange        | 30.228 | broccoli     | 21.418 |
| carrot         | 16.801 | hot dog       | 24.280 | pizza        | 46.001 |
| donut          | 39.685 | cake          | 31.516 | bed          | 30.078 |
| toilet         | 49.645 | laptop        | 51.554 | mouse        | 57.478 |
| remote         | 24.332 | keyboard      | 41.024 | cell phone   | 29.425 |
| microwave      | 46.203 | oven          | 26.922 | toaster      | 16.231 |
| sink           | 31.143 | refrigerator  | 43.622 | book         | 11.921 |
| clock          | 47.546 | vase          | 34.431 | scissors     | 17.359 |
| teddy bear     | 36.482 | hair drier    | 0.428  | toothbrush   | 13.893 |
[09/02 19:19:54] d2.evaluation.testing INFO: copypaste: Task: bbox
[09/02 19:19:54] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl,ARdet1,ARdet10,ARdet100,ARs,ARm,ARl
[09/02 19:19:54] d2.evaluation.testing INFO: copypaste: 32.6619,50.7154,35.0027,19.9863,37.3647,41.9661,29.9360,50.8152,54.9029,39.3122,60.4762,69.4366
[09/02 19:19:54] d2go.utils.misc INFO: Dumping trained config file: output_sylph/TFA_shuangbeibs/coco/pretrain/trained_model_configs/model_final.yaml
[09/02 19:19:54] d2go.utils.misc INFO: Finished dumping trained config file
[09/02 19:19:54] mobile_cv.torch.utils_pytorch.distributed_helper INFO: Save tools.setup.new_main_func return to: /tmp/mcvdh_detectron2.utils.serialize.new_main_func_returnp17qhdor.pth.rank0

三、 结果与结论:

定义如下:

  • 方法1 即:Baseline,采用2GPUS,保持总的BS、iteration、base_lr不变。
  • 方法2 即:采用1GPU,保持总的Batch_Size、iteration、base_lr不变。
  • 方法3 即: 采用2GPUs,适当调整iteration(减倍)、Batch_Size(加倍)、base_lr(加倍)。

3.1、训练结果:

方法APAP50AP75APsAPmAPlARdet1ARdet10ARdet100ARsARmARl
方法132.88750.83335.16319.22837.86842.12629.62550.69854.89738.17360.52469.414
方法232.88250.83635.34619.49537.77041.77730.01450.69855.01738.05260.51269.699
方法332.62250.71535.00319.98637.36541.96629.93650.81554.90339.31260.47669.437

观察训练结果可以发现三者比较接近,尤其是 方法1 与 方法2。方法3稍有0.2的落后,主要是由于总的Batch_Size理应是32,由于显存爆炸设置成了30.

3.2、总的训练时间:

Baseline(2GPU)不改变iteration(1GPU)改变iteration(2GPU)
11:17:3620:15:229:49:10

可以发现使用大的Batch_size确实能够节省训练时间,并且达到同样的训练效果。

3.3、tensorboard可视化的结果:

  • 青靛色为方法3:改变iteration(2GPU)
  • 浅蓝色为方法2:不改变iteration(1GPU)
  • 橙黄色为方法1:Baseline(2GPU)

训练时间

data_time

loss_fcos_cls

loss_fcos_cls

loss_fcos_ctr

loss_fcos_ctr

loss_fcos_loc

loss_fcos_loc

lr

lr

total_loss

total_loss

3.4 结论:

从3.3中的各个可视化结果图来看,损失函数都比较相似,与3.1的训练结果相匹配。

因此,对于可以采用这种大Batch_Size的方式来减少总的iteration,替代源代码中那种集群的训练方式,从而减少训练的时间。

  • 5
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

做一个徘徊在牛a与牛c之间

熬夜不易,谢谢支持!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值