DeFCN debug记录(训练过程),以及对cvpods框架的分析

使用旷世的cvpods框架,环境部署在云端,然后在本地的pycharm下进行debug。

目录

一、 前提准备

二、 Debug记录

1. train_net.py

2. cvpods/engine/launch.py --- (对分布式训练进行初始化设置)

 1) main

3. cvpods/engine/setup.py

1)default_setup

4. cvpods/engine/runner.py

1) __init__

5. cvpods/data/build.py

1) build_transform_gens

2) build_dataset

 3) build_train_loader

 6. cvpods/data/datasets/coco.py

1) _get_coco_instance_meta

 2) _get_metadata

 3) _load_annotations

4) _init__

 7 . cvpods/data/base_dataset.py

1) __init__

2)  filter_images_with_only_crowd_annotations

3) print_instances_class_histogram

 4)_set_group_flag

8. net.py

9. fcos.py

 10. cvpods/modeling/backbone/resnet.py

1) build_resnet_backbone

11. cvpods/modeling/backbone/fpn.py

1) FPN

三、一些设定的入口

1. 传入参数的设定  argparse.ArgumentParser:

2.  打印输出日志的地方

以表格的形式打印输出

3. 传入模型的入口

4. runner 执行评估或者测试

5. 对数据集处理的入口

data_loader 的创建 

 6.分布式训练 DDP的设定

7. 模型的创建

1) resnet backbone的创建 ------ 来源于cvpods库

 2) FPN 的 创建 ------来源于cvpods库

 3) FCOSHead的创建 ------ 来源于自定义

8. 3D Maxfilter定义处

9. optimizer定义 

10.  lr-scheduler的定义

 11. 训练过程

1) 入口

2) 实现流程

12 .训练过程中通过挂 hooks 的实现细节

1)OptimizationHook

2) IterationTimer

 3) LRScheduler

4) PeriodicCheckpointer

5) EvalHook

6) PeriodicWriter

13. 执行 data 送入 model 的入口

14. 建议更改的一些设定


一、 前提准备

在终端直接运行训练时是在DeFCN下对应实验的config文件下,直接运行的,如下:

pods_train --num-gpus 1 --dir ~/data/zjx/cvpods1/DeFCN-main/playground/detection/coco/poto.res50.fpn.coco.800size.3x_ms.3dmf

要想在本地pychram下直接debug需要设置一下。配置一下config,就是输入的参数 --dir

添加环境变量 PYTHONPATH = '路径--之前的DeFCN的config文件路径' 

ok,完成。

中间遇到的问题

pydev debugger: warning: trying to add breakpoint to file that does not exist

解决办法

debug过程中卡顿

解决办法

二、 Debug记录

1. train_net.py

args

 config

runner

2. cvpods/engine/launch.py --- (对分布式训练进行初始化设置)

args

 1) main

config

3. cvpods/engine/setup.py

1)default_setup

args

 进行config调整后 cfg

4. cvpods/engine/runner.py

1) __init__

self

2) 

5. cvpods/data/build.py

1) build_transform_gens

tfm_gens

2) build_dataset

dataset--(_build_single_dataset返回的) 

 dataset ---(最终返回的)

 3) build_train_loader

sampler (经过DDP)

 sampler (经过包装器)

 data_loader

 6. cvpods/data/datasets/coco.py

1) _get_coco_instance_meta

thing_ids

 thing_color

 thing_dataset_id_to_contiguous_id

 thing_classes

 2) _get_metadata

meta  (具体见其来源函数的内部,及上面 1) 这部分)

 写入一些路径后

 3) _load_annotations

coco_api

 cat_ids

 cats

 thing_classes

 id_map

 img_ids

 imgs

 anns

 ann_ids

 img_anns

 img_dict

 anno_dict_list

objs(第一次循环后的)

 obj

 segm

4) _init__

self (最终的)

 7 . cvpods/data/base_dataset.py

1) __init__

self

2)  filter_images_with_only_crowd_annotations

dataset_dicts

3) print_instances_class_histogram

entry

 data

 4)_set_group_flag

self 

 dataset_dict

8. net.py

1) build_backbone

input_shape

9. fcos.py

cfg

 backbone_shape

 self.shift_generator

 10. cvpods/modeling/backbone/resnet.py

1) build_resnet_backbone

stem

p

stage_kargs 循环 stage_idx=2,3,4,5

11. cvpods/modeling/backbone/fpn.py

1) FPN

input_shapes

三、一些设定的入口

1. 传入参数的设定  argparse.ArgumentParser:

train_net.py -----71

parser = default_argument_parser()

cvpods/engine/setup.py -----27

参数解读

--dir config和net所在的路径,默认为当前工作路径,后面配合相当于增添环境变量PYTHONPATH为该路径

--resume  是否需要从checkpoint directory 恢复训练,应该是断点续训

--eval-only  只进行评估

--num-gpus 定义使用的GPU的总数

--num-machine  默认1

--machine-rank 默认0 定义主机

opts 修改config的选择

2.  打印输出日志的地方

train_net.py -----169

 logger.info("Create soft link to {}".format(config.OUTPUT_DIR))

 cvpods/engine/setup.py ----- 135

setup_logger(output_dir, distributed_rank=rank)

这里主要是 setup_logger函数里会有相应的日志输出打印设置。  

  cvpods/engine/setup.py -----137

logger.info("Rank of current process: {}. World size: {}".format(
        rank, comm.get_world_size()))
logger.info("Environment info:\n" + collect_env_info())
logger.info("Command line arguments: " + str(args))

cvpods/data/build.py -----127

logger.info(f"TransformGens used: {transform_gens} in training")

coco库中打印的 

print('loading annotations into memory...')
print('Done (t={:0.2f}s)'.format(time.time()- tic))
print('creating index...')

 cvpods/data/datasets/coco.py -----238

logger.info("Loading {} takes {:.2f} seconds.".format(
                json_file, timer.seconds()))

 cvpods/data/datasets/coco.py -----308

logger.info("Loaded {} images in COCO format from {}".format(
            len(imgs_anns), json_file))

cvpods/data/base_dataset.py -----219 

 logger.info(
        "Removed {} images with no usable annotations. {} images left.".format(
            num_before - num_after, num_after
        )
    )

 cvpods/data/base_dataset.py -----170 

print_instances_class_histogram(dataset_dicts, class_names)

train_net.py -----104 106

logger.info("Running with full config:\n{}".format(cfg))
logger.info("different config with base class:\n{}".format(cfg.diff(base_config)))

 runner.py ----- 281

 logger.info("Starting training from iteration {}".format(self.start_iter))

以表格的形式打印输出

 cvpods/data/base_dataset.py ------344

利用 tabulate库来实现,

 table = tabulate(
        data,
        headers=["category", "#instances"] * (N_COLS // 2),
        tablefmt="pipe",
        numalign="left",
        stralign="center",
    )  # 以优雅的表格形式打印输出
 log_first_n(
        "INFO",
        "Distribution of instances among all {} categories:\n".format(
            num_classes) + colored(table, "cyan"),
        key="message",
    )

 cvpods/data/build.py -----133  147

logger.info("Using training sampler {}".format(sampler_name))
logger.info("Wrap sampler with infinite warpper...")

net.py -----37

logger = logging.getLogger(__name__)
logger.info("Model:\n{}".format(model))

 cvpods/utils/compat_wrapper.py -----14

logger.warning("{} will be deprecated. {}".format(func.__name__, extra_info))

cvpods/engine/runner.py -----90 ------(打印整个模型的结构)

logger.info(f"Model: \n{self.model}")

 hooks.py ---- 201

logger.info(
                "Overall training speed: {} iterations in {} ({:.4f} s / it)".format(
                    num_iter,
                    str(datetime.timedelta(seconds=int(total_time_minus_hooks))),
                    total_time_minus_hooks / num_iter,
                )
            )

        logger.info(
            "Total training time: {} ({} on hooks)".format(
                str(datetime.timedelta(seconds=int(total_time))),
                str(datetime.timedelta(seconds=int(hook_time))),
            )
        )

3. 传入模型的入口

首先需要定义好模型的网络架构,以及模型参数的设置。然后把模型的网络架构和设置config文件传入进来

train_net.py ----- 171

launch(
        main,
        args.num_gpus,
        num_machines=args.num_machines,
        machine_rank=args.machine_rank,
        dist_url=args.dist_url,
        args=(args, config, build_model),
    )

这个launch函数首先对分布式训练进行了一些初始化设置,然后再执行模型的训练,这里执行模型的训练的关键就是 传入的 main 函数,其在

train_net.py ----- 81

4. runner 执行评估或者测试

train_net.py ----89

runner = runner_decrator(RUNNERS.get(cfg.TRAINER.NAME))(cfg, build_model)

这里的训练流程 包含它们的一些模型 ,如果想要跑自己的需要用SimpleRunner或者重写训练环。

最后他会转到cvpods/engine/runner.py  开始执行训练

5. 对数据集处理的入口


若修改数据集,则在config文件中的DATASET.TRAIN 进行修改,并且需要相应的参考一下是否需要更改cvpods/data/datasets/paths_route.py中的内容,举例如下

_PREDEFINED_SPLITS_COCO["coco"] = {
    "coco_2014_train":
    ("coco/train2014", "coco/annotations/instances_train2014.json"),
    "coco_2014_val":
    ("coco/val2014", "coco/annotations/instances_val2014.json"),
    "coco_2014_minival":
    ("coco/val2014", "coco/annotations/instances_minival2014.json"),
    "coco_2014_minival_100":
    ("coco/val2014", "coco/annotations/instances_minival2014_100.json"),
    "coco_2014_valminusminival": (
        "coco/val2014",
        "coco/annotations/instances_valminusminival2014.json",
    ),
    "coco_2017_train": ("coco/train2017",
                        "coco/annotations/instances_train2017.json"),
    "coco_2017_val": ("coco/val2017",
                      "coco/annotations/instances_val2017.json"),
    "coco_2017_test": ("coco/test2017",
                       "coco/annotations/image_info_test2017.json"),
    "coco_2017_test-dev": ("coco/test2017",
                           "coco/annotations/image_info_test-dev2017.json"),
    "coco_2017_val_100": ("coco/val2017",
                          "coco/annotations/instances_val2017_100.json"),
}

这个导入数据集的路径设置会在cvpods/data/datasets/coco.py -----381

 image_root, json_file = _PREDEFINED_SPLITS_COCO[self.task_key][self.name]

 根据config文件中设置的 数据集的名称 等 到相应的字典中拿出 数据集的路径。


cvpods/data/datasets/coco.py -----236

coco_api = COCO(json_file)

这个地方会加载json文件的注释,使用了pycocotools.coco库,其里面实现的功能和之前的coco数据处理时的一样(参考用于目标跟踪的COCO数据集的预处理过程的API,以及对训练数据的数据增强操作

但这里并没有进行图片的裁剪,而是只加载了json文件中的注释。 


 cvpods/data/base_dataset.py ----161

dataset_dicts = filter_images_with_only_crowd_annotations(dataset_dicts)

 它会过滤掉 注释信息 "iscrowd“ 值不等于0的image,即只保留包含密集拥挤目标的图片。


cvpods/data/build.py -----102

def build_train_loader(cfg):

 cvpods/data/build.py -----128 

dataset = build_dataset(
        cfg, cfg.DATASETS.TRAIN, transforms=transform_gens, is_train=True
    )

data_loader 的创建 

cvpods/data/build.py -----149  <---- runner.py ------322 < ----- runner.py-----86

data_loader = torch.utils.data.DataLoader(
        dataset,
        batch_size=images_per_minibatch,
        sampler=sampler,
        num_workers=cfg.DATALOADER.NUM_WORKERS,
        collate_fn=trivial_batch_collator,
        worker_init_fn=worker_init_reset_seed,
    )

 6.分布式训练 DDP的设定

 cvpods/data/build.py -----142

sampler = SAMPLERS.get(sampler_name)(
            dataset, images_per_minibatch, num_devices, rank)

7. 模型的创建

  cvpods/engine/runner.py -----88

model = build_model(cfg)

然后,根据这个接口,程序会转到 自定义的相应的 config文件下的net.py中 ,但创建的过程中依然会用到cvpods库,比如骨干网络和fpn的创建,

net.py -----31 

def build_model(cfg):

    cfg.build_backbone = build_backbone
    cfg.build_shift_generator = build_shift_generator

    model = FCOS(cfg)
    logger = logging.getLogger(__name__)
    logger.info("Model:\n{}".format(model))
    return model

可以看到,首先在cfg中新加入两个参数,build_backbone和build_shift_generator,这两都是函数,它们的作用将会在FCOS(cfg)中来实现。

1) resnet backbone的创建 ------ 来源于cvpods库

在FCOS类中,如下所示

self.backbone = cfg.build_backbone(
            cfg, input_shape=ShapeSpec(channels=len(cfg.MODEL.PIXEL_MEAN)))

这里的 ShapeSpec只是返回一个相应的元组元素类

namedtuple("_ShapeSpec", ["channels", "height", "width", "stride"])

 然后程序进入 net.py中的build_backbone函数中,在这里

backbone = build_retinanet_resnet_fpn_p5_backbone(cfg, input_shape)

接着调用 cvpods/modeling/backbone/fpn.py 中的 

def build_retinanet_resnet_fpn_p5_backbone(cfg, input_shape: ShapeSpec):
    return build_retinanet_resnet_fpn_backbone(cfg, input_shape)

def build_retinanet_resnet_fpn_backbone(cfg, input_shape: ShapeSpec):
    return build_retinanet_fpn_backbone(cfg, input_shape)

 最终在 该放fpn.py文件下的build_retinanet_fpn_backbone中实现,这个函数里可选的结构有

"resnet","shufflev2","mobilev2","timm"

本次选用的是resnet。

    if backbone_name == "resnet":
        bottom_up = build_resnet_backbone(cfg, input_shape)

 进入cvpods/modeling/backbone/resnet.py 中的build_resnet_backbone函数中。其中网络的设置,包括选型(本次所选resnet-50)都由传入的cfg文件进行设置的。

depth = cfg.MODEL.RESNETS.DEPTH  # 50
num_blocks_per_stage = {
        18: [2, 2, 2, 2],
        34: [3, 4, 6, 3],
        50: [3, 4, 6, 3],
        101: [3, 4, 23, 3],
        152: [3, 8, 36, 3],
        200: [3, 24, 36, 3],
        269: [3, 30, 48, 8],
    }[depth]  # [] 相应的 表示每层的 block的个数

对于stem的创建,由类BasicStem完成

class BasicStem(nn.Module):

而且 这里会有一个设定,根据

freeze_at = cfg.MODEL.BACKBONE.FREEZE_AT  # 2
    if freeze_at >= 1:
        for p in stem.parameters():  # <c>
            p.requires_grad = False
        stem = FrozenBatchNorm2d.convert_frozen_batchnorm(stem)

当freeze_at>1时,会使得stem部分的网络参数不更新,而且接着对其初始化的值也进行了设置。

 而resnet基本模块的创建是由该文件resnet.py文件下的类BottleneckBlock完成的

class BottleneckBlock(ResNetBlockBase):

 由下面程序调用

blocks = make_stage(**stage_kargs)
def make_stage(block_class, num_blocks, first_stride, **kwargs):
    blocks = []
    for i in range(num_blocks):
        blocks.append(block_class(stride=first_stride if i == 0 else 1, **kwargs))
        kwargs["in_channels"] = kwargs["out_channels"]
    return blocks

其中 stage_kargs是个字典,其包含内容举例如下

 其传入到BottleneckBlock类的一些参数如下所示

第一层的 1

 第一层的2

 第一层的3

 当一个层创建完成后,下一层的输入通道数会变成当前层的输出通道,下一层的输出通道数等于当前层输出通道乘以2,依此来改变下一层的输入输出通道数的设计

in_channels = out_channels
out_channels *= 2
bottleneck_channels *= 2

 然后对layer2层的参数进行了冻结

        if freeze_at >= stage_idx:  # 冻结前两层,即冻结了layer2层
            for block in blocks:
                block.freeze()

接下来循环执行,依此创建其他层。最后通过ResNet类将这些层连接到一起。这个ResNet类

class ResNet(Backbone):

 继承了Backbone父类,它被定义在同文件夹下的backbone.py文件中,它又继承了nn.Model父类

class Backbone(nn.Module, metaclass=ABCMeta):

 所以ResNet中会有

self.add_module

等操作,其来源于pytorch库中的nn.Model父类。最终的resnet backbone结构如下所示

ResNet(
  (stem): BasicStem(
    (conv1): Conv2d(
      3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
      (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
    )
    (activation): ReLU(inplace=True)
    (max_pool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  )
  (res2): Sequential(
    (0): BottleneckBlock(
      (shortcut): Conv2d(
        64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
      )
      (conv2): Conv2d(
        64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
      )
      (conv3): Conv2d(
        64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
    )
    (1): BottleneckBlock(
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
      )
      (conv2): Conv2d(
        64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
      )
      (conv3): Conv2d(
        64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
    )
    (2): BottleneckBlock(
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
      )
      (conv2): Conv2d(
        64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
      )
      (conv3): Conv2d(
        64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
    )
  )
  (res3): Sequential(
    (0): BottleneckBlock(
      (shortcut): Conv2d(
        256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
        (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
      )
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False
        (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
      )
      (conv2): Conv2d(
        128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
      )
      (conv3): Conv2d(
        128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
      )
    )
    (1): BottleneckBlock(
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
      )
      (conv2): Conv2d(
        128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
      )
      (conv3): Conv2d(
        128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
      )
    )
    (2): BottleneckBlock(
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
      )
      (conv2): Conv2d(
        128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
      )
      (conv3): Conv2d(
        128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
      )
    )
    (3): BottleneckBlock(
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
      )
      (conv2): Conv2d(
        128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
      )
      (conv3): Conv2d(
        128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
      )
    )
  )
  (res4): Sequential(
    (0): BottleneckBlock(
      (shortcut): Conv2d(
        512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
        (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
      )
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
      (conv2): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
      (conv3): Conv2d(
        256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
      )
    )
    (1): BottleneckBlock(
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
      (conv2): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
      (conv3): Conv2d(
        256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
      )
    )
    (2): BottleneckBlock(
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
      (conv2): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
      (conv3): Conv2d(
        256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
      )
    )
    (3): BottleneckBlock(
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
      (conv2): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
      (conv3): Conv2d(
        256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
      )
    )
    (4): BottleneckBlock(
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
      (conv2): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
      (conv3): Conv2d(
        256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
      )
    )
    (5): BottleneckBlock(
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
      (conv2): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
      )
      (conv3): Conv2d(
        256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
      )
    )
  )
  (res5): Sequential(
    (0): BottleneckBlock(
      (shortcut): Conv2d(
        1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
        (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
      )
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
        (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
      )
      (conv2): Conv2d(
        512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
      )
      (conv3): Conv2d(
        512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
      )
    )
    (1): BottleneckBlock(
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
      )
      (conv2): Conv2d(
        512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
      )
      (conv3): Conv2d(
        512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
      )
    )
    (2): BottleneckBlock(
      (activation): ReLU(inplace=True)
      (conv1): Conv2d(
        2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
      )
      (conv2): Conv2d(
        512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
      )
      (conv3): Conv2d(
        512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
      )
    )
  )
)

 2) FPN 的 创建 ------来源于cvpods库

首先会创建P6P7两层,

top_block=LastLevelP6P7(in_channels_p6p7, out_channels, in_feature=block_in_feature),

 然后由FPN类来完成,

backbone = FPN(
        bottom_up=bottom_up,
        in_features=in_features,
        out_channels=out_channels,
        norm=cfg.MODEL.FPN.NORM,
        top_block=LastLevelP6P7(in_channels_p6p7, out_channels, in_feature=block_in_feature),
        fuse_type=cfg.MODEL.FPN.FUSE_TYPE,
    )

而且最终的得到的backbone是包括Resnet backbone 和 fpn 结构的。如下所示

FPN(
  (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
  (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
  (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
  (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (top_block): LastLevelP6P7(
    (p6): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (p7): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  )
  (bottom_up): ResNet(
    (stem): BasicStem(
      (conv1): Conv2d(
        3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
        (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
      )
      (activation): ReLU(inplace=True)
      (max_pool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    )
    (res2): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv2): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv3): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv2): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv3): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv2): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv3): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
      )
    )
    (res3): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
      (3): BottleneckBlock(
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
    )
    (res4): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (3): BottleneckBlock(
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (4): BottleneckBlock(
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (5): BottleneckBlock(
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
    )
    (res5): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv2): Conv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv3): Conv2d(
          512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv2): Conv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv3): Conv2d(
          512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (activation): ReLU(inplace=True)
        (conv1): Conv2d(
          2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv2): Conv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv3): Conv2d(
          512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
      )
    )
  )
)

至此,backbone 创建完成,回到最初的地方,即fcos.py-----78 

self.backbone = cfg.build_backbone(
            cfg, input_shape=ShapeSpec(channels=len(cfg.MODEL.PIXEL_MEAN)))

 3) FCOSHead的创建 ------ 来源于自定义

head的部分需要根据自己的方法来定义,本次由 类 FCOSHead来完成,定义的结构如下

FCOSHead(
  (cls_subnet): Sequential(
    (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): GroupNorm(32, 256, eps=1e-05, affine=True)
    (2): ReLU()
    (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): GroupNorm(32, 256, eps=1e-05, affine=True)
    (5): ReLU()
    (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): GroupNorm(32, 256, eps=1e-05, affine=True)
    (8): ReLU()
    (9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (10): GroupNorm(32, 256, eps=1e-05, affine=True)
    (11): ReLU()
  )
  (bbox_subnet): Sequential(
    (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): GroupNorm(32, 256, eps=1e-05, affine=True)
    (2): ReLU()
    (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): GroupNorm(32, 256, eps=1e-05, affine=True)
    (5): ReLU()
    (6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): GroupNorm(32, 256, eps=1e-05, affine=True)
    (8): ReLU()
    (9): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (10): GroupNorm(32, 256, eps=1e-05, affine=True)
    (11): ReLU()
  )
  (cls_score): Conv2d(256, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (bbox_pred): Conv2d(256, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (max3d): MaxFiltering(
    (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm): GroupNorm(32, 256, eps=1e-05, affine=True)
    (nonlinear): ReLU()
    (max_pool): MaxPool3d(kernel_size=(3, 3, 3), stride=1, padding=(1, 1, 1), dilation=1, ceil_mode=False)
  )
  (filter): Conv2d(256, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (scales): ModuleList(
    (0): Scale()
    (1): Scale()
    (2): Scale()
    (3): Scale()
    (4): Scale()
  )
)

 至此,model差不多定义完成,回到net.py

model = FCOS(cfg)

 总的内容如下所示

 最后回到最初的   cvpods/engine/runner.py -----88

model = build_model(cfg)

至此,model创建部分结束。 

8. 3D Maxfilter定义处

fcos.py -----509

与 Head 部分一起定义的

self.max3d = MaxFiltering(in_channels,
                                  kernel_size=cfg.MODEL.POTO.FILTER_KERNEL_SIZE,
                                  tau=cfg.MODEL.POTO.FILTER_TAU)
self.filter = nn.Conv2d(in_channels,
                                num_shifts * 1,
                                kernel_size=3,
                                stride=1,
                                padding=1)
MaxFiltering(
  (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (norm): GroupNorm(32, 256, eps=1e-05, affine=True)
  (nonlinear): ReLU()
  (max_pool): MaxPool3d(kernel_size=(3, 3, 3), stride=1, padding=(1, 1, 1), dilation=1, ceil_mode=False)
)

9. optimizer定义 

runner.py ----- 93

 self.optimizer = self.build_optimizer(cfg, self.model)

 根据cfg设置的优化器的类型采用相应的优化器,具体在cvpods/solver/optimizer_builder.py中实现

本次采用的是SGD。如下所示

SGD (
Parameter Group 0
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 1
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 2
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 3
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 4
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 5
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 6
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 7
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 8
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 9
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 10
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 11
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 12
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 13
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 14
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 15
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 16
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 17
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 18
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 19
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 20
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 21
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 22
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 23
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 24
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 25
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 26
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 27
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 28
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 29
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 30
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 31
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 32
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 33
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 34
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 35
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 36
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 37
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 38
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 39
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 40
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 41
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 42
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 43
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 44
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 45
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 46
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 47
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 48
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 49
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 50
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 51
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 52
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 53
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 54
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 55
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 56
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 57
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 58
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 59
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 60
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 61
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 62
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 63
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 64
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 65
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 66
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 67
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 68
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 69
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 70
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 71
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 72
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 73
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 74
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 75
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 76
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 77
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 78
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 79
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 80
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 81
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 82
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 83
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 84
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 85
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 86
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 87
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 88
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 89
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 90
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 91
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 92
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 93
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 94
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 95
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 96
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 97
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0

Parameter Group 98
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 99
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 100
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 101
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 102
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 103
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001

Parameter Group 104
    dampening: 0
    lr: 0.00125
    momentum: 0.9
    nesterov: False
    weight_decay: 0.0001
)

10.  lr-scheduler的定义

runner.py ----- 134

 11. 训练过程

1) 入口

train_net.py ----- 108

runner.train()

  -----> runner.py -----273 train函数中

super().train(self.start_iter, self.start_epoch, self.max_iter)

 -----> base_runner.py -----66 train函数中

 在这里定义了训练的loop,上面的参数可以看到一共循环的次数,循环如下

self.before_train()
for self.iter in range(start_iter, max_iter):
    self.inner_iter = 0
    self.before_step()
    # by default, a step contains data_loading and model forward,
    # loss backward is executed in after_step for better expansibility
    self.run_step()
    self.after_step()

需要通过挂hooks来实现。 流程为

before_train() --> before_step()--------> run_step()----->after_step()-->after_train()
                        |--------------- < 循环 <-------------|                            
    def before_train(self):
        for h in self._hooks:
            h.before_train()

    def after_train(self):
        self.storage._iter = self.iter
        for h in self._hooks:
            h.after_train()

    def before_step(self):
        # Maintain the invariant that storage.iter == runner.iter
        # for the entire execution of each step
        self.storage._iter = self.iter

        for h in self._hooks:
            h.before_step()

    def after_step(self):
        for h in self._hooks:
            h.after_step()

2) 实现流程

上面的循环流程都和 hook挂钩, 以此来实现各部分的功能。挂hook 的地方在runner.py -----153

self.register_hooks(self.build_hooks())

build_hooks() 函数主要返回包含各种hook的下面的ret列表

ret = [
            hooks.OptimizationHook(
                accumulate_grad_steps=cfg.SOLVER.BATCH_SUBDIVISIONS,
                grad_clipper=None,
                mixed_precision=cfg.TRAINER.FP16.ENABLED
            ),
            hooks.LRScheduler(self.optimizer, self.scheduler),
            hooks.IterationTimer(),
            hooks.PreciseBN(
                # Run at the same freq as (but before) evaluation.
                cfg.TEST.EVAL_PERIOD,
                self.model,
                # Build a new data loader to not affect training
                self.build_train_loader(cfg),
                cfg.TEST.PRECISE_BN.NUM_ITER,
            )
            if cfg.TEST.PRECISE_BN.ENABLED and get_bn_modules(self.model)
            else None,
        ]

 self.register_hooks 将 hoos 注册,可以通过注册启动相应的hook,其在base_ruuner.py ----48

hooks如下所示 ,

这些hooks 都在 hooks.py中以类 实现的,而且都是继承了 HookBase父类。

class HookBase:

    def before_train(self):
        """
        Called before the first iteration.
        """
        pass

    def after_train(self):
        """
        Called after the last iteration.
        """
        pass

    def before_step(self):
        """
        Called before each iteration.
        """
        pass

    def after_step(self):
        """
        Called after each iteration.
        """
        pass

可以看到 若子类需要启动对应的功能,则需要重写这些方法,若不需要则不重写,这样会直接pass。

在训练过程中,如果执行到hooks模块的相应功能(见下面的第12部分),该模块不包含或者没重写父类的该功能,则相应的跳过不执行,说明不对这个模块做任何行动。

12 .训练过程中通过挂 hooks 的实现细节

1)OptimizationHook

class OptimizationHook(HookBase):
    def __init__(self, accumulate_grad_steps=1, grad_clipper=None, mixed_precision=False):
        self.accumulate_grad_steps = accumulate_grad_steps
        self.grad_clipper = grad_clipper
        self.mixed_precision = mixed_precision

    def before_step(self):
        self.trainer.optimizer.zero_grad()

    def after_step(self):
        losses = self.trainer.step_outputs["loss_for_backward"]
        losses /= self.accumulate_grad_steps

        if self.mixed_precision:
            from apex import amp
            with amp.scale_loss(losses, self.trainer.optimizer) as scaled_loss:
                scaled_loss.backward()
        else:
            losses.backward()

        if self.trainer.inner_iter == self.accumulate_grad_steps:
            if self.grad_clipper is not None:
                self.grad_clipper(self.tariner.model.paramters())
            self.trainer.optimizer.step()
            self.trainer.optimizer.zero_grad()

可以看出重写了父类中的  def before_step(self): 和 def after_step(self): 方法,分别用来实现 优化器的梯度清零 ;实现反向传播计算梯度,然后 更新优化器参数;依此循环

2) IterationTimer

class IterationTimer(HookBase):

    def __init__(self, warmup_iter=3):
        """
        Args:
            warmup_iter (int): the number of iterations at the beginning to exclude
                from timing.
        """
        self._warmup_iter = warmup_iter
        self._step_timer = Timer()

    def before_train(self):
        self._start_time = time.perf_counter()
        self._total_timer = Timer()
        self._total_timer.pause()

    def after_train(self):
        total_time = time.perf_counter() - self._start_time
        total_time_minus_hooks = self._total_timer.seconds()
        hook_time = total_time - total_time_minus_hooks

        num_iter = self.trainer.iter + 1 - self.trainer.start_iter - self._warmup_iter

        if num_iter > 0 and total_time_minus_hooks > 0:
            # Speed is meaningful only after warmup
            # NOTE this format is parsed by grep in some scripts
            logger.info(
                "Overall training speed: {} iterations in {} ({:.4f} s / it)".format(
                    num_iter,
                    str(datetime.timedelta(seconds=int(total_time_minus_hooks))),
                    total_time_minus_hooks / num_iter,
                )
            )

        logger.info(
            "Total training time: {} ({} on hooks)".format(
                str(datetime.timedelta(seconds=int(total_time))),
                str(datetime.timedelta(seconds=int(hook_time))),
            )
        )

    def before_step(self):
        self._step_timer.reset()
        self._total_timer.resume()

    def after_step(self):
        # +1 because we're in after_step
        iter_done = self.trainer.iter - self.trainer.start_iter + 1
        if iter_done >= self._warmup_iter:
            sec = self._step_timer.seconds()
            self.trainer.storage.put_scalars(time=sec)
        else:
            self._start_time = time.perf_counter()
            self._total_timer.reset()

        self._total_timer.pause()

 可以看出重写了父类的所有方法。

before_train  : 在训练开始之前记录开始时间

before_step: 重置时间

after_step: 计算当前的迭代的次数,计算时间

after_train: 计算总速度 总时间

 3) LRScheduler

class LRScheduler(HookBase):
    """
    A hook which executes a torch builtin LR scheduler and summarizes the LR.
    It is executed after every iteration.
    """

    def __init__(self, optimizer, scheduler):
        """
        Args:
            optimizer (torch.optim.Optimizer):
            scheduler (torch.optim._LRScheduler)
        """
        self._optimizer = optimizer
        self._scheduler = scheduler

        # NOTE: some heuristics on what LR to summarize
        # summarize the param group with most parameters
        largest_group = max(len(g["params"]) for g in optimizer.param_groups)

        if largest_group == 1:
            # If all groups have one parameter,
            # then find the most common initial LR, and use it for summary
            lr_count = Counter([g["lr"] for g in optimizer.param_groups])
            lr = lr_count.most_common()[0][0]
            for i, g in enumerate(optimizer.param_groups):
                if g["lr"] == lr:
                    self._best_param_group_id = i
                    break
        else:
            for i, g in enumerate(optimizer.param_groups):
                if len(g["params"]) == largest_group:
                    self._best_param_group_id = i
                    break

    def after_step(self):
        lr = self._optimizer.param_groups[self._best_param_group_id]["lr"]
        self.trainer.storage.put_scalar("lr", lr, smoothing_hint=False)
        self._scheduler.step()

只重写了 after_step方法, 用于不同迭代次数时改变学习率的数值

4) PeriodicCheckpointer

class PeriodicCheckpointer(_PeriodicCheckpointer, HookBase):
    """
    Same as :class:`cvpods.checkpoint.PeriodicCheckpointer`, but as a hook.

    Note that when used as a hook,
    it is unable to save additional data other than what's defined
    by the given `checkpointer`.

    It is executed every ``period`` iterations and after the last iteration.
    """

    def before_train(self):
        # `self.max_iter` and `self.max_epoch` will be initialized in __init__
        pass

    def after_step(self):
        # No way to use **kwargs
        self.step(self.trainer.iter)

 父类中的step函数如下

    def step(self, iteration: int, **kwargs: Any):
        """
        Perform the appropriate action at the given iteration.

        Args:
            iteration (int): the current iteration, ranged in [0, max_iter-1].
            kwargs (Any): extra data to save, same as in
                :meth:`Checkpointer.save`.
        """
        iteration = int(iteration)
        additional_state = {"iteration": iteration}
        additional_state.update(kwargs)
        if self.period < 0:
            return
        if self.period > 0 and (iteration + 1) % self.period == 0:
            if self.max_epoch is not None:
                epoch_iters = self.max_iter // self.max_epoch
                curr_epoch = (iteration + 1) // epoch_iters
                ckpt_name = "model_epoch_{:04d}".format(curr_epoch)
            else:
                ckpt_name = "model_iter_{:07d}".format(iteration + 1)
            self.checkpointer.save(ckpt_name, **additional_state)
        if iteration >= self.max_iter - 1:
            self.checkpointer.save("model_final", **additional_state)

重写了 after_step方法,用于每轮迭代后保存模型数据 

5) EvalHook

"""
Run an evaluation function periodically, and at the end of training.

It is executed every ``eval_period`` iterations and after the last iteration.
"""

6) PeriodicWriter

用于写 tensorboard

13. 执行 data 送入 model 的入口

base_runner.py -----87

self.run_step()

14. 建议更改的一些设定

1) config.OUTPUT_DIR

输出日志文件的路径,如果不存在,创建它的地方在。这里会有个默认值,最好手动改一下你想让他出现的地方

cvpods/engine/setup.py -----131  的 ensure_dir函数,后面会配合

cvpods/utils/dump/logger.py ---- 53 来定义输出日志文件的路径(log.txt)

  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

匿名的魔术师

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值