先讲怎么做,再讲源码层面的东西
-
数据集
方便起见,请自行转化为coco样式,我是在这个基础上修改的,如果不想转数据集,那参照后面的例子自己写data_loader
;coco数据集样式:[假设都在detectron2的工程目录下]
- datasets
- coco
- annotations
- instances_train2017.json
- instances_val2017.json
- train2017
- image001.jpg
- image002.jpg
- image004.jpg
- val2017
- image003.jpg
- image005.jpg
- annotations
- coco
- datasets
-
以训练行人为例【只有person这一个类别】
修改./detectron2/data/datasets/builtin_meta.py
中的_get_coco_instances_meta()
函数。
在最后的return ret
之前,直接注释这个函数的前面代码,把ret
改成自己需要的部分,下面是我的代码:def _get_coco_instances_meta(): #thing_ids = [k["id"] for k in COCO_CATEGORIES if k["isthing"] == 1] #thing_colors = [k["color"] for k in COCO_CATEGORIES if k["isthing"] == 1] #assert len(thing_ids) == 80, len(thing_ids) ## Mapping from the incontiguous COCO category id to an id in [0, 79] #thing_dataset_id_to_contiguous_id = {k: i for i, k in enumerate(thing_ids)} #thing_classes = [k["name"] for k in COCO_CATEGORIES if k["isthing"] == 1] #ret = { # "thing_dataset_id_to_contiguous_id": thing_dataset_id_to_contiguous_id, # "thing_classes": thing_classes, # "thing_colors": thing_colors, #} ret = { "thing_dataset_id_to_contiguous_id": {1:0}, "thing_classes": ["person"], "thing_colors": [[220,20,60]], } #print("my ret: ",ret) return ret
注意点:
- 我是做行人检测,所以修改的是
_get_coco_instances_meta()
函数,做分割和关键点的小伙伴绕行,可以看懂下面原理后自己修改; - ret的那二个字段就是我的行人标签,第一和第三个字段可以去
builtin_meta.py
最开始的COCO_CATEGORIES
找定义,或者可以粗暴的修改COCO_CATEGORIES
定义,但我没试过,不晓得有没有bug; - 你也做行人检测,可以这么修改,木有问题,做其他任务的小伙伴,一定记得看代码,或者留言,我可以给些建议;
- 我是做行人检测,所以修改的是
-
改配置文件
两个地方,MODEL.RETINANET.NUM_CLASSES
和MODEL.ROI_HEADS.NUM_CLASSES
都改为1(如果是coco,原本应该是80);
我的配置文件,config.yaml,内容如下:CUDNN_BENCHMARK: false DATALOADER: ASPECT_RATIO_GROUPING: true NUM_WORKERS: 4 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: [] PROPOSAL_FILES_TRAIN: [] TEST: - coco_2017_val TRAIN: - coco_2017_train GLOBAL: HACK: 1.0 INPUT: CROP: ENABLED: false SIZE: - 0.9 - 0.9 TYPE: relative_range FORMAT: BGR MASK_FORMAT: polygon MAX_SIZE_TEST: 1333 MAX_SIZE_TRAIN: 1333 MIN_SIZE_TEST: 800 MIN_SIZE_TRAIN: - 640 - 672 - 704 - 736 - 768 - 800 MIN_SIZE_TRAIN_SAMPLING: choice MODEL: ANCHOR_GENERATOR: ANGLES: - - -90 - 0 - 90 ASPECT_RATIOS: - - 0.5 - 1.0 - 2.0 NAME: DefaultAnchorGenerator SIZES: - - 32 - 40.31747359663594 - 50.79683366298238 - - 64 - 80.63494719327188 - 101.59366732596476 - - 128 - 161.26989438654377 - 203.18733465192952 - - 256 - 322.53978877308754 - 406.37466930385904 - - 512 - 645.0795775461751 - 812.7493386077181 BACKBONE: FREEZE_AT: 2 NAME: build_retinanet_resnet_fpn_backbone DEVICE: cuda FPN: FUSE_TYPE: sum IN_FEATURES: - res3 - res4 - res5 NORM: '' OUT_CHANNELS: 256 KEYPOINT_ON: false LOAD_PROPOSALS: false MASK_ON: false META_ARCHITECTURE: RetinaNet PANOPTIC_FPN: COMBINE: ENABLED: true INSTANCES_CONFIDENCE_THRESH: 0.5 OVERLAP_THRESH: 0.5 STUFF_AREA_LIMIT: 4096 INSTANCE_LOSS_WEIGHT: 1.0 PIXEL_MEAN: - 103.53 - 116.28 - 123.675 PIXEL_STD: - 1.0 - 1.0 - 1.0 PROPOSAL_GENERATOR: MIN_SIZE: 0 NAME: RPN RESNETS: DEFORM_MODULATED: false DEFORM_NUM_GROUPS: 1 DEFORM_ON_PER_STAGE: - false - false - false - false DEPTH: 50 NORM: FrozenBN NUM_GROUPS: 1 OUT_FEATURES: - res3 - res4 - res5 RES2_OUT_CHANNELS: 256 RES5_DILATION: 1 STEM_OUT_CHANNELS: 64 STRIDE_IN_1X1: true WIDTH_PER_GROUP: 64 RETINANET: BBOX_REG_WEIGHTS: - 1.0 - 1.0 - 1.0 - 1.0 FOCAL_LOSS_ALPHA: 0.25 FOCAL_LOSS_GAMMA: 2.0 IN_FEATURES: - p3 - p4 - p5 - p6 - p7 IOU_LABELS: - 0 - -1 - 1 IOU_THRESHOLDS: - 0.4 - 0.5 NMS_THRESH_TEST: 0.5 NUM_CLASSES: 1 NUM_CONVS: 4 PRIOR_PROB: 0.01 SCORE_THRESH_TEST: 0.05 SMOOTH_L1_LOSS_BETA: 0.1 TOPK_CANDIDATES_TEST: 1000 ROI_BOX_CASCADE_HEAD: BBOX_REG_WEIGHTS: - - 10.0 - 10.0 - 5.0 - 5.0 - - 20.0 - 20.0 - 10.0 - 10.0 - - 30.0 - 30.0 - 15.0 - 15.0 IOUS: - 0.5 - 0.6 - 0.7 ROI_BOX_HEAD: BBOX_REG_WEIGHTS: - 10.0 - 10.0 - 5.0 - 5.0 CLS_AGNOSTIC_BBOX_REG: false CONV_DIM: 256 FC_DIM: 1024 NAME: '' NORM: '' NUM_CONV: 0 NUM_FC: 0 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 SMOOTH_L1_BETA: 0.0 ROI_HEADS: BATCH_SIZE_PER_IMAGE: 512 IN_FEATURES: - res4 IOU_LABELS: - 0 - 1 IOU_THRESHOLDS: - 0.5 NAME: Res5ROIHeads NMS_THRESH_TEST: 0.5 NUM_CLASSES: 1 POSITIVE_FRACTION: 0.25 PROPOSAL_APPEND_GT: true SCORE_THRESH_TEST: 0.05 ROI_KEYPOINT_HEAD: CONV_DIMS: - 512 - 512 - 512 - 512 - 512 - 512 - 512 - 512 LOSS_WEIGHT: 1.0 MIN_KEYPOINTS_PER_IMAGE: 1 NAME: KRCNNConvDeconvUpsampleHead NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true NUM_KEYPOINTS: 17 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 ROI_MASK_HEAD: CLS_AGNOSTIC_MASK: false CONV_DIM: 256 NAME: MaskRCNNConvUpsampleHead NORM: '' NUM_CONV: 0 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 RPN: BATCH_SIZE_PER_IMAGE: 256 BBOX_REG_WEIGHTS: - 1.0 - 1.0 - 1.0 - 1.0 BOUNDARY_THRESH: -1 HEAD_NAME: StandardRPNHead IN_FEATURES: - res4 IOU_LABELS: - 0 - -1 - 1 IOU_THRESHOLDS: - 0.3 - 0.7 LOSS_WEIGHT: 1.0 NMS_THRESH: 0.7 POSITIVE_FRACTION: 0.5 POST_NMS_TOPK_TEST: 1000 POST_NMS_TOPK_TRAIN: 2000 PRE_NMS_TOPK_TEST: 6000 PRE_NMS_TOPK_TRAIN: 12000 SMOOTH_L1_BETA: 0.0 SEM_SEG_HEAD: COMMON_STRIDE: 4 CONVS_DIM: 128 IGNORE_VALUE: 255 IN_FEATURES: - p2 - p3 - p4 - p5 LOSS_WEIGHT: 1.0 NAME: SemSegFPNHead NORM: GN NUM_CLASSES: 54 WEIGHTS: models/COCORetinaNet_R50.pkl OUTPUT_DIR: ./output SEED: -1 SOLVER: BASE_LR: 0.0001 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 5000 GAMMA: 0.1 IMS_PER_BATCH: 32 LR_SCHEDULER_NAME: WarmupMultiStepLR MAX_ITER: 270000 MOMENTUM: 0.9 STEPS: - 210000 - 250000 WARMUP_FACTOR: 0.001 WARMUP_ITERS: 1000 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0001 WEIGHT_DECAY_BIAS: 0.0001 WEIGHT_DECAY_NORM: 0.0 TEST: AUG: ENABLED: false FLIP: true MAX_SIZE: 4000 MIN_SIZES: - 400 - 500 - 600 - 700 - 800 - 900 - 1000 - 1100 - 1200 DETECTIONS_PER_IMAGE: 100 EVAL_PERIOD: 0 EXPECTED_RESULTS: [] KEYPOINT_OKS_SIGMAS: [] PRECISE_BN: ENABLED: false NUM_ITER: 200 VERSION: 2
-
自己写data_loader例子
pass,后面解释含义,和detectron2读入数据的逻辑,现在小伙伴就自己看代码吧~
import os
import numpy as np
import json
from detectron2.structures import BoxMode
import itertools
# write a function that loads the dataset into detectron2's standard format
# img_dir = "coco_person"
def get_balloon_dicts(img_dir):
json_file = os.path.join(img_dir)
with open(json_file) as f:
imgs_anns = json.load(f)
dataset_dicts = []
for _, v in imgs_anns["images"].items():
record = {}
filename = os.path.join(img_dir, v["filename"])
height, width = cv2.imread(filename).shape[:2]
record["file_name"] = filename
record["height"] = height
record["width"] = width
annos = v["regions"]
objs = []
for _, anno in annos.items():
assert not anno["region_attributes"]
anno = anno["shape_attributes"]
px = anno["all_points_x"]
py = anno["all_points_y"]
poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
poly = list(itertools.chain.from_iterable(poly))
obj = {
"bbox": [np.min(px), np.min(py), np.max(px), np.max(py)],
"bbox_mode": BoxMode.XYXY_ABS,
"segmentation": [poly],
"category_id": 0,
"iscrowd": 0
}
objs.append(obj)
record["annotations"] = objs
dataset_dicts.append(record)
return dataset_dicts
from detectron2.data import DatasetCatalog, MetadataCatalog
for d in ["train", "val"]:
DatasetCatalog.register("balloon/" + d, lambda d=d: get_balloon_dicts("balloon/" + d))
MetadataCatalog.get("balloon/" + d).set(thing_classes=["balloon"])
balloon_metadata = MetadataCatalog.get("balloon/train")