文章目录
- Detectron2代码阅读
- Detectron2安装
- 入门Detectron2的参考博客
- Detectron2的使用记录
- 问题记录
- ImportError: cannot import name '_C' from 'detectron2'
- raise SizeMismatchError(detectron2.data.detection_utils.SizeMismatchError: Mismatched image shape for image
- 由于Tensor.shape尺寸过大导致执行F.interpolate(), .float()操作时报错RuntimeError: CUDA out of memory.
- RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1556653114079/work/torch/lib/c10d/ProcessGroupNCCL.cpp
- The environment is inconsistent, please check the package plan carefully. The following packages are causing the inconsistency
- Solving environment: failed with initial frozen solve. Retrying with flexible solve; Found conflicts! Looking for incompatible packages. UnsatisfiableError; Package xxx conflicts for;
- 二级标题
- 待补充
Detectron2代码阅读
Git版本控制时的commit message
detectron2_0106
# ---------------------------20210108分割线---------------------------
the initial project version of detectron2_0106;
# ---------------------------2020xxxx分割线---------------------------
待添加
# ---------------------------2020xxxx分割线---------------------------
AdelaiDet_0107
# ---------------------------2021010?分割线---------------------------
# ---------------------------20210426分割线---------------------------
the initial project version of AdelaiDet_0107;
# ---------------------------2020xxxx分割线---------------------------
待添加
# ---------------------------2020xxxx分割线---------------------------
AdelaiDet中有关网络模型的默认参数配置位于以下脚本文件中:
adet/config/defaults.py
AdelaiDet_0420
# ---------------------------20210426分割线---------------------------
the initial project version of AdelaiDet_0420;
# ---------------------------20210505分割线---------------------------
add comments to the official code;
- adet/modeling/condinst/condinst.py
- adet/modeling/condinst/mask_branch.py
- adet/modeling/condinst/dynamic_mask_head.py
- adet/modeling/fcos/fcos.py
- adet/modeling/fcos/fcos_outputs.py
write the code for the following schemes:
- CondInstv1v1_MS_R_50_1x
- DynamicMaskHeadV1v1,DynamicMaskHeadV1v2
# ---------------------------2020xxxx分割线---------------------------
待添加
# ---------------------------2020xxxx分割线---------------------------
暂时没看懂的代码
- metadata是什么意思
xxx\AdelaiDet_0107\tools\train_net_Detectron2.py
`metadata`是什么意思?
For attributes shared among the entire dataset, use
Metadata
(see below). To avoid extra memory, do not save such information inside each sample.
Each dataset is associated with some metadata, accessible throughMetadataCatalog.get(dataset_name).some_metadata
. Metadata is a key-value mapping that contains information that’s shared among the entire dataset, and usually is used to interpret what’s in the dataset, e.g., names of classes, colors of classes, root of files, etc. This information will be useful for augmentation, evaluation, visualization, logging, etc. The structure of metadata depends on what is needed from the corresponding downstream code.
摘自 ***Standard Dataset Dicts — detectron2 0.4 documentation
代码阅读记录
adet.utils.comm中的aligned_bilinear()函数
def aligned_bilinear(tensor, factor):
assert tensor.dim() == 4
assert factor >= 1
assert int(factor) == factor
if factor == 1:
return tensor
h, w = tensor.size()[2:]
tensor = F.pad(tensor, pad=(0, 1, 0, 1), mode="replicate")
oh = factor * h + 1
ow = factor * w + 1
tensor = F.interpolate(
tensor, size=(oh, ow),
mode='bilinear',
align_corners=True
)
tensor = F.pad(
tensor, pad=(factor // 2, 0, factor // 2, 0),
mode="replicate"
)
return tensor[:, :, :oh - 1, :ow - 1]
torch.Size([1, 1, 7, 5])
torch.Size([1, 1, 4, 3])
torch.Size([1, 1, 8, 6])
from adet.utils.comm import aligned_bilinear
tmp_in = torch.arange(12, dtype=torch.float).reshape(1, 1, 4, 3)
tmp_in
tensor([[[[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.],
[ 9., 10., 11.]]]])
tmp_1 = F.pad(tmp_in, pad=(0, 1, 0, 1), mode="replicate")
tmp_1
tensor([[[[ 0., 1., 2., 2.],
[ 3., 4., 5., 5.],
[ 6., 7., 8., 8.],
[ 9., 10., 11., 11.],
[ 9., 10., 11., 11.]]]])
tmp_1.shape
torch.Size([1, 1, 5, 4])
tmp_2 = F.interpolate(tmp_1, size=(9,7), mode='bilinear', align_corners=True)
tmp_2
tensor([[[[ 0.0000, 0.5000, 1.0000, 1.5000, 2.0000, 2.0000, 2.0000],
[ 1.5000, 2.0000, 2.5000, 3.0000, 3.5000, 3.5000, 3.5000],
[ 3.0000, 3.5000, 4.0000, 4.5000, 5.0000, 5.0000, 5.0000],
[ 4.5000, 5.0000, 5.5000, 6.0000, 6.5000, 6.5000, 6.5000],
[ 6.0000, 6.5000, 7.0000, 7.5000, 8.0000, 8.0000, 8.0000],
[ 7.5000, 8.0000, 8.5000, 9.0000, 9.5000, 9.5000, 9.5000],
[ 9.0000, 9.5000, 10.0000, 10.5000, 11.0000, 11.0000, 11.0000],
[ 9.0000, 9.5000, 10.0000, 10.5000, 11.0000, 11.0000, 11.0000],
[ 9.0000, 9.5000, 10.0000, 10.5000, 11.0000, 11.0000, 11.0000]]]])
tmp_2.shape
torch.Size([1, 1, 9, 7])
tmp_3 = F.pad(tmp_2, pad=(1, 0, 1, 0), mode="replicate")
tmp_3
tensor([[[[ 0.0000, 0.0000, 0.5000, 1.0000, 1.5000, 2.0000, 2.0000, 2.0000],
[ 0.0000, 0.0000, 0.5000, 1.0000, 1.5000, 2.0000, 2.0000, 2.0000],
[ 1.5000, 1.5000, 2.0000, 2.5000, 3.0000, 3.5000, 3.5000, 3.5000],
[ 3.0000, 3.0000, 3.5000, 4.0000, 4.5000, 5.0000, 5.0000, 5.0000],
[ 4.5000, 4.5000, 5.0000, 5.5000, 6.0000, 6.5000, 6.5000, 6.5000],
[ 6.0000, 6.0000, 6.5000, 7.0000, 7.5000, 8.0000, 8.0000, 8.0000],
[ 7.5000, 7.5000, 8.0000, 8.5000, 9.0000, 9.5000, 9.5000, 9.5000],
[ 9.0000, 9.0000, 9.5000, 10.0000, 10.5000, 11.0000, 11.0000, 11.0000],
[ 9.0000, 9.0000, 9.5000, 10.0000, 10.5000, 11.0000, 11.0000, 11.0000],
[ 9.0000, 9.0000, 9.5000, 10.0000, 10.5000, 11.0000, 11.0000, 11.0000]]]])
tmp_3.shape
torch.Size([1, 1, 10, 8])
tmp_out = tmp_3[:, :, :8, :6]
tmp_out
tensor([[[[ 0.0000, 0.0000, 0.5000, 1.0000, 1.5000, 2.0000],
[ 0.0000, 0.0000, 0.5000, 1.0000, 1.5000, 2.0000],
[ 1.5000, 1.5000, 2.0000, 2.5000, 3.0000, 3.5000],
[ 3.0000, 3.0000, 3.5000, 4.0000, 4.5000, 5.0000],
[ 4.5000, 4.5000, 5.0000, 5.5000, 6.0000, 6.5000],
[ 6.0000, 6.0000, 6.5000, 7.0000, 7.5000, 8.0000],
[ 7.5000, 7.5000, 8.0000, 8.5000, 9.0000, 9.5000],
[ 9.0000, 9.0000, 9.5000, 10.0000, 10.5000, 11.0000]]]])
tmp_out.shape
torch.Size([1, 1, 8, 6])
Detectron2安装
20210106记:在GPU1080上
# create conda environment `usr_detectron2` with:
Add the conda-forge channel: conda config --add channels conda-forge
conda install pytorch torchvision torchaudio cudatoolkit=10.1 -c pytorch -c conda-forge
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/win-64/
# First install Detectron2 with:
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
# Then build AdelaiDet with:
git clone https://github.com/aim-uofa/AdelaiDet.git
cd AdelaiDet
python setup.py build develop
20210507记:在GPU2080Ti上
conda info --env或conda env list
conda config --add channels conda-forge
conda create -n usr_detectron2 python=3.9
conda activate usr_detectron2
conda install pytorch torchvision torchaudio cudatoolkit=10.1 -c pytorch -c conda-forge
# to install Detectron2 from a local clone:
git clone -b v0.3 https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2
# You often need to rebuild AdelaiDet after reinstalling Detectron2 with:
cd AdelaiDet
python setup.py build develop
入门Detectron2的参考博客
刚开始上手Detectron2时需要预先阅读的文档
***Detectron2_Tutorial.ipynb Colab Notebook
Getting Started with Detectron2 — detectron2 0.4 documentation
***Use Builtin Datasets — detectron2 0.4 documentation
***Standard Dataset Dicts — detectron2 0.4 documentation
Use Custom Datasets — detectron2 0.4 documentation
**Model Output Format - Use Models — detectron2 0.4 documentation
***detectron2.config — detectron2 0.4 documentation
***Detectron2训练自己的数据集(较详细)_丑小鸭-CSDN博客 20200615
Detectron2 源码分析_shang3988的专栏-CSDN博客 20200923
9.3.1 RetinaNet with Detectron2 · PyTorch深度学习 20200617
Detectron2 Beginner’s Tutorial(colabNotebook搬运,包括ipynb文件)_桜見的博客-CSDN博客 20200206
detectron2概述(各个脚本文件的功用)_艺的博客-CSDN博客_detectron2是什么 20200128
刚开始上手Detectron2时需要关注的类
cfg = get_cfg() # detectron2/config/config.py
predictor = DefaultPredictor(cfg) # detectron2/engine/defaults.py
trainer = DefaultTrainer(cfg) # detectron2/engine/defaults.py
v = Visualizer(img, MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2) # detectron2/utils/visualizer.py
DatasetCatalog.register() # detectron2/data/catalog.py
MetadataCatalog.get().set() # detectron2/data/catalog.py
evaluator = COCOEvaluator("mydataset_val", ("bbox", "segm"), False, output_dir="./output/") # detectron2/evaluation/coco_evaluation.py
val_loader = build_detection_test_loader(cfg, "mydataset_val") # detectron2/data/build.py
print(inference_on_dataset(trainer.model, val_loader, evaluator)) # detectron2/evaluation/evaluator.py
detectron2.structures.instances — detectron2 0.4 documentation
Detectron2的使用记录
使用自己数据集时需要注意的地方
20210423记:
自己数据集.json标注文件中的category_id需要是从0开始的;
category_id
(int, required): an integer in the range [0, num_categories-1] representing the category label. The value num_categories is reserved to represent the “background” category, if applicable.
Use Custom Datasets — detectron2 0.4 documentation
20210522记:
COCO数据集的.json文件中the category ids是incontiguous;在Detectron2中,通过detectron2/data/datasets/coco.py def load_coco_json()加载Load a json file with COCO’s instances annotation format时会将其处理为contiguous ids in [0, 80);
如何执行模型配置
detectron2.config — detectron2 0.4 documentation
Detectron2中的num_GPU,IMS_PER_BATCH,BASE_LR设置
Benchmarks — detectron2 0.4 documentation
Training & Evaluation in Command Line - Getting Started with Detectron2 — detectron2 0.4 documentation
staticauto_scale_workers() - detectron2.engine — detectron2 0.4 documentation
在mmdetection中, 原始为lr=0.01 with num_GPU=8, samples_per_gpu=2; 此时batch_size=16;
num_GPU=2, samples_per_gpu=6时, 则lr需乘以
2
∗
6
/
(
8
∗
2
)
2*6/(8*2)
2∗6/(8∗2);
We evaluate CondInst on the large-scale benchmark MS-COCO [23]. Following the common practice [14, 22, 37], our models are trained with split train2017 (115K images) and all the ablation experiments are evaluated on split val2017 (5K images).
摘自 CondInst
在Detectron2中, 默认情况下, the configs are made for 8-GPU training, num_GPU=8, SOLVER.IMS_PER_BATCH=16, SOLVER.BASE_LR=0.02; 此时each GPU will see 2 images per batch;
- num_GPU=16, SOLVER.IMS_PER_BATCH=32, 则SOLVER.BASE_LR=0.02*32/16; 此时each GPU will see 2 images per batch;
- num_GPU=2, SOLVER.IMS_PER_BATCH=12, 则SOLVER.BASE_LR=0.02*12/16; 此时each GPU will see 6 images per batch;
关于iters_in_one_epoch, MAX_ITER的计算,
-
d2.data.datasets.coco INFO: Loaded 118287 images in COCO format from coco/annotations/instances_train2017.json
d2.data.build INFO: Removed 1021 images with no usable annotations. 117266 images left. -
COCO train17中总共有115k张图片, 若设置batch_size=16, 则iters_in_one_epoch = int(dataset_imgs/batch_size) + 1 = int(115k/16) + 1 = int(7.1875k) + 1 = 7188; 若训练12个epoch,则可指定最大迭代次数cfg.SOLVER.MAX_ITER = (iters_in_one_epoch * 12) = 86256;
-
LSCD train总共有6735张图片, LSCD val总共有1000张图片; 若设置batch_size=8, 则iters_in_one_epoch = int(6735/8) + 1 = int(841.875) + 1 = 842; 若训练12个epoch,则可指定最大迭代次数cfg.SOLVER.MAX_ITER = (iters_in_one_epoch * 12) = 10104;
Detectron2中的其它设置
设置/AdelaiDet_0420/detectron2/detectron2/engine/defaults.py中的period=10参数,可以改变log.txt中的iter: 19, metrics.json中的"iteration": 19间隔周期;
/AdelaiDet_0420/detectron2/detectron2/engine/defaults.py
def build_hooks(self):
略
ret.append(hooks.PeriodicWriter(self.build_writers(), period=10)) # line375
略
/AdelaiDet_0420/detectron2/detectron2/engine/hooks.py
class PeriodicWriter(HookBase):
def __init__(self, writers, period=20): # line155
略
/AdelaiDet_0420/detectron2/tools/plain_train_net.py
def do_train(cfg, model, resume=False):
略
if iteration - start_iter > 5 and (
(iteration + 1) % 20 == 0 or iteration == max_iter - 1
):
for writer in writers:
writer.write()
略
此外,右键"Project | Find in Path.."搜索"Starting training from iteration,可以看到相关的脚本文件;
warmup 策略
detectron2.solver.lr_scheduler — detectron2 0.4 documentation
深度学习训练策略-学习率预热Warmup_豆芽菜-CSDN博客 20200405
深度学习 warmup 策略_comway_Li的博客-CSDN博客 20200321
深度学习之“训练热身”(warm up)–学习率的设置_dreamandgo的博客-CSDN博客 20200720
Mask R-CNN的Detectron2, MMDetection性能对比
detectron2/MODEL_ZOO.md at master · facebookresearch/detectron2 · GitHub
- COCO Instance Segmentation Baselines with Mask R-CNN in Detectron2
Name | lr sched | train time (s/iter) | inference time (s/iter) | train mem (GB) | box AP | mask AP | model id | 备注 |
---|---|---|---|---|---|---|---|---|
R50-FPN | 1x | 0.261 | 0.043 | 3.4 | 38.6 | 35.2 | 137260431 | 比较 |
R101-FPN | 3x | 0.340 | 0.056 | 4.6 | 42.9 | 38.6 | 138205316 | - |
mmdetection/configs/mask_rcnn at master · open-mmlab/mmdetection · GitHub
- Results and models of Mask R-CNN in MMDetection
Backbone | Style | Lr schd | Mem (GB) | Inf time (fps) | box AP | mask AP | 备注 |
---|---|---|---|---|---|---|---|
R-50-FPN | caffe | 1x | 4.3 | - | 38.0 | 34.4 | - |
R-50-FPN | pytorch | 1x | 4.4 | 16.1 | 38.2 | 34.7 | 比较 |
R-50-FPN | pytorch | 2x | - | - | 39.2 | 35.4 | - |
R-101-FPN | caffe | 1x | - | - | 40.4 | 36.4 | - |
R-101-FPN | pytorch | 1x | 6.4 | 13.5 | 40.0 | 36.1 | - |
R-101-FPN | pytorch | 2x | - | - | 40.8 | 36.6 | - |
- Mask R-CNN的训练log.json
Detectron2 R50-FPN | 1x
https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x/137260431/metrics.json
{"data_time": 0.002598743070848286, "fast_rcnn/cls_accuracy": 0.97265625, "fast_rcnn/false_negative": 1.0, "fast_rcnn/fg_cls_accuracy": 0.0, "iteration": 19,
"loss_box_reg": 0.032130290812347084, "loss_cls": 0.639292468316853, "loss_mask": 0.6927643306553364,
"loss_rpn_cls": 0.6572066731750965, "loss_rpn_loc": 0.16647980932611972,
"lr": 0.00039962, "mask_rcnn/accuracy": 0.49910617656153367, "mask_rcnn/false_negative": 0.3886190705390907, "mask_rcnn/false_positive": 0.6187156134052185, "roi_head/num_bg_samples": 502.5, "roi_head/num_fg_samples": 9.5, "rpn/num_neg_anchors": 225.25, "rpn/num_pos_anchors": 30.75, "time": 0.25318309606518596,
"total_loss": 2.24487855611369}
{"data_time": 0.0025207019643858075, "fast_rcnn/cls_accuracy": 0.9716796875, "fast_rcnn/false_negative": 1.0, "fast_rcnn/fg_cls_accuracy": 0.0, "iteration": 39,
"loss_box_reg": 0.09106488188263029, "loss_cls": 0.5796218859031796, "loss_mask": 0.6907654628157616,
"loss_rpn_cls": 0.45275296457111835, "loss_rpn_loc": 0.1657287187408656,
"lr": 0.0007992199999999999, "mask_rcnn/accuracy": 0.531749595389413, "mask_rcnn/false_negative": 0.11595830385098957, "mask_rcnn/false_positive": 0.8639941425849713, "roi_head/num_bg_samples": 497.5, "roi_head/num_fg_samples": 14.5, "rpn/num_neg_anchors": 230.25, "rpn/num_pos_anchors": 25.75, "time": 0.2263991323998198,
"total_loss": 2.01636832227814}
{"data_time": 0.0023499008966609836, "fast_rcnn/cls_accuracy": 0.96435546875, "fast_rcnn/false_negative": 1.0, "fast_rcnn/fg_cls_accuracy": 0.0, "iteration": 59,
"loss_box_reg": 0.12914634053595364, "loss_cls": 0.4017837508581579, "loss_mask": 0.690228708088398,
"loss_rpn_cls": 0.31959519628435373, "loss_rpn_loc": 0.13147040538024157,
"lr": 0.00119882, "mask_rcnn/accuracy": 0.5578946873589731, "mask_rcnn/false_negative": 0.08273137213201628, "mask_rcnn/false_positive": 0.9230921910050085, "roi_head/num_bg_samples": 493.75, "roi_head/num_fg_samples": 18.25, "rpn/num_neg_anchors": 235.75, "rpn/num_pos_anchors": 20.25, "time": 0.23010813840664923,
"total_loss": 1.6990175214014016}
{"data_time": 0.0023776369635015726, "fast_rcnn/cls_accuracy": 0.955078125, "fast_rcnn/false_negative": 1.0, "fast_rcnn/fg_cls_accuracy": 0.0, "iteration": 79, "loss_box_reg": 0.1658823168836534, "loss_cls": 0.37299142871052027, "loss_mask": 0.6893040910363197,
"loss_rpn_cls": 0.2666563978418708, "loss_rpn_loc": 0.1486980152549222,
"lr": 0.0015984200000000001, "mask_rcnn/accuracy": 0.5633592547171357, "mask_rcnn/false_negative": 0.04547704788242386, "mask_rcnn/false_positive": 0.9446711285129321, "roi_head/num_bg_samples": 489.0, "roi_head/num_fg_samples": 23.0, "rpn/num_neg_anchors": 226.5, "rpn/num_pos_anchors": 29.5, "time": 0.23420310846995562,
"total_loss": 1.6437977316963952}
--------------------
{"data_time": 0.002329822047613561, "fast_rcnn/cls_accuracy": 0.94970703125, "fast_rcnn/false_negative": 0.3859087630724847, "fast_rcnn/fg_cls_accuracy": 0.591547539893617, "iteration": 44979,
"loss_box_reg": 0.2508180043660104, "loss_cls": 0.227490158053115, "loss_mask": 0.27231015264987946,
"loss_rpn_cls": 0.04181082101422362, "loss_rpn_loc": 0.07604182060458697,
"lr": 0.02, "mask_rcnn/accuracy": 0.8922324188088697, "mask_rcnn/false_negative": 0.10118279120451368, "mask_rcnn/false_positive": 0.12274398554618456, "roi_head/num_bg_samples": 449.75, "roi_head/num_fg_samples": 62.25, "rpn/num_neg_anchors": 230.75, "rpn/num_pos_anchors": 25.25, "time": 0.25466170033905655,
"total_loss": 0.8704715783533175}
{"data_time": 0.002378121018409729, "fast_rcnn/cls_accuracy": 0.9287109375, "fast_rcnn/false_negative": 0.388353581901969, "fast_rcnn/fg_cls_accuracy": 0.5810359231411864, "iteration": 44999,
"loss_box_reg": 0.2347505206707865, "loss_cls": 0.2301798826083541, "loss_mask": 0.27007893938571215,
"loss_rpn_cls": 0.04352796872262843, "loss_rpn_loc": 0.08090459188679233,
"lr": 0.02, "mask_rcnn/accuracy": 0.8877158907400843, "mask_rcnn/false_negative": 0.089878663914568, "mask_rcnn/false_positive": 0.1376528427255747, "roi_head/num_bg_samples": 443.25, "roi_head/num_fg_samples": 68.75, "rpn/num_neg_anchors": 229.75, "rpn/num_pos_anchors": 26.25, "time": 0.2536838488886133,
"total_loss": 0.8512819005991332,
"bbox/AP": 29.6848, "bbox/AP50": 49.736, "bbox/AP75": 31.6153, "bbox/APs": 17.0325, "bbox/APm": 32.3724, "bbox/APl": 38.9994,
"segm/AP": 27.9303, "segm/AP50": 46.4976, "segm/AP75": 29.4796, "segm/APs": 12.7666, "segm/APm": 30.2016, "segm/APl": 41.198}
{"data_time": 0.002324605593457818, "fast_rcnn/cls_accuracy": 0.9228515625, "fast_rcnn/false_negative": 0.4123746826728626, "fast_rcnn/fg_cls_accuracy": 0.5709211553473849, "iteration": 45019,
"loss_box_reg": 0.23696620645932853, "loss_cls": 0.24506191909313202, "loss_mask": 0.2732751062139869,
"loss_rpn_cls": 0.046370368072530255, "loss_rpn_loc": 0.07181740016676486,
"lr": 0.02, "mask_rcnn/accuracy": 0.897779099108315, "mask_rcnn/false_negative": 0.09148855694390579, "mask_rcnn/false_positive": 0.10694905510167176, "roi_head/num_bg_samples": 440.0, "roi_head/num_fg_samples": 72.0, "rpn/num_neg_anchors": 220.0, "rpn/num_pos_anchors": 36.0, "time": 0.2497749866452068,
"total_loss": 0.8699611588672269}
--------------------
{"data_time": 0.0026094965869560838, "fast_rcnn/cls_accuracy": 0.93701171875, "fast_rcnn/false_negative": 0.3187358810240964, "fast_rcnn/fg_cls_accuracy": 0.6734516189759037, "iteration": 89959,
"loss_box_reg": 0.2107297400943935, "loss_cls": 0.19628392602317035, "loss_mask": 0.23165522469207644,
"loss_rpn_cls": 0.035993807558043045, "loss_rpn_loc": 0.0715790189569816,
"lr": 0.00020000000000000004, "mask_rcnn/accuracy": 0.8943493506033365, "mask_rcnn/false_negative": 0.09956806044260469, "mask_rcnn/false_positive": 0.1258371461357612, "roi_head/num_bg_samples": 443.75, "roi_head/num_fg_samples": 68.25, "rpn/num_neg_anchors": 218.75, "rpn/num_pos_anchors": 37.25, "time": 0.2603230450768024,
"total_loss": 0.7695412554403447}
{"data_time": 0.0024753035977482796, "fast_rcnn/cls_accuracy": 0.94482421875, "fast_rcnn/false_negative": 0.2629855958096901, "fast_rcnn/fg_cls_accuracy": 0.6700900961473812, "iteration": 89979,
"loss_box_reg": 0.2162564373575151, "loss_cls": 0.18739765952341259, "loss_mask": 0.24809803813695908,
"loss_rpn_cls": 0.031174708376056515, "loss_rpn_loc": 0.07452140923123807,
"lr": 0.00020000000000000004, "mask_rcnn/accuracy": 0.8987971277948823, "mask_rcnn/false_negative": 0.08529945120560417, "mask_rcnn/false_positive": 0.11964548731891606, "roi_head/num_bg_samples": 443.75, "roi_head/num_fg_samples": 68.25, "rpn/num_neg_anchors": 227.25, "rpn/num_pos_anchors": 28.75, "time": 0.26268363802228123,
"total_loss": 0.739351074967999}
{"bbox/AP": 38.64992550664329, "bbox/AP50": 59.47717812766804, "bbox/AP75": 42.12190719744732, "bbox/APl": 49.888680953062895, "bbox/APm": 42.02547896334861, "bbox/APs": 22.480254061778187, "data_time": 0.0024516225093975663, "fast_rcnn/cls_accuracy": 0.9326171875, "fast_rcnn/false_negative": 0.3482697426796806, "fast_rcnn/fg_cls_accuracy": 0.6329931972789116, "iteration": 89999,
"loss_box_reg": 0.21414381475187838, "loss_cls": 0.1887940641026944, "loss_mask": 0.23086919263005257,
"loss_rpn_cls": 0.027912706733332016, "loss_rpn_loc": 0.06928918394260108,
"lr": 0.00020000000000000004, "mask_rcnn/accuracy": 0.8918846949181082, "mask_rcnn/false_negative": 0.08113309024818016, "mask_rcnn/false_positive": 0.135757795864421, "roi_head/num_bg_samples": 440.5, "roi_head/num_fg_samples": 71.5, "rpn/num_neg_anchors": 228.75, "rpn/num_pos_anchors": 27.25,
"segm/AP": 35.24377920043061, "segm/AP50": 56.31648013808561, "segm/AP75": 37.49476279170336, "segm/APl": 50.34300779993869, "segm/APm": 37.70836473109089, "segm/APs": 17.1625607932606, "time": 0.26572991348803043, "total_loss": 0.7517079937970266}
MMDetection R-50-FPN | pytorch | 1x
http://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_fpn_1x_coco/mask_rcnn_r50_fpn_1x_coco_20200205_050542.log.json
{"CUDA_HOME": "/mnt/lustre/share/cuda-10.0", "GPU 0,1,2,3,4,5,6,7": "Tesla V100-SXM2-32GB", "PyTorch": "1.3.1", "TorchVision": "0.4.2", "MMCV": "0.2.16", "MMDetection": "1.0rc1+d1c761c", "seed": 0}
{"mode": "train", "epoch": 1, "iter": 50, "lr": 0.00797, "time": 3.87105, "data_time": 3.52078, "memory": 4207,
"loss_rpn_cls": 0.31397, "loss_rpn_bbox": 0.09809, "loss_cls": 0.68102, "acc": 93.2146, "loss_bbox": 0.15389, "loss_mask": 0.71273, "loss": 1.95971}
{"mode": "train", "epoch": 1, "iter": 100, "lr": 0.00931, "time": 0.33465, "data_time": 0.00731, "memory": 4207,
"loss_rpn_cls": 0.17393, "loss_rpn_bbox": 0.09358, "loss_cls": 0.48114, "acc": 93.85815, "loss_bbox": 0.2136, "loss_mask": 0.6744, "loss": 1.63665}
{"mode": "train", "epoch": 1, "iter": 150, "lr": 0.01064, "time": 0.30104, "data_time": 0.00618, "memory": 4207,
"loss_rpn_cls": 0.14034, "loss_rpn_bbox": 0.0951, "loss_cls": 0.4919, "acc": 92.52588, "loss_bbox": 0.26593, "loss_mask": 0.64107, "loss": 1.63434}
{"mode": "train", "epoch": 1, "iter": 200, "lr": 0.01197, "time": 0.30138, "data_time": 0.00583, "memory": 4207,
"loss_rpn_cls": 0.13552, "loss_rpn_bbox": 0.09778, "loss_cls": 0.52423, "acc": 91.6001, "loss_bbox": 0.30218, "loss_mask": 0.62185, "loss": 1.68157}
--------------------
{"mode": "train", "epoch": 6, "iter": 7300, "lr": 0.02, "time": 0.31512, "data_time": 0.00871, "memory": 4441,
"loss_rpn_cls": 0.04038, "loss_rpn_bbox": 0.05534, "loss_cls": 0.25806, "acc": 91.79053, "loss_bbox": 0.25501, "loss_mask": 0.27165, "loss": 0.88044}
{"mode": "val", "epoch": 6, "iter": 7330, "lr": 0.02,
"bbox_mAP": 0.298, "bbox_mAP_50": 0.494, "bbox_mAP_75": 0.319, "bbox_mAP_s": 0.167, "bbox_mAP_m": 0.332, "bbox_mAP_l": 0.376,
"bbox_mAP_copypaste": "0.298 0.494 0.319 0.167 0.332 0.376",
"segm_mAP": 0.283, "segm_mAP_50": 0.469, "segm_mAP_75": 0.301, "segm_mAP_s": 0.145, "segm_mAP_m": 0.313, "segm_mAP_l": 0.374,
"segm_mAP_copypaste": "0.283 0.469 0.301 0.145 0.313 0.374"}
{"mode": "train", "epoch": 7, "iter": 50, "lr": 0.02, "time": 3.53933, "data_time": 3.1868, "memory": 4441,
"loss_rpn_cls": 0.0374, "loss_rpn_bbox": 0.05666, "loss_cls": 0.24946, "acc": 91.99194, "loss_bbox": 0.25296, "loss_mask": 0.26755, "loss": 0.86403}
--------------------
{"mode": "train", "epoch": 12, "iter": 7200, "lr": 0.0002, "time": 0.30634, "data_time": 0.00588, "memory": 4441,
"loss_rpn_cls": 0.02146, "loss_rpn_bbox": 0.04223, "loss_cls": 0.17699, "acc": 93.70728, "loss_bbox": 0.20831, "loss_mask": 0.229, "loss": 0.678}
{"mode": "train", "epoch": 12, "iter": 7250, "lr": 0.0002, "time": 0.33768, "data_time": 0.00644, "memory": 4441,
"loss_rpn_cls": 0.0231, "loss_rpn_bbox": 0.04785, "loss_cls": 0.19285, "acc": 93.18335, "loss_bbox": 0.22734, "loss_mask": 0.24249, "loss": 0.73362}
{"mode": "train", "epoch": 12, "iter": 7300, "lr": 0.0002, "time": 0.30858, "data_time": 0.00727, "memory": 4441,
"loss_rpn_cls": 0.02357, "loss_rpn_bbox": 0.04924, "loss_cls": 0.19678, "acc": 93.0603, "loss_bbox": 0.22557, "loss_mask": 0.24389, "loss": 0.73906}
{"mode": "val", "epoch": 12, "iter": 7330, "lr": 0.0002,
"bbox_mAP": 0.382, "bbox_mAP_50": 0.588, "bbox_mAP_75": 0.414, "bbox_mAP_s": 0.219, "bbox_mAP_m": 0.409, "bbox_mAP_l": 0.495,
"bbox_mAP_copypaste": "0.382 0.588 0.414 0.219 0.409 0.495",
"segm_mAP": 0.347, "segm_mAP_50": 0.557, "segm_mAP_75": 0.372, "segm_mAP_s": 0.183, "segm_mAP_m": 0.374, "segm_mAP_l": 0.472,
"segm_mAP_copypaste": "0.347 0.557 0.372 0.183 0.374 0.472"}
问题记录
问题描述:
开始
原因分析:
开始
解决方案:
开始
ImportError: cannot import name ‘_C’ from ‘detectron2’
问题描述:
cd xxx/Pytorch_WorkSpace/OpenSourcePlatform/detectron2
python demo/demo.py \
--config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml \
--input my_workspace/test_demo_model/000019.jpg \
--opts MODEL.WEIGHTS checkpoints/mask_rcnn_r50_fpn_1x_coco_137260431final_a54504.pkl \
MODEL.DEVICE cpu
Traceback (most recent call last):
File "xxx/OpenSourcePlatform/detectron2/detectron2/layers/__init__.py", line 3, in <module>
from .deform_conv import DeformConv, ModulatedDeformConv
File "xxx/OpenSourcePlatform/detectron2/detectron2/layers/deform_conv.py", line 11, in <module>
from detectron2 import _C
ImportError: cannot import name '_C' from 'detectron2' (xxx/OpenSourcePlatform/detectron2/detectron2/__init__.py)
原因分析:
ppwwyyxx commented on 10 Mar 2020
If you install detectron2 following INSTALL.md, you cannotimport detectron2
at the root of the github repo because it will not find the one you installed, but will find the one in the repo (which has not been compiled).
Try a different directory. You should execute the code in any different directory.
ImportError: cannot import name ‘_C’ from ‘detectron2’ · Issue #1018 · facebookresearch/detectron2
创建from_detectron2_import_C.py,文件内容如下:
import os
print('current working directory of this process:\n', os.getcwd())
from detectron2 import _C
在不同路径下,执行该脚本文件,会发现在/AdelaiDet/路径下执行会报错,而在除了该路径之外的其它路径下执行就不会报错。原因在于在/AdelaiDet/路径下存在detectron2文件夹
,与环境中的detectron2包
重名。
(detectron2) xxx@xxx-System-Product-Name:xxx/Pytorch_WorkSpace/OpenSourcePlatform/AdelaiDet$
python xxx/Pytorch_WorkSpace/OpenSourcePlatform/AdelaiDet/my_workspace/usr_from_detectron2_import_C.py
current working directory of this process:
xxx/Pytorch_WorkSpace/OpenSourcePlatform/AdelaiDet
Traceback (most recent call last):
File "xxx/Pytorch_WorkSpace/OpenSourcePlatform/AdelaiDet/my_workspace/usr_from_detectron2_import_C.py", line 23, in <module>
from detectron2 import _C
ImportError: cannot import name '_C' from 'detectron2' (xxx/Pytorch_WorkSpace/OpenSourcePlatform/AdelaiDet/detectron2/__init__.py)
(detectron2) xxx@xxx-System-Product-Name:xxx/Pytorch_WorkSpace/OpenSourcePlatform/AdelaiDet$
cd my_workspace/
(detectron2) xxx@xxx-System-Product-Name:xxx/Pytorch_WorkSpace/OpenSourcePlatform/AdelaiDet/my_workspace$
python xxx/Pytorch_WorkSpace/OpenSourcePlatform/AdelaiDet/my_workspace/usr_from_detectron2_import_C.py
current working directory of this process:
xxx/Pytorch_WorkSpace/OpenSourcePlatform/AdelaiDet/my_workspace
(detectron2) xxx@xxx-System-Product-Name:xxx/Pytorch_WorkSpace/OpenSourcePlatform/AdelaiDet/my_workspace$
cd ../..
(detectron2) xxx@xxx-System-Product-Name:xxx/Pytorch_WorkSpace/OpenSourcePlatform$
python xxx/Pytorch_WorkSpace/OpenSourcePlatform/AdelaiDet/my_workspace/usr_from_detectron2_import_C.py
current working directory of this process:
xxx/Pytorch_WorkSpace/OpenSourcePlatform
(detectron2) xxx@xxx-System-Product-Name:xxx/Pytorch_WorkSpace/OpenSourcePlatform$
解决方案:
to “execute” the code in a different directory.
cd ./my_workspace
python ../demo/demo.py \
--config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml \
--input test_demo_model/000019.jpg \
--output ./ \
--opts MODEL.WEIGHTS ../checkpoints/mask_rcnn_r50_fpn_1x_coco_137260431final_a54504.pkl \
MODEL.DEVICE cpu
当使用Run/Debug Configuration: Python—PyCharm进行Debug/Run时,在"Run…/Debug… | Edit Configurations… | Configuration tab"下,注意要取消勾选"Add content roots to PYTHONPATH", 因为Content root—PyCharm中包含了/AdelaiDet/路径。
raise SizeMismatchError(detectron2.data.detection_utils.SizeMismatchError: Mismatched image shape for image
问题描述:
File "/miniconda2/envs/usr_detectron2/lib/python3.9/site-packages/detectron2/data/dataset_mapper.py", line 126, in __call__
utils.check_image_size(dataset_dict, image)
File "/home/usrname/miniconda2/envs/usr_detectron2/lib/python3.9/site-packages/detectron2/data/detection_utils.py", line 194, in check_image_size
raise SizeMismatchError(
detectron2.data.detection_utils.SizeMismatchError: Mismatched image shape for image /data/carton_net_/images/train2017/net (4114).jpg, got (500, 375), expect (500, 500). Please check the width/height in your annotation.
原因分析and解决方案:
20210423经核查,carton_net_数据集json标注文件中的width/height有误,与图片的实际宽高不符;
将instances_train2017.json中的数据
{
"height": 500,
"width": 500,
"id": 5782,
"file_name": "net (4114).jpg"
},
改为
{
"height": 375,
"width": 500,
"id": 5782,
"file_name": "net (4114).jpg"
},
结果仍然还有其他图片的数据有问题。
于是决定暂时注释掉File "/miniconda2/envs/usr_detectron2/lib/python3.9/site-packages/detectron2/data/dataset_mapper.py",
line 126, in __call__ utils.check_image_size(dataset_dict, image)以跳过此问题;
后续有待重新生成数据正确的instances_train2017.json;
以下的解决方案暂时未尝试,
where I need to change the code · Issue #2463 · facebookresearch/detectron2 · GitHub 20210109
SizeMismatchError When trying to train a model using a custom dataset · Issue #194 · facebookresearch/detectron2 · GitHub 20191029
由于Tensor.shape尺寸过大导致执行F.interpolate(), .float()操作时报错RuntimeError: CUDA out of memory.
20210424记:
RuntimeError: CUDA out of memory. Tried to allocate 4.47 GiB; UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown;
问题描述:
[05/07 12:21:05 d2.evaluation.evaluator]: Inference done 276/500. 0.5077 s / img. ETA=0:04:39
val2017/818.jpg h:3087 w:2561
val2017/2394 (2).jpg h:3024 w:2956
val2017/2 (76).jpg h:4000 w:3000
val2017/4 (213).jpg h:3264 w:2448
val2017/4 (315).jpg h:3264 w:2243
val2017/5 (8).jpg h:4000 w:3000
Traceback (most recent call last):
File "/home/amax/usr_users/usrname/OpenSourcePlatform/AdelaiDet_0420/tools/train_net_usr.py", line 367, in <module>
launch(
File "/home/amax/usr_users/usrname/OpenSourcePlatform/AdelaiDet_0420/detectron2/detectron2/engine/launch.py", line 55, in launch
mp.spawn(
File "/home/amax/miniconda2/envs/usr_detectron2/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/amax/miniconda2/envs/usr_detectron2/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/amax/miniconda2/envs/usr_detectron2/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/amax/miniconda2/envs/usr_detectron2/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/amax/usr_users/usrname/OpenSourcePlatform/AdelaiDet_0420/detectron2/detectron2/engine/launch.py", line 94, in _distributed_worker
main_func(*args)
File "/home/amax/usr_users/usrname/OpenSourcePlatform/AdelaiDet_0420/tools/train_net_usr.py", line 344, in main
res = Trainer.test(cfg, model)
File "/home/amax/usr_users/usrname/OpenSourcePlatform/AdelaiDet_0420/detectron2/detectron2/engine/defaults.py", line 534, in test
results_i = inference_on_dataset(model, data_loader, evaluator)
File "/home/amax/usr_users/usrname/OpenSourcePlatform/AdelaiDet_0420/detectron2/detectron2/evaluation/evaluator.py", line 141, in inference_on_dataset
outputs = model(inputs)
File "/home/amax/miniconda2/envs/usr_detectron2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/amax/usr_users/usrname/OpenSourcePlatform/AdelaiDet_0420/adet/modeling/condinst/condinst.py", line 181, in forward
instances_per_im = self.postprocess(
File "/home/amax/usr_users/usrname/OpenSourcePlatform/AdelaiDet_0420/adet/modeling/condinst/condinst.py", line 396, in postprocess
results.pred_masks = (pred_global_masks > mask_threshold).float()
RuntimeError: CUDA out of memory. Tried to allocate 4.47 GiB (GPU 0; 10.73 GiB total capacity; 6.10 GiB already allocated; 1.05 GiB free; 8.69 GiB reserved in total by PyTorch)
(usr_detectron2) amax@amax:~/usr_users/usrname/OpenSourcePlatform/AdelaiDet_0420/tools$ /home/amax/miniconda2/envs/usr_detectron2/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 20 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(usr_detectron2) amax@amax:~/usr_users/usrname/OpenSourcePlatform/AdelaiDet_0420/tools$
原因分析:
/adet/modeling/condinst/condinst.py line363
def postprocess(self, results, output_height, output_width, padded_im_h, padded_im_w, mask_threshold=0.5):
略
pred_global_masks = F.interpolate(
pred_global_masks,
size=(output_height, output_width),
mode="bilinear", align_corners=False
)
pred_global_masks = pred_global_masks[:, 0, :, :]
results.pred_masks = (pred_global_masks > mask_threshold).float()
略
pred_masks_tmp = (pred_global_masks > mask_threshold)
pred_masks_tmp2 = pred_masks_tmp.clone().detach()
pred_masks_tmp2[0].float()
运行上面三个命令时均不会报错,但运行pred_masks_tmp2.float()时就会报错"RuntimeError: CUDA out of memory."
/detectron2/evaluation/coco_evaluation.py line384
def instances_to_coco_json(instances, img_id):
略
has_mask = instances.has("pred_masks")
if has_mask:
# use RLE to encode the masks, because they are too large and takes memory
# since this evaluator stores outputs of the entire dataset
rles = [
mask_util.encode(np.array(mask[:, :, None], order="F", dtype="uint8"))[0]
for mask in instances.pred_masks
]
略
pred_masks_adarray1 = np.array(instances.pred_masks[0][:, :, None], order="F", dtype="uint8")
pred_masks_adarray2 = np.array(instances.pred_masks[0].float()[:, :, None], order="F", dtype="uint8")
解决方案:
pred_global_masks的shape是torch.Size([100, 800, 608]), 将其上采样至原始图片大小torch.Size([100, 3648, 2736])时, 计算量过大, 从而导致报错"RuntimeError: CUDA out of memory."。解决方案之一是将F.interpolate(), .float()操作放到CPU上运行。
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1556653114079/work/torch/lib/c10d/ProcessGroupNCCL.cpp
(未阅读未尝试)重新配置语义分割实验环境遇到的坑 - Oliver-cs - 博客园
The environment is inconsistent, please check the package plan carefully. The following packages are causing the inconsistency
20210505记:
问题描述:
(base) amax@amax:~$ conda update conda
Collecting package metadata (current_repodata.json): done
Solving environment: /
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:
- conda-forge/linux-64::libgfortran4==7.5.0=h14aa051_19
- defaults/linux-64::numpy-base==1.15.4=py27hde5b4d6_0
- defaults/linux-64::mkl_fft==1.0.6=py27hd81dba3_0
- defaults/linux-64::mkl_random==1.0.2=py27hd81dba3_0
- defaults/linux-64::blas==1.0=mkl
failed with repodata from current_repodata.json, will retry with next repodata source.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
略
详情见:GPU2080Ti操作日志_20200505 environment is inconsistent.log
原因分析与解决方案:
尝试了以下博客中的方法,均未解决;只好冷静下来,逐行观察上述报错提示,觉得要不试试将上述inconsistent的包逐个卸载掉;使用conda list
命令查看当前环境中已安装的包,发现不同包的Channel不同,于是决定先将Channel不是defaults的包逐个卸载掉;在卸载其中的某些包时,仍然会报类似于上述的错误;后来,通过轮换键入conda uninstall package_name
, pip uninstall package_name
, conda clean -y --all
命令,终于卸载掉了其中的一些包;接着,通过conda config --show
命令查看conda的channels,通过conda config --remove channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
命令移除除defaults之外的其它channels;最后,通过conda clean -y --all
,conda update -n base conda
命令,此问题似乎得到了解决;
*** anaconda - The environment is inconsistent, please check the package plan carefully - Stack Overflow
解决conda update -n base -c defaults conda报错问题_ljx0951的博客-CSDN博客 20200428
Solving environment: failed with initial frozen solve. Retrying with flexible solve; Found conflicts! Looking for incompatible packages. UnsatisfiableError; Package xxx conflicts for;
20210505记:
问题描述:
(base) amax@amax:~$ conda install pytorch torchvision torchaudio cudatoolkit=10.1 -c pytorch -c conda-forge
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: /
Found conflicts! Looking for incompatible packages.
Examining torchaudio: 4%|鈻堚枅鈻堚枅鈻�
详情见:GPU2080Ti操作日志_20200505 Package xxx conflicts for.log
原因分析与解决方案:
尝试了以下博客中的方法,均未解决;只好冷静下来,逐行观察上述报错提示,觉得要不试试将上述inconsistent的包逐个卸载掉;使用conda list
命令查看当前环境中已安装的包,发现不同包的Channel不同,于是决定先将Channel不是defaults的包逐个卸载掉;在卸载其中的某些包时,仍然会报类似于上述的错误;后来,通过轮换键入conda uninstall package_name
, pip uninstall package_name
, conda clean -y --all
命令,终于卸载掉了其中的一些包;接着,通过conda config --show
命令查看conda的channels,通过conda config --remove channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
命令移除除defaults之外的其它channels;最后,通过conda clean -y --all
,conda update -n base conda
命令,此问题似乎得到了解决;
conda安装环境报错 - Tools&Platform Guide
二级标题
待补充