python out of memory_显存充足，但报错 out of memory-CSDN博客

本文链接：https://blog.csdn.net/weixin_31310737/article/details/114389418

在使用cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml训练自定义数据集时，尽管显存充足，仍然遇到out of memory错误。尝试设置batch_size为1和使用多进程训练，问题依旧。错误日志显示GPU 2可用内存仅为31.062500MB，而分配请求为66.797119MB。建议检查其他占用GPU资源的进程，或者降低模型的batch size。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

用cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml训练自己的数据集

显存充足，但报错 out of memory，请问该怎么解决这个问题？

`python3 -u tools/train.py -c configs/dcn/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml -o pretrain_weights=models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms --use_tb=True --tb_log_dir=tb_log_caltech/scalar --eval

P.S. batch_size已经设置为1，

尝试多进程方式 python -m paddle.distributed.launch --selected_gpus 0,1,2,3 tools/train.py ... 也有同样的问题

实在不知道要怎么做了，求指教...

2020-04-06 17:36:49,707-INFO: 6707 samples in file dataset/coco/annotations/instances_val2007.json

2020-04-06 17:36:49,712-INFO: places would be ommited when DataLoader is not iterable

W0406 17:36:50.419684 27808 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0

W0406 17:36:50.422271 27808 device_context.cc:245] device: 0, cuDNN Version: 7.6.

2020-04-06 17:36:51,523-INFO: Loading parameters from models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms...

2020-04-06 17:36:51,524-WARNING: models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]

loading annotations into memory...

Done (t=0.20s)

creating index...

index created!

2020-04-06 17:36:53,468-WARNING: Found an invalid bbox in annotations: im_id: 5387, area: -10.0 x1: 348, y1: 176, x2: 348, y2: 196.

2020-04-06 17:36:53,481-WARNING: Found an invalid bbox in annotations: im_id: 5765, area: -10.0 x1: 71, y1: 174, x2: 71, y2: 197.

2020-04-06 17:36:53,686-INFO: 15649 samples in file dataset/coco/annotations/instances_train2007.json

2020-04-06 17:36:53,699-INFO: places would be ommited when DataLoader is not iterable

I0406 17:36:54.286912 27808 parallel_executor.cc:440] The Program will be executed on CUDA using ParallelExecutor, 4 cards are used, so 4 programs are executed in parallel.

W0406 17:37:00.890586 27808 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 730. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 604.

I0406 17:37:01.025799 27808 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1

I0406 17:37:28.992170 27808 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True

I0406