python out of memory_显存充足,但报错 out of memory

在使用cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml训练自定义数据集时,尽管显存充足,仍然遇到out of memory错误。尝试设置batch_size为1和使用多进程训练,问题依旧。错误日志显示GPU 2可用内存仅为31.062500MB,而分配请求为66.797119MB。建议检查其他占用GPU资源的进程,或者降低模型的batch size。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

用cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml训练自己的数据集

显存充足,但报错 out of memory,请问该怎么解决这个问题?

`python3 -u tools/train.py -c configs/dcn/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml -o pretrain_weights=models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms --use_tb=True --tb_log_dir=tb_log_caltech/scalar --eval

P.S. batch_size已经设置为1,

尝试多进程方式 python -m paddle.distributed.launch --selected_gpus 0,1,2,3 tools/train.py ... 也有同样的问题

实在不知道要怎么做了,求指教...

2020-04-06 17:36:49,707-INFO: 6707 samples in file dataset/coco/annotations/instances_val2007.json

2020-04-06 17:36:49,712-INFO: places would be ommited when DataLoader is not iterable

W0406 17:36:50.419684 27808 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0

W0406 17:36:50.422271 27808 device_context.cc:245] device: 0, cuDNN Version: 7.6.

2020-04-06 17:36:51,523-INFO: Loading parameters from models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms...

2020-04-06 17:36:51,524-WARNING: models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]

2020-04-06 17:36:51,524-WARNING: models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]

loading annotations into memory...

Done (t=0.20s)

creating index...

index created!

2020-04-06 17:36:53,468-WARNING: Found an invalid bbox in annotations: im_id: 5387, area: -10.0 x1: 348, y1: 176, x2: 348, y2: 196.

2020-04-06 17:36:53,481-WARNING: Found an invalid bbox in annotations: im_id: 5765, area: -10.0 x1: 71, y1: 174, x2: 71, y2: 197.

2020-04-06 17:36:53,686-INFO: 15649 samples in file dataset/coco/annotations/instances_train2007.json

2020-04-06 17:36:53,699-INFO: places would be ommited when DataLoader is not iterable

I0406 17:36:54.286912 27808 parallel_executor.cc:440] The Program will be executed on CUDA using ParallelExecutor, 4 cards are used, so 4 programs are executed in parallel.

W0406 17:37:00.890586 27808 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 730. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 604.

I0406 17:37:01.025799 27808 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1

I0406 17:37:28.992170 27808 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True

I0406

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值