python pytorch多cpu,Pytorch数据加载器,线程过多,CPU内存分配过多

I'm training a model using PyTorch. To load the data, I'm using torch.utils.data.DataLoader. The data loader is using a custom database I've implemented. A strange problem has occurred, every time the second for in the following code executes, the number of threads/processes increases and a huge amount of memory is allocated

for epoch in range(start_epoch, opt.niter + opt.niter_decay + 1):

epoch_start_time = time.time()

if epoch != start_epoch:

epoch_iter = epoch_iter % dataset_size

for i, item in tqdm(enumerate(dataset, start=epoch_iter)):

I suspect the threads and memories of the previous iterators are not released after each __iter__() call to the data loader.

The allocated memory is close to the amount of memory allocated by the main thread/process when the threads are created. That is in the initial epoch the main thread is using 2GB of memory and so 2 threads of size 2GB are created. In the next epochs, 5GB of memory is allocated by the main thread and two 5GB threads are constructed (num_workers is 2).

I suspect that fork() function copies most of the context to the new threads.

The following is the Activity monitor showing the processes created by python, ZMQbg/1 are processes related to python.

ShF3t.png

My dataset used by the data loader has 100 sub-datasets, the __getitem__ call randomly selects one (ignoring the index). (the sub-datasets are AlignedDataset from pix2pixHD GitHub repository):

解决方案

torch.utils.data.DataLoader prefetch 2*num_workers, so that you will always have data ready to send to the GPU/CPU, this could be the reason you see the memory increase

PyTorch在处理CPU和NPU之间的数据传输时,可以利用多线程来提高性能。特别是当处理大规模的数据集时,通过并发地将数据加载到内存,然后并行发送给NPU,可以减少等待时间。以下是一个基本步骤: 1. **导入所需的库**: 首先,需要安装`torch.utils.data.DataLoader`,它支持多进程或多线程数据加载,并且`num_workers`参数可用于指定线程数。 ```python import torch from torch.utils.data import DataLoader ``` 2. **创建DataLoader**: 在创建DataLoader时,设置`num_workers > 0`以及`pin_memory=True`,这会启用多线程并将数据拷贝到NVIDIA的设备内存(如果有足够的资源),这对于CPU和NPU间的快速传输非常关键。 ```python dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=num_threads, pin_memory=True) ``` 3. **数据预处理**: 在`worker_init_fn`函数中,可以对每个工作进程进一步优化,比如设置NPU环境或调整其他配置。 ```python def worker_init_fn(worker_id): # 设置NPU相关环境,如果有的话 torch.npu.set_device(device) dataloader = DataLoader(dataset, ..., worker_init_fn=worker_init_fn) ``` 4. **数据加载和转移**: 当数据由多线程工作者进程加载到内存后,它们会被自动转移到NPU上。在模型的forward函数内部,只需要确保模型放在正确的硬件上下文(如NPU device)即可。 5. **注意事项**: - 保证dataset本身是可以多线程安全操作的,否则可能会导致数据竞争的问题。 - 根据系统的负载和硬件资源合理设置`num_workers`。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值