今日做模型训练,Pytorch在加载数据时遇到如下错误:
Epoch 1/800: 31%|███ | 4/13 [00:03<00:07, 1.27img/s, loss (batch)=1.29]
Traceback (most recent call last):
File "/home/ubuntu/jiahong/pytorch-unet/train.py", line 193, in <module>
amp=args.amp)
File "/home/ubuntu/jiahong/pytorch-unet/train.py", line 79, in train_net
for batch in train_loader:
File "/home/ubuntu/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/home/ubuntu/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/home/ubuntu/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/home/ubuntu/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 2.
Original Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/ubuntu/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/home/ubuntu/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 74, in default_collate
return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/home/ubuntu/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 74, in <dictcomp>
return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/home/ubuntu/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [1024, 1536] at entry 0 and [1024, 1536, 3] at entry 1
python-BaseException
从错误的log上看,直接的错误原因是在调用torch.stack接口时,传递进入的参数batch的entry 0和entry 1的shape不一致。而batch是一个list,这个list的每一个成员是读入的图片。当把一个batch的图片利用torch.stack打包成一个tensor的时候,由于list的成员的shape不一致,导致了上述错误。
为什么读入的图片的shape不一致呢?由于在生成标签的时候有的标签图片被保存成了灰度图(位深度为8)有的图片被保存成了彩色图(位深度为24),这样的两种通道数不一致的标签图在训练时,产生了不同shape的图片。
这个图的属性如下
另外一个标签图:
这个图的属性如下:
也就是说第二张图是三通道图片,这个不是正常的标签图片。
所以只需要把第二张图改成单通道图片,torch.stack便不会产生错误了。