PyTorch 加载数据集PIL.Image和numpy相互转化出现 TypeError: Cannot handle this data type的问题
pytorch 加载图片数据集通常会使用 PIL读入图片,进行一些预处理转化为Tensor并有dataloader加载进行训练或测试。有时候我们对图像的预处理要求较高,PIL库无法快捷的完成,常常会用numpy数据辅助完成。但是使用不当也会造成一些问题,例如:
Traceback (most recent call last):
File "train.py", line 288, in <module>
main()
File "train.py", line 231, in main
for i, (frontRGB, rePro, topRGB) in enumerate(trainDataLoader):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/PIL/Image.py", line 2533, in fromarray
mode, rawmode = _fromarray_typemap[typekey]
KeyError: ((1, 1, 3), '<f8')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/username/Project/viewSynthesis/crossViewWithDepth/src/datasets.py", line 52, in __getitem__
img= Image.fromarray(img)
File "/usr/local/lib/python3.7/dist-packages/PIL/Image.py", line 2535, in fromarray
raise TypeError("Cannot handle this data type")
TypeError: Cannot handle this data type
这里其实爆出了两个问题,第一是dataloader的报错:
TypeError: Caught TypeError in DataLoader worker process 0.
这是dataloader无法正常加载数据集的错误信息,其原因可能有很多。包括dataset类中__getitem__
方法有误等,当然由于numpy与Image相互转化问题导致无法正常返回数据也会报这个错,所以这个信息比较宽泛,如果遇到需要具体分析。
这里又出现了另一个错误信息;
TypeError: Cannot handle this data type
这里的错误产生位置指向了:img= Image.fromarray(img)
这一行。我这里通过将img转化为np.ndarray从而进行了更复杂的图像预处理,但是转为图像出了错。这是因为fromarray默认会将图像转为uint8类型(也就是RGB每个通道由8bit表示的图像格式),一旦输入的np数组不是这个格式,就无法正常转化。因此我们需要:
img= Image.fromarray(np.uint8(img))
先将数组转化为uint8再转为图像就没有问题了。
更严格一点可以:
img= Image.fromarray(np.uint8(img)).convert('RGB')
保证图像的格式、通道等。