数据预处理

breezehasai

于 2021-11-20 10:28:31 发布

阅读量2k

点赞数

文章标签：深度学习 pytorch 人工智能

本文链接：https://blog.csdn.net/breezehasai/article/details/121418395

版权

1. 获得一批数据集，先把数据集转成能够处理的格式。

对于mat数据集，可以用scipy处理，scipy处理不了的，用h5py进行处理

2. 监督学习的，先将数据分好类，分别保存在各自文件夹下

ImageFolder：Supported extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp**

3. 在将别的格式数据转成jpg格式时

要将数据归一化到0-255，同时要将type变为uint8
mat中的数值一般采用double64存储和运算

hi = np.max(image)
lo = np.min(image)
# 将数据归到0-255之间
image = (((image - lo) / (hi - lo)) * 255).astype(np.uint8)

将tensor格式转化为np.ndarray格式

transform1 = transforms.ToTensor()
img1 = transform1(img) 
img2 = img1.numpy()*255 
img2 = img2.astype('uint8') 
img2 = np.transpose(img2 , (1,2,0))

4. 自定义dataset

https://blog.csdn.net/wsp_1138886114/article/details/83620869
https://www.cnblogs.com/wanghui-garcia/p/11514368.html

class CustomDataset(data.Dataset):#需要继承data.Dataset
    def __init__(self):
        # TODO
        # 1. Initialize file path or list of file names.
        pass
    def __getitem__(self, index):
        # TODO
        # 1. Read one data from file (e.g. using numpy.fromfile, PIL.Image.open).
        # 2. Preprocess the data (e.g. torchvision.Transform).
        # 3. Return a data pair (e.g. image and label).
        #这里需要注意的是，第一步：read one data，是一个data
        pass
    def __len__(self):
        # You should change 0 to the total size of your dataset.
        return 0