pytorch中数据操作和预处理高维数组数据预处理

最新推荐文章于 2023-02-05 14:05:22 发布

hhhhxxn

最新推荐文章于 2023-02-05 14:05:22 发布

阅读量953

点赞数

分类专栏： pytorch 文章标签： pytorch 机器学习深度学习

本文链接：https://blog.csdn.net/hhhhxxn/article/details/110227033

版权

pytorch 专栏收录该内容

13 篇文章 3 订阅

订阅专栏

在PyTorch中torch.utils.data模块包含着一些常用的数据预处理的操作，主要用于数据的读取、切分、准备等。常用的函数类如表所示：
在这里插入图片描述
使用这些类能够对高维数组、图像等各种类型的数据进行预处理，以便深度学习模型的使用。
针对文本数据的处理可以使用torchtext库进行相关的数据准备操作。

高维数组数据预处理

为了展示全连接神经网络模型，下面使用sklearn中提供的数据集load_boston和load_iris，来进行回归和分类的数据准备。

一、回归数据准备

##加载相应的模块
import torch
import torch.utils.data as Data
from sklearn.datasets import load_boston,load_iris 
import numpy as np

## 读取波士顿回归数据
boston_x,boston_y = load_boston(return_X_y = True)
print("boston_X.dtype:",boston_x.dtype)
print("boston_X.dtype:",boston_y.dtype)
# boston_X.dtype: float64
# boston_X.dtype: float64


## 将数据集转化为32位浮点型张量
train_xt = torch.from_numpy(boston_x.astype(np.float32))
train_yt = torch.from_numpy(boston_y.astype(np.float32))                    
print("train_xy.dtype:",train_xt.dtype)
print("train_xy.dtype:",train_yt.dtype)
# train_xy.dtype: torch.float32
# train_xy.dtype: torch.float32


## 将训练集转化位张量后，使用TensorDataset将X和Y整理到一起
train_data = Data.TensorDataset(train_xt,train_yt)
## 定义一个数据加载器，将训练数据集进行批量处理
train_loader = Data.DataLoader(
    dataset = train_data, ##使用的数据集
    batch_size = 64,  ##批处理样本大小
    shuffle = True,  #每次迭代前打乱数据
    num_workers = 1, ##使用两个进程
)
## 检查训练数据集的一个batch的样本的维度是否正确
for step, (b_x,b_y) in enumerate(train_loader):
    if step > 0:
        break

## 输出训练图像的尺寸和标签的尺寸及数据类型
print("b_x.shape:",b_x.shape)
print("b_y.shape:",b_y.shape)
print("b_x.dtype:",b_x.dtype)
print("b_y.dtype:",b_y.dtype)
# b_x.shape: torch.Size([64, 13])
# b_y.shape: torch.Size([64])
# b_x.dtype: torch.float32
# b_y.dtype: torch.float32

二、分类数据准备

##加载相应的模块
import torch
import torch.utils.data as Data
from sklearn.datasets import load_boston,load_iris 
import numpy as np

##处理分类数据
iris_x,irisy = load_iris(return_X_y = True)
print("iris_x.dtype:",iris_x.dtype)
print("iris_y.dtype:",irisy.dtype)
# iris_x.dtype: float64
# iris_y.dtype: int64


##训练集X转化为张量，训练集y转化为张量
train_xt = torch.from_numpy(iris_x.astype(np.float32))
train_yt = torch.from_numpy(irisy.astype(np.int64))
print("train_xt.dtype:",train_xt.dtype)
print("train_yt.dtype:",train_yt.dtype)
# train_xt.dtype: torch.float32
# train_yt.dtype: torch.int64

##将训练集转化为张张量后，使用TensorDataset将X和Y整理在一起
train_data = Data.TensorDataset(train_xt,train_yt)
##定义一个数据加载器，将训练数据集进行批量处理
train_loader = Data.DataLoader(
    dataset = train_data,
    batch_size = 10,
    shuffle = True,
    num_workers = 1,
)
#检查训练数据集的一个batch样本的维度是否正确
for step,(b_x,b_y) in enumerate(train_loader):
    if step > 0:
        break

##输出训练图像的尺寸和标签的尺寸与数据类型
print("b_x.shape:",b_x.shape)
print("b_y.shape:",b_y.shape)
print("b_x.dtype:",b_x.dtype)
print("b_y.dtype:",b_y.dtype)
# b_x.shape: torch.Size([10, 4])
# b_y.shape: torch.Size([10])
# b_x.dtype: torch.float32
# b_y.dtype: torch.int64

hhhhxxn

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pytorch中数据操作和预处理高维数组数据预处理

在PyTorch中torch.utils.data模块包含着一些常用的数据预处理的操作，主要用于数据的读取、切分、准备等。常用的函数类如表所示：使用这些类能够对高维数组、图像等各种类型的数据进行预处理，以便深度学习模型的使用。针对文本数据的处理可以使用torchtext库进行相关的数据准备操作。...
复制链接

扫一扫