（六）PyTorch深度学习：加载数据集

Kkh_8686

已于 2022-12-20 22:07:56 修改

阅读量672

点赞数

分类专栏： python 文章标签：深度学习 pytorch python

于 2022-07-16 21:50:06 首次发布

本文链接：https://blog.csdn.net/K_AAbb/article/details/125824615

版权

python 专栏收录该内容

43 篇文章 14 订阅

订阅专栏

PyTorch加载数据集

1、Dataset：数据集（支持数据索引）；

2、DataLoader：加载数据集；

3、Batch：加载全部数据；（优点：可以最大化利用向量计算优化提升计算的速度；缺点）

4、随机梯度下降：一个样品数据；（优点：得到一个好的随机性，克服鞍点问题；缺点：训练出来的模型比其他的好，但是由于每次只有一个样本，没法利用CPU/GPU并行能力，优化时间过长）

5、代码：

import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader    # Dataset：抽象类，不能实例化

#####################1 准备、加载数据集#################################
# DiabetesDataset 继承 Dataset 中副类的基本功能
class DiabetesDataset(Dataset):
    def __init__(self, filepath):         # 初始化，根据路径（filepath）或者加载简单的数据和标签
        xy = np.loadtxt(filepath, delimiter=',', dtype=np.float32)
        self.len = xy.shape[0]
        self.x_data = torch.from_numpy(xy[ : , : -1])    # 除租后一列的所有行
        self.y_data = torch.from_numpy(xy[ : , [-1]])    # 最后一列的所有行

    # 为了在DiabetesDataset 实例化之后 索引dataset[index] 提取数据
    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

    # 将数据的数量
    def __len__(self):
        return self.len

# data_path = "F:\\Python_Deep_Learning\\data\\diabetes.csv.gz"

dataset = DiabetesDataset('diabetes.csv.gz')
train_loader = DataLoader(dataset=dataset,
                          batch_size=32,
                          shuffle=True,
                          num_workers=2)

#########################2 使用类设计模型###############################
class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear1 = torch.nn.Linear(8,6)
        self.linear2= torch.nn.Linear(6,4)
        self.linear3 = torch.nn.Linear(4,1)
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.sigmoid(self.linear1(x))
        x = self.sigmoid(self.linear2(x))
        x = self.sigmoid(self.linear3(x))
        return x

model = Model()
###################3 构建损失函数、优化器###############################
criterion = torch.nn.BCELoss(size_average=False)          # BCE损失
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)   # 参数优化

#####################4 循环训练 #########################
if __name__ == "__main__":
    for epoch in range(100):
        for i, data in enumerate(train_loader, 0):
            # 准备数据
            inputs, labels = data
            # 前向传播
            y_pred = model(inputs)
            loss = criterion(y_pred, labels)
            print(epoch, i, loss.item())
            # 反向传播
            optimizer.zero_grad()
            loss.backward()
            # 更新权重
            optimizer.step()