pytorch 实现Logistic Regression的简单介绍

最新推荐文章于 2024-10-13 21:20:00 发布

李济安an

最新推荐文章于 2024-10-13 21:20:00 发布

阅读量818

点赞数 20

文章标签： pytorch 人工智能深度学习 python

本文链接：https://blog.csdn.net/lilacs_/article/details/138907754

版权

1.构造数据集

Step1：使用torch.utils.data中的Dataset类构建数据集，定义一个需要创建的数据集类（这里是DiabetesDataset）

Step2：使用torch.utils.data中的DataLoader类加载数据集

import torch
import numpy as np
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

class DiabetesDataset(Dataset):
    def __init__(self):
        pass
    def __getitem__(self,index):
        pass
    def __len__(self):
        pass
dataset = DiabetesDataset()
train_loader = DataLoader(dataset = dataset, batch_size=batch_size, shuffle=True)

DataLoader的形式如下：

torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=None, sampler=None, batch_sampler=None, num_workers=0, collate_fn=None, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None, multiprocessing_context=None, generator=None, ***, prefetch_factor=None, persistent_workers=False, pin_memory_device='')

Parameters

dataset (Dataset) – dataset from which to load the data.
batch_size (int, optional) – how many samples per batch to load (default: 1).
shuffle (bool, optional) – set to True to have the data reshuffled at every epoch (default: False).
sampler (Sampler or Iterable*,* optional) – defines the strategy to draw samples from the dataset. Can be any Iterable with __len__ implemented. If specified, shuffle must not be specified.
batch_sampler (Sampler or Iterable*,* optional) – like sampler, but returns a batch of indices at a time. Mutually exclusive with batch_size, shuffle, sampler, and drop_last.
num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
collate_fn (Callable*,* optional) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.
pin_memory (bool, optional) – If True, the data loader will copy Tensors into device/CUDA pinned memory before returning them. If your data elements are a custom type, or your collate_fn returns a batch that is a custom type, see the example below.
drop_last (bool, optional) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False)
timeout (numeric*,* optional) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. (default: 0)
worker_init_fn (Callable*,* optional) – If not None, this will be called on each worker subprocess with the worker id (an int in [0, num_workers - 1]) as input, after seeding and before data loading. (default: None)
multiprocessing_context (str or multiprocessing.context.BaseContext*,* optional) – If None, the default multiprocessing context of your operating system will be used. (default: None)
generator (torch.Generator, optional) – If not None, this RNG will be used by RandomSampler to generate random indexes and multiprocessing to generate base_seed for workers. (default: None)
prefetch_factor (int, optional*,* keyword-only arg) – Number of batches loaded in advance by each worker. 2 means there will be a total of 2 * num_workers batches prefetched across all workers. (default value depends on the set value for num_workers. If value of num_workers=0 default is None. Otherwise, if value of num_workers > 0 default is 2).
persistent_workers (bool, optional) – If True, the data loader will not shut down the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. (default: False)
pin_memory_device (str, optional) – the device to pin_memory to if pin_memory is True.

下面是实现使用Dataset和DataLoader对数据加载的完整步骤:

import torch
import numpy as np
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
class DiabetesDataset(Dataset):
    def init(self,file_path):
        xy = np.loadtxt(file_path,delimiter=',',dtype = np.float32)
        self.len = xy.shape[0]
        self.x_data = torch.from_numpy(xy[:,:-1])
        self.y_data = torch.from_numpy(xy[:,[-1]])
    def getitem(self,index):
        return self.x_data[index],self.y_data[index]
    def len(self):
        return self.len 
file_path = r"D:\jupyter\pytorch基础\diabetes.csv.gz"
dataset = DiabetesDataset(file_path)
train_loader = DataLoader(dataset=dataset, batch_size=20, shuffle=True)

2.构造训练网络

定义训练网络需要继承torch.nn.Module类

class LinearModel(torch.nn.Module):
    def __init__(self):
        super(LinearModel,self).__init__() # 继承父类的__init__
        self.linear = torch.nn.Linear(1,1) # linear的参数为（input参数个数，output参数个数）
    def forward(self,x):
        y_pred = self.linear(x)
        return y_pred
    
model = LinearModel()

torch.nn中包含线性单元Linear、激活函数Sigmoid、ReLU等，下面是一个网络的实现

class Model(torch.nn.Module):
    def __init__(self):
        super(Model,self).__init__()
        self.linear1 = torch.nn.Linear(8,6)
        self.linear2 = torch.nn.Linear(6,4)
        self.linear3 = torch.nn.Linear(4,1)
        self.activate1 = torch.nn.Sigmoid()
        self.activate2 = torch.nn.ReLU()
    def forward(self,x):
        x = self.activate2(self.linear1(x))
        x = self.activate2(self.linear2(x))        
        x = self.activate1(self.linear3(x))        
        return x
model = Model()

3.构建`criterion`和`optimizer`

# BCE损失函数
criterion = torch.nn.BCELoss(reduction='mean')
# SGD优化器
optimizer = torch.optim.SGD(model.parameters(),lr=0.1)

4.训练

for epoch in range(100):
    for i, data in enumerate(train_loader, 0):
        # 1 prepare data
        inputs, labels = data
        # 2 forward
        y_pred = model(inputs)
        loss = criterion(y_pred, labels)
        print(epoch, i, loss.item())
        # 3 backward
        optimizer.zero_grad()
        loss.backward()
        # 4 update
        optimizer.step()