pytorch 实现Logistic Regression的简单介绍

1.构造数据集

Step1:使用torch.utils.data中的Dataset类构建数据集,定义一个需要创建的数据集类(这里是DiabetesDataset)

Step2:使用torch.utils.data中的DataLoader类加载数据集

import torch
import numpy as np
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

class DiabetesDataset(Dataset):
    def __init__(self):
        pass
    def __getitem__(self,index):
        pass
    def __len__(self):
        pass
dataset = DiabetesDataset()
train_loader = DataLoader(dataset = dataset, batch_size=batch_size, shuffle=True)

DataLoader的形式如下:

torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=None, sampler=None, batch_sampler=None, num_workers=0, collate_fn=None, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None, multiprocessing_context=None, generator=None, ***, prefetch_factor=None, persistent_workers=False, pin_memory_device='')

Parameters

  • dataset (Dataset) – dataset from which to load the data.
  • batch_size (int, optional) – how many samples per batch to load (default: 1).
  • shuffle (bool, optional) – set to True to have the data reshuffled at every epoch (default: False).
  • sampler (Sampler or Iterable*,* optional) – defines the strategy to draw samples from the dataset. Can be any Iterable with __len__ implemented. If specified, shuffle must not be specified.
  • batch_sampler (Sampler or Iterable*,* optional) – like sampler, but returns a batch of indices at a time. Mutually exclusive with batch_size, shuffle, sampler, and drop_last.
  • num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
  • collate_fn (Callable*,* optional) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.
  • pin_memory (bool, optional) – If True, the data loader will copy Tensors into device/CUDA pinned memory before returning them. If your data elements are a custom type, or your collate_fn returns a batch that is a custom type, see the example below.
  • drop_last (bool, optional) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False)
  • timeout (numeric*,* optional) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. (default: 0)
  • worker_init_fn (Callable*,* optional) – If not None, this will be called on each worker subprocess with the worker id (an int in [0, num_workers - 1]) as input, after seeding and before data loading. (default: None)
  • multiprocessing_context (str or multiprocessing.context.BaseContext*,* optional) – If None, the default multiprocessing context of your operating system will be used. (default: None)
  • generator (torch.Generator, optional) – If not None, this RNG will be used by RandomSampler to generate random indexes and multiprocessing to generate base_seed for workers. (default: None)
  • prefetch_factor (int, optional*,* keyword-only arg) – Number of batches loaded in advance by each worker. 2 means there will be a total of 2 * num_workers batches prefetched across all workers. (default value depends on the set value for num_workers. If value of num_workers=0 default is None. Otherwise, if value of num_workers > 0 default is 2).
  • persistent_workers (bool, optional) – If True, the data loader will not shut down the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. (default: False)
  • pin_memory_device (str, optional) – the device to pin_memory to if pin_memory is True.

下面是实现使用Dataset和DataLoader对数据加载的完整步骤:

import torch
import numpy as np
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
class DiabetesDataset(Dataset):
    def init(self,file_path):
        xy = np.loadtxt(file_path,delimiter=',',dtype = np.float32)
        self.len = xy.shape[0]
        self.x_data = torch.from_numpy(xy[:,:-1])
        self.y_data = torch.from_numpy(xy[:,[-1]])
    def getitem(self,index):
        return self.x_data[index],self.y_data[index]
    def len(self):
        return self.len 
file_path = r"D:\jupyter\pytorch基础\diabetes.csv.gz"
dataset = DiabetesDataset(file_path)
train_loader = DataLoader(dataset=dataset, batch_size=20, shuffle=True)

2.构造训练网络

定义训练网络需要继承torch.nn.Module类

class LinearModel(torch.nn.Module):
    def __init__(self):
        super(LinearModel,self).__init__() # 继承父类的__init__
        self.linear = torch.nn.Linear(1,1) # linear的参数为(input参数个数,output参数个数)
    def forward(self,x):
        y_pred = self.linear(x)
        return y_pred
    
model = LinearModel()

torch.nn中包含线性单元Linear、激活函数Sigmoid、ReLU等,下面是一个网络的实现

class Model(torch.nn.Module):
    def __init__(self):
        super(Model,self).__init__()
        self.linear1 = torch.nn.Linear(8,6)
        self.linear2 = torch.nn.Linear(6,4)
        self.linear3 = torch.nn.Linear(4,1)
        self.activate1 = torch.nn.Sigmoid()
        self.activate2 = torch.nn.ReLU()
    def forward(self,x):
        x = self.activate2(self.linear1(x))
        x = self.activate2(self.linear2(x))        
        x = self.activate1(self.linear3(x))        
        return x
model = Model()

3.构建criterionoptimizer

# BCE损失函数
criterion = torch.nn.BCELoss(reduction='mean')
# SGD优化器
optimizer = torch.optim.SGD(model.parameters(),lr=0.1)

4.训练

for epoch in range(100):
    for i, data in enumerate(train_loader, 0):
        # 1 prepare data
        inputs, labels = data
        # 2 forward
        y_pred = model(inputs)
        loss = criterion(y_pred, labels)
        print(epoch, i, loss.item())
        # 3 backward
        optimizer.zero_grad()
        loss.backward()
        # 4 update
        optimizer.step()
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值