一.Logistic Regression
(一)小例子简介
逻辑回归可以理解为二分类(0,1),例如成绩过与不过,比赛赢与输,非0即1的问题。
继续沿用学习时间与成绩绩点的例子,我们通过绩点判别课程是否通过,如下图所示:
当学习时间为4小时时,判断课程是否通过。这是一个简单的二分类问题。我们做如下设计:
建立的模型如下:
使用交叉熵损失函数:
(二)分步构建模型
1.class构建模型
import torch
import torch.nn.functional as F
x_data = torch.Tensor([[1.0],[2.0],[3.0],[4.0]])
y_data = torch.Tensor([[0.],[0.],[1.],[1.]])
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear = torch.nn.Linear(1, 1) # One in and one out
def forward(self, x):
y_pred = F.sigmoid(self.linear(x))
return y_pred
# our model
model = Model()
2.构建损失与优化器(BCEloss,SGD)
criterion = torch.nn.BCELoss(size_average=True) # binary cross entropy loss
optimizer = torch.optim.SGD(model.parameters(), lr=0.01) # stochastic gradient descent
3.训练循环(前向,后向,更新)
# Training loop
for epoch in range(1000):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x_data)
# Compute and print loss
loss = criterion(y_pred, y_data)
print(epoch, loss.item())
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()
# After training
hour_var = torch.Tensor([[1.0]])
print("predict 1 hour ", 1.0, model(hour_var).item() > 0.5)
hour_var = torch.Tensor([[7.0]])
print("predict 7 hours", 7.0, model(hour_var).item() > 0.5)
(三)完整程序
import torch
import torch.nn.functional as F
x_data = torch.Tensor([[1.0],[2.0],[3.0],[4.0]])
y_data = torch.Tensor([[0.],[0.],[1.],[1.]])
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear = torch.nn.Linear(1, 1) # One in and one out
def forward(self, x):
y_pred = F.sigmoid(self.linear(x))
return y_pred
# our model
model = Model()
criterion = torch.nn.BCELoss(size_average=True) # binary cross entropy loss
optimizer = torch.optim.SGD(model.parameters(), lr=0.01) # stochastic gradient descent
# Training loop
for epoch in range(1000):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x_data)
# Compute and print loss
loss = criterion(y_pred, y_data)
print(epoch, loss.item())
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()
# After training
hour_var = torch.Tensor([[1.0]])
print("predict 1 hour ", 1.0, model(hour_var).item() > 0.5)
hour_var = torch.Tensor([[7.0]])
print("predict 7 hours", 7.0, model(hour_var).item() > 0.5)
运行结果:
0 0.5711952447891235
1 0.5709328055381775
2 0.5706718564033508
3 0.5704125761985779
:
:
:
996 0.4135492146015167
997 0.41343745589256287
998 0.41332578659057617
999 0.41321417689323425
predict 1 hour 1.0 False
predict 7 hours 7.0 True
二.DataLoader
(一)糖尿病数据集二分类例子
糖尿病数据集下载地址:https://www.kaggle.com/saurabh00007/diabetescsv#diabetes.csv
diabetes dataset 数据表有9列,最后一列是结果outcome(0,1),具体列名如下:
1.加载数据
import torch
import numpy as np
xy = np.loadtxt('data-03-diabetes.csv',delimiter=',',dtype=np.float32)
x_data = torch.from_numpy(xy[:,0:-1]) # 取前面除去最后一列
y_data = torch.from_numpy(xy[:,[-1]]) # 取最后一列,是0,1这样的分类标记
print(x_data.data.shape) # torch.size([759,8])
print(y_data.data.shape) # torch.size([759,1])
2.构建模型
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.l1 = torch.nn.Linear(8, 6)
self.l2 = torch.nn.Linear(6, 4)
self.l3 = torch.nn.Linear(4, 1)
self.sigmoid = torch.nn.Sigmoid()
def forward(self, x):
out1 = self.sigmoid(self.l1(x))
out2 = self.sigmoid(self.l2(out1))
y_pred = self.sigmoid(self.l3(out2))
return y_pred
# our model
model = Model()
3.损失函数与优化器
criterion = torch.nn.BCELoss(reduction='mean')
# optimizer = torch.optim.SGD(model.parameters(), lr=0.1) # 0.6左右
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
4.训练循环
# Training loop
for epoch in range(100):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x_data)
# Compute and print loss
loss = criterion(y_pred, y_data)
print(epoch, loss.item())
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()
5.完整程序
import torch
import numpy as np
xy = np.loadtxt('data-03-diabetes.csv',delimiter=',',dtype=np.float32)
x_data = torch.from_numpy(xy[:,0:-1]) # 取前面除去最后一列
y_data = torch.from_numpy(xy[:,[-1]]) # 取最后一列,是0,1这样的分类标记
print(x_data.data.shape) # torch.size([759,8])
print(y_data.data.shape) # torch.size([759,1])
class Model(torch.nn.Module):
def __init__(self):
"""
In the constructor we instantiate three nn.Linear module
"""
super(Model, self).__init__()
self.l1 = torch.nn.Linear(8, 6)
self.l2 = torch.nn.Linear(6, 4)
self.l3 = torch.nn.Linear(4, 1)
self.sigmoid = torch.nn.Sigmoid()
def forward(self, x):
"""
In the forward function we accept a Variable of input data and we must return
a Variable of output data. We can use Modules defined in the constructor as
well as arbitrary operators on Variables.
"""
out1 = self.sigmoid(self.l1(x))
out2 = self.sigmoid(self.l2(out1))
y_pred = self.sigmoid(self.l3(out2))
return y_pred
# our model
model = Model()
# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.BCELoss(reduction='mean')
# optimizer = torch.optim.SGD(model.parameters(), lr=0.1) # 0.6左右
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
# Training loop
for epoch in range(100):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x_data)
# Compute and print loss
loss = criterion(y_pred, y_data)
print(epoch, loss.item())
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()
运行结果:
torch.Size([759, 8])
torch.Size([759, 1])
0 0.6538757681846619
1 0.6466246843338013
2 0.649458646774292
3 0.6435197591781616
:
:
:
96 0.4327313005924225
97 0.43345311284065247
98 0.43255946040153503
99 0.430547297000885
(二)使用dataloader标准化数据加载过程
1.通过创建class加载数据
通过class类创建dataset,在__init__()中加载数据,设置__getitem__()方法便于取值,设置__len__方法便于获得数据的长度,这里的长度指的是数据表的行数。
创建class后,通过实例化调用,在设置train_loader时,可同时设置batch_size,每次训练批次的大小。
除去加载数据模块,接下来的过程大同小异,在这里不做分步解析。
class DiabetesDataset(Dataset):
""" Diabetes dataset."""
# Initialize your data, download, etc.
def __init__(self):
xy = np.loadtxt('data-03-diabetes.csv', delimiter=',', dtype=np.float32)
self.len = xy.shape[0]
self.x_data = torch.from_numpy(xy[:, 0:-1])
self.y_data = torch.from_numpy(xy[:, [-1]])
def __getitem__(self, index):
return self.x_data[index], self.y_data[index]
def __len__(self):
return self.len
dataset = DiabetesDataset()
train_loader = DataLoader(dataset=dataset, batch_size=32, shuffle=True, num_workers=2)
2.完整程序
import torch
import numpy as np
from torch.utils.data import Dataset, DataLoader
class DiabetesDataset(Dataset):
""" Diabetes dataset."""
# Initialize your data, download, etc.
def __init__(self):
xy = np.loadtxt('data-03-diabetes.csv', delimiter=',', dtype=np.float32)
self.len = xy.shape[0]
self.x_data = torch.from_numpy(xy[:, 0:-1])
self.y_data = torch.from_numpy(xy[:, [-1]])
def __getitem__(self, index):
return self.x_data[index], self.y_data[index]
def __len__(self):
return self.len
dataset = DiabetesDataset()
train_loader = DataLoader(dataset=dataset, batch_size=32, shuffle=True, num_workers=2)
class Model(torch.nn.Module):
def __init__(self):
"""
In the constructor we instantiate three nn.Linear module
"""
super(Model, self).__init__()
self.l1 = torch.nn.Linear(8, 6)
self.l2 = torch.nn.Linear(6, 4)
self.l3 = torch.nn.Linear(4, 1)
self.sigmoid = torch.nn.Sigmoid()
def forward(self, x):
"""
In the forward function we accept a Variable of input data and we must return
a Variable of output data. We can use Modules defined in the constructor as
well as arbitrary operators on Variables.
"""
out1 = self.sigmoid(self.l1(x))
out2 = self.sigmoid(self.l2(out1))
y_pred = self.sigmoid(self.l3(out2))
return y_pred
# our model
model = Model()
# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.BCELoss(reduction='mean')
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
# Training loop
for epoch in range(2):
for i, data in enumerate(train_loader, 0):
# get the inputs
inputs, labels = data
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(inputs)
# Compute and print loss
loss = criterion(y_pred, labels)
print(epoch, i, loss.item())
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()
运行结果:
0 0 0.6559445261955261
0 1 0.6424900889396667
0 2 0.6229329705238342
0 3 0.7626698017120361
:
:
:
1 20 0.35893598198890686
1 21 0.556236743927002
1 22 0.7378787398338318
1 23 0.47327911853790283