1 问题描述
- 用PyTorch来构建逻辑回归模型,使用的是框架自带函数:
nn.Linear(),跟上一篇博客中的线性回归是一样的只不过这里的模型输出的类别不止一个,从而来解决多类分类问题。本质上,逻辑回归是一种广义线性模型。 - 这里使用的数据集是MNIST手写数字,10类判别,用的判别方式是softmax,就是最大概率选择模式。
- 输入模型的数据用一个data loader来装,也就是数据集是分批进入模型,每批次是100个,总共6批600个,然后模型迭代次数epoch是5次
- learning rate(lr)可以调整模型训练速度,lr大的话训练快,但收敛可能震荡(取不到极小值),lr小的话训练慢,收敛慢,但收敛震荡小,结果会较好,在实际场景中需要tradeoff lr和模型训练的情况。
2 具体代码
# Logistic Regression
import torch
import torch.nn as nn
import torchvision # related the dataset
import torchvision.transforms as transforms # change the dimension
# Hyper-parameters
input_size = 28 * 28 # 784, width * height in a image
num_classes = 10
num_epochs = 5 # iteration times
batch_size = 100 # size of training input
learning_rate = 0.003
# MNIST data (images and labels)
train_dataset = torchvision.datasets.MNIST(
root = 'data',
train = True, # identify the train data
transform = transforms.ToTensor(), # transform to tensor
download = True # download data, if the data exist, then skip the download
)
test_dataset = torchvision.datasets.MNIST(
root = 'data',
transform = transforms.ToTensor()
)
# Data loader (input pipeline), utils is the basic class for subsample
train_loader = torch.utils.data.DataLoader(
dataset = train_dataset, # select the train data
batch_size = batch_size, # identify the batch size
shuffle = True # shuffle the data to stochastic order
)
test_loader = torch.utils.data.DataLoader(
dataset = test_dataset,
batch_size = batch_size,
shuffle = False
)
# Logistic regression model, use nn.Linear like the Linear regression
model = nn.Linear(input_size, num_classes)
# Loss and optimizer
# nn.CrossEntropyLoss() computes softmax internally
criterion = nn.CrossEntropyLoss() # cross entropy, not MSE
optimizer = torch.optim.SGD( # stochastic gradient decend
model.parameters(),
lr = learning_rate
)
# Train the model
total_step = len(train_loader) # length of pipeline
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):
# Reshape the images t0 (batch_size, input_size)
images = images.reshape(-1, input_size)
# Forward pass
outputs = model(images)
loss = criterion(outputs, labels) # calculate the loss
# Backward and optimize
optimizer.zero_grad() # clear the gradient
loss.backward() # update gradient
optimizer.step() # update parameters
if (i+1) % 100 == 0:
print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, i+1, total_step, loss.item()) # print the processing info
)
# Test the model
# In the test, we don't need to compute gradients (for memory efficiency)
with torch.no_grad(): # no gradient in following formula
correct = 0
total = 0
for images, labels in test_loader:
images = images.reshape(-1, input_size)
outputs = model(images)
_,predicted = torch.max(outputs.data, 1) # predicted results
total += labels.size(0)
correct += (predicted == labels).sum() # judge the correct times
print('Accuracy of the model on the 10000 test images: {} %'.format(100 * correct//total))
# save the model checkpoint
torch.save(model.state_dict(), 'model.ckpt')
3 程序输出
程序输出基本达到收敛,逻辑模型的判别精度在80%以上,相比神经网络模型,效果会差一点;当然,逻辑模型的适用性会更好,因为模型简单,训练和测试较快,泛化能力比较稳定,应用场景广泛,比如个性化推荐。
Epoch [1/5], Step [100/600], Loss: 2.0555
Epoch [1/5], Step [200/600], Loss: 1.7926
Epoch [1/5], Step [300/600], Loss: 1.6185
Epoch [1/5], Step [400/600], Loss: 1.4833
Epoch [1/5], Step [500/600], Loss: 1.4163
Epoch [1/5], Step [600/600], Loss: 1.3495
Epoch [2/5], Step [100/600], Loss: 1.2309
Epoch [2/5], Step [200/600], Loss: 1.0242
Epoch [2/5], Step [300/600], Loss: 1.0835
Epoch [2/5], Step [400/600], Loss: 0.9478
Epoch [2/5], Step [500/600], Loss: 0.9879
Epoch [2/5], Step [600/600], Loss: 0.9484
Epoch [3/5], Step [100/600], Loss: 0.7804
Epoch [3/5], Step [200/600], Loss: 0.8536
Epoch [3/5], Step [300/600], Loss: 0.8493
Epoch [3/5], Step [400/600], Loss: 0.8510
Epoch [3/5], Step [500/600], Loss: 0.6945
Epoch [3/5], Step [600/600], Loss: 0.7752
Epoch [4/5], Step [100/600], Loss: 0.6894
Epoch [4/5], Step [200/600], Loss: 0.7288
Epoch [4/5], Step [300/600], Loss: 0.7954
Epoch [4/5], Step [400/600], Loss: 0.7401
Epoch [4/5], Step [500/600], Loss: 0.7364
Epoch [4/5], Step [600/600], Loss: 0.5671
Epoch [5/5], Step [100/600], Loss: 0.6541
Epoch [5/5], Step [200/600], Loss: 0.6945
Epoch [5/5], Step [300/600], Loss: 0.7652
Epoch [5/5], Step [400/600], Loss: 0.6326
Epoch [5/5], Step [500/600], Loss: 0.6274
Epoch [5/5], Step [600/600], Loss: 0.5558
Accuracy of the model on the 10000 test images: 85 %