本文将通过一个经典的Kaggle比赛:MNIST手写字识别来进行简单的图像处理入门。
该Kaggle竞赛链接
并且会用到很常见的CNN模型。
首先导入数据,做好前期准备:
import pandas as pd
import numpy as np
train_data = pd.read_csv('./train.csv')
test_data = pd.read_csv('./test.csv')
简单查看一下可以发现,该数据中,训练集里是包含label的,在其数据的第一位,而测试集没有。并且他的数据为了方便存储在csv文件中,他是转成了一维数组的方式,而不是二维图片。所以需要把图片reshape成模型可以处理的形式,同时也方便可视化。
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
train_dataset = []
test_dataset = []
train_X = []
train_Y = []
test_X = []
test_Y = []
train_data = np.array(train_data)# 这句话不要放到循环中,不然会吃内存
test_data = np.array(test_data)
for i in range(len(train_data)):
data = train_data[i][1:].reshape((1,28,28))
label = train_data[i][0]
train_X.append(data/255) # len, 1, 28, 28
train_Y.append(label)
for i in range(len(test_data)):
data = test_data[i].reshape((1, 28,28))
test_X.append(data/255)
可以自行根据自己的方式进行维度修改。
简单看一下图像的样子:
import matplotlib.pyplot as plt
# plt.imshow(train_data[0][1:].reshape((28,28)))
plt.imshow(train_X[0][0])
由于我reshape的时候是三维的(1通道,28x28),所以显示读取的时候要选择成train_X[i][0]的格式。图像如下:
为了后面Pytorch模型能顺利读数据,需要把数据从numpy转成TensorDataset的格式然后再构建DataLoader。通过DataLoader可以规定训练的时候,每batch个数据一起进行预测并且可以打乱数据顺序。
train_X = torch.tensor(train_X)
train_Y = torch.tensor(train_Y).type(torch.LongTensor)
test_X = torch.tensor(test_X)
train_dataset = torch.utils.data.TensorDataset(train_X, train_Y)
test_dataset = torch.utils.data.TensorDataset(test_X)
num_epochs = 16
batch_size = 16
learning_rate = 0.01
momentum = 0.5
# pytorch中的DataLoader对象,可以对数据洗牌,批处理数据,多处理来并行加载数据
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
然后构建CNN模型,模型并不是唯一的,你可以根据自己需求修改:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
class ConvNet(nn.Module):
def __init__(self):
super(ConvNet,self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(1,16,kernel_size=5,stride=1,padding=2),
# 激活函数
nn.ReLU(),
nn.MaxPool2d(kernel_size=2,stride=2))
self.layer2 = nn.Sequential(
nn.Conv2d(16,64,kernel_size=5,stride=1,padding=2),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2,stride=2))
self.drop_out = nn.Dropout()
self.fc1 = nn.Linear(7*7*64,1000)
self.fc2 = nn.Linear(1000,10)
def forward(self,x):
out = self.layer1(x)
out = self.layer2(out)
# flattens the data dimensions from 7 x 7 x 64 into 3164 x 1
out = out.reshape(out.size(0),-1)
out = self.drop_out(out)
out = self.fc1(out)
out = self.fc2(out)
return out
model = ConvNet()
model.to(device)
然后就是训练阶段:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(),lr = learning_rate)
total_step = len(train_loader)
loss_list = []
acc_list = []
for epoch in range(num_epochs):
running_loss = 0.0
correct = 0
y_train = 0
for i,(images,labels) in enumerate(train_loader):
images = images.type(torch.FloatTensor).to(device)
labels = labels.to(device)
outputs = model(images)
# 计算损失
loss = criterion(outputs,labels)
loss_list.append(loss.item())
# 反向传播
# 先清空所有参数的梯度缓存,否则会在上面累加
optimizer.zero_grad()
# 计算反向传播
loss.backward()
# 更新梯度
optimizer.step()
# 记录精度
total = labels.size(0)
_,predicted = torch.max(outputs.data,1)
correct += (predicted == labels).sum().item()
y_train += total
running_loss += loss.item()
if (i+1) % 400 == 0:
print('Epoch[{}/{}],Step[{},{}],Loss Current Batch:{:.4f},Loss Avg:{:4f}, Accuracy:{:.2f} %'
.format(epoch+1,
num_epochs,
i+1,
total_step,
loss.item(),
running_loss/400,
(correct/y_train),
))
running_loss=0.0
correct = 0
y_train = 0
其中输出loss的方式可以自己定义,主要是深度学习训练的几个步骤不要漏掉。
然后开始输出预测, 由于Kaggle是直接后台验证预测集的正确与否,我们自己是看不到label的,所以只需要把predicted的结果输出成csv文件提交即可。
model.eval()
test_pred = []
# 在模型中禁用autograd功能,加快计算
with torch.no_grad():
correct = 0
total = 0
for images in test_loader:
images = images[0].type(torch.FloatTensor).to(device)
outputs = model(images)
_,predicted = torch.max(outputs.data,1)
test_pred.extend(predicted.cpu().numpy())
#按规定格式生成文件
index = np.zeros((len(test_pred), 1))
submission = pd.concat([pd.DataFrame(index).rename(columns={0:'ImageId'}).astype(int), pd.DataFrame(test_pred).rename(columns = {0:'Label'})], axis=1)
submission.to_csv("submission.csv", index = False)
最后将生成的文件提交即可。注意提交的时候预测的结果要转成numpy格式的,不要用cuda或者tensor之类的,否则可能会结果为0分。😀