题目地址
之前学svm时候就做了一下,pca+svm也有0.98左右,这次试试cnn吧。在本地跑完提交需要搭梯子,就直接在kaggle的kernel上运行了,kernel上也有很多大佬分享自己的代码,可以学到很多。
先导入一堆包进来:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import torch
import torch.nn as nn
import torch.utils.data as Data
import torchvision
from torch.autograd import Variable
import matplotlib.pyplot as plt
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory
import os
print(os.listdir("../input"))
读取训练集和测试,同时将训练集再分成训练集和验证集:
data = pd.read_csv('../input/train.csv')
test = pd.read_csv('../input/test.csv')
train, valid = train_test_split(data, stratify=data.label, test_size=0.2)
定义一个自己的数据集,原本每张图为(1,784),需变成(28,28),再加一维,得到(1,28,28)
刚开始没加最后那一维,然后后面报错维度不对啥的。。。普通图片都是三通道的为(3, x, y),手写数字的图为单通道的。
class MyDataset(Dataset):
def __init__(self, df, transform, train=True):
self.df = df.values
self.transform = transform
self.train = train
def __len__(self):
return len(self.df)
def __getitem__(self, index):
if self.train == True: # 训练集or验证集
label = self.df[index, 0]
image = torch.FloatTensor(self.df[index, 1:]).view(28, 28).unsqueeze(0