目录
1、人民币二分类
2、Dataloader与Dataset
torch.utils.data.DataLoader
功能:构建可迭代的数据装载器
- dataset:Dataset类,决定数据从哪读取及如何读取
- batchsize:批大小
- num_works:是否多进程读取数据
- shuffle:每个epoch是否乱序
- drop_last:当样本数不能被batchsize整除时,是否舍弃最后一批数据
Epoch:所有训练样本都已输入到模型中,称为一个Epoch
Iteration:一批样本输入到模型中,称之为一个Iteration
Batchsize:批大小,决定一个Epoch有多少个Iteration
样本总数:80,Batchsize:8
1 Epoch = 10 Iteration
样本总数:87,Batchsize:8
1 Epoch = 10 Iteration?drop_last = True ##舍弃7个
1 Epoch = 11 Iteration? drop_last = False##最后一个样本数量小于Batchsize
torch.utils.data.Dataset
功能:Dataset抽象类,所有自定义的Dataset需要继承它,并且复写
__getitem__()
getitem:接收一个索引,返回一个样本
目录
1、人民币二分类
2、Dataloader与Dataset
torch.utils.data.DataLoader
torch.utils.data.Dataset
1、人民币二分类
[点击并拖拽以移动]
2、Dataloader与Dataset
torch.utils.data.DataLoader
功能:构建可迭代的数据装载器
dataset:Dataset类,决定数据从哪读取及如何读取
batchsize:批大小
num_works:是否多进程读取数据
shuffle:每个epoch是否乱序
drop_last:当样本数不能被batchsize整除时,是否舍弃最后一批数据
Epoch:所有训练样本都已输入到模型中,称为一个Epoch
Iteration:一批样本输入到模型中,称之为一个Iteration
Batchsize:批大小,决定一个Epoch有多少个Iteration
样本总数:80,Batchsize:8
1 Epoch = 10 Iteration
样本总数:87,Batchsize:8
1 Epoch = 10 Iteration?drop_last = True ##舍弃7个
1 Epoch = 11 Iteration? drop_last = False##最后一个样本数量小于Batchsize
torch.utils.data.Dataset
功能:Dataset抽象类,所有自定义的Dataset需要继承它,并且复写
__getitem__()
getitem:接收一个索引,返回一个样本
#==============step 1/5 数据===========
split_dir = os.path.join('..','..','data','rmb_split')
train_dir = os.path.join(split_dir,'train')
valid_dir = os.path.join(split_dir,'valid')
train_transform=transforms.Compose([
transforms.Resize((32,32)),
transforms.RandomCrop(32,padding=4),
transforms.ToTensor(),
])
valid_transform=transforms.Compose([
transforms.Resize((32,32)),
transforms.ToTensor(),
])
#构建MyDataset实例
train_data = RMBDataset(data_dir=train_dir,transform=train_transform)
valid_data = RMBDataset(data_dir=valid_dir,transform=valid_transform)
#构建DataLoader
train_loader=DataLoader(dataset=train_data,batch_size=BATCH_SIZE,shuffle=True)
valid_loader=DataLoader(dataset=valid_data,batch_size=BATCH_SIZE)
#=======================step 2/5 模型============
net = LeNet(classes=2)
net.initialize_weights()
#=======================step 3/5 损失函数============
criterion = nn.CrossEntropyLoss()
#======================step 4/5 优化器==============
optimizer=optim.SGD(net.parameters(),lr=LR,momentum=0.9)
scheduler=torch.optim.lr_scheduler.StepLR(optimizer,step_size=10,gamma=0.1)
#===============step 5/5 训练==========================
train_curve=list()
valid_curve=list()
for epoch in range(MAX_EPOCH):
loss_mean = 0.
correct = 0.
total = 0.
net.train()
for i,data in enumerate(train_loader):
#forward
inputs,labels = data
outputs = net(inputs)
#backward
optimizer.zero_grad()
loss=criterion(outputs,labels)
loss.backward()
#统计分类情况
_,predicted = torch.max(outputs.data,1)
total+=labels.size(0)
correct += (predicted ==labels).squeeze().numpy()
#打印训练信息
loss_mean+=loss.item()
train_curve.append(loss.item())
if (i+1)%log_interval ==0:
loss_mean = loss_mean /log_intervel
print("training:epoch[{:0>3}/{:0>3}]Iteration[{:0>3}/{:0>3}] Loss:{:.4f} Acc:{:.2%}".format(epoch,MAX_EPOCH,i+1,len(train_loader),loss_mean,correct/total))
loss_mean=0.
scheduler.step()#更新学习率
#validate the model
if (epoch+1)%val_interval ==0:
correct_val = 0.
total_val = 0.
loss_val = 0.
net.eval()
with torch.no_grad():
for j,data in enumerate(valid_loader):
inputs,labels=data
outputs=net(inputs)
loss=ctiterion(outputs,labels)
_,predicted=torch.max(outputs.data,1)
total_val += labels.size(0)
correct_val+=(predicted==labels).squeeze().sum().numpy()