对kaggle数据集animal-10进行vgg_net图像十分类

数据集使用的是在kggle上下载的animal-10数据集,有十个种类的动物图片,总量约为26000张

vgg网络使用了3个3x3卷积核来代替7x7卷积核,使用了2个3x3卷积核来代替5*5卷积核,这样做的主要目的是在保证具有相同感知野的条件下,提升了网络的深度,在一定程度上提升了神经网络的效果。

这里用到的是vgg16网络,包含了16个隐藏层(13个卷积层和3个全连接层),当然也可以用vgg19(包含了19个隐藏层(16个卷积层和3个全连接层))来做,结构很相似,全部使用3x3的卷积和2x2的max pooling。下图是vgg网络的结构图

这里导入我们需要用的包

import torchvision.models as models
import torch
import torch.nn.functional as F
from torchvision import transforms as T
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
from torch import optim
from torch import nn
import cv2
import numpy as np
from sklearn.metrics import classification_report

这里是文件读取、图像处理以及对数据集的划分

def my_cv_imread(filepath):
    img = cv2.imdecode(np.fromfile(filepath, dtype=np.uint8), -1)
    return img


# 图像处理
transform = T.Compose([
    T.Resize((48, 48)),
    T.RandomHorizontalFlip(),
    T.ToTensor(),
    T.Normalize(mean=[0.4, 0.4, 0.4], std=[0.2, 0.2, 0.2])
])
dataset = ImageFolder(r'E:\dataset\archive\raw-img', transform=transform)

train = int(len(dataset) * 0.8)
other_train = len(dataset) - train
train_dataset, test_dataset = torch.utils.data.random_split(dataset, [train, other_train])

train_dataloader_train = DataLoader(train_dataset, batch_size=1024, shuffle=True)
train_dataloader_test = DataLoader(test_dataset, batch_size=1024, shuffle=True)

 这里的训练数据集使用的是80%

接下来构建我们的vgg网络

class Net(nn.Module):
    def __init__(self, model):
        super(Net, self).__init__()
        for name, value in model.named_parameters():
            value.requires_grad = False
        self.vgg_layer = nn.Sequential(*list(model.children())[:-2])
        # 第一层
        self.Linear_layer1 = nn.Linear(512, 4096)
        # 第二层
        self.Linear_layer2 = nn.Linear(4096, 512)
        # 第三层
        self.Linear_layer3 = nn.Linear(512, 10)
        # drop层
        self.drop_layer = torch.nn.Dropout(p=0.5)

    def forward(self, x):
        x = self.vgg_layer(x)
        # print(x.shape)
        x = x.view(x.size(0), -1)
        # print(x.shape)
        x = F.relu(self.Linear_layer1(x))
        x = self.drop_layer(x)
        x = F.relu(self.Linear_layer2(x))
        x = self.drop_layer(x)
        x = self.Linear_layer3(x)
        return x


vgg = models.vgg16(pretrained=True)
model = Net(vgg)

我们对vgg网络进行了一定的修改,第一张图是vgg模型的结构,第二张是修改后的网络 

———————————————————————————————————————————

训练模型

optimizer = optim.Adam(model.parameters())
loss_func = nn.CrossEntropyLoss()

for epoch in range(max_epoch):
    model.train()
    batch = 0
    all_loss = 0
    trail_acc = 0
    trail_totle = 0
    allax_batch = len(train_dataloader_train)
    for train_data in train_dataloader_train:
        batch_images, batch_labels = train_data
        out = model(batch_images)
        # loss
        loss = loss_func(out, batch_labels)
        all_loss += loss
        # 预测
        prediction = torch.max(out, 1)[1]
        # 总预测准确的数量
        train_correct = (prediction == batch_labels).sum()
        # 加和数量
        trail_acc += train_correct
        # 总数量
        trail_totle += len(batch_labels)
        # 求导
        optimizer.zero_grad()
        # 反向传递
        loss.backward()
        # 向前走一步
        optimizer.step()
        batch += 1
        print("Epoch: %d/%d || batch:%d/%d average_loss: %.3f || train_acc: %.2f || loss:%.2f"
              % (epoch + 1, max_epoch, batch, allax_batch, loss, train_correct / len(batch_labels), loss))
    print("Epoch: %d/%d || acc:%d || all_loss:%.2f" % (epoch + 1, max_epoch, trail_acc / trail_totle, all_loss))

 这里使用的是adam优化以及CrossEntropyLoss交叉熵损失函数,相关详细介绍的链接我放在下面

https://blog.csdn.net/leadai/article/details/79178787

https://zhuanlan.zhihu.com/p/98785902

训练模型的参数如下

learning_rate = 0.2
max_epoch = 5
batch_size = 1024
max_batch = 16

训练好模型后,我们可以保存一下模型,然后可以显示主要分类指标的文本报告

transform = T.Compose([
    T.Resize((48, 48)),
    T.RandomHorizontalFlip(),
    T.ToTensor(),
    T.Normalize(mean=[0.4, 0.4, 0.4], std=[0.2, 0.2, 0.2])
])

with torch.no_grad():
    true_lable = []
    pre_lable = []
    for train_data in train_dataloader_train:
        batch_images, batch_labels = train_data
        out = model(batch_images)
        prediction = torch.max(out, 1)[1]
        true_lable += batch_labels.tolist()
        pre_lable += prediction.tolist()

print(classification_report(true_lable, pre_lable))

 这是epoch为5的精确度结果

  这是epoch为30的精确度结果

 

  • 4
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值