PaddlePaddle tutorial Ⅲ——Fashion-Mnist Full Connect Networks

数据集获取

Fashion-Mnist数据集的格式与Mnist数据集相同,为了更通用,我们采用从原始数据文件进行数据的读取,有关更多的数据读取方式可参见我之前发的一篇博客

你可以在AIStudio数据集选项卡下获取此数据集,或在github仓库获取此数据集

第一类第二类第三类第四类第五类第六类第七类第八类第九类第十类
0123456789
T-shirt/top(T恤)Trouser(裤子)Pullover(套衫)Dress(裙子)Coat(外套)Sandal(凉鞋)Shirt(汗衫)Sneaker(运动鞋)Bag(包)Ankle boot(踝靴)

导入必要的包

import paddle
import gzip
import math
import numpy as np
import os
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import paddle.nn.functional as F
import warnings
from paddle.io import Dataset
warnings.filterwarnings("ignore")

paddle.__version__
'2.0.0-rc1'

数据处理

定义数据解压函数

def extract_data(filename, num_data, head_size, data_size):
    with gzip.open(filename) as bytestream:
        bytestream.read(head_size)
        buf = bytestream.read(data_size * num_data)
        data = np.frombuffer(buf, dtype=np.uint8).astype(np.float)
    return data

定义数据装载函数,其中onehot用于表示是否使用独热码生成标签

def load_data(dataset_name, onehot=False):
    data_dir = 'fashion_mnist'

    data = extract_data(data_dir + '/train-images-idx3-ubyte.gz', 60000, 16, 28 * 28)
    train_images = data.reshape((-1, 28, 28, 1))

    data = extract_data(data_dir + '/train-labels-idx1-ubyte.gz', 60000, 8, 1)
    train_labels = data.reshape((-1))

    data = extract_data(data_dir + '/t10k-images-idx3-ubyte.gz', 10000, 16, 28 * 28)
    test_images = data.reshape((-1, 28, 28, 1))

    data = extract_data(data_dir + '/t10k-labels-idx1-ubyte.gz', 10000, 8, 1)
    test_labels = data.reshape((-1))

    train_images = np.asarray(train_images)
    test_labels = np.asarray(test_labels)

    X = np.concatenate((train_images, test_images), axis=0)
    y = np.concatenate((train_labels, test_labels), axis=0).astype(np.int)
    
    if onehot== True:
        y_vec = np.zeros((len(y), 10), dtype=np.float)
        for i, label in enumerate(y):
            y_vec[i, y[i]] = 1.0
        
        return X, y_vec

    return X, y

读取数据

class_dict = {0:'T-shirt', 1:'Trouser', 2:'Pullover', 3:'Dress', 4:'Coat',
              5:'Sandal', 6:'Shirt', 7:'Sneaker', 8:'Bag', 9:'Ankle boot'}

all_images, all_labels = load_data('fashion_mnist')
train_images, train_labels = all_images[:60000], all_labels[:60000]
val_images, val_labels = all_images[60000:], all_labels[60000:]

数据可视化

def plot_num_images(num):
    if num < 1:
        print('INFO:The number of input pictures must be greater than zero!')
    else:
        choose_list = []
        for i in range(num):
            choose_n = np.random.randint(len(train_images))
            choose_list.append(choose_n)
        fig = plt.gcf()
        fig.set_size_inches(18, 15)
        for i in range(num):
            ax_img = plt.subplot(math.ceil(num / 3), 8, i + 1)
            plt_img = train_images[choose_list[i]].reshape(28, 28)
            ax_img.imshow(plt_img, cmap='binary')
            ax_img.set_title(class_dict[train_labels[choose_list[i]]],
                             fontsize=10)
        plt.show()
plot_num_images(16)

1

数据装载

定义FashionDataset数据类

class FashionDataset(Dataset):
    def __init__(self, mode='train'):
        super(FashionDataset, self).__init__()
        if mode == 'train':
            self.data = [[train_images[i].astype('float32').flatten(), train_labels[i].astype('int64')] for i in range(train_images.shape[0])]
        else:
            self.data = [[val_images[i].astype('float32').flatten(), val_labels[i].astype('int64')] for i in range(val_images.shape[0])]

    def __getitem__(self, index):
        data = self.data[index][0]
        label = self.data[index][1]

        return data, label

    def __len__(self):
        return len(self.data)
train_loader = paddle.io.DataLoader(FashionDataset(mode='train'), batch_size=10000, shuffle=True)
val_loader = paddle.io.DataLoader(FashionDataset(mode='val'), batch_size=10000, shuffle=True)

建模训练

这里我们选择Paddle高阶API进行建模训练

classification = paddle.nn.Sequential(
    paddle.nn.Linear(784, 256),
    paddle.nn.Linear(256, 10),
)

model = paddle.Model(classification)
model.summary((784))
---------------------------------------------------------------------------
 Layer (type)       Input Shape          Output Shape         Param #    
===========================================================================
   Linear-1           [[784]]               [256]             200,960    
   Linear-2           [[256]]                [10]              2,570     
===========================================================================
Total params: 203,530
Trainable params: 203,530
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.78
Estimated Total Size (MB): 0.78
---------------------------------------------------------------------------






{'total_params': 203530, 'trainable_params': 203530}
model.prepare(optimizer=paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters()),
              loss=paddle.nn.CrossEntropyLoss(),
              metrics=paddle.metric.Accuracy())
callback = paddle.callbacks.VisualDL(log_dir='log')
model.fit(train_loader,
          val_loader,
          epochs=50,
          batch_size=32,
          verbose=1,
          callbacks=callback)
The loss value printed in the log is the current step, and the metric is the average value of previous step.
Epoch 1/50
step 6/6 [==============================] - loss: 169.0003 - acc: 0.2945 - 650ms/step
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 1/1 [==============================] - loss: 126.2418 - acc: 0.4275 - 267ms/step
Eval samples: 10000
Epoch 2/50
step 6/6 [==============================] - loss: 60.4907 - acc: 0.5221 - 189ms/step
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 1/1 [==============================] - loss: 52.0316 - acc: 0.6389 - 176ms/step
Eval samples: 10000
Epoch 3/50
step 6/6 [==============================] - loss: 59.7795 - acc: 0.6465 - 169ms/step
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 1/1 [==============================] - loss: 62.0711 - acc: 0.6366 - 174ms/step
Eval samples: 10000
Epoch 4/50
step 6/6 [==============================] - loss: 41.2441 - acc: 0.6783 - 175ms/step
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 1/1 [==============================] - loss: 43.7353 - acc: 0.6991 - 179ms/step
Eval samples: 10000
Epoch 5/50
step 6/6 [==============================] - loss: 34.1188 - acc: 0.7276 - 171ms/step
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 1/1 [==============================] - loss: 34.5387 - acc: 0.7427 - 186ms/step
Eval samples: 10000
...
Epoch 46/50
step 6/6 [==============================] - loss: 22.3755 - acc: 0.7249 - 180ms/step
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 1/1 [==============================] - loss: 36.3304 - acc: 0.7274 - 187ms/step
Eval samples: 10000
Epoch 47/50
step 6/6 [==============================] - loss: 18.3063 - acc: 0.7474 - 177ms/step
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 1/1 [==============================] - loss: 26.0110 - acc: 0.7577 - 175ms/step
Eval samples: 10000
Epoch 48/50
step 6/6 [==============================] - loss: 14.9705 - acc: 0.7828 - 184ms/step
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 1/1 [==============================] - loss: 19.3751 - acc: 0.7808 - 199ms/step
Eval samples: 10000
Epoch 49/50
step 6/6 [==============================] - loss: 11.3665 - acc: 0.8097 - 239ms/step
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 1/1 [==============================] - loss: 14.4292 - acc: 0.7874 - 200ms/step
Eval samples: 10000
Epoch 50/50
step 6/6 [==============================] - loss: 9.3327 - acc: 0.8186 - 182ms/step
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 1/1 [==============================] - loss: 11.6504 - acc: 0.8024 - 166ms/step
Eval samples: 10000

模型验证

model.evaluate(FashionDataset(mode='test'), batch_size=64, verbose=1)
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 157/157 [==============================] - loss: 12.4378 - acc: 0.8024 - 3ms/step          
Eval samples: 10000





{'loss': [12.43781], 'acc': 0.8024}

获取在验证集上每一类的准确率

Correct_num = {0:0, 1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0, 9:0}
predict_onehot = model.predict(FashionDataset(mode='test'), batch_size=10000)
predict_result = np.argmax(predict_onehot[0][0], axis=1)
for i in range(len(val_labels)):
    if predict_result[i] == val_labels[i]:
        Correct_num[val_labels[i]] += 1
Correct_rate = {}
for key in Correct_num:
    Correct_rate[class_dict[key]] = Correct_num[key] / sum(val_labels==key)
Predict begin...
step 1/1 [==============================] - 118ms/step
Predict samples: 10000
Correct_rate
{'T-shirt': 0.65,
 'Trouser': 0.952,
 'Pullover': 0.72,
 'Dress': 0.747,
 'Coat': 0.774,
 'Sandal': 0.898,
 'Shirt': 0.58,
 'Sneaker': 0.884,
 'Bag': 0.887,
 'Ankle boot': 0.932}

通过柱状图更直观的可视化每一类正确率

fig = plt.gcf()
fig.set_size_inches(10, 5)
class_name = {'T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal','Shirt', 'Sneaker', 'Bag', 'Ankle boot'}
plt.bar(range(10), [Correct_rate.get(class_name, 0) for class_name in class_name], align='center',yerr=0.000001)

plt.xticks(range(10), class_name)
plt.xlabel('Class Name')
plt.ylabel('Rate')
plt.title('Correct Rate of Each Class')
plt.show()

1

VIsualDL可视化

使用visualdl --logdir log查看训练日志

1

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值