用MNIST制作多标签（multi-label）数据集

最新推荐文章于 2023-04-21 15:45:16 发布

HackerTom

最新推荐文章于 2023-04-21 15:45:16 发布

阅读量1.9k

点赞数 1

分类专栏：机器学习文章标签： MNIST multi-label tensorflow numpy matplotlib

本文链接：https://blog.csdn.net/HackerTom/article/details/106613878

版权

机器学习专栏收录该内容

120 篇文章 16 订阅

订阅专栏

MNIST 是个单标签（single label，multi-class）数据集，图片尺寸都是 $28\times28$ ，可以将 4 幅图拼在一起，组成一幅 $56\times56$ 的图像，标签也对应加在一起，就可以组成一个简易的多标签（multi-label）数据集。示例：
multi-MNIST
这样造的数据相当于假设了各 classes 之间的独立性，而真实 multi-label 数据各 classes 之间会有关系。

Codes

用 tensorflow 自带的程序来读 MNIST 数据
tensorflow 1.12

import os
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data
#%matplotlib inline


MNIST_P = "E:/iTom/dataset/MNIST"
mnist = input_data.read_data_sets(MNIST_P, one_hot=True)
print(mnist.train.num_examples,  # 55000
	  mnist.validation.num_examples,  # 5000
      mnist.test.num_examples)  # 10000


x_train = mnist.train.images
x_val = mnist.validation.images
x_test = mnist.test.images
print(x_train.shape, x_val.shape, x_test.shape)

x_train = x_train.reshape(-1, 28, 28)
x_val = x_val.reshape(-1, 28, 28)
x_test = x_test.reshape(-1, 28, 28)
print(x_train.shape, x_val.shape, x_test.shape)
print(np.max(x_train), x_train.min())

y_train = mnist.train.labels
y_val = mnist.validation.labels
y_test = mnist.test.labels
print(y_train.shape, y_val.shape, y_test.shape)


def integrate(x4, y4):
    """4 幅拼成 1 幅
    x4: [4, 28, 28] images
    y4: [4, 10] labels
    """
    up = np.concatenate([x4[0], x4[1]], 1)
    down = np.concatenate([x4[2], x4[3]], 1)
    x4 = np.concatenate([up, down], 0)
    # plt.imshow(x4, cmap="Greys")
    # plt.show()

    y4 = (y4.sum(0) > 0).astype(y4.dtype)
    # print(y4)
    return x4, y4


def make(images, labels):
	"""对数据集批量操作"""
    X, Y = [], []
    for i in range(0, labels.shape[0], 4):
        img = images[i: i + 4]
        lab = labels[i: i + 4]
        _x, _y = integrate(img, lab)
        X.append(_x[np.newaxis, :])
        Y.append(_y[np.newaxis, :])
    X = np.vstack(X)
    Y = np.vstack(Y)
    return X, Y


# 保存
X_test, Y_test = make(x_test, y_test)
print(X_test.shape, Y_test.shape)
np.save(os.path.join(MNIST_P, "x_test.npy"), X_test)
np.save(os.path.join(MNIST_P, "y_test.npy"), Y_test)

X_train, Y_train = make(x_train, y_train)
print(X_train.shape, Y_train.shape)
np.save(os.path.join(MNIST_P, "x_train.npy"), X_train)
np.save(os.path.join(MNIST_P, "y_train.npy"), Y_train)

X_val, Y_val = make(x_val, y_val)
print(X_val.shape, Y_val.shape)
np.save(os.path.join(MNIST_P, "x_val.npy"), X_val)
np.save(os.path.join(MNIST_P, "y_val.npy"), Y_val)

HackerTom

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
用MNIST制作多标签（multi-label）数据集

MNIST 是个单标签（single label，multi-class）数据集，图片尺寸都是 28×2828\times2828×28，可以将 4 幅图拼在一起，组成一幅 56×5656\times5656×56 的图像，标签也对应加在一起，就可以组成一个简易的多标签（multi-label）数据集。示例：Codes用 tensorflow 自带的程序来读 MNIST 数据tensorflow 1.12import osimport numpy as npimport matplotli
复制链接

扫一扫

专栏目录