1.导入相关模块
import numpy as np
from sklearn.datasets import fetch_openml
2.MNIST Dataset
mnist = fetch_openml("mnist_784")
X = mnist.data / 255.0
y = mnist.target
X.shape, y.shape
数据转换为Pandas数据框架
import pandas as pd
feat_cols = ['pixel' + str(i) for i in range(X.shape[1])]
df = pd.DataFrame(X, columns=feat_cols)
df['label'] = y
df['label'] = df['label'].apply(lambda i: str(i))
X, y = None, None
print('Size of the dataframe: {}'.format(df.shape))
由于数据框中的教程是按类排序的,所以我们需要一个随机顺序的索引向量来混合例子。
rndperm = np.random.permutation(df.shape[0])
将随机图像可视化
matshow允许将一个二维矩阵或数组可视化为一个彩色图像。
%matplotlib inline