将NMF应用于之前用过的Wild数据集中的Labeled Faces。NMF的主要参数是我们想要提取的分量个数。通常来说,这个数字要小于输入特征的个数(否则的话,将每个像素作为单独的分量就可以对数据进行解释)。
首先,观察分类个数如何影响NMF重建数据的好坏:
import mglearn.plots
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_lfw_people
people=fetch_lfw_people(data_home = "C:\\Users\\86185\\Downloads\\",min_faces_per_person=20,resize=.7)
image_shape=people.images[0].shape
mask=np.zeros(people.target.shape,dtype=np.bool_)
for target in np.unique(people.target):
mask[np.where(people.target==target)[0][:50]]=1
X_people=people.data[mask]
y_people=people.target[mask]
X_train,X_test,y_train,y_test=train_test_split(X_people,y_people,stratify=y_people,random_state=0)
mglearn.plots.plot_nmf_faces(X_train,X_test,image_shape)
plt.show()