Linear Discriminant Analysis (LDA) can be used as a technique for feature extraction to increase the computational effciency and reduce the degree of over-ftting due to the curse of dimensionality in nonregularized models.
1.1 Computing the scatter matrices
np.set_printoptions(precision=4) mean_vecs = [] for label in range(1, 4): mean_vecs.append(np.mean(X_train_std[y_train == label], axis=0))1.2 The covariance matrix is a normalized version of the scatter matrix
d = 13 # number of features S_W = np.zeros((d, d)) for label, mv in zip(range(1, 4), mean_vecs): class_scatter = np.zeros((d, d)) # scatter matrix for each class for row in X_train_std[y_train == label]: row, mv = row.reshape(d, 1), mv.reshape(d, 1) # make column vectors class_scatter += (row - mv).dot((row - mv).T) S_W += class_scatter # sum class scatter matrices
d = 13 # number of features S_W = np.zeros((d, d)) for label, mv in zip(range(1, 4), mean_vecs): class_scatter = np.cov(X_train_std[y_train == label].T) S_W += class_scatter
mean_overall = np.mean(X_train_std, axis=0) d = 13 # number of features S_B = np.zeros((d, d)) for i, mean_vec in enumerate(mean_vecs): n = X_train[y_train == i + 1, :].shape[0] mean_vec = mean_vec.reshape(d, 1) # make column vector mean_overall = mean_overall.reshape(d, 1) # make column vector S_B += n * (mean_vec - mean_overall).dot((mean_vec - mean_overall).T)
2.1 Selecting linear discriminants for the new feature subspace
eigen_vals, eigen_vecs = np.linalg.eig(np.linalg.inv(S_W).dot(S_B)) # Make a list of (eigenvalue, eigenvector) tuples eigen_pairs = [(np.abs(eigen_vals[i]), eigen_vecs[:, i]) for i in range(len(eigen_vals))] # Sort the (eigenvalue, eigenvector) tuples from high to low eigen_pairs = sorted(eigen_pairs, key=lambda k: k[0], reverse=True)2.2 C all the content of the class-discriminatory information discriminability
tot = sum(eigen_vals.real) discr = [(i / tot) for i in sorted(eigen_vals.real, reverse=True)] cum_discr = np.cumsum(discr)
2.3 Show
plt.bar(range(1, 14), discr, alpha=0.5, align='center', label='individual "discriminability"') plt.step(range(1, 14), cum_discr, where='mid', label='cumulative "discriminability"') plt.ylabel('"discriminability" ratio') plt.xlabel('Linear Discriminants') plt.ylim([-0.1, 1.1]) plt.legend(loc='best') plt.show()
3. Projecting samples onto the new feature space
w = np.hstack((eigen_pairs[0][1][:, np.newaxis].real, eigen_pairs[1][1][:, np.newaxis].real)) X_train_lda = X_train_std.dot(w) colors = ['r', 'b', 'g'] markers = ['s', 'x', 'o'] for l, c, m in zip(np.unique(y_train), colors, markers): plt.scatter(X_train_lda[y_train == l, 0] * (-1), X_train_lda[y_train == l, 1] * (-1), c=c, label=l, marker=m) plt.xlabel('LD 1') plt.ylabel('LD 2') plt.legend(loc='lower right') plt.show()
4. LDA via scikit-learn
lda = LDA(n_components=2) X_train_lda = lda.fit_transform(X_train_std, y_train) lr = LogisticRegression() lr = lr.fit(X_train_lda, y_train) plot_decision_regions(X_train_lda, y_train, classifier=lr) plt.xlabel('LD 1') plt.ylabel('LD 2') plt.legend(loc='lower left') # plt.tight_layout() # plt.savefig('./images/lda3.png', dpi=300) plt.show()
4.1 By lowering the regularization strength, we could probably shift the decision boundaries so that the logistic regression models classify all samples in the training dataset correctly.
X_test_lda = lda.transform(X_test_std) plot_decision_regions(X_test_lda, y_test, classifier=lr) plt.xlabel('LD 1') plt.ylabel('LD 2') plt.legend(loc='lower left') plt.show()
Reference: 《Python Machine Learning》