PCA异常检测
来自pyod的文档
Principal component analysis (PCA) can be used in detecting outliers. PCA is a linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space.
In this procedure, covariance matrix of the data can be decomposed to orthogonal vectors, called eigenvectors, associated with eigenvalues. The eigenvectors with high eigenvalues capture most of the variance in the data.
Therefore, a low dimensional hyperplane constructed by k eigenvectors can capture most of the variance in the data. However, outliers are different from normal data points, which is more obvious on the hyperplane constructed by the eigenvectors with small eigenvalues.
Therefore, outlier scores can be obtained as the sum of the projected distance of a sample on all eigenvectors. See [BSCSC03,BAgg15] for details.
Score(X) = Sum of weighted euclidean distance between each sample to the hyperplane constructed by the selected eigenvectors
实践用乳腺癌数据集
from pyod.models import pca
data = train_data.values
y = data[:,-1]
n_samples = int(numeric.shape[0])
train_set = numeric[:int(n_samples*0.8)]
y_train = y[:int(n_samples*0.8)]
y_test = y[int(n_samples*0.8):]
test_set = numeric[int(n_samples*0.8):]
my_pca = pca.PCA()
my_pca.fit(train_set)
y_pre = my_pca.predict(X=test_set)
def trans(c):
if c=='n':
return 0
else:
return 1;
y_ = list(map(trans,y_test))
print('预测成功率为 %.2f%% '%((y_==y_pre).sum() / y_pre.shape[0] * 100))