在学习KPCA原理
原理部分可参考文章:https://zhuanlan.zhihu.com/p/59775730
以下是我用python实现的KPCA
#!/usr/bin/env python
# coding: utf-8
# In[1]:
'''
myKPCA 2019.10.31
Reference:
Zhihua Zhou. Machine learning[M]. Tsinghua University Press, 2016
训练实现步骤:1.计算核矩阵K; 2.中心化核矩阵得到hat_K; 3.计算得到hat_K的最大k个特征值与特征向量
数据集: HNR
使用效果:linear和poly运行速度正常,运行出来的精度也可以。
rbf运行速度较慢,因为在计算核矩阵的时候用的是for, 没有进行优化
同pca一样可通过重构误差来确定降维后的维度
'''
import numpy as np
# from scipy.spatial.distance import pdist, squareform
'''
define class myKPCA
该类的核函数有3种可选,其中多项式核和高斯核需要另外输入参数sigma,默认核函数为linear
降维后的维数d' 和PCA一样,可以自己输入,也可以通过重构误差来确定
'''
class myKPCA:
'''
Initialize function of class myKPCA.
Input:
kernel: 可选的核函数,默认为'linear'
'linear'==>线性核函数;'poly'==>多项式核函数;'rbf'==>高斯核
n_components:The dimension after dimensionality reduction
if n_components=0, n_components will be set by the refactoring threshold
t: threshold, t=0.95
sigma: the degree of rbf kernel function, polynomial kernel function
'''
def __init__(self, kernel='linear', n_components=0, t=0.95, sigma=1):
self.kernel = kernel
self.n_components = n_components
self.t = t
self.sigma = sigma
self.w = [] # [x_train num_sample, n_components]
self.K = [] # [x_train num_sample, x_train num_sample]
self.X = [] # [x_train num_sample, x_train num_feature]
'''
define rbf, linear and poly kernel function
Input:
X1:numpy.ndarry, size: [X1_num_sample, num_feature]
X1:numpy.ndarry, size: [X2_num_sample, num_feature]
Returns:
k: numpy.ndarry, size: [X2_num_sample, X1_num_sample]
'''
def rbf(self, X1, X2):
m = X1.shape[0]
p = X2.shape[0]
'''
使用pdist方法来求解欧式距离,运行速度较之于下面的for方法快,但...
k = pdist(X1, 'euclidean')
k = squareform(k)
k = np.exp( -k/(2*self.sigma**2) )
'''
k = np.ones([p, m])
for i in range(p):
dist = np.sqrt(np.sum( np.square(X1-X2[i,:]), axis=1 ))
# k[i,:] = np.exp( -dist/(2*self.sigma**2) ) # 这一行的写法与课本的表达式一致,等价于下面那种写法
k[i,:] = np.exp