相似矩阵构建
为所有的数据构建一个图,图的节点就是一个数据点,包含labeled和unlabeled的数据。节点i和节点j的边表示他们的相似度。节点i和节点j的边权重为:
wij=exp(−||xi−xj||2α2)
α是超参。
图构建方法knn
只保留每个节点的k近邻权重,其他的为0,也就是不存在边,因此是稀疏的相似矩阵。
# return k neighbors index
def naive_knn(data_set, query, k):
"""
@:param data_set 数据集
@:param query 给定基准数据
@:param k k个离基准数据的最近邻
带入:
data_set = np.array([[0, 1], [1, 2], [2, 3]])
query = data[0, :] 即[0, 1]
k = 2
"""
num_samples = data_set.shape[0] # matrix's row size
# 带入得num_samples=3
# step 1: calculate Euclidean distance
diff = np.tile(query, (num_samples, 1)) - data_set
# 带入得
# np.tile(query, (num_samples, 1))=[[0 1]
# [0 1]
# [0 1]]
# diff=[[0 0]
# [-1 -1]
# [-2 -2]]
squared_diff = diff ** 2
# 带入得 squared_diff=[[0 0]
# [1 1]
# [4 4]]
squared_dist = np.sum(squared_diff, axis=1) # sum is performed by row
# 带入得 squared_dist=[0 2 8]
# step 2: sort the distance
sorted_dist_indices = np.argsort(squared_dist)
# 带入得 sorted_dist_indices=[0 1 2]
if k > len(sorted_dist_indices):
k = len(sorted_dist_indices)
return sorted_dist_indices[0:k]
# 带入得 sorted_dist_indices[0:k]=[0 1]
构建图
MatX = np.array([[0, 1], [1, 2], [2, 3]])
build_graph(MatX, 'knn', knn_num_neighbors=2)
其中build_graph中通过knn构建得到weight矩阵:
# build a big graph (normalized weight matrix)
def build_graph(MatX, kernel_type, rbf_sigma=None, knn_num_neighbors=None):
num_samples = MatX.shape[0]
affinity_matrix = np.zeros((num_samples, num_samples), np.float32)
if kernel_type == 'rbf':
if rbf_sigma is None:
raise ValueError('You should input a sigma of rbf kernel!')
for i in xrange(num_samples):
row_sum = 0.0
for j in xrange(num_samples):
diff = MatX[i, :] - MatX[j, :]
affinity_matrix[i][j] = np.exp(sum(diff**2) / (-2.0 * rbf_sigma**2))
row_sum += affinity_matrix[i][j]
affinity_matrix[i][:] /= row_sum
elif kernel_type == 'knn':
if knn_num_neighbors is None:
raise ValueError('You should input a k of knn kernel!')
for i in xrange(num_samples):
k_neighbors = naive_knn(MatX, MatX[i, :], knn_num_neighbors)
affinity_matrix[i][k_neighbors] = 1.0 / knn_num_neighbors
else:
raise NameError('Not support kernel type! You can use knn or rbf!')
return affinity_matrix
整个构建过程3轮迭代如下:
init affinity_matrix
[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
loop: 0
k_neighbors:[0 1]
affinity_matrix:
[[ 0.5 0.5 0. ]
[ 0. 0. 0. ]
[ 0. 0. 0. ]]
loop: 1
k_neighbors:[1 0]
affinity_matrix:
[[ 0.5 0.5 0. ]
[ 0.5 0.5 0. ]
[ 0. 0. 0. ]]
loop: 2
k_neighbors:[2 1]
affinity_matrix:
[[ 0.5 0.5 0. ]
[ 0.5 0.5 0. ]
[ 0. 0.5 0.5]]
LPA迭代版本
通过节点之间的边传播label。边的权重越大,表示两个节点越相似,那么label越容易传播过去。我们定义一个NxN的概率转移矩阵P:
Pij=