标签传播算法

相似矩阵构建

为所有的数据构建一个图,图的节点就是一个数据点,包含labeled和unlabeled的数据。节点i和节点j的边表示他们的相似度。节点i和节点j的边权重为:

wij=exp(||xixj||2α2)

α是超参。

图构建方法knn

只保留每个节点的k近邻权重,其他的为0,也就是不存在边,因此是稀疏的相似矩阵。

# return k neighbors index
def naive_knn(data_set, query, k):
    """
    @:param data_set 数据集
    @:param query 给定基准数据
    @:param k k个离基准数据的最近邻
    带入:
    data_set = np.array([[0, 1], [1, 2], [2, 3]])
    query = data[0, :] 即[0, 1]
    k = 2
    """
    num_samples = data_set.shape[0]  # matrix's row size
    # 带入得num_samples=3

    # step 1: calculate Euclidean distance
    diff = np.tile(query, (num_samples, 1)) - data_set
    # 带入得
    # np.tile(query, (num_samples, 1))=[[0 1]
    #  [0 1]
    #  [0 1]]
    # diff=[[0 0]
    #  [-1 -1]
    #  [-2 -2]]

    squared_diff = diff ** 2
    # 带入得 squared_diff=[[0 0]
    # [1 1]
    # [4 4]]

    squared_dist = np.sum(squared_diff, axis=1)  # sum is performed by row
    # 带入得 squared_dist=[0 2 8]

    # step 2: sort the distance
    sorted_dist_indices = np.argsort(squared_dist)
    # 带入得 sorted_dist_indices=[0 1 2]

    if k > len(sorted_dist_indices):
        k = len(sorted_dist_indices)

    return sorted_dist_indices[0:k]
    # 带入得 sorted_dist_indices[0:k]=[0 1]

构建图

MatX = np.array([[0, 1], [1, 2], [2, 3]])
build_graph(MatX, 'knn', knn_num_neighbors=2)

其中build_graph中通过knn构建得到weight矩阵:

# build a big graph (normalized weight matrix)
def build_graph(MatX, kernel_type, rbf_sigma=None, knn_num_neighbors=None):
    num_samples = MatX.shape[0]
    affinity_matrix = np.zeros((num_samples, num_samples), np.float32)
    if kernel_type == 'rbf':
        if rbf_sigma is None:
            raise ValueError('You should input a sigma of rbf kernel!')
        for i in xrange(num_samples):
            row_sum = 0.0
            for j in xrange(num_samples):
                diff = MatX[i, :] - MatX[j, :]
                affinity_matrix[i][j] = np.exp(sum(diff**2) / (-2.0 * rbf_sigma**2))
                row_sum += affinity_matrix[i][j]
            affinity_matrix[i][:] /= row_sum
    elif kernel_type == 'knn':
        if knn_num_neighbors is None:
            raise ValueError('You should input a k of knn kernel!')
        for i in xrange(num_samples):
            k_neighbors = naive_knn(MatX, MatX[i, :], knn_num_neighbors)
            affinity_matrix[i][k_neighbors] = 1.0 / knn_num_neighbors
    else:
        raise NameError('Not support kernel type! You can use knn or rbf!')

    return affinity_matrix

整个构建过程3轮迭代如下:

init affinity_matrix
[[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]
loop: 0
k_neighbors:[0 1]
affinity_matrix:
[[ 0.5  0.5  0. ]
 [ 0.   0.   0. ]
 [ 0.   0.   0. ]]
loop: 1
k_neighbors:[1 0]
affinity_matrix:
[[ 0.5  0.5  0. ]
 [ 0.5  0.5  0. ]
 [ 0.   0.   0. ]]
loop: 2
k_neighbors:[2 1]
affinity_matrix:
[[ 0.5  0.5  0. ]
 [ 0.5  0.5  0. ]
 [ 0.   0.5  0.5]]

LPA迭代版本

通过节点之间的边传播label。边的权重越大,表示两个节点越相似,那么label越容易传播过去。我们定义一个NxN的概率转移矩阵P:

Pij=

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值