影响力最大化——CELF算法的简介与python实现

        CELF算法是Leskovecl等人利用IC模型的子模特性对爬山贪心算法进一步改进得到的优化算法。子模函数的定义为:任意函数f(·)将有限集合映射为非负实数集并且满足收益递减特性即为子模函数。设集合s ∈T,任意元素v添加到集合S中获得的边际效益大于等于添加到集合T中所获得的边际效益。

        Kempe已经对独立级联模型和线性阈值模型的影响期望值函数加以证明,得出其满足子模特性。由子模特性,把一个节点v添加到结合S时,如果集合S越小,v节点的边际影响期望值就越大。因此,对于那些在前一轮中计算的边际影响期望值小于某些点在当前轮中所计算的结果,那么这些节点就不需要重新计算边际影响期望值,根据子模特性,可以省去大量节点的重复计算,从而在时间复杂度方面有了很大的提高,效率提升了约为爬山算法的700倍而精度上基本保持一致。

        CELF的原理是,假设已知A,B,C,D 四个节点的边际影响力范围增量分别是d(A)、d(B)、d(C)、d(D),且 d(A)> d(B)> d(C)> d(D),那么根据贪心策略,会将A加入到种子集合S当中之后,然后在下一轮继续计算B,C,D的边际影响力时,如果B此时的边际影响力d(B)>d(C),那么依据子模性,d(B)必定是这一轮所有节点的边际影响力当中最大的,因为C、D等其他节点在这一轮的边际增量必定小于它们各自在上一轮的增量,即d(C)<=d(C)<d(B),d(D)<=d(D)<d'(B),因此可直接把B加入到S当中去,这样就省去了计算C,D,E...等等节点边际影响力的步骤,大大提高了效率。

import matplotlib.pyplot as plt
from random import uniform, seed
import numpy as np
import time
import networkx as nx
from igraph import *


def IC(g, S, p=0.5, mc=1000):
 
    spread = []
    for i in range(mc):

        # Simulate propagation process
        new_active, A = S[:], S[:]
        while new_active:

            # For each newly active node, find its neighbors that become activated
            new_ones = []
            for node in new_active:
                # Determine neighbors that become infected
                np.random.seed(i)
                success = np.random.uniform(0, 1, len(g.neighbors(node, mode="out"))) < p
                new_ones += list(np.extract(success, g.neighbors(node, mode="out")))

            new_active = list(set(new_ones) - set(A))

            # Add newly activated nodes to the set of activated nodes
            A += new_active

        spread.append(len(A))

    return (np.mean(spread))



def celf(g, k, p=0.1, mc=1000):
   
    start_time = time.time()
    marg_gain = [IC(g, [node], p, mc) for node in range(g.vcount())]

    # Create the sorted list of nodes and their marginal gain
    Q = sorted(zip(range(g.vcount()), marg_gain), key=lambda x: x[1], reverse=True)

    # Select the first node and remove from candidate list
    S, spread, SPREAD = [Q[0][0]], Q[0][1], [Q[0][1]]
    Q, LOOKUPS, timelapse = Q[1:], [g.vcount()], [time.time() - start_time]

    for _ in range(k - 1):

        check, node_lookup = False, 0

        while not check:
            # Count the number of times the spread is computed
            node_lookup += 1

            # Recalculate spread of top node
            current = Q[0][0]

            # Evaluate the spread function and store the marginal gain in the list
            Q[0] = (current, IC(g, S + [current], p, mc) - spread)

            # Re-sort the list
            Q = sorted(Q, key=lambda x: x[1], reverse=True)

            # Check if previous top node stayed on top after the sort
            check = (Q[0][0] == current)

        # Select the next node
        spread += Q[0][1]
        S.append(Q[0][0])
        SPREAD.append(spread)
        LOOKUPS.append(node_lookup)
        timelapse.append(time.time() - start_time)

        # Remove the selected node from the list
        Q = Q[1:]

    return (S, SPREAD, timelapse, LOOKUPS)

source = []
target = []
network_file = 'E:/DPSO_demo-master/CA-HepPh.txt'
file = open(network_file, "r", encoding="utf-8")
Vnumber, Enumber = file.readline().split()
Enumber = int(Enumber)
for i in range(0, Enumber):
    start, end = file.readline().split()
    source.append(int(start))
    target.append(int(end))

print('之前', source)
print('之前', target)
A = set(source + target)
g = Graph(directed=True)
g.add_vertices(range(100000))
g.add_edges(list(zip(source,target)))

start_time = time.time()

# Run algorithms
celf_output   = celf(g,50,p = 0.01,mc = 1000)
end_time = time.time()
runningtime1 = end_time - start_time
print("总时间:", runningtime1)
# Print results
print("celf output:   " + str(celf_output[0]))

CELF介绍与图的引用:胡怀雄. 基于独立级联模型的社交网络影响力最大化研究[D]. 深圳大学, 2018. 

  • 6
    点赞
  • 25
    收藏
    觉得还不错? 一键收藏
  • 12
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 12
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值