CELF算法是Leskovecl等人利用IC模型的子模特性对爬山贪心算法进一步改进得到的优化算法。子模函数的定义为:任意函数f(·)将有限集合映射为非负实数集并且满足收益递减特性即为子模函数。设集合s ∈T,任意元素v添加到集合S中获得的边际效益大于等于添加到集合T中所获得的边际效益。
Kempe已经对独立级联模型和线性阈值模型的影响期望值函数加以证明,得出其满足子模特性。由子模特性,把一个节点v添加到结合S时,如果集合S越小,v节点的边际影响期望值就越大。因此,对于那些在前一轮中计算的边际影响期望值小于某些点在当前轮中所计算的结果,那么这些节点就不需要重新计算边际影响期望值,根据子模特性,可以省去大量节点的重复计算,从而在时间复杂度方面有了很大的提高,效率提升了约为爬山算法的700倍而精度上基本保持一致。
CELF的原理是,假设已知A,B,C,D 四个节点的边际影响力范围增量分别是d(A)、d(B)、d(C)、d(D),且 d(A)> d(B)> d(C)> d(D),那么根据贪心策略,会将A加入到种子集合S当中之后,然后在下一轮继续计算B,C,D的边际影响力时,如果B此时的边际影响力d(B)>d(C),那么依据子模性,d(B)必定是这一轮所有节点的边际影响力当中最大的,因为C、D等其他节点在这一轮的边际增量必定小于它们各自在上一轮的增量,即d(C)<=d(C)<d(B),d(D)<=d(D)<d'(B),因此可直接把B加入到S当中去,这样就省去了计算C,D,E...等等节点边际影响力的步骤,大大提高了效率。
import matplotlib.pyplot as plt
from random import uniform, seed
import numpy as np
import time
import networkx as nx
from igraph import *
def IC(g, S, p=0.5, mc=1000):
spread = []
for i in range(mc):
# Simulate propagation process
new_active, A = S[:], S[:]
while new_active:
# For each newly active node, find its neighbors that become activated
new_ones = []
for node in new_active:
# Determine neighbors that become infected
np.random.seed(i)
success = np.random.uniform(0, 1, len(g.neighbors(node, mode="out"))) < p
new_ones += list(np.extract(success, g.neighbors(node, mode="out")))
new_active = list(set(new_ones) - set(A))
# Add newly activated nodes to the set of activated nodes
A += new_active
spread.append(len(A))
return (np.mean(spread))
def celf(g, k, p=0.1, mc=1000):
start_time = time.time()
marg_gain = [IC(g, [node], p, mc) for node in range(g.vcount())]
# Create the sorted list of nodes and their marginal gain
Q = sorted(zip(range(g.vcount()), marg_gain), key=lambda x: x[1], reverse=True)
# Select the first node and remove from candidate list
S, spread, SPREAD = [Q[0][0]], Q[0][1], [Q[0][1]]
Q, LOOKUPS, timelapse = Q[1:], [g.vcount()], [time.time() - start_time]
for _ in range(k - 1):
check, node_lookup = False, 0
while not check:
# Count the number of times the spread is computed
node_lookup += 1
# Recalculate spread of top node
current = Q[0][0]
# Evaluate the spread function and store the marginal gain in the list
Q[0] = (current, IC(g, S + [current], p, mc) - spread)
# Re-sort the list
Q = sorted(Q, key=lambda x: x[1], reverse=True)
# Check if previous top node stayed on top after the sort
check = (Q[0][0] == current)
# Select the next node
spread += Q[0][1]
S.append(Q[0][0])
SPREAD.append(spread)
LOOKUPS.append(node_lookup)
timelapse.append(time.time() - start_time)
# Remove the selected node from the list
Q = Q[1:]
return (S, SPREAD, timelapse, LOOKUPS)
source = []
target = []
network_file = 'E:/DPSO_demo-master/CA-HepPh.txt'
file = open(network_file, "r", encoding="utf-8")
Vnumber, Enumber = file.readline().split()
Enumber = int(Enumber)
for i in range(0, Enumber):
start, end = file.readline().split()
source.append(int(start))
target.append(int(end))
print('之前', source)
print('之前', target)
A = set(source + target)
g = Graph(directed=True)
g.add_vertices(range(100000))
g.add_edges(list(zip(source,target)))
start_time = time.time()
# Run algorithms
celf_output = celf(g,50,p = 0.01,mc = 1000)
end_time = time.time()
runningtime1 = end_time - start_time
print("总时间:", runningtime1)
# Print results
print("celf output: " + str(celf_output[0]))
CELF介绍与图的引用:胡怀雄. 基于独立级联模型的社交网络影响力最大化研究[D]. 深圳大学, 2018.