可能存在错误,大家发现了请评论指正。
1 Analyzing the Wikipedia voters network [27 points]
import snap
G = snap.LoadEdgeList(snap.TNGraph, "Wiki-Vote.txt", 0, 1)
snap.PrintInfo(G, "Wiki-Vote", "result.txt", False)
result.txt:
Wiki-Vote: Directed
Nodes: 7115
Edges: 103689
Zero Deg Nodes: 0
Zero InDeg Nodes: 4734
Zero OutDeg Nodes: 1005
NonZero In-Out Deg Nodes: 1376
Unique directed edges: 103689
Unique undirected edges: 100762
Self Edges: 0
BiDir Edges: 5854
Closed triangles: 608389
Open triangles: 12720413
Frac. of closed triads: 0.045645
Connected component size: 0.993113
Strong conn. comp. size: 0.182713
Approx. full diameter: 6
90% effective diameter: 3.791225
1. The number of nodes in the network.
7115
2. The number of nodes with a self-edge (self-loop).
0
3. The number of directed edges in the network.
103689
4. The number of undirected edges in the network.
100762
5. The number of reciprocated edges in the network.
5854
6. The number of nodes of zero out-degree.
1005
7. The number of nodes of zero in-degree.
4734
k1 = 0
k2 = 0
for NI in G.Nodes():
if NI.GetOutDeg() > 10:
k1 += 1
if NI.GetInDeg() < 10:
k2 += 1
print(k1, k2)
8. The number of nodes with more than 10 outgoing edges (out-degree > 10).
1612
9. The number of nodes with fewer than 10 incoming edges (in-degree < 10).
5165
2 Further Analyzing the Wikipedia voters network [33 points]
1. (18 points) Plot the distribution of out-degrees of nodes in the network on a log-log scale. Each data point is a pair (x, y) where x is a positive integer and y is the number of nodes in the network with out-degree equal to x. Restrict the range of x between the minimum and maximum out-degrees. You may filter out data points with a 0 entry. For the log-log scale, use base 10 for both x and y axes.
snap.PlotOutDegDistr(G, "Wiki-Vote", "Wiki-Vote Out Degree")
2. (15 points) Compute and plot the least-square regression line for the out-degree distribution in the log-log scale plot. Note we want to find coefficients a and b such that the function log10 y = a · log10 x + b, equivalently, y = 10b · x a , best fits the out-degree distribution. What are the coefficients a and b? For this part, you might want to use the method called polyfit in NumPy with deg parameter equal to 1.
import math
import numpy as np
maxOutDeg = 0
for NI in G.Nodes():
if NI.GetOutDeg() > maxOutDeg:
maxOutDeg = NI.GetOutDeg()
log10x = []
y = []
for deg in range(1, maxOutDeg):
pointNo = G.CntOutDegNodes(deg)
if pointNo != 0:
log10x.append(math.log10(int(deg)))
y.append(pointNo)
print(np.polyfit(log10x, y, deg=1))
result:
[-164.99965984 355.24262157]
3 Finding Experts on the Java Programming Language on StackOveflow [40 points]
1. The number of weakly connected components in the network.
G = snap.LoadEdgeList(snap.TNGraph, "stackoverflow-Java.txt", 0, 1)
Components = G.GetWccs()
print(len(Components))
result:
10143
2. The number of edges and the number of nodes in the largest weakly connected component.
MxWcc = G.GetMxWcc()
snap.PrintInfo(MxWcc, "MxWcc", "result-MxWcc.txt", False)
result-MxWcc.txt:
MxWcc: Directed
Nodes: 131188
Edges: 322486
Zero Deg Nodes: 0
Zero InDeg Nodes: 78365
Zero OutDeg Nodes: 26008
NonZero In-Out Deg Nodes: 26815
Unique directed edges: 322486
Unique undirected edges: 322371
Self Edges: 15035
BiDir Edges: 15265
Closed triangles: 41388
Open triangles: 51596519
Frac. of closed triads: 0.000802
Connected component size: 1.000000
Strong conn. comp. size: 0.032953
Approx. full diameter: 12
90% effective diameter: 5.527031
result:
322486 131188
3. IDs of the top 3 most central nodes in the network by PagePank scores.
PRankH = G.GetPageRank()
top1, top2, top3 = [0, 0], [0, 0], [0, 0]
for item in PRankH:
if PRankH[item] > top1[1]:
top3 = top2
top2 = top1
top1 = [item, PRankH[item]]
elif PRankH[item] > top2[1]:
top3 = top2
top2 = [item, PRankH[item]]
elif PRankH[item] > top3[1]:
top3 = [item, PRankH[item]]
print(top1[0], top2[0], top3[0])
result:
992484 135152 22656
4. IDs of the top 3 hubs and top 3 authorities in the network by HITS scores.
NIdHubH, NIdAuthH = G.GetHits()
top1, top2, top3 = [0, 0], [0, 0], [0, 0]
for item in NIdHubH:
if PRankH[item] > top1[1]:
top3 = top2
top2 = top1
top1 = [item, PRankH[item]]
elif PRankH[item] > top2[1]:
top3 = top2
top2 = [item, PRankH[item]]
elif PRankH[item] > top3[1]:
top3 = [item, PRankH[item]]
print(top1[0], top2[0], top3[0])
top1, top2, top3 = [0, 0], [0, 0], [0, 0]
for item in NIdAuthH:
if PRankH[item] > top1[1]:
top3 = top2
top2 = top1
top1 = [item, PRankH[item]]
elif PRankH[item] > top2[1]:
top3 = top2
top2 = [item, PRankH[item]]
elif PRankH[item] > top3[1]:
top3 = [item, PRankH[item]]
print(top1[0], top2[0], top3[0])
result:
992484 135152 22656
992484 135152 22656