cs224w homework 0

可能存在错误,大家发现了请评论指正。

1 Analyzing the Wikipedia voters network [27 points]

import snap

G = snap.LoadEdgeList(snap.TNGraph, "Wiki-Vote.txt", 0, 1)
snap.PrintInfo(G, "Wiki-Vote", "result.txt", False)

result.txt: 

Wiki-Vote: Directed
  Nodes:                    7115
  Edges:                    103689
  Zero Deg Nodes:           0
  Zero InDeg Nodes:         4734
  Zero OutDeg Nodes:        1005
  NonZero In-Out Deg Nodes: 1376
  Unique directed edges:    103689
  Unique undirected edges:  100762
  Self Edges:               0
  BiDir Edges:              5854
  Closed triangles:         608389
  Open triangles:           12720413
  Frac. of closed triads:   0.045645
  Connected component size: 0.993113
  Strong conn. comp. size:  0.182713
  Approx. full diameter:    6
  90% effective diameter:  3.791225

1. The number of nodes in the network.

7115

2. The number of nodes with a self-edge (self-loop).

0

3. The number of directed edges in the network.

103689

4. The number of undirected edges in the network.

100762

5. The number of reciprocated edges in the network.

5854

6. The number of nodes of zero out-degree.

1005

7. The number of nodes of zero in-degree.

4734

k1 = 0
k2 = 0
for NI in G.Nodes():
    if NI.GetOutDeg() > 10:
        k1 += 1
    if NI.GetInDeg() < 10:
        k2 += 1
print(k1, k2)

8. The number of nodes with more than 10 outgoing edges (out-degree > 10).

1612

9. The number of nodes with fewer than 10 incoming edges (in-degree < 10).

5165

 

2 Further Analyzing the Wikipedia voters network [33 points]

1. (18 points) Plot the distribution of out-degrees of nodes in the network on a log-log scale. Each data point is a pair (x, y) where x is a positive integer and y is the number of nodes in the network with out-degree equal to x. Restrict the range of x between the minimum and maximum out-degrees. You may filter out data points with a 0 entry. For the log-log scale, use base 10 for both x and y axes.

snap.PlotOutDegDistr(G, "Wiki-Vote", "Wiki-Vote Out Degree")

2. (15 points) Compute and plot the least-square regression line for the out-degree distribution in the log-log scale plot. Note we want to find coefficients a and b such that the function log10 y = a · log10 x + b, equivalently, y = 10b · x a , best fits the out-degree distribution. What are the coefficients a and b? For this part, you might want to use the method called polyfit in NumPy with deg parameter equal to 1.

import math
import numpy as np
maxOutDeg = 0
for NI in G.Nodes():
    if NI.GetOutDeg() > maxOutDeg:
        maxOutDeg = NI.GetOutDeg()

log10x = []
y = []
for deg in range(1, maxOutDeg):
    pointNo = G.CntOutDegNodes(deg)
    if pointNo != 0:
        log10x.append(math.log10(int(deg)))
        y.append(pointNo)

print(np.polyfit(log10x, y, deg=1))

result:

[-164.99965984  355.24262157]

 

3 Finding Experts on the Java Programming Language on StackOveflow [40 points]

1. The number of weakly connected components in the network.

G = snap.LoadEdgeList(snap.TNGraph, "stackoverflow-Java.txt", 0, 1)

Components = G.GetWccs()
print(len(Components))

result:

10143

2. The number of edges and the number of nodes in the largest weakly connected component.

MxWcc = G.GetMxWcc()
snap.PrintInfo(MxWcc, "MxWcc", "result-MxWcc.txt", False)

result-MxWcc.txt:
 

MxWcc: Directed
  Nodes:                    131188
  Edges:                    322486
  Zero Deg Nodes:           0
  Zero InDeg Nodes:         78365
  Zero OutDeg Nodes:        26008
  NonZero In-Out Deg Nodes: 26815
  Unique directed edges:    322486
  Unique undirected edges:  322371
  Self Edges:               15035
  BiDir Edges:              15265
  Closed triangles:         41388
  Open triangles:           51596519
  Frac. of closed triads:   0.000802
  Connected component size: 1.000000
  Strong conn. comp. size:  0.032953
  Approx. full diameter:    12
  90% effective diameter:  5.527031

result:

322486 131188

3. IDs of the top 3 most central nodes in the network by PagePank scores.

PRankH = G.GetPageRank()
top1, top2, top3 = [0, 0], [0, 0], [0, 0]
for item in PRankH:
    if PRankH[item] > top1[1]:
        top3 = top2
        top2 = top1
        top1 = [item, PRankH[item]]
    elif PRankH[item] > top2[1]:
        top3 = top2
        top2 = [item, PRankH[item]]
    elif PRankH[item] > top3[1]:
        top3 = [item, PRankH[item]]

print(top1[0], top2[0], top3[0])

result:

992484 135152 22656

4. IDs of the top 3 hubs and top 3 authorities in the network by HITS scores.

NIdHubH, NIdAuthH = G.GetHits()
top1, top2, top3 = [0, 0], [0, 0], [0, 0]
for item in NIdHubH:
    if PRankH[item] > top1[1]:
        top3 = top2
        top2 = top1
        top1 = [item, PRankH[item]]
    elif PRankH[item] > top2[1]:
        top3 = top2
        top2 = [item, PRankH[item]]
    elif PRankH[item] > top3[1]:
        top3 = [item, PRankH[item]]
print(top1[0], top2[0], top3[0])

top1, top2, top3 = [0, 0], [0, 0], [0, 0]
for item in NIdAuthH:
    if PRankH[item] > top1[1]:
        top3 = top2
        top2 = top1
        top1 = [item, PRankH[item]]
    elif PRankH[item] > top2[1]:
        top3 = top2
        top2 = [item, PRankH[item]]
    elif PRankH[item] > top3[1]:
        top3 = [item, PRankH[item]]
print(top1[0], top2[0], top3[0])

result:

992484 135152 22656
992484 135152 22656

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值