第二章 数据科学的编程工具
人生苦短,我用Python。
Python是一种面向对象、解释型计算机程序设计语言。
特点¶
-
免费、功能强大、使用者众多
-
与R和MATLAB相比,Python是一门更易学、更严谨的程序设计语言。使用Python编写的脚本更易于理解和维护。
-
如同其它编程语言一样,Python语言的基础知识包括:类型、列表(list)和元组(tuple)、字典(dictionary)、条件、循环、异常处理等。
Python中包含了丰富的类库。¶
众多开源的科学计算软件包都提供了Python的调用接口,例如著名的计算机视觉库OpenCV。 Python本身的科学计算类库发展也十分完善,例如NumPy、SciPy和matplotlib等。 就社会网络分析而言,igraph, networkx, graph-tool, Snap.py等类库提供了丰富的网络分析工具。
第三方包可以使用pip install的方法安装。
pip install flownetwork
from flownetwork import flownetwork as fn
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
print(fn.__version__)
$version = py3.0.1$
help(fn.constructFlowNetwork)
constructFlowNetwork(C)
C is an array of two dimentions, e.g.,
C = np.array([[user1, item1],
[user1, item2],
[user2, item1],
[user2, item3]])
Return a balanced flow network
# constructing a flow network
demo = fn.attention_data
gd = fn.constructFlowNetwork(demo)
# drawing a demo network
fig = plt.figure(figsize=(12, 8), facecolor='white')
pos = {0: np.array([0.2, 0.8]),
2: np.array([0.2, 0.2]),
1: np.array([0.4, 0.6]),
6: np.array([0.4, 0.4]),
4: np.array([0.7, 0.8]),
5: np.array([0.7, 0.5]),
3: np.array([0.7, 0.2]),
'sink': np.array([1, 0.5]),
'source': np.array([0, 0.5])}
width = [float(d['weight']*1.2) for (u, v, d) in gd.edges(data=True)]
edge_labels = dict([((u, v, ), d['weight']) for u, v, d in gd.edges(data=True)])
nx.draw_networkx_edge_labels(gd, pos, edge_labels=edge_labels, font_size=15, alpha=.5)
nx.draw(gd, pos, node_size=3000, node_color='orange', alpha=0.2, width=width, edge_color='orange', style='solid')
nx.draw_networkx_labels(gd, pos, font_size=18)
plt.show()
print(nx.info(gd))
DiGraph with 9 nodes and 15 edges
# flow matrix
m = fn.getFlowMatrix(gd)
print(m)
[[0. 5. 1. 0. 2. 1. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 3. 1.]
[0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 3. 0. 0. 0.]
[0. 0. 0. 2. 0. 0. 2. 0. 0.]
[0. 0. 0. 2. 0. 0. 0. 0. 0.]
[0. 0. 0. 2. 0. 0. 0. 0. 1.]
[0. 0. 0. 2. 0. 0. 0. 0. 0.]]