python pymongo+networkx 实现mongo数据血缘关系可视化

数据血缘通常是指数据产生的链路,其采集主要通过自动解析(存储过程、SQL、ETL过程等文件)结合人工收集的方式实现。本文不涉及数据血缘如何获取,只对如何通过python操作mongodb并可视化数据血缘关系提供一些思路。

首先通过pymongo连接本地数据库,并插入测试数据

import pymongo
myclient = pymongo.MongoClient("mongodb://127.0.0.1:27017/")
mydb = myclient["world"]
mycol = mydb["areas"]
mylist = [
{ "_id" : 1, "name" : "Asia" },
{ "_id" : 2, "name" : "China", "belongsto" : "Asia" },
{ "_id" : 3, "name" : "ZheJiang", "belongsto" : "China" },
{ "_id" : 4, "name" : "HangZhou", "belongsto" : "ZheJiang" },
{ "_id" : 5, "name" : "NingBo", "belongsto" : "ZheJiang" },
{ "_id" : 6, "name" : "Xihu", "belongsto" : "HangZhou" }
]
x = mycol.insert_many(mylist)
for x in mycol.find():
  print(x)

输出结果: 

{'_id': 1, 'name': 'Asia'}
{'_id': 2, 'name': 'China', 'belongsto': 'Asia'}
{'_id': 3, 'name': 'ZheJiang', 'belongsto': 'China'}
{'_id': 4, 'name': 'HangZhou', 'belongsto': 'ZheJiang'}
{'_id': 5, 'name': 'NingBo', 'belongsto': 'ZheJiang'}
{'_id': 6, 'name': 'Xihu', 'belongsto': 'HangZhou'}

递归查询name='Xihu'这个节点的父节点

pipeline = [{'$graphLookup': 
{
          'from': "areas",
          'startWith': "$belongsto",
          'connectFromField': "belongsto",
          'connectToField': "name",
          'as': "belongHierarchy"
       }},{'$match': {'name' : 'Xihu'}}]
for doc in (mycol.aggregate(pipeline)):
    print (doc)

输出结果: 

{'_id': 6, 'name': 'Xihu', 'belongsto': 'HangZhou','reportingHierarchy':

[{'_id': 1, 'name': 'Asia'},

{'_id': 2, 'name': 'China', 'belongsto': 'Asia'},

{'_id': 3, 'name': 'ZheJiang', 'belongsto': 'China'},

{'_id': 4, 'name': 'HangZhou', 'belongsto': 'ZheJiang'}]}

解析输出结果并可视化展示节点关系

import networkx as nx
import matplotlib.pyplot as plt  
rs = list(mycol.aggregate(pipeline))
def get_relation(rs):
    G = nx.DiGraph()
    for node in rs:
        try:
            G.add_edge(node['name'], node['belongsto'])
            for item in node['belongHierarchy']:
                if 'belongsto' in item.keys():
                    G.add_edge(item['name'], item['belongsto'])
                else:
                    pass
        except:
            pass
    return G
G = get_relation(rs)
nx.draw(G, with_labels=True, font_weight='bold')
plt.show()

5036fef2b51fb4eea2b268ba2be1acf99ce.jpg

展示area这个collection中所有的节点

pipeline = [{'$graphLookup': 
{
          'from': "areas",
          'startWith': "$belongsto",
          'connectFromField': "belongsto",
          'connectToField': "name",
          'as': "belongHierarchy"
       }}]
rs = list(mycol.aggregate(pipeline))
def get_relation(rs):
    G = nx.DiGraph()
    for node in rs:
        try:
            G.add_edge(node['name'], node['belongsto'])
            for item in node['belongHierarchy']:
                if 'belongsto' in item.keys():
                    G.add_edge(item['name'], item['belongsto'])
                else:
                    pass
        except:
            pass
    return G
G = get_relation(rs)
nx.draw(G, with_labels=True, font_weight='bold')
plt.show()

67ce92147e73656dc1b2191647bb6a1e3bf.jpg

转载于:https://my.oschina.net/aubao/blog/3035977

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值