Neo4J 图算法

关系数据库对于多对多关系建模并不是那么合适。而反过来,Neo4j则擅长多对多关系的处理,让我们来看看它如何使用相同的数据集,但是把用户建模为节点,而不是表,列或者外键,然后将朋友关系建模为图中的边。
 
图遍历是通过在互相连接的两个节点之间移动访问图中的一组节点的操作。这是图数据库中进行数据检索的基本操作。遍历的一个关键概念是这个操作仅仅是局部相关的, 遍历查询时只需要考虑所需的数据,无需像关系型数据库的join一样,在整个数据集上执行代价极高的分组操作
 
 
第四章  路径查找和图搜索算法 47
 
LOAD CSV WITH HEADERS FROM 'file:///transport-nodes.csv' AS row 
MERGE (place:Place {id:row.id}) 
SET place.latitude = toFloat(row.latitude), 
place.longitude = toFloat(row.latitude), 
place.population = toInteger(row.population);
 
 
LOAD CSV WITH HEADERS FROM  'file:///transport-relationships.csv'  AS row 
MATCH (origin:Place {id: row.src}) 
MATCH (destination:Place {id: row.dst}) 
MERGE (origin)-[:EROAD {distance: toInteger(row.cost)}]->(destination);
 
MATCH (source:Place {id: "Amsterdam"}), 
(destination:Place {id: "London"}) 
CALL algo.shortestPath.stream(source, destination, null) 
YIELD nodeId, cost 
RETURN algo.getNodeById(nodeId).id AS place, cost
 
MATCH (source:Place {id: "Amsterdam"}), 
(destination:Place {id: "London"}) 
CALL algo.shortestPath(source, destination, null) 
YIELD  writeMillis,loadMillis,nodeCount, totalCost
RETURN  writeMillis,loadMillis,nodeCount, totalCost
 
下图使用不同的最短路径 ,结果大不一样(没法上传):
 
 
无权重最短路径,会选择访问节点数最少的路线,在地
铁系统等需要较少站点的情况下可能非常有用。然而,在驾驶场景中,我们可能更感兴趣
的是使用最短权重路径的总成本。
700 多公里
MATCH (source:Place {id: "Amsterdam"}),
(destination:Place {id: "London"})
CALL algo.shortestPath.stream(source, destination, null)
YIELD nodeId, cost
WITH collect(algo.getNodeById(nodeId)) AS path
UNWIND range(0, size(path)-1) AS index
WITH path[index] AS current, path[index+1] AS next
WITH current, next, [(current)-[r:EROAD]-(next) | r.distance][0] AS distance
WITH collect({current: current, next:next, distance: distance}) AS stops
UNWIND range(0, size(stops)-1) AS index
WITH stops[index] AS location, stops, index
RETURN location.current.id AS place,
reduce(acc=0.0,
distance in [stop in stops[0..index] | stop.distance] |
acc + distance) AS cost; 
 
带权重的453公里
MATCH (source:Place {id: "Amsterdam"}), 
(destination:Place {id: "London"}) 
CALL algo.shortestPath.stream(source, destination, "distance") 
YIELD nodeId, cost 
RETURN algo.getNodeById(nodeId).id AS place, cost
 
MATCH (source:Place {id: "Amsterdam"}),
(destination:Place {id: "London"})
CALL algo.shortestPath.stream(source, destination, null)
YIELD nodeId, cost
RETURN algo.getNodeById(nodeId).id AS place, cost
此查询返回以下输出:
+--------------------+
| place | cost |
+--------------------+
| "Amsterdam" | 0.0 |
| "Immingham" | 1.0 |
| "Doncaster" | 2.0 |
| "London" | 3.0 |
+--------------------+ 
 
A*
MATCH (source:Place {id: "Den Haag"}),
(destination:Place {id: "London"})
CALL algo.shortestPath.astar.stream(source,
destination, "distance", "latitude", "longitude")
YIELD nodeId, cost
RETURN algo.getNodeById(nodeId).id AS place, cost 
 
k最短
MATCH (start:Place {id:"Gouda"}),
(end:Place {id:"Felixstowe"})
CALL algo.kShortestPaths.stream(start, end, 5, "distance")
YIELD index, nodeIds, path, costs
RETURN index,
[node in algo.getNodesById(nodeIds[1..-1]) | node.id] AS via,
reduce(acc=0.0, cost in costs | acc + cost) AS totalCost 
 
CALL algo.allShortestPaths.stream(null)
YIELD sourceNodeId, targetNodeId, distance 
WHERE sourceNodeId < targetNodeId 
RETURN algo.getNodeById(sourceNodeId).id AS source, 
algo.getNodeById(targetNodeId).id AS target, 
distance 
ORDER BY distance DESC 
LIMIT 10
 
CALL algo.allShortestPaths.stream("distance") 
YIELD sourceNodeId, targetNodeId, distance 
WHERE sourceNodeId < targetNodeId 
RETURN algo.getNodeById(sourceNodeId).id AS source, 
algo.getNodeById(targetNodeId).id AS target, 
distance 
ORDER BY distance DESC 
LIMIT 10
 
单源最短路径
MATCH (n:Place {id:"London"})
CALL algo.shortestPath.deltaStepping.stream(n, "distance", 1.0)
YIELD nodeId, distance
WHERE algo.isFinite(distance)
RETURN algo.getNodeById(nodeId).id AS destination, distance
ORDER BY distance 
 
让我们看看MST算法的作用。以下查询查找从A瀀瀆瀇濸瀅濷濴瀀开始的生成树:
MATCH (n:Place {id:"Amsterdam"}) 
CALL algo.spanningTree.minimum("Place", "EROAD", "distance", id(n), {write:true, writeProperty:"mst"}) 
YIELD loadMillis, computeMillis, writeMillis, effectiveNodeCount 
RETURN loadMillis, computeMillis, writeMillis, effectiveNodeCount
 
查看最小生成树, neo4j有问题
MATCH path = (n:Place {id:"Amsterdam"})-[:MINST*]-()
WITH relationships(path) AS rels
UNWIND rels AS rel
WITH DISTINCT rel AS rel
RETURN startNode(rel).id AS source, endNode(rel).id AS destination,
rel.distance AS cost 
 
第五章 中心性算法 83
CALL algo.degree(label:String, relationship:String,{write: true, writeProperty:'degree', concurrency:4})
YIELD nodes, loadMillis, computeMillis, writeMillis, write, writeProperty
 
call algo.degree.stream('User','FOLLOWS',{direction:'incoming'})
yield nodeId,score
return algo.getNodeById(nodeId).id as name,score
order by score desc
带权重的
call algo.degree.stream('User','FOLLOWS',{direction:'incoming',weightproperty:'xxxx'})
yield nodeId,score
return algo.getNodeById(nodeID).id as name,score
order by score desc
 
 
 
导入数据
LOAD CSV WITH HEADERS FROM 'file:///social-nodes.csv' AS row 
MERGE (:User {id: row.id})
 
LOAD CSV WITH HEADERS FROM 'file:///social-relationships.csv' AS row 
MATCH (source:User {id: row.src}) 
MATCH (destination:User {id: row.dst}) 
MERGE (source)-[:FOLLOWS]->(destination)
 
紧密中心性
CALL algo.closeness.stream("User", "FOLLOWS")
YIELD nodeId, centrality
RETURN algo.getNodeById(nodeId).id, centrality
ORDER BY centrality DESC
以下查询使用W濴瀆瀆濸瀅瀀濴瀁 & F濴瀈瀆瀇紧密中心性算法: 更有效
CALL algo.closeness.stream("User", "FOLLOWS", {improved: true})
YIELD nodeId, centrality
RETURN algo.getNodeById(nodeId).id AS user, centrality
ORDER BY centrality DESC
 
和谐中心性
CALL algo.closeness.harmonic.stream("User", "FOLLOWS")
YIELD nodeId, centrality
RETURN algo.getNodeById(nodeId).id AS user, centrality
ORDER BY centrality DESC 
 
中介中心性
CALL algo.betweenness.stream("User", "FOLLOWS")
YIELD nodeId, centrality
RETURN algo.getNodeById(nodeId).id AS user, centrality
ORDER BY centrality DESC
 
WITH ["James", "Michael", "Alice", "Doug", "Amy"] AS existingUsers
MATCH (existing:User) WHERE existing.id IN existingUsers
MERGE (newUser:User {id: "Jason"})
MERGE (newUser)<-[:FOLLOWS]-(existing)
MERGE (newUser)-[:FOLLOWS]->(existing)
 
 
MATCH (user:User {id: "Jason"}) DETACH DELETE user
 
中介中心性变体:RA-BRANDS
更高效率的中介中心性:RA-BRANDS 
 
 CALL algo.betweenness.sampled.stream("User", "FOLLOWS", {strategy:"degree"})
YIELD nodeId, centrality
RETURN algo.getNodeById(nodeId).id AS user, centrality
ORDER BY centrality DESC 
 
 
PageRank
 
CALL algo.pageRank.stream('User', 'FOLLOWS', {iterations:20, dampingFactor:0.85})
YIELD nodeId, score
RETURN algo.getNodeById(nodeId).id AS page, score
ORDER BY score DESC 
 
articleRank
 
第六章 社区检测算法 115
LOAD CSV WITH HEADERS FROM 'file:///sw-nodes.csv' AS row 
MERGE (:Library {id: row.id})
 
LOAD CSV WITH HEADERS FROM 'file:///sw-relationships.csv' AS row 
MATCH (source:Library {id: row.src}) 
MATCH (destination:Library {id: row.dst}) 
MERGE (source)-[:DEPENDS_ON]->(destination)
 
三角形计数
CALL algo.triangle.stream("Library","DEPENDS_ON") 
YIELD nodeA, nodeB, nodeC 
RETURN algo.getNodeById(nodeA).id AS nodeA, 
algo.getNodeById(nodeB).id AS nodeB, 
algo.getNodeById(nodeC).id AS nodeC
 
局部聚类系数
CALL algo.triangleCount.stream('Library', 'DEPENDS_ON') YIELD nodeId, triangles, coefficient WHERE coefficient > 0 RETURN algo.getNodeById(nodeId).id AS library, coefficient ORDER BY coefficient DESC
 
强连接组件(确定性算法)
CALL algo.scc.stream("Library", "DEPENDS_ON") YIELD nodeId, partition RETURN partition, collect(algo.getNodeById(nodeId)) AS libraries ORDER BY size(libraries) DESC
 
连接组件算法(确定性算法)
连接组件算法(C
在无向图中查找连接节点集,其中每个节点都可以从同一集中的任何其他节点访
问。它不同于SCC算法,因为它只需要在一个方向上的节点对之间存在路径,而
SCC需要在两个方向上都存在路径。B
论文“A
   使用场景:
与SCC一样,连接的组件通常在分析的早期用于理解图的结构。因为它可以
有效地伸缩,所以考虑使用此算法来处理需要频繁更新的图。它可以快速显示组之
间的新节点,这对于欺诈检测等分析非常有用。
建立起来这样的习惯:事先运行连接组件来测试图是否连接,作为一般图分
析的准备步骤。执行这个快速测试可以避免在图的一个孤岛组件上运行算法,那最
终会得到不正确的结果。
CALL algo.unionFind.stream("Library", "DEPENDS_ON")
YIELD nodeId,setId
RETURN setId, collect(algo.getNodeById(nodeId)) AS libraries
ORDER BY size(libraries) DESC
 
 
标签传播算法(LPA)
使用场景:
  在大型网络中使用标签传播进行初始社区检测,特别是在有权重的情况下。
该算法可以并行化,因此在图划分方面速度非常快。
CALL algo.labelPropagation.stream("Library", "DEPENDS_ON",
{ iterations: 10 })
YIELD nodeId, label
RETURN label,
collect(algo.getNodeById(nodeId).id) AS libraries
ORDER BY size(libraries) DESC 
我们也可以在假设图是无向的情况下运行该算法,这意味着节点将尝试采用
它们所依赖的库以及依赖这些库的库中的标签
CALL algo.labelPropagation.stream("Library", "DEPENDS_ON",
{ iterations: 10, direction: "BOTH" })
YIELD nodeId, label
RETURN label,
collect(algo.getNodeById(nodeId).id) AS libraries
ORDER BY size(libraries) DESC 
 
Louvain模块化()
CALL algo.louvain.stream("Library", "DEPENDS_ON") YIELD nodeId, communities RETURN algo.getNodeById(nodeId).id AS libraries, communities
Neo4J运行失败
流版本:
CALL algo.louvain.stream("Library", "DEPENDS_ON")
YIELD nodeId, communities
WITH algo.getNodeById(nodeId) AS node, communities
SET node.communities= communities 
 
找到最终社区:
MATCH (l:Library) RETURN l.communities[-1] AS community, collect(l.id) AS libraries ORDER BY size(libraries) DESC
 
看中间聚类
MATCH (l:Library) RETURN l.communities[0] AS community, collect(l.id) AS libraries ORDER BY size(libraries) DESC
 
 
社区发现算法
 
 

neo4j为图数据库,其中涉及的也就为图算法,图算法被用来度量图形,节点及关系。

在neo4j中,通过call algo.list() 可查看neo4j中的算法列表。

在neo4j官方文档中,主要记录如下各种方法:

一.   中心性算法(Centrality algorithms)
      中心度算法主要用来判断一个图中不同节点的重要性:
       (1)PageRank(页面排名算法,algo.pageRank)
       (2)ArticleRank(文档排名算法,algo.articleRank)
       (3)Betweenness Centrality(中介中心性,algo.betweenness)
       (4)Closeness Centrality(紧密中心性,algo.closeness)
       (5)Harmonic Centrality(谐波中心性(这个翻译一直拿不准),algo.closseness)
       (6)Eigenvecor Centrality (特征向量中心性,algo.eigenvector)
       (7)Degree Centrality(度中心性,algo.degree)
       
二.   社区发现算法(Community detection algorithms)
      评估一个群体是如何聚集或划分的,以及其增强或分裂的趋势:
        (1)Louvain(鲁汶算法,algo.louvain)
        (2)Label Propagation(标签传播算法,algo.labelPropagagtion)
        (3)Connected Components(连通组件算法,algo.unionFind)
        (4)Strongly Connected Compontents(强连通组件,algo.scc)
        (5)Triangle Counting/Clustering Coefficient(三角计数/聚类系数,algo.triangleCount)
        (6) Balanced Triads(平衡三角算法,algo.balancedTriads)
        
三.    路径寻找算法(Path Finding algorithms)     
      用于找到最短路径,或者评估路径的可用性和质量:
        (1)Minimum Weight Spanning Tree(最小权重生成树,algo.mst)
        (2)Shortest Path(最短路径,algo.shortestPath)
        (3)Single Source Shortesr Path(单源最短路径,algo.shortestPath.deltastepping)
        (4)All Pairs Shortest Path (全节点对最短路径,algo.allShortestPath)
        (5)A* (A star, algo.shortestPath.astar)
        (6)Yen’s K-shortest paths(Yen k最短路径,algo.kShortestPaths)
        (7)Random Walk(随机路径,algo.randomWalk)

四.  相似度算法(Similarity algorithms)
      用于计算节点间的相似度:
        (1)Jaccard Similarity(Jaccard相似度,algo.similarity.jaccard)
        (2)Consine Similarity(余弦相似度,algo.similarity.consine)
        (3)Pearson Similarity(Pearson相似度,algo.similarity.pearson)
        (4)Euclidean Distance(欧式距离,algo.similarity.euclidean)
        (5)Overlap Similarity(重叠相似度,algo.similarity.overlap)

五.   链接预测算法(Link Prediction algorithms)
      下面算法有助于确定一对节点的紧密程度。然后,我们将使用计算的分数作为链接预测解决方案的一部分:
        (1)Adamic Adar(algo.linkprediction.adamicAdar)
        (2)Common Neighbors(相同邻居,algo.linkprediction.commonNeighbors)
        (3)Preferential Attachment(择优连接,algo.linkprediction.preferentialAttachment)
        (4)Resource Allocation(资源分配,algo.linkprediction.resourceAllocation)
        (5)Same Community(相同社区,algo.linkprediction.sameCommunity)
        (6)Total Neighbors(总邻居,algo.linkprediction.totalNeighbors)

六   预处理算法(Preprocessing functions and procedures)
        数据处理过程:
         (1)One Hot Encoding (algo.ml.oneHotEncoding)

相似度计算

RETURN algo.similarity.cosine([3,8,7,5,2,9], [10,8,6,6,4,5]) AS similarity

 

  • 3
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值