Neo4J 图算法

最新推荐文章于 2024-05-29 07:26:43 发布

哈哈和呵呵

最新推荐文章于 2024-05-29 07:26:43 发布

阅读量1.4k

点赞数 3

分类专栏： Neo4J

本文链接：https://blog.csdn.net/wangbaosongmsn/article/details/107851350

版权

Neo4J 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

关系数据库对于多对多关系建模并不是那么合适。而反过来，Neo4j则擅长多对多关系的处理，让我们来看看它如何使用相同的数据集，但是把用户建模为节点，而不是表，列或者外键，然后将朋友关系建模为图中的边。

图遍历是通过在互相连接的两个节点之间移动来访问图中的一组节点的操作。这是图数据库中进行数据检索的基本操作。遍历的一个关键概念是这个操作仅仅是局部相关的，遍历查询时只需要考虑所需的数据，无需像关系型数据库的join一样，在整个数据集上执行代价极高的分组操作

https://neo4j.com/docs/graph-algorithms/current/labs-algorithms/

第四章 路径查找和图搜索算法 47

LOAD CSV WITH HEADERS FROM 'file:///transport-nodes.csv' AS row

MERGE (place:Place {id:row.id})

SET place.latitude = toFloat(row.latitude),

place.longitude = toFloat(row.latitude),

place.population = toInteger(row.population);

LOAD CSV WITH HEADERS FROM 'file:///transport-relationships.csv' AS row

MATCH (origin:Place {id: row.src})

MATCH (destination:Place {id: row.dst})

MERGE (origin)-[:EROAD {distance: toInteger(row.cost)}]->(destination);

MATCH (source:Place {id: "Amsterdam"}),

(destination:Place {id: "London"})

CALL algo.shortestPath.stream(source, destination, null)

YIELD nodeId, cost

RETURN algo.getNodeById(nodeId).id AS place, cost

MATCH (source:Place {id: "Amsterdam"}),

(destination:Place {id: "London"})

CALL algo.shortestPath(source, destination, null)

YIELD writeMillis,loadMillis,nodeCount, totalCost

RETURN writeMillis,loadMillis,nodeCount, totalCost

下图使用不同的最短路径，结果大不一样(没法上传)：

无权重最短路径，会选择访问节点数最少的路线，在地

铁系统等需要较少站点的情况下可能非常有用。然而，在驾驶场景中，我们可能更感兴趣

的是使用最短权重路径的总成本。

700 多公里

MATCH (source:Place {id: "Amsterdam"}),

(destination:Place {id: "London"})

CALL algo.shortestPath.stream(source, destination, null)

YIELD nodeId, cost

WITH collect(algo.getNodeById(nodeId)) AS path

UNWIND range(0, size(path)-1) AS index

WITH path[index] AS current, path[index+1] AS next

WITH current, next, [(current)-[r:EROAD]-(next) | r.distance][0] AS distance

WITH collect({current: current, next:next, distance: distance}) AS stops

UNWIND range(0, size(stops)-1) AS index

WITH stops[index] AS location, stops, index

RETURN location.current.id AS place,

reduce(acc=0.0,

distance in [stop in stops[0..index] | stop.distance] |

acc + distance) AS cost;

带权重的453公里

MATCH (source:Place {id: "Amsterdam"}),

(destination:Place {id: "London"})

CALL algo.shortestPath.stream(source, destination, "distance")

YIELD nodeId, cost

RETURN algo.getNodeById(nodeId).id AS place, cost

MATCH (source:Place {id: "Amsterdam"}),

(destination:Place {id: "London"})

CALL algo.shortestPath.stream(source, destination, null)

YIELD nodeId, cost

RETURN algo.getNodeById(nodeId).id AS place, cost

此查询返回以下输出：

+--------------------+

| place | cost |

+--------------------+

| "Amsterdam" | 0.0 |

| "Immingham" | 1.0 |

| "Doncaster" | 2.0 |

| "London" | 3.0 |

+--------------------+

MATCH (source:Place {id: "Den Haag"}),

(destination:Place {id: "London"})

CALL algo.shortestPath.astar.stream(source,

destination, "distance", "latitude", "longitude")

YIELD nodeId, cost

RETURN algo.getNodeById(nodeId).id AS place, cost

k最短

MATCH (start:Place {id:"Gouda"}),

(end:Place {id:"Felixstowe"})

CALL algo.kShortestPaths.stream(start, end, 5, "distance")

YIELD index, nodeIds, path, costs

RETURN index,

[node in algo.getNodesById(nodeIds[1..-1]) | node.id] AS via,

reduce(acc=0.0, cost in costs | acc + cost) AS totalCost

CALL algo.allShortestPaths.stream(null)

YIELD sourceNodeId, targetNodeId, distance

WHERE sourceNodeId < targetNodeId

RETURN algo.getNodeById(sourceNodeId).id AS source,

algo.getNodeById(targetNodeId).id AS target,

distance

ORDER BY distance DESC

LIMIT 10

CALL algo.allShortestPaths.stream("distance")

YIELD sourceNodeId, targetNodeId, distance

WHERE sourceNodeId < targetNodeId

RETURN algo.getNodeById(sourceNodeId).id AS source,

algo.getNodeById(targetNodeId).id AS target,

distance

ORDER BY distance DESC

LIMIT 10

单源最短路径

MATCH (n:Place {id:"London"})

CALL algo.shortestPath.deltaStepping.stream(n, "distance", 1.0)

YIELD nodeId, distance

WHERE algo.isFinite(distance)

RETURN algo.getNodeById(nodeId).id AS destination, distance

ORDER BY distance

让我们看看MST算法的作用。以下查询查找从A瀀瀆瀇濸瀅濷濴瀀开始的生成树：

MATCH (n:Place {id:"Amsterdam"})

CALL algo.spanningTree.minimum("Place", "EROAD", "distance", id(n), {write:true, writeProperty:"mst"})

YIELD loadMillis, computeMillis, writeMillis, effectiveNodeCount

RETURN loadMillis, computeMillis, writeMillis, effectiveNodeCount

查看最小生成树， neo4j有问题

MATCH path = (n:Place {id:"Amsterdam"})-[:MINST*]-()

WITH relationships(path) AS rels

UNWIND rels AS rel

WITH DISTINCT rel AS rel

RETURN startNode(rel).id AS source, endNode(rel).id AS destination,

rel.distance AS cost

第五章中心性算法 83

度中心性 https://blog.csdn.net/name__student/article/details/90017910

CALL algo.degree(label:String, relationship:String,{write: true, writeProperty:'degree', concurrency:4})

YIELD nodes, loadMillis, computeMillis, writeMillis, write, writeProperty

call algo.degree.stream('User','FOLLOWS',{direction:'incoming'})

yield nodeId,score

return algo.getNodeById(nodeId).id as name,score

order by score desc

带权重的

call algo.degree.stream('User','FOLLOWS',{direction:'incoming'，weightproperty：＇xxxx＇})

yield nodeId,score

return algo.getNodeById(nodeID).id as name,score

order by score desc

导入数据

LOAD CSV WITH HEADERS FROM 'file:///social-nodes.csv' AS row

MERGE (:User {id: row.id})

LOAD CSV WITH HEADERS FROM 'file:///social-relationships.csv' AS row

MATCH (source:User {id: row.src})

MATCH (destination:User {id: row.dst})

MERGE (source)-[:FOLLOWS]->(destination)

紧密中心性

CALL algo.closeness.stream("User", "FOLLOWS")

YIELD nodeId, centrality

RETURN algo.getNodeById(nodeId).id, centrality

ORDER BY centrality DESC

以下查询使用W濴瀆瀆濸瀅瀀濴瀁 & F濴瀈瀆瀇紧密中心性算法：更有效

CALL algo.closeness.stream("User", "FOLLOWS", {improved: true})

YIELD nodeId, centrality

RETURN algo.getNodeById(nodeId).id AS user, centrality

ORDER BY centrality DESC

和谐中心性

CALL algo.closeness.harmonic.stream("User", "FOLLOWS")

YIELD nodeId, centrality

RETURN algo.getNodeById(nodeId).id AS user, centrality

ORDER BY centrality DESC

中介中心性

CALL algo.betweenness.stream("User", "FOLLOWS")

YIELD nodeId, centrality

RETURN algo.getNodeById(nodeId).id AS user, centrality

ORDER BY centrality DESC

WITH ["James", "Michael", "Alice", "Doug", "Amy"] AS existingUsers

MATCH (existing:User) WHERE existing.id IN existingUsers

MERGE (newUser:User {id: "Jason"})

MERGE (newUser)<-[:FOLLOWS]-(existing)

MERGE (newUser)-[:FOLLOWS]->(existing)

MATCH (user:User {id: "Jason"}) DETACH DELETE user

中介中心性变体：RA-BRANDS

更高效率的中介中心性：RA-BRANDS

CALL algo.betweenness.sampled.stream("User", "FOLLOWS", {strategy:"degree"})

YIELD nodeId, centrality

RETURN algo.getNodeById(nodeId).id AS user, centrality

ORDER BY centrality DESC

PageRank

CALL algo.pageRank.stream('User', 'FOLLOWS', {iterations:20, dampingFactor:0.85})

YIELD nodeId, score

RETURN algo.getNodeById(nodeId).id AS page, score

ORDER BY score DESC

articleRank

第六章社区检测算法 115

LOAD CSV WITH HEADERS FROM 'file:///sw-nodes.csv' AS row

MERGE (:Library {id: row.id})

LOAD CSV WITH HEADERS FROM 'file:///sw-relationships.csv' AS row

MATCH (source:Library {id: row.src})

MATCH (destination:Library {id: row.dst})

MERGE (source)-[:DEPENDS_ON]->(destination)

三角形计数

CALL algo.triangle.stream("Library","DEPENDS_ON")

YIELD nodeA, nodeB, nodeC

RETURN algo.getNodeById(nodeA).id AS nodeA,

algo.getNodeById(nodeB).id AS nodeB,

algo.getNodeById(nodeC).id AS nodeC

局部聚类系数

CALL algo.triangleCount.stream('Library', 'DEPENDS_ON') YIELD nodeId, triangles, coefficient WHERE coefficient > 0 RETURN algo.getNodeById(nodeId).id AS library, coefficient ORDER BY coefficient DESC

强连接组件(确定性算法)

CALL algo.scc.stream("Library", "DEPENDS_ON") YIELD nodeId, partition RETURN partition, collect(algo.getNodeById(nodeId)) AS libraries ORDER BY size(libraries) DESC

连接组件算法(确定性算法)

连接组件算法（C

在无向图中查找连接节点集，其中每个节点都可以从同一集中的任何其他节点访

问。它不同于SCC算法，因为它只需要在一个方向上的节点对之间存在路径，而

SCC需要在两个方向上都存在路径。B

论文“A

使用场景：

与SCC一样，连接的组件通常在分析的早期用于理解图的结构。因为它可以

有效地伸缩，所以考虑使用此算法来处理需要频繁更新的图。它可以快速显示组之

间的新节点，这对于欺诈检测等分析非常有用。

建立起来这样的习惯：事先运行连接组件来测试图是否连接，作为一般图分

析的准备步骤。执行这个快速测试可以避免在图的一个孤岛组件上运行算法，那最

终会得到不正确的结果。

CALL algo.unionFind.stream("Library", "DEPENDS_ON")

YIELD nodeId,setId

RETURN setId, collect(algo.getNodeById(nodeId)) AS libraries

ORDER BY size(libraries) DESC

标签传播算法(LPA)

使用场景：

在大型网络中使用标签传播进行初始社区检测，特别是在有权重的情况下。

该算法可以并行化，因此在图划分方面速度非常快。

CALL algo.labelPropagation.stream("Library", "DEPENDS_ON",

{ iterations: 10 })

YIELD nodeId, label

RETURN label,

collect(algo.getNodeById(nodeId).id) AS libraries

ORDER BY size(libraries) DESC

我们也可以在假设图是无向的情况下运行该算法，这意味着节点将尝试采用

它们所依赖的库以及依赖这些库的库中的标签

CALL algo.labelPropagation.stream("Library", "DEPENDS_ON",

{ iterations: 10, direction: "BOTH" })

YIELD nodeId, label

RETURN label,

collect(algo.getNodeById(nodeId).id) AS libraries

ORDER BY size(libraries) DESC

Louvain模块化()

CALL algo.louvain.stream("Library", "DEPENDS_ON") YIELD nodeId, communities RETURN algo.getNodeById(nodeId).id AS libraries, communities

Neo4J运行失败

流版本：

CALL algo.louvain.stream("Library", "DEPENDS_ON")

YIELD nodeId, communities

WITH algo.getNodeById(nodeId) AS node, communities

SET node.communities= communities

找到最终社区：

MATCH (l:Library) RETURN l.communities[-1] AS community, collect(l.id) AS libraries ORDER BY size(libraries) DESC

看中间聚类

MATCH (l:Library) RETURN l.communities[0] AS community, collect(l.id) AS libraries ORDER BY size(libraries) DESC

社区发现算法

neo4j为图数据库，其中涉及的也就为图算法，图算法被用来度量图形，节点及关系。

在neo4j中，通过call algo.list() 可查看neo4j中的算法列表。

在neo4j官方文档中，主要记录如下各种方法：

一. 　　中心性算法（Centrality algorithms）
　　　　　　中心度算法主要用来判断一个图中不同节点的重要性：
　　　　　　　（１）PageRank（页面排名算法，algo.pageRank）
　　　　　　　（２）ArticleRank（文档排名算法，algo.articleRank）
　　　　　　　（３）Betweenness Centrality（中介中心性，algo.betweenness）
　　　　　　　（４）Closeness Centrality(紧密中心性，algo.closeness)
　　　　　　　（５）Harmonic Centrality（谐波中心性(这个翻译一直拿不准)，algo.closseness）
　　　　　　　（６）Eigenvecor Centrality （特征向量中心性，algo.eigenvector）
　　　　　　　（７）Degree Centrality(度中心性，algo.degree)
　　　　　　　
二. 　　社区发现算法（Community detection algorithms）
　　　　　　评估一个群体是如何聚集或划分的，以及其增强或分裂的趋势：
　　　　　　　　（１）Louvain(鲁汶算法，algo.louvain)
　　　　　　　　（２）Label Propagation（标签传播算法，algo.labelPropagagtion）
　　　　　　　　（３）Connected Components（连通组件算法，algo.unionFind）
　　　　　　　　（４）Strongly Connected Compontents（强连通组件，algo.scc）
　　　　　　　　（５）Triangle Counting/Clustering Coefficient（三角计数/聚类系数，algo.triangleCount）
　　　　　　　　（６） Balanced Triads（平衡三角算法，algo.balancedTriads）
　　　　　　　　
三. 　　　路径寻找算法（Path Finding algorithms）　　　　　
　　　　　　用于找到最短路径，或者评估路径的可用性和质量：
　　　　　　　　（１）Ｍinimum Weight Spanning Tree（最小权重生成树，algo.mst）
　　　　　　　　（２）Shortest Path（最短路径，algo.shortestPath）
　　　　　　　　（３）Single Source Shortesr Path（单源最短路径，algo.shortestPath.deltastepping）
　　　　　　　　（４）All Pairs Shortest Path （全节点对最短路径，algo.allShortestPath）
　　　　　　　　（５）Ａ＊（Ａ　star, algo.shortestPath.astar）
　　　　　　　　（６）Yen’s K-shortest paths(Yen k最短路径，algo.kShortestPaths)
　　　　　　　　（７）Random Walk（随机路径，algo.randomWalk）

四．　　相似度算法（Similarity algorithms）
　　　　　　用于计算节点间的相似度：
　　　　　　　　（１）Jaccard Similarity（Jaccard相似度，algo.similarity.jaccard）
　　　　　　　　（２）Consine Similarity（余弦相似度，algo.similarity.consine）
　　　　　　　　（３）Pearson Similarity（Pearson相似度，algo.similarity.pearson）
　　　　　　　　（４）Euclidean Distance（欧式距离，algo.similarity.euclidean）
　　　　　　　　（５）Overlap Similarity（重叠相似度，algo.similarity.overlap）

五.　　　链接预测算法（Link Prediction algorithms）
　　　　　　下面算法有助于确定一对节点的紧密程度。然后，我们将使用计算的分数作为链接预测解决方案的一部分:
　　　　　　　　（１）Adamic Adar（algo.linkprediction.adamicAdar）
　　　　　　　　（２）Common Neighbors（相同邻居，algo.linkprediction.commonNeighbors）
　　　　　　　　（３）Preferential Attachment（择优连接，algo.linkprediction.preferentialAttachment）
　　　　　　　　（４）Resource Allocation（资源分配，algo.linkprediction.resourceAllocation）
　　　　　　　　（５）Same Community（相同社区，algo.linkprediction.sameCommunity）
　　　　　　　　（６）Total Neighbors（总邻居，algo.linkprediction.totalNeighbors）

六　　　预处理算法（Preprocessing functions and procedures）
　　　　　　　数据处理过程：
　　　　　　　（１）One Hot Encoding （algo.ml.oneHotEncoding）

相似度计算

RETURN algo.similarity.cosine([3,8,7,5,2,9], [10,8,6,6,4,5]) AS similarity

哈哈和呵呵

关注

3
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
Neo4J 图算法

关系数据库对于多对多关系建模并不是那么合适。而反过来，Neo4j则擅长多对多关系的处理，让我们来看看它如何使用相同的数据集，但是把用户建模为节点，而不是表，列或者外键，然后将朋友关系建模为图中的边。图遍历是通过在互相连接的两个节点之间移动来访问图中的一组节点的操作。这是图数据库中进行数据检索的基本操作。遍历的一个关键概念是这个操作仅仅是局部相关的，遍历查询时只需要考虑所需的数据，无需像关系型数据库的join一样，在整个数据集上执行代价极高的分组操作https://neo4j.com/d..
复制链接

扫一扫

专栏目录