我与Neo4j用户一起工作时经常看到的一种建模方法是创建非常通用的关系(例如HAS,CONTAINS,IS),并在关系属性或端节点的属性/标签上进行过滤。
直观上,这似乎并未充分利用图模型,因为这意味着您必须评估许多您不感兴趣的关系和节点。
但是,我从未真正测试过这两种方法之间的性能差异,因此我想尝试一下。
我创建了4个不同的数据库,这些数据库的一个节点具有60,000个传出关系– 10,000个我们要检索的关系和50,000个不相关的关系。
我用4种不同的方式为“关系”建模...
- 使用特定的关系类型
(节点)-[:HAS_ADDRESS]->(地址) - 使用通用关系类型,然后按最终节点标签过滤
(节点)-[:HAS]->(地址:地址) - 使用通用关系类型,然后按关系属性进行过滤
(节点)-[:HAS {类型:“地址”}]->(地址) - 使用通用关系类型,然后按终端节点属性进行过滤
(节点)-[:HAS]->(地址{type:“地址”})
…然后测量了检索“具有地址”关系所花费的时间。
- 如果您想看一下, 代码在github上 。
尽管它显然不如JMH微型基准测试精确,但我认为足以感觉到这两种方法之间的差异。
我对每个数据库运行了100次查询,然后分别执行了第50、75和99个百分位数(时间以毫秒为单位):
Using a generic relationship type and then filtering by end node label
50%ile: 6.0 75%ile: 6.0 99%ile: 402.60999999999825
Using a generic relationship type and then filtering by relationship property
50%ile: 21.0 75%ile: 22.0 99%ile: 504.85999999999785
Using a generic relationship type and then filtering by end node label
50%ile: 4.0 75%ile: 4.0 99%ile: 145.65999999999931
Using a specific relationship type
50%ile: 0.0 75%ile: 1.0 99%ile: 25.749999999999872
我们可以通过分析等效的密码查询来进一步探究每种方法的时间为何不同。 我们将从使用特定关系名称的那一个开始:
使用特定的关系类型
neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS_ADDRESS]->() return count(n);
+----------+
| count(n) |
+----------+
| 10000 |
+----------+
1 row
ColumnFilter
|
+EagerAggregation
|
+SimplePatternMatcher
|
+NodeByIdOrEmpty
+----------------------+-------+--------+-----------------------------+-----------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+-------+--------+-----------------------------+-----------------------+
| ColumnFilter | 1 | 0 | | keep columns count(n) |
| EagerAggregation | 1 | 0 | | |
| SimplePatternMatcher | 10000 | 10000 | n, UNNAMED53, UNNAMED35 | |
| NodeByIdOrEmpty | 1 | 1 | n, n | { AUTOINT0} |
+----------------------+-------+--------+-----------------------------+-----------------------+
Total database accesses: 10001
在这里,我们可以看到有10002次数据库访问,以便获得10,000个HAS_ADDRESS关系的计数。 每次加载节点,关系或属性时,我们都会获得数据库访问权限。
相比之下,其他方法只能加载更多数据然后才将其过滤掉:
使用通用关系类型,然后按最终节点标签过滤
neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS]->(:Address) return count(n);
+----------+
| count(n) |
+----------+
| 10000 |
+----------+
1 row
ColumnFilter
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+NodeByIdOrEmpty
+----------------------+-------+--------+-----------------------------+----------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+-------+--------+-----------------------------+----------------------------------+
| ColumnFilter | 1 | 0 | | keep columns count(n) |
| EagerAggregation | 1 | 0 | | |
| Filter | 10000 | 10000 | | hasLabel( UNNAMED45:Address(0)) |
| SimplePatternMatcher | 10000 | 60000 | n, UNNAMED45, UNNAMED35 | |
| NodeByIdOrEmpty | 1 | 1 | n, n | { AUTOINT0} |
+----------------------+-------+--------+-----------------------------+----------------------------------+
Total database accesses: 70001
使用通用关系类型,然后按关系属性进行过滤
neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS {type: "address"}]->() return count(n);
+----------+
| count(n) |
+----------+
| 10000 |
+----------+
1 row
ColumnFilter
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+NodeByIdOrEmpty
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
| ColumnFilter | 1 | 0 | | keep columns count(n) |
| EagerAggregation | 1 | 0 | | |
| Filter | 10000 | 20000 | | Property( UNNAMED35,type(0)) == { AUTOSTRING1} |
| SimplePatternMatcher | 10000 | 120000 | n, UNNAMED63, UNNAMED35 | |
| NodeByIdOrEmpty | 1 | 1 | n, n | { AUTOINT0} |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
Total database accesses: 140001
使用通用关系类型,然后按终端节点属性进行过滤
neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS]->({type: "address"}) return count(n);
+----------+
| count(n) |
+----------+
| 10000 |
+----------+
1 row
ColumnFilter
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+NodeByIdOrEmpty
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
| ColumnFilter | 1 | 0 | | keep columns count(n) |
| EagerAggregation | 1 | 0 | | |
| Filter | 10000 | 20000 | | Property( UNNAMED45,type(0)) == { AUTOSTRING1} |
| SimplePatternMatcher | 10000 | 120000 | n, UNNAMED45, UNNAMED35 | |
| NodeByIdOrEmpty | 1 | 1 | n, n | { AUTOINT0} |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
Total database accesses: 140001
因此,总而言之……特定的关系#ftw!
翻译自: https://www.javacodegeeks.com/2014/10/neo4j-genericvague-relationship-names.html