最近有人问我如何使用Neo4j的LOAD CSV工具在CSV文件的列中处理值的“数组”,尽管我最初认为这不可能,因为每个单元格都被视为字符串, 迈克尔向我展示了一种方法我认为这很整洁。
假设我们有一个代表人和他们的朋友的CSV文件。 它可能看起来像这样:
name,friends
"Mark","Michael,Peter"
"Michael","Peter,Kenny"
"Kenny","Anders,Michael"
我们想要的是具有以下节点:
- 标记
- 麦可
- 彼得
- 肯尼
- 安德斯
与以下朋友的关系:
- 马克->迈克尔
- 马可->彼得
- 迈克尔->彼得
- 迈克尔->肯尼
- 肯尼->安德斯
- 肯尼->迈克尔
我们将从加载CSV文件并返回每一行开始:
$ load csv with headers from "file:/Users/markneedham/Desktop/friends.csv" AS row RETURN row;
+------------------------------------------------+
| row |
+------------------------------------------------+
| {name -> "Mark", friends -> "Michael,Peter"} |
| {name -> "Michael", friends -> "Peter,Kenny"} |
| {name -> "Kenny", friends -> "Anders,Michael"} |
+------------------------------------------------+
3 rows
正如预期的那样,“朋友”列被视为字符串,这意味着我们可以使用split函数获取要与之成为朋友的人的数组:
$ load csv with headers from "file:/Users/markneedham/Desktop/friends.csv" AS row RETURN row, split(row.friends, ",") AS friends;
+-----------------------------------------------------------------------+
| row | friends |
+-----------------------------------------------------------------------+
| {name -> "Mark", friends -> "Michael,Peter"} | ["Michael","Peter"] |
| {name -> "Michael", friends -> "Peter,Kenny"} | ["Peter","Kenny"] |
| {name -> "Kenny", friends -> "Anders,Michael"} | ["Anders","Michael"] |
+-----------------------------------------------------------------------+
3 rows
现在我们已经将它们作为数组了,我们可以使用UNWIND来获取要创建的朋友对:
$ load csv with headers from "file:/Users/markneedham/Desktop/friends.csv" AS row WITH row, split(row.friends, ",") AS friends UNWIND friends AS friend RETURN row.name, friend;
+-----------------------+
| row.name | friend |
+-----------------------+
| "Mark" | "Michael" |
| "Mark" | "Peter" |
| "Michael" | "Peter" |
| "Michael" | "Kenny" |
| "Kenny" | "Anders" |
| "Kenny" | "Michael" |
+-----------------------+
6 rows
现在,我们将介绍一些MERGE语句来创建适当的节点和关系:
$ load csv with headers from "file:/Users/markneedham/Desktop/friends.csv" AS row WITH row, split(row.friends, ",") AS friends UNWIND friends AS friend MERGE (p1:Person {name: row.name}) MERGE (p2:Person {name: friend}) MERGE (p1)-[:FRIENDS_WITH]->(p2);
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 5
Relationships created: 6
Properties set: 5
Labels added: 5
373 ms
现在,如果我们查询数据库以获取所有节点和关系…
$ match (p1:Person)-[r]->(p2) RETURN p1,r, p2;
+------------------------------------------------------------------------+
| p1 | r | p2 |
+------------------------------------------------------------------------+
| Node[0]{name:"Mark"} | :FRIENDS_WITH[0]{} | Node[1]{name:"Michael"} |
| Node[0]{name:"Mark"} | :FRIENDS_WITH[1]{} | Node[2]{name:"Peter"} |
| Node[1]{name:"Michael"} | :FRIENDS_WITH[2]{} | Node[2]{name:"Peter"} |
| Node[1]{name:"Michael"} | :FRIENDS_WITH[3]{} | Node[3]{name:"Kenny"} |
| Node[3]{name:"Kenny"} | :FRIENDS_WITH[4]{} | Node[4]{name:"Anders"} |
| Node[3]{name:"Kenny"} | :FRIENDS_WITH[5]{} | Node[1]{name:"Michael"} |
+------------------------------------------------------------------------+
6 rows
…您会看到我们拥有一切。 如果不是用逗号分隔的人员列表,我们在单元格中有一个文字数组…
name,friends
"Mark", "[Michael,Peter]"
"Michael", "[Peter,Kenny]"
"Kenny", "[Anders,Michael]"
…我们需要调整查询的一部分,以提取我们的朋友以剥离第一个和最后一个字符:
$ load csv with headers from "file:/Users/markneedham/Desktop/friendsa.csv" AS row RETURN row, split(substring(row.friends, 1, length(row.friends) -2), ",") AS friends;
+-------------------------------------------------------------------------+
| row | friends |
+-------------------------------------------------------------------------+
| {name -> "Mark", friends -> "[Michael,Peter]"} | ["Michael","Peter"] |
| {name -> "Michael", friends -> "[Peter,Kenny]"} | ["Peter","Kenny"] |
| {name -> "Kenny", friends -> "[Anders,Michael]"} | ["Anders","Michael"] |
+-------------------------------------------------------------------------+
3 rows
然后,如果我们将整个查询放在一起,我们将得到以下结果:
$ load csv with headers from "file:/Users/markneedham/Desktop/friendsa.csv" AS row WITH row, split(substring(row.friends, 1, length(row.friends) -2), ",") AS friends UNWIND friends AS friend MERGE (p1:Person {name: row.name}) MERGE (p2:Person {name: friend}) MERGE (p1)-[:FRIENDS_WITH]->(p2);;
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 5
Relationships created: 6
Properties set: 5
Labels added: 5