博文配套视频课程:自然语言处理与知识图谱
导入CSV文件格式
从Neo4j 2.2版本开始,系统就自带了一个大数据导入工具:neo4j-import,可支持并行、可扩展的大规模数据导入。它每次导入必须要创建一个新数据库,并且要为节点个关系提供不同的CSV文件。
官方CSV导入案例 单击 CSV header format 可以查看CSV文件的格式
注意事项
- 目前在neo4j 3.x 也只能每次启动读取到一个数据库。可以通过conf文件夹指定:dbms.active_database=neo4j.db
- 对于node文件,name:field_type 如果没有指定类型默认为string,除了一般属性之外还要包括ID和Lable标签
- database=neo4j.db 数据库的名称必须要有db后缀,启动时conf配置中的dbms.active_database=neo4j.db(启动时访问的数据库) 必须和创建的数据库名称相同
- nodes与relationships依赖的必须是绝对路径
导入数据源
movies.csv
movieId:ID,title,year:int,:LABEL
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie
tt0242653,"The Matrix Revolutions",2003,Movie
actors.csv
personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
There are three mandatory fields for relationship data
- :START_ID — ID refering to a node.
- :END_ID — ID refering to a node.
- :TYPE — The relationship type.
roles.csv
:START_ID,role,:END_ID,:TYPE
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN
导入关联引用数据
C:\Users\hong>neo4j-admin import --database=neo4j.db
--nodes=D:\hadoop\neo4j-community-3.5.28\import\movies.csv
--nodes=D:\hadoop\neo4j-community-3.5.28\import\actors.csv
--relationships=D:\hadoop\neo4j-community-3.5.28\import\roles.csv
打印结果可以看出来导入已成功
C:\Users\hong>neo4j-admin import --database=neo4j.db --nodes=D:\hadoop\neo4j-community-3.5.28\import\movies.csv --nodes=D:\hadoop\neo4j-community-3.5.28\import\actors.csv --relationships=D:\hadoop\neo4j-community-3.5.28\import\roles.csv
Neo4j version: 3.5.28
Importing the contents of these files into D:\hadoop\neo4j-community-3.5.28\data\databases\neo4j.db:
Nodes:
D:\hadoop\neo4j-community-3.5.28\import\movies.csv
D:\hadoop\neo4j-community-3.5.28\import\actors.csv
Relationships:
D:\hadoop\neo4j-community-3.5.28\import\roles.csv
Available resources:
Total machine memory: 13.98 GB
Free machine memory: 6.59 GB
Max heap memory : 3.11 GB
Processors: 8
Configured max memory: 9.78 GB
High-IO: false
Import starting 2022-01-19 20:16:13.582+0800
Estimated number of nodes: 6.00
Estimated number of node properties: 15.00
Estimated number of relationships: 9.00
Estimated number of relationship properties: 9.00
Estimated disk space usage: 1.19 kB
Estimated required memory usage: 1020.01 MB
InteractiveReporterInteractions command list (end with ENTER):
c: Print more detailed information about current stage
i: Print more detailed information
(1/4) Node import 2022-01-19 20:16:13.621+0800
Estimated number of nodes: 6.00
Estimated disk space usage: 547.00 B
Estimated required memory usage: 1020.01 MB
Neo4j查询结果如下