前言
一直以来都想记录一下关于Neo4j APOC工具包的使用笔记,最近又需要用到里面的一些东西,决定开始记录下来。apoc作为Neo4j的扩展包,使用得当,在开发中可以带来很大的便利。下面的笔记都是以linux下的Neo4j 服务器版和apoc3.5为例说明。
最新官网地址:https://neo4j.com/docs/labs/apoc/current/
3.5官网地址:https://neo4j.com/docs/labs/apoc/3.5/
各版本jar包下载地址:https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/
安装
- 下载apoc jar包,放到$NEO4J_HOME/plugins目录下
- 修改$NEO4J_HOME/conf/neo4j.conf文件,添加如下配置(algo是算法包的配置,这里一起加进来了,不需要的去掉即可)
dbms.security.procedures.unrestricted=apoc.*,algo.* apoc.import.file.enabled=true apoc.export.file.enabled=true ##下面这条配置是可选的,表示使用neo4j的配置,比如导入数据的路径 ##apoc.import.file.use_neo4j_config=true
- 重启Neo4j即可
数据导入
在使用数据导入功能的时候,务必要记添加上面的配置。
1. 导入csv:CALL apoc.load.csv
属性参考:
name | default | description |
---|---|---|
|
| skip result rows |
|
| limit result rows |
|
| indicates if file has a header |
|
| separator character or 'TAB' |
|
| the char to use for quoted elements |
|
| array separator |
|
| which columns to ignore |
|
| which values to treat as null, e.g. |
|
| per field mapping, entry key is field name, .e.g |
mapping 属性可以针对某些列做特殊指定,配置如下:
name | default | description |
---|---|---|
|
| 'int', 'string' etc. |
|
| indicates if field is an array |
|
| separator for array |
|
| rename field |
|
| ignore/remove this field |
|
| which values to treat as null, e.g. |
CALL apoc.load.csv('test.csv', {skip:1, limit:1, header:true, ignore:['name'],
mapping:{
age: {type:'int'},
beverage: {array:true, arraySep:';', name:'drinks'}
}
})
YIELD lineNo, map, list
RETURN *;
加载大数据量时,批量提交:
CALL apoc.periodic.iterate('
CALL apoc.load.csv({url}) yield map as row return row
','
CREATE (p:Person) SET p = row
', {batchSize:10000, iterateList:true, parallel:true});
数据导出
1. 导出到csv
CALL apoc.export.csv.all("movies.csv", {})
##导出全部数据到csv中,第一个参数是文件名,节点和边会被导入到一个文件中,很少用到。
CALL apoc.export.csv.data
##导出指定的节点和关系到csv中
MATCH (person:Person)-[actedIn:ACTED_IN]->(movie:Movie)
WITH collect(DISTINCT person) AS people, collect(DISTINCT movie) AS movies, collect(actedIn) AS actedInRels
CALL apoc.export.csv.data(people + movies, actedInRels, "movies-actedIn.csv", {})
YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
CALL apoc.export.csv.query
##根据cypher语句导出数据,也是最常用的方式,需要注意结果文件中含有引号。
WITH "MATCH path = (person:Person)-[:DIRECTED]->(movie)
RETURN person.name AS name, person.born AS born,
movie.title AS title, movie.tagline AS tagline, movie.released AS released" AS query
CALL apoc.export.csv.query(query, "movies-directed.csv", {})
YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data;
和关系型数据库集成
Neo4j可以通过jdbc与关系型数据库集成
1. mysql
(1)下载mysql数据库的jdbc驱动包,放在plugins目录下
(2)第一次使用的时候需要先加载jdbc驱动包:
CALL apoc.load.driver("com.mysql.jdbc.Driver");
统计products表的行数
WITH "jdbc:mysql://localhost:3306/northwind?user=root" as url
CALL apoc.load.jdbc(url,"products") YIELD row
RETURN count(*);
获取部分字段
WITH "jdbc:mysql://localhost:3306/northwind?user=root" as url
CALL apoc.load.jdbc(url,"products") YIELD row
RETURN row.name,row.age limit 10;
批量读取
CALL apoc.periodic.iterate(
'CALL apoc.load.jdbc("jdbc:mysql://localhost:3306/northwind?user=root","company")',
'CREATE (p:Person) SET p += value',
{ batchSize:10000, parallel:true})
RETURN batches, total