JanusGraph java 数据导入_JanusGraph批量导入数据代码总结

最新推荐文章于 2022-04-30 18:04:12 发布

weixin_30097621

最新推荐文章于 2022-04-30 18:04:12 发布

阅读量559

点赞数

文章标签： JanusGraph java 数据导入

本文链接：https://blog.csdn.net/weixin_30097621/article/details/114919506

版权

这里写自定义目录标题

1. Json导入到本地TinkerGraph

2. CSV导入到本地TinkerGraph

3. Json导入到分布式存储(berkeleyje-es)

本文中的代码基于janusgraph 0.3.1进行演示。数据文件都为janusgraph包中自带的数据文件。

1. Json导入到本地TinkerGraph

1.1 配置

conf/hadoop-graph/hadoop-load-json.properties 配置如下:

# Hadoop Graph Configuration

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph

gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat

gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat

gremlin.hadoop.inputLocation=./data/grateful-dead.json

gremlin.hadoop.outputLocation=output

gremlin.hadoop.jarsInDistributedCache=true

# SparkGraphComputer Configuration

spark.master=local[*]

spark.executor.memory=1g

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

1.2 样例Json

{"id":1,"label":"song","inE":{"followedBy":[{"id":3059,"outV":153,"properties":{"weight":1}},{"id":276,"outV":5,"properties":{"weight":2}},{"id":3704,"outV":3,"properties":{"weight":2}},{"id":4383,"outV":62,"pr

operties":{"weight":1}}]},"outE":{"followedBy":[{"id":0,"inV":2,"properties":{"weight":1}},{"id":1,"inV":3,"properties":{"weight":2}},{"id":2,"inV":4,"properties":{"weight":1}},{"id":3,"inV":5,"properties":{"we

ight":1}},{"id":4,"inV":6,"properties":{"weight":1}}],"sungBy":[{"id":7612,"inV":340}],"writtenBy":[{"id":7611,"inV":527}]},"properties":{"name":[{"id":0,"value":"HEY BO DIDDLEY"}],"songType":[{"id":2,"value":"

cover"}],"performances":[{"id":1,"value":5}]}}

{"id":2,"label":"song","inE":{"followedBy":[{"id":0,"outV":1,"properties":{"weight":1}},{"id":323,"outV":34,"properties":{"weight":1}}]},"outE":{"followedBy":[{"id":6190,"inV":123,"properties":{"weight":1}},{"i

d":6191,"inV":50,"properties":{"weight":1}}],"sungBy":[{"id":7666,"inV":525}],"writtenBy":[{"id":7665,"inV":525}]},"properties":{"name":[{"id":3,"value":"IM A MAN"}],"songType":[{"id":5,"value":"cover"}],"perfo

rmances":[{"id":4,"value":1}]}}

1.3 代码

readGraph = GraphFactory.open('conf/hadoop-graph/hadoop-load-json.properties')

writeGraphConf = new BaseConfiguration()

writeGraphConf.setProperty("gremlin.graph", "org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph")

writeGraphConf.setProperty("gremlin.tinkergraph.graphFormat", "gryo")

writeGraphConf.setProperty("gremlin.tinkergraph.graphLocation", "/tmp/csv-graph.kryo")

blvp = BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph(writeGraphConf).create(readGraph)

readGraph.compute(SparkGraphComputer).workers(1).program(blvp).submit().get()

1.4 文件校验

新生成的文件如下

[root@vm03 data]# ls -l /tmp/csv-graph.kryo

-rw-r--r--. 1 root root 726353 May 29 04:09 /tmp/csv-graph.kryo

2. CSV导入到本地TinkerGraph

2.1 配置

conf/hadoop-graph/hadoop-load-csv.properties 配置如下:

# Hadoop Graph Configuration

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph

gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptInputFormat

gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat

gremlin.hadoop.inputLocation=./data/grateful-dead.txt

gremlin.hadoop.outputLocation=output

gremlin.hadoop.jarsInDistributedCache=true

gremlin.hadoop.scriptInputFormat.script=./data/script-input-grateful-dead.groovy

# SparkGraphComputer Configuration

spark.master=local[*]

spark.executor.memory=1g

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

2.2 样例CSV

2,song,IM A MAN,cover,1 followedBy,50,1|followedBy,123,1|sungBy,525|writtenBy,525 followedBy,1,1|followedBy,34,1

2.3 代码

script-input-grateful-dead.groovy 代码如下：

def parse(line) {

def (vertex, outEdges, inEdges) = line.split(/\t/, 3)

def (v1id, v1label, v1props) = vertex.split(/,/, 3)

def v1 = graph.addVertex(T.id, v1id.toInteger(), T.label, v1label)

switch (v1label) {

case "song":

def (name, songType, performances) = v1props.split(/,/)

v1.property("name", name)

v1.property("songType", songType)

v1.property("performances", performances.toInteger())

break

case "artist":

v1.property("name", v1props)

break

default:

throw new Exception("Unexpected vertex label: ${v1label}")

}

[[outEdges, true], [inEdges, false]].each { def edges, def out ->

edges.split(/\|/).grep().each { def edge ->

def parts = edge.split(/,/)

def otherV, eLabel, weight = null

if (parts.size() == 2) {

(eLabel, otherV) = parts

} else {

(eLabel, otherV, weight) = parts

}

def v2 = graph.addVertex(T.id, otherV.toInteger())

def e = out ? v1.addOutEdge(eLabel, v2) : v1.addInEdge(eLabel, v2)

if (weight != null) e.property("weight", weight.toInteger())

}

return v1

}

janusgraph代码：

readGraph = GraphFactory.open('conf/hadoop-graph/hadoop-load-csv.properties')

writeGraphConf = new BaseConfiguration()

writeGraphConf.setProperty("gremlin.graph", "org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph")

writeGraphConf.setProperty("gremlin.tinkergraph.graphFormat", "gryo")

writeGraphConf.setProperty("gremlin.tinkergraph.graphLocation", "/tmp/csv-graph2.kryo")

blvp = BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph(writeGraphConf).create(readGraph)

readGraph.compute(SparkGraphComputer).workers(1).program(blvp).submit().get()

g = GraphFactory.open(writeGraphConf).traversal()

g.V().valueMap(true)

2.4 文件校验

新生成的文件如下

[root@vm03 data]# ls -l /tmp/csv-graph2.kryo

-rw-r--r--. 1 root root 339939 May 29 04:56 /tmp/csv-graph2.kryo

3. Json导入到分布式存储(berkeleyje-es)

3.1 配置

conf/hadoop-graph/hadoop-load-json-ber-es.properties 配置如下:

# Hadoop Graph Configuration

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph

gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat

gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat

gremlin.hadoop.inputLocation=./data/grateful-dead.json

gremlin.hadoop.outputLocation=output

gremlin.hadoop.jarsInDistributedCache=true

# SparkGraphComputer Configuration

spark.master=local[*]

spark.executor.memory=1g

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

./conf/janusgraph-berkeleyje-es-bulkload.properties 配置如下：

gremlin.graph=org.janusgraph.core.JanusGraphFactory

storage.backend=berkeleyje

storage.directory=../db/berkeley

index.search.backend=elasticsearch

3.2 样例Json

cover"}],"performances":[{"id":1,"value":5}]}}

rmances":[{"id":4,"value":1}]}}

3.3 代码

outputGraphConfig = './conf/janusgraph-berkeleyje-es-bulkload.properties'

readGraph = GraphFactory.open('conf/hadoop-graph/hadoop-load-json-ber-es.properties')

blvp = BulkLoaderVertexProgram.build().writeGraph(outputGraphConfig).create(readGraph)

readGraph.compute(SparkGraphComputer).workers(1).program(blvp).submit().get()

g = GraphFactory.open(outputGraphConfig).traversal()

g.V().valueMap(true)

3.4 验证

通过gremlin-server搭建服务进行验证

gremline-server配置文件如下(gremlin-server-berkeleyje-bulkload.yaml)，与gremlin-server-berkeleyje.yaml类似，下面的位置进行调整：

graph: conf/janusgraph-berkeleyje-es-bulkload.properties

./gremlin-server.sh conf/gremlin-server/gremlin-server-berkeleyje-bulkload.yaml 启动服务

通过graphexp进行查询

weixin_30097621

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
JanusGraph java 数据导入_JanusGraph批量导入数据代码总结

这里写自定义目录标题1. Json导入到本地TinkerGraph2. CSV导入到本地TinkerGraph3. Json导入到分布式存储(berkeleyje-es)本文中的代码基于janusgraph 0.3.1进行演示。数据文件都为janusgraph包中自带的数据文件。1. Json导入到本地TinkerGraph1.1 配置conf/hadoop-graph/hadoop-load-j...
复制链接

扫一扫