JanusGraph学习笔记

最新推荐文章于 2024-05-01 20:10:38 发布

Noah Gao

最新推荐文章于 2024-05-01 20:10:38 发布

阅读量1.2k

点赞数

分类专栏：知识图谱 Janusgraph 文章标签：知识图谱 elasticsearch cassandra

本文链接：https://blog.csdn.net/weixin_43219903/article/details/106282082

版权

知识图谱同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

Janusgraph

2 篇文章 0 订阅

订阅专栏

0. 背景

因为计划搭建一个知识图谱平台，所以架构如下。

Apache Jena作为语义网络的Java框架。
JanusGraph作为图数据库。
Apache Cassandra作为数据存储引擎。
Elasticsearch作为索引引擎。

1. JanusGraph相关概念

1.1 后端存储（Storage & Index Backend）

JanusGraph可以将数据和索引存储在不同的存储引擎中，如下图所示：


图1.1 JanusGraph后端数据库关系
参考：https://docs.janusgraph.org/v0.5/index-management/index-performance/

2. Java搭建图数据库

2.1 使用方式

Java中使用Janusgraph作为图数据库时，有两种使用方式，分别是：

嵌入进Java中运行。
Janusgraph单独以服务器形式运行，Java作为客户端，通过http或者socket交互。

这两种使用方式在Java代码中的区别体现在打开图的方法及对应的配置文件中。其中，JanusGraph自带了基于不同索引和数据存储引擎的配置文件和示例代码。

2.1.1 嵌入进Java中

打开图的示例代码如下：

graph = GraphFactory.open("your_property_file");
g = graph.traversal();

对应的配置文件如下（参考Janusgra自带的基于CQL查询的Cassandra配置文件）：

// janusgraph-cql-es.properties

gremlin.graph=org.janusgraph.core.JanusGraphFactory

storage.backend=cql
storage.cql.keyspace=noah
storage.hostname=127.0.0.1

index.jgex.backend=elasticsearch
index.jgex.index-name=noah
index.jgex.hostname=127.0.0.1

2.1.2 Janusgraph以服务器形式运行

与JanusGraph服务器交互，获取图的示例代码如下：

conf = new PropertiesConfiguration("your_property_file");

// Schema: 使用远程驱动
try {
    cluster = Cluster.open(conf.getString("gremlin.remote.driver.clusterFile"));
    client = cluster.connect();
} catch (Exception e) {
    throw new ConfigurationException(e);
}

// Query: 使用远程图
graph = EmptyGraph.instance();
g = graph.traversal().withRemote(conf);

对应的配置文件如下（参考Janusgra自带的基于CQL查询的Cassandra配置文件）：

// remote-graph.properties
gremlin.remote.remoteConnectionClass=org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection
gremlin.remote.driver.clusterFile=src/main/resources/remote-objects.yaml
gremlin.remote.driver.sourceName=g


// remote-objects.yaml
hosts: [localhost]
port: 8182
serializer: {
    className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0,
    config: {
        ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry]
    }
}

2.2 代码部分

JanusGraph的example中同样自带了基于不同后端数据库的Java实现代码。

2.2.1 建立Graph

根据配置文件打开graph，并且获得图的遍历实例g = graph.traversal()。

public GraphTraversalSource openGraph() throws ConfigurationException {
    LOGGER.info("opening graph");
    conf = new PropertiesConfiguration(propFileName); // 读取配置文件
    LOGGER.info(conf.getString("storage.backend"));
    graph = GraphFactory.open(conf); // 获得数据库中的图
    g = graph.traversal(); // 获得图的遍历实例
    return g;
}

2.2.2 架构（Schema）

JanusGraph的架构（Schema）可以以隐式或显式地方式创建。官方鼓励用户显示地建立架构，方便开发。

A JanusGraph schema can either be explicitly or implicitly defined. Users are encouraged to explicitly define the graph schema during application development. An explicitly defined schema is an important component of a robust graph application and greatly improves collaborative software development.

架构的类型（Schema Type）包含节点标签（Vertex Label）、属性键（Property Key）、边标签（Edge Label）。同时，这三种类型也分别被指派给图中的相应元素（节点、边、属性）。

需要注意的是，属性键和边标签的名称具有唯一性，不能重复。因为这两个共同称为关系类型（relation types），可以通过如下代码查询是否存在。

mgmt = graph.openManagement()
if (mgmt.containsRelationType('name'))
    name = mgmt.getPropertyKey('name')
mgmt.getRelationTypes(EdgeLabel.class)
mgmt.commit()

边标签（Edge Label）

示例代码如下（参考：https://docs.janusgraph.org/v0.5/basics/schema/#edge-label-multiplicity）：

mgmt = graph.openManagement()
follow = mgmt.makeEdgeLabel('follow').multiplicity(MULTI).make()
mgmt.commit()

其中，

边标签的名称（即代码中的follow）是唯一的。
其后的MULTI标识了该边标签的多重性，即表示允许图中存在多条标签为follow的边，对于边标签，JanusGraph支持如下的多重性：
- MULTI：允许图中存在多条标签相同的边。
- SIMPLE：允许图中存在至多一条标签相同的边。
- MANY2ONE：对标签相同的边的数量不作限制，但要求对于任一节点，其出边至多一条，入边则不限制。
- ONE2MANY：对标签相同的边的数量不作限制，但要求对于任一节点，其出边不限制，入边则至多一条。
- ONE2ONE：对标签相同的边的数量不作限制，但要求对于任一节点，其出边和入边均至多一条。

属性键（Property Keys）

示例代码如下（参考：https://docs.janusgraph.org/v0.5/basics/schema/#property-key-cardinality）：

mgmt = graph.openManagement()
birthDate = mgmt.makePropertyKey('birthDate').dataType(Long.class).cardinality(Cardinality.SINGLE).make()
name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SET).make()
sensorReading = mgmt.makePropertyKey('sensorReading').dataType(Double.class).cardinality(Cardinality.LIST).make()
mgmt.commit()

其中，

属性键的名称（即代码中的birthData、name、sensorReading）是唯一的。
属性键的数据类型（即代码中的Long.class、String.class、Double.class）定义了属性键值的数据类型，JanusGraph原生支持的数据类型有 String 、Character、Boolean、Byte、Short、Integer、Long、Float、Double、Date、GeoShape、UUID。
属性键的基数（即代码中的Cardinality.SINGLE、Cardinality.SET、Cardinality.LIST）定义了节点中属性键对应的值的类型。

节点标签（Vertex Label）

示例代码如下（参考：https://docs.janusgraph.org/v0.5/basics/schema/#defining-vertex-labels）：

mgmt = graph.openManagement()
person = mgmt.makeVertexLabel('person').make()
mgmt.commit()
// 创建一个有标签的节点
person = graph.addVertex(label, 'person')
// 创建一个没有标签的节点
v = graph.addVertex()
graph.tx().commit()

其中：

在创建节点的时候，可以定义其标签，也可以不定义。
节点标签的名称（即代码中的person）是唯一的。

2.2.3 建立索引

JanusGraph支持建立图索引或者以节点为中心的索引。

图索引 - 复合索引

示例代码如下（参考：https://docs.janusgraph.org/v0.5/index-management/index-performance/#composite-index）：

// 建立复合索引
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.buildIndex('byNameAndAgeComposite', Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()

//对应查询方法
g.V().has('name', 'hercules')
g.V().has('age', 30).has('name', 'hercules')

其中，

一个复合索引中所有的键值都要包含在查询语句中。例如：对于g.V().has('age', 30)的查询，其不会经过复合索引去查找，因为根据先前建立的名为byNameAndAgeComposite这条复合索引，键age必须要与name一起查询。
此外，查询语句只能使用相等约束，而范围约束等查询条件则不支持通过复合索引查询。

图索引 - 混合索引

示例代码如下（参考：https://docs.janusgraph.org/v0.5/index-management/index-performance/#mixed-index）：

// 建立混合索引
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('nameAndAge', Vertex.class).addKey(name).addKey(age).buildMixedIndex("search")
mgmt.commit()

// 对应查询方法
g.V().has('name', textContains('hercules')).has('age', inside(20, 50))
g.V().has('name', textContains('hercules'))
g.V().has('age', lt(50))
g.V().has('age', outside(20, 50))
g.V().has('age', lt(50).or(gte(60)))
g.V().or(__.has('name', textContains('hercules')), __.has('age', inside(20, 50)))

其中，

混合索引支持各种约束条件的查询。
混合索引支持全文搜索，范围搜索，地理搜索等查询。

以节点为中心的索引

示例代码如下（参考：https://docs.janusgraph.org/v0.5/index-management/index-performance/#vertex-centric-indexes）：

// 建立以节点为中心的索引
mgmt = graph.openManagement()
time = mgmt.getPropertyKey('time')
battled = mgmt.getEdgeLabel('battled')
mgmt.buildEdgeIndex(battled, 'battlesByTime', Direction.BOTH, Order.desc, time)
mgmt.commit()

// 对应的查询方法
h = g.V().has('name', 'hercules').next()
g.V(h).outE('battled').property('rating', 5.0) //Add some rating properties
g.V(h).outE('battled').has('rating', gt(3.0)).inV()

2.2.4 插入数据

创建节点（Vertex）

final Vertex saturn = g.addV("titan").property("name", "saturn").property("age", 10000).next();

上述代码创建的节点形如：

来源：https://docs.janusgraph.org/getting-started/basic-usage/

其中，titan为节点的label(type)，节点属性包括name和age。

创建边（Edge）

// 第一种
g.V(jupiter).as("a").V(saturn).addE("father").from("a").next();

// 第二种
g.V(jupiter).as("a").V(sky).addE("lives").property("reason", "loves fresh breezes").from("a").next();

上述代码创建的边形如：


第一种	第二种

来源：https://docs.janusgraph.org/getting-started/basic-usage/

2.2.5 查询节点（Vertex）

2.2.6 查询边（Edge）

2.3 可视化

JanusGraph的可视化主要有以下五种方法：

Cytoscape
Gephi
Graphexp
KeyLines
Linkurious

本篇文章则使用Gephi作为可视化手段。如果有时间，会继续尝试其他可视化方法并更新。

2.3.1 概念

Gephi是一个用于可视化图或者网络的软件。

Gephi is the leading visualization and exploration software for all kinds of graphs and networks.

2.3.2 Gephi配置

首先安装完成后需要安装Gephi的Graph Streaming插件。在"工具"->"插件"菜单，在"可用插件"中选中"Graph Streaming"安装即可。
启动 Gephi 并新建项目：“文件”->“新建项目”。
重命名工作区：“工作区”->“重命名”，设为"janusgraphspace"
启动 Gephi Master Server：鼠标右击左侧Streaming窗口中的"Master Server"，然后选择"Start"，启动Master Server。URL为：http://{ip地址}:8080/janusgraphspace。

2.3.3 JanusGraph配置

确认gremlin-sever、cassandra、elasticsearch已经运行。

运行gremlin.bat ./gremlin-server-cql-es.yaml，即Gremlin命令行。（参考：https://docs.janusgraph.org/getting-started/basic-usage/）

// 读取配置文件，并打开图。
gremlin> graph = JanusGraphFactory.open('conf/janusgraph-cql-es.properties') 
gremlin> g = graph.traversal()

// 激活 tinkerpop.gephi 插件
gremlin> :plugin use tinkerpop.gephi
// 连接到 tinkerpop.gephi 插件
gremlin> :remote connect tinkerpop.gephi

// 由于tinkerpop.gephi插件默认连接的Gephi地址为"http://localhost:8080/workspace1"。
// 但实际上可能JanusGraph部署在远程服务器上，并且URL路径也与之前配置的Gephi的路径不符合。
// 所以根据需要修改host和URL路径。

// 配置Gephi的host
gremlin> :remote config host 127.0.0.1
// 配置Gephi的worksapce
gremlin> :remote config workspace janusgraphspace
// 把图数据推送到 Gephi 中
gremlin> :> graph

// 最后可根据需要清空 Gephi 工作区内的图数据
gremlin> :> clear

2.3.4 Gephi界面美化

在上一步数据推送到Gephi中后，会发现所有的节点（Vertices）、边（Edges）都堆积到中心，没有办法看，所以需要进一步美化界面。

更改布局：“窗口”->“布局”。通常设置斥力强度为10000，勾选由尺寸调整。
更改外观：“窗口”->“外观”。
2.1. “节点”->“颜色”->“Partition”->下拉框为"name"。
2.2. “节点”->“大小”->“统一的”->大小为40。
可进入数据资料选项卡查看详细数据。
可进入预览选项卡进一步修改，修改完后需点击下方刷新生效。

3. Reference

Noah Gao

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
3
评论
JanusGraph学习笔记

因为计划搭建一个知识图谱平台，所以准备使用Apache Jena作为语义网络的Java框架。因此打算使用JanusGraph作为图数据库，同时使用Apache Cassandra作为数据存储引擎，Elasticsearch作为索引引擎。
复制链接

扫一扫