hugegraph 是百度开源的图数据库,支持hbase,mysql,rocksdb等作为存储后端。本文以EDGE 存储,hbase为存储后端,来探索hugegraph是如何存取数据的。
存数据
序列化
首先需要序列化,hbase 使用BinarySerializer:
keyWithIdPrefix 和indexWithIdPrefix都是false
这个后面会用到。
public class HbaseSerializer extends BinarySerializer {
public HbaseSerializer() {
super(false, true);
}
}
要存到db,首先需要序列化为BackendEntry,BackendEntry 是图数据库和后端存储的传输对象,Hbase对应的是BinaryBackendEntry:
public class BinaryBackendEntry implements BackendEntry {
private static final byte[] EMPTY_BYTES = new byte[]{};
private final HugeType type;
private final BinaryId id;
private Id subId;
private final List columns;
private long ttl;
public BinaryBackendEntry(HugeType type, byte[] bytes) {
this(type, BytesBuffer.wrap(bytes).parseId(type));
}
public BinaryBackendEntry(HugeType type, BinaryId id) {
this.type = type;
this.id = id;
this.subId = null;
this.columns = new ArrayList<>();
this.ttl = 0L;
}
我们来看序列化,序列化,其实就是要将数据放到entry的column列里。
hbase 的keyWithIdPrefix是false,因此name不包含ownerVertexId(参考下面的EdgeId,去掉ownerVertexId)
public BackendEntry writeEdge(HugeEdge edge) {
BinaryBackendEntry entry = newBackendEntry(edge);
byte[] name = this.keyWithIdPrefix ?
this.formatEdgeName(edge) : EMPTY_BYTES;
byte[] value = this.formatEdgeValue(edge);
entry.column(name, value);
if (edge.hasTtl()) {
entry.ttl(edge.ttl());
}
return entry;
}
EdgeId:
private final Id ownerVertexId;
private final Directions direction;
private final Id edgeLabelId;
private final String sortValues;
private final Id otherVertexId;
private final boolean directed;
private String cache;
backend 存储
生成BackendEntry后,通过store机制,交给后端的backend存储。
EDGE的保存,对应HbaseTables.Edge: