TinkerPop 体系架构

最新推荐文章于 2024-05-28 14:55:48 发布

《一夜飘零》

最新推荐文章于 2024-05-28 14:55:48 发布

阅读量565

点赞数

分类专栏：知识图谱

本文链接：https://blog.csdn.net/jiaxinhong/article/details/108469414

版权

知识图谱专栏收录该内容

44 篇文章 1 订阅

订阅专栏

Why TinkerPop?

provider integration

The goal of TinkerPop, as a Graph Computing Framework, is to make it easy for developers to create graph applications by providing APIs and tools that simplify their endeavors. One of the fundamental aspects to what TinkerPop offers in this area lies in the fact that TinkerPop is an abstraction layer over different graph databases and different graph processors. As an abstraction layer, TinkerPop provides a way to avoid vendor lock-in to a specific database or processor. This capability provides immense value to developers who are thus afforded options in their architecture and development because:

They can try different implementations using the same code to decide which is best for their environment.
They can grow into a particular implementation if they so desire; e.g., start with a graph that is designed to scale within a single machine and then later switch to a graph that is designed to scale horizontally.
They can feel more confident in graph technology choices, as advances in the state of different provider implementations are behind TinkerPop APIs, which open the possibility to switch providers with limited impact.

TinkerPop has always had the vision of being an abstraction over different graph databases. That much is not new and dates back to TinkerPop 1.x. It is in TinkerPop 3.x, however, that we see the introduction of the notion that TinkerPop is also an abstraction over different graph processors like Spark. The scope of this tutorial does not permit it to delve into "graph processors", but the short story is that the same Gremlin statement we wrote in the examples above can be executed to run in distributed fashion over Spark or Hadoop. The changes required to the code to do this are not in the traversal itself, but in the definition of the TraversalSource. You can again see why we encourage graph operations to be executed through that class as opposed to just using Graph. You can read more about these features in this section on hadoop-gremlin.

TIP	To maintain an abstraction over `Graph` creation, use `GraphFactory.open()` to construct new instances. See the documentation for individual `Graph` implementations to learn about the configuration options to provide.

Loading Data

gremlin to the 7

There are many strategies for getting data into your graph. As you are just getting started, let’s look at the simpler methods aimed at "smaller" graphs. A "small" graph, in this context, is one that has fewer than ten million edges. The most direct way to load this data is to write a Groovy script that can be executed in the Gremlin Console, a tool that you should be well familiar with at this point. For our example, let’s use the Wikipedia Vote Network data set, which contains 7,115 vertices and 103,689 edges.

$ curl -L -O http://snap.stanford.edu/data/wiki-Vote.txt.gz
$ gunzip wiki-Vote.txt.gz

The data is contained in a tab-delimited structure in which vertices are Wikipedia users and edges from one user to another imply a "vote" relationship. Here is the script to parse the file and generate the Graph instance using TinkerGraph:

graph = TinkerGraph.open()
graph.createIndex('userId', Vertex.class) //1

g = graph.traversal()

getOrCreate = { id ->
  g.V().has('user','userId', id).
    fold().
    coalesce(unfold(),
             addV('user').property('userId', id)).next()  //2
}

new File('wiki-Vote.txt').eachLine {
  if (!it.startsWith("#")){
    (fromVertex, toVertex) = it.split('\t').collect(getOrCreate) //3
    g.addE('votesFor').from(fromVertex).to(toVertex).iterate()
  }
}

To ensure fast lookups of vertices, we need an index. The createIndex() method is a method native to TinkerGraph. Please consult your graph databases' documentation for their index creation approaches.
This "get or create" traversal gets a vertex if it already exists; otherwise, it creates it. It uses coalesce() in a clever way by first determining whether the list of vertices produced by the previous fold() has anything in it by testing the result of unfold(). If unfold()returns nothing then that vertex doesn’t exist and the subsequent addV() inner traversal can be called to create it.
We are iterating each line of the wiki-Vote.txt file and this line splits the line on the delimiter, then uses some neat Groovy syntax to apply the getOrCreate() function to each of the two userId fields encountered in the line and stores those vertices in the fromVertexand toVertex variables, respectively.

NOTE	While this is a tab-delimited structure, this same pattern can be applied to any data source you require and Groovy tends to have nice libraries that can help make working with data quite enjoyable.

WARNING

Take care if using a Graph implementation that supports transactions. As TinkerGraph does not, there is no need to commit(). If your Graph does support transactions, intermediate commits during load will need to be applied.

To load larger data sets you should read about the CloneVertexProgram, which provides a generalized method for loading graphs of virtually any size and consider the native bulk loading features of the underlying graph database that you’ve chosen.

Gremlin in Other Programming Languages

This tutorial focused on Gremlin usage within the Gremlin Console which means that the examples were Groovy-based and oriented toward the JVM. Gremlin, however, is far from being a Java-only library. TinkerPop natively supports a number of different programming languages, making it possible to execute all of the examples presented in this tutorial with little modification. These different language implementations of Gremlin are referred to as Gremlin Language Variants and they help make Gremlin more accessible and easier to use for those who do not use Java as their primary programming language.

CONSOLE (GROOVY)GROOVYCSHARPJAVAJAVASCRIPTPYTHON

gremlin> v1 = g.addV('person').property('name','marko').next()
==>v[0]
gremlin> v2 = g.addV('person').property('name','stephen').next()
==>v[2]
gremlin> g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate()

v1 = g.addV('person').property('name','marko').next()
v2 = g.addV('person').property('name','stephen').next()
g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate()

Vertex v1 = g.AddV("person").Property("name","marko").Next();
Vertex v2 = g.AddV("person").Property("name","stephen").Next();
g.V(v1).AddE("knows").To(v2).Property("weight",0.75).Iterate();

Vertex v1 = g.addV("person").property("name","marko").next();
Vertex v2 = g.addV("person").property("name","stephen").next();
g.V(v1).addE("knows").to(v2).property("weight",0.75).iterate();

const v1 = g.addV('person').property('name','marko').next();
const v2 = g.addV('person').property('name','stephen').next();
g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate();

v1 = g.addV('person').property('name','marko').next()
v2 = g.addV('person').property('name','stephen').next()
g.V(Bindings.of('id',v1)).addE('knows').to(v2).property('weight',0.75).iterate()

Conclusion

and that is the end of The TinkerPop Workout — by Gremlin. You are hopefully feeling more confident in your TinkerPop skills and have a good overview of what the stack has to offer, as well as some entry points to further research within the reference documentation. Welcome to The TinkerPop!