![2c60f5403923e05f52ad61ed91aa4d27.png](https://i-blog.csdnimg.cn/blog_migrate/ed00d1b5d8f786a1b4f42212ccba41a3.png)
Turing Completeness
![a7590966033526cf4e97212039131321.png](https://i-blog.csdnimg.cn/blog_migrate/12a18bb0811fc3919123c1021e7bdbe1.jpeg)
- 图灵机M由元组(有限符合集F,有限状态集Q,转移函数L)组成,图灵机在无限纸条(可表达任意大的数据)上游走执行
- 算法需要考虑是否对于任意长度的输入都能被有限个状态集和有限条纸袋数目处理
- 图灵停机问题不可判定,理发师悖论(理发师给并且只给那些不给自己理发的人理发,那么理发师的发谁来理?),因此图灵机无法解决所有问题
- 有限状态自动机(将无限的穷举空间划分为有限的匹配模式)只能表达正则语言。(自带栈的有限状态机,可存储数据)下推自动机PushDownAutomation能够表达上下文无关语言。图灵机比下推自动机带更多的栈。
- 图灵机语言是递归可枚举语言。能事先验证(递归将解决问题方向变小)知道结果的语言,称为递归可枚举语言,即可以以暴力枚举匹配的方式判定被接受串
- 图灵机完备的语言意味着语言可以使用图灵机完成任何图灵机机可以完成的任务
- 不能在多项式时间内解决的问题为NP问题
- 图灵完备的门槛比较低,循环可以用尾递归+状态条件判断来模拟,
- 一般命令式语言如C、Java等都是图灵完备的,都有if分支语句、goto语句(或while、for之类的循环语句),以及能够进行赋值操作(也就是改变内存状态)。SQL-92 不是图灵完备
- 图灵完备只表示可计算,不保证计算的效率与可理解性,因此有很多编程语言发明
- lambda演算(λ-演算),形如(lambda x . body),是最小的程序设计语言,但与图灵机等价。lambda是数字符号主义,图灵机是机械主义
Graph Traversal Language
![5ba5d40a21033e88205b81b3f6ad6ed3.png](https://i-blog.csdnimg.cn/blog_migrate/9157b1c4ffb3489e7c1d694f1d78a6cc.jpeg)
- instruction set must at minimum support values(), property(), sack(), choose(), repeat(), in(), and out()。即支持可随机移动,可修改状态与数据
- The Turing machine state is the traverser's sack
- The function of the Turing machine is the composition of the steps to create program
- G is a line graph with each vertex representing a cell in the Turing tape with respective input symbols
Graph Traversal
![6e85a611261e918aca6268b8c28e961b.png](https://i-blog.csdnimg.cn/blog_migrate/e00f42c3ab802588216d4c554275dda2.jpeg)
- TraversalSource: a generator of traversals for a particular graph, DSL, and execution engine.
- GraphTraversalSource provides V() and E(). return type is GraphTraversal
- Traversal<S,E>: a functional data flow process transforming objects of type S into object of type E. supports function composition as chain steps
- GraphTraversal: a traversal DSL that is oriented towards the semantics of the raw graph (i.e. vertices, edges, etc.)
- GraphTraversal is a monoid(如+,*) in that it is an algebraic structure that has a single binary operation that is associative. The binary operation is function composition (i.e. method chaining) and its identity is the step identity(). This is related to a monad as popularized by the functional programming community
- The objects propagating through the traversal are wrapped in a Traverser<T>
- One can write Gremlin traversals in any language, and Gremlin Traversal Machine (GTM) is responsible for processing traversals
- Traversal<S,E>是Iterator<E>接口的实现类,S和E代表着出发点和结束。Traversal Representation就是一种以单步为规则的抽象,更类似函数式编程中functor的概念。 在一般的单步中,很多都可以传入一个Lambda或者functor(如Predictable),或者依赖一个迭代器(Iterator),非常类似高阶函数。__为匿名Traversal
- Traverser<T>:在当前遍历中生成T的对象。遍历器Traverser维护Traversal的所有元数据,提供get()当前遍历对象,path()历史遍历路径,loops()当前循环次数,bulk()批量维护的对象个数,sack()遍历器相关的局部数据,sideEffects()遍历相关的全局数据
- TraversalStrategy:拦截器(interceptor)方法,优化调整更换遍历的执行步骤。注意新加自定义的Step可能影响到策略调整
- TraversalSideEffects:用于存储图的遍历中的全局信息键值对,类型为Map<String,Object>. used like withSideEffect()-sideEffect(Traversal.store(x)).select(x)
Path Data Structure
![2f89ffb8103fe1b76e317b55b7310016.png](https://i-blog.csdnimg.cn/blog_migrate/8754357fd95e28b64f7503d62edc37ea.jpeg)
- The Path data structure is an ordered list of objects, where each object is associated to a Set<String> of labels
- While the path step is a lot more convenient, in some cases it can be very expensive in terms of memory and CPU usage so it is worth remembering these alternative techniques using as() and select()
![e4827b47a7aecdf86ec18d97c0b4a2a6.png](https://i-blog.csdnimg.cn/blog_migrate/b3a6c346e005e41c238d7f54015ed775.jpeg)
- Path并不一定将历史遍历路径都记录,会根据Gremlin需要返回的结果来选择性地记录路径信息,比如g.V().outE().inV().path()比g.V().out().path()结果多了路径上边的信息
Step
![efcbe09ca24a9ba07ab3d31a8a69cef6.png](https://i-blog.csdnimg.cn/blog_migrate/257f9836057fb15ff04f46213cea3d4b.jpeg)
- step is a stateless function
- process a stream/flow of input traversers, and yields a stream of output traversers/iterator. yield means not execute immediately
- Step<S,E>是无状态单步,状态数据在Treverser中
![8f012a16c7a0930437d42aed1429f509.png](https://i-blog.csdnimg.cn/blog_migrate/cd80c8e51d85ef3381b13315dd5c7d12.jpeg)
- map: e.g. constant(), count(), elementMap(), fold(), group(), groupCount(), id(), identity(), label(), loops(), match(), max(), min(), order(), program(), path(), project(), propertyMap(), sack(), select(), sum(), valueMap()
- flatMap: e.g. unfold()
- filter: e.g. and(), or(), not(), coin(), dedup(), drop(), has(), is(), limit(), none(), range()
- sideEffect: e.g. aggregate(scope, x), group(), groupCount(), inject(), pageRank(), profile(), sack(), subgraph(), tree()
- branch steps: e.g. choose(), coalesce(), local(), repeat(), union()
- step modulators: look back on the previous step and modifies their behavior.e.g. as(), by(), emit(), from(), to()
- start steps: configuration methods like with(), withComputer(), withSack(), withSideEffect(), withStrategies()..., and spawn steps like V(), E(), addV(), addE(), inject()
- terminal steps: steps do not return a traversal, but execute the traversal and return a result. e.g. hasNext(), next(), tryNext(), iterator(), toList(), toSet(), fill(collection), iterate(), explain()
- barrier steps: barrier()-step (barrier) turns the lazy traversal pipeline into a bulk-synchronous pipeline. e.g. barrier(), cap(), count(), fold(), max(), min(), sum()
- sideEffect get/set steps: aggregate() aggregate, cap() get, group() reduce
- traversal objects get/set steps: fold() aggregate, group() reduce
- math steps: string-based math processor. Variables can be path labels, side-effects, or incoming map keys. can use calculator operators
- statistical steps: count(), sum(), max(), min(), mean()
- VertexProgram steps: e.g. pageRank()
- logical operators: and(), or() and not(), return Boolean
- Predicates: eq/lt/gt/inside/outside/between/within/without. and Text searching predicates, startingWith/endingWith/containing
- Random: coin(), sample()
- optional: optional(), coalesce() to select one possible traversal
Graph Step & VertexStep
![e545b8d422268b4697f3492e248d6fb0.png](https://i-blog.csdnimg.cn/blog_migrate/c79a458cf0cf7cb0626422a70d1268df.jpeg)
- start a Traversal
- graph step: G -> V*
- vertex step: V -> V*
- "graph" represents a graph instance, and "g" represents an instance of a graph traversal source object. g only needs to be instantiated once and should then be re-used
Select Step
- select() can go back to a previously seen area of computation
- Select labeled steps within a path (as defined by as(x)...select(x) in a traversal). select() will do its best to avoid calculating the path history, instead will rely on a global data structure for storing the currently selected object
- Select objects out of a Map<String,Object> flow (i.e. a select(keys/values)).
Project Step
- map projects the current object into a Map<String,Object> keyed by provided labels
- It is similar to select()-step, save that instead of retrieving and modulating historic traverser state, it modulates the current state of the traverser
As Modulator
- as(label) provide a label to the step that can later be accessed by steps and data structures that make use of such labels, e.g. select(), match(), path()
- as step allows us to refer back to the previous state of a traversal but potentially requires a lot of memory to hold this state during complex queries
- label可理解为一个标记,不是property graph的点/边的label
- 通过as()可以相比path()更有效率地获取路径中的部分信息
By Modulator
![dd7b6d75f8a2fb5e6018512eb97181be.png](https://i-blog.csdnimg.cn/blog_migrate/b220ed96ba069880c4d21091a769d44e.jpeg)
- by modulator steps are processed in a round robin fashion
- by(key)可以获取先前对象的键值,path().by(Traversal)可作用于path的每个element, order().by(Traversal, incr)可以选择element属性值或者某种遍历值来排序
Global Scope & Local Scope
![fc4e1719ad171e524137f295a422ac76.png](https://i-blog.csdnimg.cn/blog_migrate/afb08f7d03cf8bd1b63c3716fd4b0d18.jpeg)
- nested functions -> nested steps
Group Step
![e9986b808911368af706e286b60188ff.png](https://i-blog.csdnimg.cn/blog_migrate/479ba63efd4e0294f2650a5bed9d4df7.jpeg)
- group step returns map collection, can use unfold to unbundle the collection
Has Step
![63b2bfb4803fbf266f617d90278608cc.png](https://i-blog.csdnimg.cn/blog_migrate/d9ea0d2656d42f7ea76ecc94311d7e7b.jpeg)
- filter vertices, edges, and vertex properties based on their properties
Local Step
![2c9dd8d1f6b0324375b65229b7f415b0.png](https://i-blog.csdnimg.cn/blog_migrate/90327a3f94b308fd6bf801c3f41614d2.jpeg)
- Scope determines whether the particular step being scoped is with respects to the current object (local) at that step or to the entire stream of objects up to that step (global)
![9183b99878ecad7b36a368c032345b9a.png](https://i-blog.csdnimg.cn/blog_migrate/cb459a259230c6c99a55a1d427f8b9c0.jpeg)
- traversal operates on a continuous stream of objects. but local() operate on a single element within that stream
- local() which wraps a section of the traversal in a object-local traversal
- Local Step is quite similar in functionality to FlatMap where it can often be confused. local() propagates the traverser through the internal traversal as is without splitting/cloning it, so it is not flatMap step
Aggregate Step
![673caa37f5605e3c39c86c4a927e49d0.png](https://i-blog.csdnimg.cn/blog_migrate/7077d8085bc1bf522a6e80fa39b674c5.jpeg)
- Using aggregate to create collections during a traversal
- global scope means that the step will use eager evaluation in that no objects continue on until all previous objects have been fully aggregated
- local scope the aggregation will occur in a lazy evaluation. EarlyLimitStrategy alters the behavior of aggregate(local)
Repeat Step
![c3fe3618740bc6018bb744379673afc5.png](https://i-blog.csdnimg.cn/blog_migrate/55d6916d1995148c2fdc57ed7b0a1067.jpeg)
- looping over a traversal given some break predicate, can explore vertices and edges using both, bothE, bothV and otherV
![7bad03db4229ae649b40c31cae70878c.png](https://i-blog.csdnimg.cn/blog_migrate/402e4eb47292b5a57be6280b03580363.jpeg)
- two modulators for repeat(): until() and emit()
- repeat()...until()/emit() is do/while looping, until()/emit()...repeat() is while/do looping
- emit-predicate, with emit(), the traverser is split in two — the traverser exits the code block as well as continues back within the code block (assuming until() holds true)
![15ba525b9ec09e95fdc5d8aa1dfbfeac.png](https://i-blog.csdnimg.cn/blog_migrate/0b325004a3fa28c5f168b315522050cb.jpeg)
- use a repeat…until loop to look for shortest paths
- Using emit to return results during a repeat loop
- having a high or low "fan out" of possible routes depending on the query direction, so path finding can be memory and CPU intensive
Union Step
![39f69f45e164046d4cad5a5846df0dbf.png](https://i-blog.csdnimg.cn/blog_migrate/e55e9cd2ea4055ecc69e212d3ca692bc.jpeg)
- Using union to combine query results
Barrier Step
![d6ef6f32d4c24f84295ff835b11d7435.png](https://i-blog.csdnimg.cn/blog_migrate/7ad83714b69108df19bad52cb7932a3b.jpeg)
- barrier turns the lazy traversal pipeline into a bulk-synchronous pipeline
- When everything prior to barrier() needs to be executed before moving onto the steps after the barrier()
- When "stalling" the traversal may lead to a "bulking optimization" in traversals that repeatedly touch many of the same elements (i.e. optimizing). e.g. If there are one million traversers at vertex 1, represent those one million traversers as a single traverser with a Traverser.bulk() equal to one million and execute both() once, instead of calculate one million both() computations
- LazyBarrierStrategy inserts barrier() into a traversal where appropriate in order to gain the "bulking optimization."
- Each traverser entering repeat has its recursion bulked
- CollectingBarrierStep: All of the traversers prior to the step are put into a collection and then processed in some way prior to the collection being "drained" one-by-one to the next step. e.g. order(), sample(), aggregate(), barrier().
- ReducingBarrierStep: All of the traversers prior to the step are processed by a reduce function and once all the previous traversers are processed, a single "reduced value" traverser is emitted to the next step. e.g. fold(), count(), sum(), max(), min().
- SupplyingBarrierStep: All of the traversers prior to the step are iterated (no processing) and then some provided supplier yields a single traverser to continue to the next step. e.g. cap()
- TraversalVertexProgramStep: a barrier is introduced at the end of every adjacent vertex step. the traversal does its best to compute as much as possible at the current, local vertex. What it can't compute without referencing an adjacent vertex is aggregated into a barrier collection. When there are no more traversers at the local vertex, the barriered traversers are the messages that are propagated to remote vertices for further processing
Sack Step
![0ac6bebbc4175f493746ee1371250542.png](https://i-blog.csdnimg.cn/blog_migrate/c13c4b5c7f638e804d09394830c4ea88.jpeg)
- sacks: a data structure local to the traverser
- Each sack of each traverser is created when using GraphTraversal.withSack(initialValueSupplier, splitOperator?, mergeOperator?)
- sack() offers capability in that we can specify how items are added to the collection. For example they can be added using addition, multiplication, subtraction or division
![1b930abb3aa6e58bb771c7891ff9022f.png](https://i-blog.csdnimg.cn/blog_migrate/78fab1b985ad7bd21792407ba77fe3e2.jpeg)
- quantum computing: like physics wave-particle duality, a wave is a diffusion of energe in space, sack is the energy. sack is divided amongst children traverers sacks when parent traverser split. sack energy is summed to a single traverser sack when traversers merge
![ef9ae62f498cc2f0b6516a3b2d0d5da1.png](https://i-blog.csdnimg.cn/blog_migrate/a9e362fd610d4a863f26b55720e5e5c0.jpeg)
- sack with no parameters causes the current contents of the sack to be returned
![e8aa4bb57415ea37a359d786a6c60c99.png](https://i-blog.csdnimg.cn/blog_migrate/8d1e5538f72c650f31f85566680eb3a8.jpeg)
- a
![570fda8bde63dbb2ee0439285e689504.png](https://i-blog.csdnimg.cn/blog_migrate/14b60ca06f5e89b6629ee3f31bc20b1d.jpeg)
- sack用于聚合集合
Tree Step
![7afc2648aaae768500718ed01f6923e4.png](https://i-blog.csdnimg.cn/blog_migrate/07127df72cc1500138bf80a2e3748935.jpeg)
- the emanating paths from that element can be aggregated to form a tree
- see how the paths of all the emanating traversers are united to form the tree
Program Step
![38ed2ef444aead284c49ac7f0a83fa72.png](https://i-blog.csdnimg.cn/blog_migrate/45f1cf16a910814a06b3837b8b421b8a.jpeg)
- The step takes a VertexProgram as an argument and will process the incoming graph accordingly
- VertexProgramStrategy会将,除了用于OLAP的VertexProgram算法的Step外,OLTP的step都需要用TraversalVertexProgramStep包装,依此通过TraversalVertexProgram计算
Profile Step
![4a70df089b295b8276dc471421f34519.png](https://i-blog.csdnimg.cn/blog_migrate/79541395e98def71a1b50ec8cbceff53.jpeg)
- profile()-step generates a TraversalMetrics sideEffect object
- Traversers can be merged, so the Count represents the sum of all Traverser.bulk() results and thus
Match Step
![08738386ca349ac595bdfb610bfb16ab.png](https://i-blog.csdnimg.cn/blog_migrate/8bfe74282e664de94f2afc63f293fcde.jpeg)
- match() provides a more declarative way of pattern matching
- match() is stateless. The variable bindings of the traversal patterns are stored in the path history of the traverser. As such, the variables used over all match()-steps within a traversal are globally unique. A benefit of this is that subsequent where(), select(), match(), etc. steps can leverage the same variables in their analysis.
- If a variable is at the start of a traversal pattern it must exist as a label in the path history of the traverser else the traverser can not go down that path. If a variable is at the end of a traversal pattern then if the variable exists in the path history of the traverser, the traverser's current location must match (i.e. equal) its historic location at that same label
- introduce runtime traversal strategy. the traverser was able to make endogenous optimization decisions instead of exogenous traversal rewrite rules for optimization
- When a traverser is in match(), a registered MatchAlgorithm analyzes the current state of the traverser (i.e. its history based on its path data), the runtime statistics of the traversal patterns, and returns a traversal-pattern that the traverser should try next. The default CountMatchAlgorithm dynamically revises the pattern execution plan by sorting the patterns according to their filtering capabilities (i.e. largest set reduction patterns execute first)
- match写法有时比如as()-select()-where()的single-path traversals更简单
- Gremlin为命令式语言,match提供了Cypher类似的声明式语言写法,提供了更多优化空间,比如MatchAlgorithm
- gremlin多用来描述遍历路径,match pattern多用来描述符合条件的多条查询路径构成的子图
SQL2Gremlin
![e4bb1be0a3dab58faca0a6bdf9ef208f.png](https://i-blog.csdnimg.cn/blog_migrate/baad2321c2fc0f67a4cd4bdc25aa78ed.jpeg)
- adding out() steps is simpler than having to add join clauses
- 可以用repeat … until去构建长join语句,侧面说明了Gremlin比SQL在高度连接的网络数据上更直观遍历的优势
![f51fc7e02bff8cc0c678f79bdee276b3.png](https://i-blog.csdnimg.cn/blog_migrate/8ac98a6f59ad93fc165eea7a8d9d7a8e.jpeg)
- Select: select From table 翻译为 hasLabel(table),select columnNames 翻译为 values(col) 或 valueMap(col),LEN(col)等可用map()+lambda
- Filtering: where 翻译为 has(col, filter(x))
- Ordering: order by 翻译为 order().by(col)
- Grouping: group by 翻译为 groupCount().by(col)
- Joining: inner join翻译为in/out(),left join可翻译为 match(out()).select()
参考
- https://github.com/tinkerpop/gremlin/wiki
- http://tinkerpop.apache.org/gremlin.html
- http://tinkerpop.apache.org/docs/3.4.4/reference/
- http://tinkerpop-gremlin.cn/
- https://tinkerpop.apache.org/docs/3.4.7/tutorials/gremlins-anatomy/
- https://learnxinyminutes.com/
- https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html
- Turing machines slides. https://www.slideshare.net/lavishka_anuj/turing-machines-12176328
- Gremlin's Anatomy, https://www.slideshare.net/StephenMallette/gremlins-anatomy-88713465
- Gremlin 101.3 On Your FM Dial, https://www.slideshare.net/slidarko/gremlin-1013-on-your-fm-dial。很形象的例子
- ACM DBPL Keynote: The Graph Traversal Machine and Language, https://www.slideshare.net/slidarko/acm-dbpl-keynote-the-graph-traversal-machine-and-language
- The Gremlin Graph Traversal Language, https://www.slideshare.net/slidarko/the-gremlin-traversal-language
- Quantum Processes in Graph Computing, https://www.slideshare.net/slidarko/quantum-processes-in-graph-computing
- Gremlin's Graph Traversal Machinery, https://www.slideshare.net/slidarko/gremlins-graph-traversal-machinery
- https://www.datastax.com/blog/2016/10/gremlin-implementation-gremlin-traversal-machine
- Gremlin Recipes, https://www.datastax.com/blog/2017/09/gremlin-recipes
- Tales from the TinkerPop, https://www.datastax.com/blog/2015/07/tales-tinkerpop
- The Benefits of the Gremlin Graph Traversal Machine, https://www.datastax.com/blog/2015/09/benefits-gremlin-graph-traversal-machine
- Gremlin's Time Machine, https://www.datastax.com/blog/2016/09/gremlins-time-machine
- The Gremlin Graph Traversal Machine and Language, https://arxiv.org/abs/1508.03843
- http://sql2gremlin.com/
- Gremlin Recipes, https://www.datastax.com/blog/2017/09/gremlin-recipes