c++ map 遍历_Gremlin图遍历语言设计

2c60f5403923e05f52ad61ed91aa4d27.png

Turing Completeness

a7590966033526cf4e97212039131321.png
  • 图灵机M由元组(有限符合集F,有限状态集Q,转移函数L)组成,图灵机在无限纸条(可表达任意大的数据)上游走执行
  • 算法需要考虑是否对于任意长度的输入都能被有限个状态集和有限条纸袋数目处理
  • 图灵停机问题不可判定,理发师悖论(理发师给并且只给那些不给自己理发的人理发,那么理发师的发谁来理?),因此图灵机无法解决所有问题
  • 有限状态自动机(将无限的穷举空间划分为有限的匹配模式)只能表达正则语言。(自带栈的有限状态机,可存储数据)下推自动机PushDownAutomation能够表达上下文无关语言。图灵机比下推自动机带更多的栈。
  • 图灵机语言是递归可枚举语言。能事先验证(递归将解决问题方向变小)知道结果的语言,称为递归可枚举语言,即可以以暴力枚举匹配的方式判定被接受串
  • 图灵机完备的语言意味着语言可以使用图灵机完成任何图灵机机可以完成的任务
  • 不能在多项式时间内解决的问题为NP问题
  • 图灵完备的门槛比较低,循环可以用尾递归+状态条件判断来模拟,
  • 一般命令式语言如C、Java等都是图灵完备的,都有if分支语句、goto语句(或while、for之类的循环语句),以及能够进行赋值操作(也就是改变内存状态)。SQL-92 不是图灵完备
  • 图灵完备只表示可计算,不保证计算的效率与可理解性,因此有很多编程语言发明
  • lambda演算(λ-演算),形如(lambda x . body),是最小的程序设计语言,但与图灵机等价。lambda是数字符号主义,图灵机是机械主义

Graph Traversal Language

5ba5d40a21033e88205b81b3f6ad6ed3.png
  • instruction set must at minimum support values(), property(), sack(), choose(), repeat(), in(), and out()。即支持可随机移动,可修改状态与数据
  • The Turing machine state is the traverser's sack
  • The function of the Turing machine is the composition of the steps to create program
  • G is a line graph with each vertex representing a cell in the Turing tape with respective input symbols

Graph Traversal

6e85a611261e918aca6268b8c28e961b.png
  • TraversalSource: a generator of traversals for a particular graph, DSL, and execution engine.
  • GraphTraversalSource provides V() and E(). return type is GraphTraversal
  • Traversal<S,E>: a functional data flow process transforming objects of type S into object of type E. supports function composition as chain steps
  • GraphTraversal: a traversal DSL that is oriented towards the semantics of the raw graph (i.e. vertices, edges, etc.)
  • GraphTraversal is a monoid(如+,*) in that it is an algebraic structure that has a single binary operation that is associative. The binary operation is function composition (i.e. method chaining) and its identity is the step identity(). This is related to a monad as popularized by the functional programming community
  • The objects propagating through the traversal are wrapped in a Traverser<T>
  • One can write Gremlin traversals in any language, and Gremlin Traversal Machine (GTM) is responsible for processing traversals
  • Traversal<S,E>是Iterator<E>接口的实现类,S和E代表着出发点和结束。Traversal Representation就是一种以单步为规则的抽象,更类似函数式编程中functor的概念。 在一般的单步中,很多都可以传入一个Lambda或者functor(如Predictable),或者依赖一个迭代器(Iterator),非常类似高阶函数。__为匿名Traversal
  • Traverser<T>:在当前遍历中生成T的对象。遍历器Traverser维护Traversal的所有元数据,提供get()当前遍历对象,path()历史遍历路径,loops()当前循环次数,bulk()批量维护的对象个数,sack()遍历器相关的局部数据,sideEffects()遍历相关的全局数据
  • TraversalStrategy:拦截器(interceptor)方法,优化调整更换遍历的执行步骤。注意新加自定义的Step可能影响到策略调整
  • TraversalSideEffects:用于存储图的遍历中的全局信息键值对,类型为Map<String,Object>. used like withSideEffect()-sideEffect(Traversal.store(x)).select(x)

Path Data Structure

2f89ffb8103fe1b76e317b55b7310016.png
  • The Path data structure is an ordered list of objects, where each object is associated to a Set<String> of labels
  • While the path step is a lot more convenient, in some cases it can be very expensive in terms of memory and CPU usage so it is worth remembering these alternative techniques using as() and select()

e4827b47a7aecdf86ec18d97c0b4a2a6.png
  • Path并不一定将历史遍历路径都记录,会根据Gremlin需要返回的结果来选择性地记录路径信息,比如g.V().outE().inV().path()比g.V().out().path()结果多了路径上边的信息

Step

efcbe09ca24a9ba07ab3d31a8a69cef6.png
  • step is a stateless function
  • process a stream/flow of input traversers, and yields a stream of output traversers/iterator. yield means not execute immediately
  • Step<S,E>是无状态单步,状态数据在Treverser中

8f012a16c7a0930437d42aed1429f509.png
  • map: e.g. constant(), count(), elementMap(), fold(), group(), groupCount(), id(), identity(), label(), loops(), match(), max(), min(), order(), program(), path(), project(), propertyMap(), sack(), select(), sum(), valueMap()
  • flatMap: e.g. unfold()
  • filter: e.g. and(), or(), not(), coin(), dedup(), drop(), has(), is(), limit(), none(), range()
  • sideEffect: e.g. aggregate(scope, x), group(), groupCount(), inject(), pageRank(), profile(), sack(), subgraph(), tree()
  • branch steps: e.g. choose(), coalesce(), local(), repeat(), union()
  • step modulators: look back on the previous step and modifies their behavior.e.g. as(), by(), emit(), from(), to()
  • start steps: configuration methods like with(), withComputer(), withSack(), withSideEffect(), withStrategies()..., and spawn steps like V(), E(), addV(), addE(), inject()
  • terminal steps: steps do not return a traversal, but execute the traversal and return a result. e.g. hasNext(), next(), tryNext(), iterator(), toList(), toSet(), fill(collection), iterate(), explain()
  • barrier steps: barrier()-step (barrier) turns the lazy traversal pipeline into a bulk-synchronous pipeline. e.g. barrier(), cap(), count(), fold(), max(), min(), sum()
  • sideEffect get/set steps: aggregate() aggregate, cap() get, group() reduce
  • traversal objects get/set steps: fold() aggregate, group() reduce
  • math steps: string-based math processor. Variables can be path labels, side-effects, or incoming map keys. can use calculator operators
  • statistical steps: count(), sum(), max(), min(), mean()
  • VertexProgram steps: e.g. pageRank()
  • logical operators: and(), or() and not(), return Boolean
  • Predicates: eq/lt/gt/inside/outside/between/within/without. and Text searching predicates, startingWith/endingWith/containing
  • Random: coin(), sample()
  • optional: optional(), coalesce() to select one possible traversal

Graph Step & VertexStep

e545b8d422268b4697f3492e248d6fb0.png
  • start a Traversal
  • graph step: G -> V*
  • vertex step: V -> V*
  • "graph" represents a graph instance, and "g" represents an instance of a graph traversal source object. g only needs to be instantiated once and should then be re-used

Select Step

  • select() can go back to a previously seen area of computation
  • Select labeled steps within a path (as defined by as(x)...select(x) in a traversal). select() will do its best to avoid calculating the path history, instead will rely on a global data structure for storing the currently selected object
  • Select objects out of a Map<String,Object> flow (i.e. a select(keys/values)).

Project Step

  • map projects the current object into a Map<String,Object> keyed by provided labels
  • It is similar to select()-step, save that instead of retrieving and modulating historic traverser state, it modulates the current state of the traverser

As Modulator

  • as(label) provide a label to the step that can later be accessed by steps and data structures that make use of such labels, e.g. select(), match(), path()
  • as step allows us to refer back to the previous state of a traversal but potentially requires a lot of memory to hold this state during complex queries
  • label可理解为一个标记,不是property graph的点/边的label
  • 通过as()可以相比path()更有效率地获取路径中的部分信息

By Modulator

dd7b6d75f8a2fb5e6018512eb97181be.png
  • by modulator steps are processed in a round robin fashion
  • by(key)可以获取先前对象的键值,path().by(Traversal)可作用于path的每个element, order().by(Traversal, incr)可以选择element属性值或者某种遍历值来排序

Global Scope & Local Scope

fc4e1719ad171e524137f295a422ac76.png
  • nested functions -> nested steps

Group Step

e9986b808911368af706e286b60188ff.png
  • group step returns map collection, can use unfold to unbundle the collection

Has Step

63b2bfb4803fbf266f617d90278608cc.png
  • filter vertices, edges, and vertex properties based on their properties

Local Step

2c9dd8d1f6b0324375b65229b7f415b0.png
  • Scope determines whether the particular step being scoped is with respects to the current object (local) at that step or to the entire stream of objects up to that step (global)

9183b99878ecad7b36a368c032345b9a.png
  • traversal operates on a continuous stream of objects. but local() operate on a single element within that stream
  • local() which wraps a section of the traversal in a object-local traversal
  • Local Step is quite similar in functionality to FlatMap where it can often be confused. local() propagates the traverser through the internal traversal as is without splitting/cloning it, so it is not flatMap step

Aggregate Step

673caa37f5605e3c39c86c4a927e49d0.png
  • Using aggregate to create collections during a traversal
  • global scope means that the step will use eager evaluation in that no objects continue on until all previous objects have been fully aggregated
  • local scope the aggregation will occur in a lazy evaluation. EarlyLimitStrategy alters the behavior of aggregate(local)

Repeat Step

c3fe3618740bc6018bb744379673afc5.png
  • looping over a traversal given some break predicate, can explore vertices and edges using both, bothE, bothV and otherV

7bad03db4229ae649b40c31cae70878c.png
  • two modulators for repeat(): until() and emit()
  • repeat()...until()/emit() is do/while looping, until()/emit()...repeat() is while/do looping
  • emit-predicate, with emit(), the traverser is split in two — the traverser exits the code block as well as continues back within the code block (assuming until() holds true)

15ba525b9ec09e95fdc5d8aa1dfbfeac.png
  • use a repeat…​until loop to look for shortest paths
  • Using emit to return results during a repeat loop
  • having a high or low "fan out" of possible routes depending on the query direction, so path finding can be memory and CPU intensive

Union Step

39f69f45e164046d4cad5a5846df0dbf.png
  • Using union to combine query results

Barrier Step

d6ef6f32d4c24f84295ff835b11d7435.png
  • barrier turns the lazy traversal pipeline into a bulk-synchronous pipeline
  • When everything prior to barrier() needs to be executed before moving onto the steps after the barrier()
  • When "stalling" the traversal may lead to a "bulking optimization" in traversals that repeatedly touch many of the same elements (i.e. optimizing). e.g. If there are one million traversers at vertex 1, represent those one million traversers as a single traverser with a Traverser.bulk() equal to one million and execute both() once, instead of calculate one million both() computations
  • LazyBarrierStrategy inserts barrier() into a traversal where appropriate in order to gain the "bulking optimization."
  • Each traverser entering repeat has its recursion bulked
  • CollectingBarrierStep: All of the traversers prior to the step are put into a collection and then processed in some way prior to the collection being "drained" one-by-one to the next step. e.g. order(), sample(), aggregate(), barrier().
  • ReducingBarrierStep: All of the traversers prior to the step are processed by a reduce function and once all the previous traversers are processed, a single "reduced value" traverser is emitted to the next step. e.g. fold(), count(), sum(), max(), min().
  • SupplyingBarrierStep: All of the traversers prior to the step are iterated (no processing) and then some provided supplier yields a single traverser to continue to the next step. e.g. cap()
  • TraversalVertexProgramStep: a barrier is introduced at the end of every adjacent vertex step. the traversal does its best to compute as much as possible at the current, local vertex. What it can't compute without referencing an adjacent vertex is aggregated into a barrier collection. When there are no more traversers at the local vertex, the barriered traversers are the messages that are propagated to remote vertices for further processing

Sack Step

0ac6bebbc4175f493746ee1371250542.png
  • sacks: a data structure local to the traverser
  • Each sack of each traverser is created when using GraphTraversal.withSack(initialValueSupplier, splitOperator?, mergeOperator?)
  • sack() offers capability in that we can specify how items are added to the collection. For example they can be added using addition, multiplication, subtraction or division

1b930abb3aa6e58bb771c7891ff9022f.png
  • quantum computing: like physics wave-particle duality, a wave is a diffusion of energe in space, sack is the energy. sack is divided amongst children traverers sacks when parent traverser split. sack energy is summed to a single traverser sack when traversers merge

ef9ae62f498cc2f0b6516a3b2d0d5da1.png
  • sack with no parameters causes the current contents of the sack to be returned

e8aa4bb57415ea37a359d786a6c60c99.png
  • a

570fda8bde63dbb2ee0439285e689504.png
  • sack用于聚合集合

Tree Step

7afc2648aaae768500718ed01f6923e4.png
  • the emanating paths from that element can be aggregated to form a tree
  • see how the paths of all the emanating traversers are united to form the tree

Program Step

38ed2ef444aead284c49ac7f0a83fa72.png
  • The step takes a VertexProgram as an argument and will process the incoming graph accordingly
  • VertexProgramStrategy会将,除了用于OLAP的VertexProgram算法的Step外,OLTP的step都需要用TraversalVertexProgramStep包装,依此通过TraversalVertexProgram计算

Profile Step

4a70df089b295b8276dc471421f34519.png
  • profile()-step generates a TraversalMetrics sideEffect object
  • Traversers can be merged, so the Count represents the sum of all Traverser.bulk() results and thus

Match Step

08738386ca349ac595bdfb610bfb16ab.png
  • match() provides a more declarative way of pattern matching
  • match() is stateless. The variable bindings of the traversal patterns are stored in the path history of the traverser. As such, the variables used over all match()-steps within a traversal are globally unique. A benefit of this is that subsequent where(), select(), match(), etc. steps can leverage the same variables in their analysis.
  • If a variable is at the start of a traversal pattern it must exist as a label in the path history of the traverser else the traverser can not go down that path. If a variable is at the end of a traversal pattern then if the variable exists in the path history of the traverser, the traverser's current location must match (i.e. equal) its historic location at that same label
  • introduce runtime traversal strategy. the traverser was able to make endogenous optimization decisions instead of exogenous traversal rewrite rules for optimization
  • When a traverser is in match(), a registered MatchAlgorithm analyzes the current state of the traverser (i.e. its history based on its path data), the runtime statistics of the traversal patterns, and returns a traversal-pattern that the traverser should try next. The default CountMatchAlgorithm dynamically revises the pattern execution plan by sorting the patterns according to their filtering capabilities (i.e. largest set reduction patterns execute first)
  • match写法有时比如as()-select()-where()的single-path traversals更简单
  • Gremlin为命令式语言,match提供了Cypher类似的声明式语言写法,提供了更多优化空间,比如MatchAlgorithm
  • gremlin多用来描述遍历路径,match pattern多用来描述符合条件的多条查询路径构成的子图

SQL2Gremlin

e4bb1be0a3dab58faca0a6bdf9ef208f.png
  • adding out() steps is simpler than having to add join clauses
  • 可以用repeat …​ until去构建长join语句,侧面说明了Gremlin比SQL在高度连接的网络数据上更直观遍历的优势

f51fc7e02bff8cc0c678f79bdee276b3.png
  • Select: select From table 翻译为 hasLabel(table),select columnNames 翻译为 values(col) 或 valueMap(col),LEN(col)等可用map()+lambda
  • Filtering: where 翻译为 has(col, filter(x))
  • Ordering: order by 翻译为 order().by(col)
  • Grouping: group by 翻译为 groupCount().by(col)
  • Joining: inner join翻译为in/out(),left join可翻译为 match(out()).select()


参考

  • https://github.com/tinkerpop/gremlin/wiki
  • http://tinkerpop.apache.org/gremlin.html
  • http://tinkerpop.apache.org/docs/3.4.4/reference/
  • http://tinkerpop-gremlin.cn/
  • https://tinkerpop.apache.org/docs/3.4.7/tutorials/gremlins-anatomy/
  • https://learnxinyminutes.com/
  • https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html
  • Turing machines slides. https://www.slideshare.net/lavishka_anuj/turing-machines-12176328
  • Gremlin's Anatomy, https://www.slideshare.net/StephenMallette/gremlins-anatomy-88713465
  • Gremlin 101.3 On Your FM Dial, https://www.slideshare.net/slidarko/gremlin-1013-on-your-fm-dial。很形象的例子
  • ACM DBPL Keynote: The Graph Traversal Machine and Language, https://www.slideshare.net/slidarko/acm-dbpl-keynote-the-graph-traversal-machine-and-language
  • The Gremlin Graph Traversal Language, https://www.slideshare.net/slidarko/the-gremlin-traversal-language
  • Quantum Processes in Graph Computing, https://www.slideshare.net/slidarko/quantum-processes-in-graph-computing
  • Gremlin's Graph Traversal Machinery, https://www.slideshare.net/slidarko/gremlins-graph-traversal-machinery
  • https://www.datastax.com/blog/2016/10/gremlin-implementation-gremlin-traversal-machine
  • Gremlin Recipes, https://www.datastax.com/blog/2017/09/gremlin-recipes
  • Tales from the TinkerPop, https://www.datastax.com/blog/2015/07/tales-tinkerpop
  • The Benefits of the Gremlin Graph Traversal Machine, https://www.datastax.com/blog/2015/09/benefits-gremlin-graph-traversal-machine
  • Gremlin's Time Machine, https://www.datastax.com/blog/2016/09/gremlins-time-machine
  • The Gremlin Graph Traversal Machine and Language, https://arxiv.org/abs/1508.03843
  • http://sql2gremlin.com/
  • Gremlin Recipes, https://www.datastax.com/blog/2017/09/gremlin-recipes
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值