源地址:http://blog.csdn.net/snail_gesture/article/details/5151058
背景:
整个Spark Streaming是按照Batch Duractions划分Job的。但是很多时候我们需要算过去的一天甚至一周的数据,这个时候不可避免的要进行状态管理,而Spark Streaming每个Batch Duractions都会产生一个Job,Job里面都是RDD,所以此时面临的问题就是怎么对状态进行维护?这个时候就需要借助updateStateByKey和mapWithState方法完成核心的步骤。
源码分析:
1. 无论是updateStateByKey还是mapWithState方法在DStream中均没有,但是是通过隐身转换函数实现其功能。
<code class="hljs markdown has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">object DStream { // <span class="hljs-code" style="box-sizing: border-box;">`toPairDStreamFunctions`</span> was in SparkContext before 1.3 and users had to // <span class="hljs-code" style="box-sizing: border-box;">`import StreamingContext._`</span> to enable it. Now we move it here to make the compiler find // it automatically. However, we still keep the old function in StreamingContext for backward // compatibility and forward to the following function directly. implicit def toPairDStreamFunctions[<span class="hljs-link_label" style="box-sizing: border-box;">K, V</span>](<span class="hljs-link_url" style="box-sizing: border-box;">stream: DStream[(K, V</span>)]) <span class="hljs-code" style="box-sizing: border-box;"> (implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null):</span> <span class="hljs-code" style="box-sizing: border-box;"> PairDStreamFunctions[K, V] = {</span> <span class="hljs-code" style="box-sizing: border-box;"> new PairDStreamFunctions[K, V](stream)</span> } </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li></ul>
updateStateByKey:
1. 在PairDStreamFunctions中updateStateByKey具体实现如下:
在已有的历史基础上,updateFunc对历史数据进行更新。该函数的返回值是DStream类型的。
<code class="hljs fsharp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">/** * Return a <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"state"</span> DStream where the state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> each key is updated by applying * the given <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span> on the previous state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> the <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> values <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> each key. * Hash partitioning is used <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> generate the RDDs <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">with</span> Spark's <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">default</span> number <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> partitions. * @param updateFunc State update <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span>. If `this` <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span> returns None, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">then</span> * corresponding state key-value pair will be eliminated. * @tparam S State <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">type</span></span> */ def updateStateByKey[S: ClassTag]( updateFunc: (Seq[V], Option[S]) => Option[S] ): DStream[(K, S)] = ssc.withScope { <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// defaultPartitioner</span> updateStateByKey(updateFunc, defaultPartitioner()) } </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li></ul>
2. defaultPartitioner:
<code class="hljs cs has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span>[streaming] def <span class="hljs-title" style="box-sizing: border-box;">defaultPartitioner</span>(numPartitions: Int = self.ssc.sc.defaultParallelism) = { <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> HashPartitioner(numPartitions) } </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>
3. partitioner就是控制RDD的每个patition
<code class="hljs coffeescript has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">/** * Return a <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"state"</span> DStream where the state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> each key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> updated <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">by</span> applying * the given <span class="hljs-reserved" style="box-sizing: border-box;">function</span> <span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">on</span> the previous state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> the <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> values <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the key. * org.apache.spark.Partitioner <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> used to control the partitioning <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> each RDD. * <span class="hljs-property" style="box-sizing: border-box;">@param</span> updateFunc State update <span class="hljs-reserved" style="box-sizing: border-box;">function</span>. If `<span class="javascript" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">this</span></span>` <span class="hljs-reserved" style="box-sizing: border-box;">function</span> returns None, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">then</span> * corresponding state key-value pair will be eliminated. * <span class="hljs-property" style="box-sizing: border-box;">@param</span> partitioner Partitioner <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> controlling the partitioning <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> each RDD <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> the <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> * DStream. * <span class="hljs-property" style="box-sizing: border-box;">@tparam</span> S State type */ def updateStateByKey[<span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">S</span>: ClassTag]( <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">updateFunc</span>: <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(Seq[V], Option[S])</span> =></span> Option[S], <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">partitioner</span>: Partitioner ): DStream[(K, S)] = ssc.withScope { val cleanedUpdateF = sparkContext.clean(updateFunc) val <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-title" style="box-sizing: border-box;">newUpdateFunc</span> = <span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(iterator: Iterator[(K, Seq[V], Option[S])])</span> =></span> { iterator.flatMap(t<span class="hljs-function" style="box-sizing: border-box;"> =></span> cleanedUpdateF(t._2, t._3).map(s<span class="hljs-function" style="box-sizing: border-box;"> =></span> (t._1, s))) } updateStateByKey(newUpdateFunc, partitioner, <span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">true</span>) } </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li></ul>
4. rememberPartitioner默认为true
<code class="hljs applescript has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">/** * Return a new <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"state"</span> DStream <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">where</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> each key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> updated <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">by</span> applying * <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">given</span> function <span class="hljs-function_start" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">on</span></span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> previous state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> new values <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> each key. * org.apache.spark.Partitioner <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> used <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> control <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> partitioning <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> each RDD. * @param updateFunc State update function. Note, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">that</span> this function may generate a different * tuple <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">with</span> a different key than <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> input key. Therefore keys may be removed * <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">or</span> added <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> this way. It <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> up <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> developer <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> decide whether <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> * remember <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> partitioner despite <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> key being changed. * @param partitioner Partitioner <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> controlling <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> partitioning <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> each RDD <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> new * DStream * @param rememberPartitioner Whether <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> remember <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> paritioner object <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> generated RDDs. * @tparam S State type */ def updateStateByKey[S: ClassTag]( updateFunc: (Iterator[(K, Seq[V], Option[S])]) => Iterator[(K, S)], partitioner: Partitioner, rememberPartitioner: Boolean ): DStream[(K, S)] = ssc.withScope { new StateDStream(self, ssc.sc.clean(updateFunc), partitioner, rememberPartitioner, None) } </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li></ul>
5. 在StateDStream中,StorageLevel是直接存储到磁盘,因为此时的数据非常大
<code class="hljs haskell has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">StateDStream</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">ClassTag</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">V</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">ClassTag</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">ClassTag</span>]<span class="hljs-container" style="box-sizing: border-box;">( <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">parent</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">DStream</span>[(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">V</span>)</span>], updateFunc: <span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Iterator</span>[(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Seq</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">V</span>], <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Option</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>])</span>]) => <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Iterator</span>[<span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>)</span>], partitioner: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Partitioner</span>, preservePartitioning: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Boolean</span>, initialRDD : <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Option</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">RDD</span>[<span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>)</span>]] ) extends <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">DStream</span>[<span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>)</span>]<span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">parent</span>.<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">ssc</span>)</span> { super.persist<span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">StorageLevel</span>.<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">MEMORY_ONLY_SER</span>)</span> </span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li></ul>
- 在computeUsingPreiviousRDD源码如下:
<code class="hljs scala has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> [<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">this</span>] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> computeUsingPreviousRDD ( parentRDD : RDD[(K, V)], prevStateRDD : RDD[(K, S)]) = { <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// Define the function for the mapPartition operation on cogrouped RDD;</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// first map the cogrouped tuple to tuples of required type,</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// and then apply the update function</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> updateFuncLocal = updateFunc <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> finalFunc = (iterator: Iterator[(K, (Iterable[V], Iterable[S]))]) => { <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> i = iterator.map(t => { <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> itr = t._2._2.iterator <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> headOption = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (itr.hasNext) Some(itr.next()) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> None (t._1, t._2._1.toSeq, headOption) }) updateFuncLocal(i) } <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//cogroup每次计算的时候都会遍历prevSrateRDD中的所有parititioner的信息</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> cogroupedRDD = parentRDD.cogroup(prevStateRDD, partitioner) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> stateRDD = cogroupedRDD.mapPartitions(finalFunc, preservePartitioning) Some(stateRDD) } </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li></ul>
所以,如果数据很多的时候不建议使用updateStateByKey。
updateStateByKey函数实现如下:
mapWithState:
1. 返回MapWithStateDStream函数,维护和更新历史状态都是基于Key。使用一个function对key-value形式的数据进行状态维护。
<code class="hljs lua has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">/** * :: Experimental :: * Return a <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[MapWithStateDStream]]</span> by applying a <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span> <span class="hljs-title" style="box-sizing: border-box;">to</span> <span class="hljs-title" style="box-sizing: border-box;">every</span> <span class="hljs-title" style="box-sizing: border-box;">key</span>-<span class="hljs-title" style="box-sizing: border-box;">value</span> <span class="hljs-title" style="box-sizing: border-box;">element</span> <span class="hljs-title" style="box-sizing: border-box;">of</span> * `<span class="hljs-title" style="box-sizing: border-box;">this</span>` <span class="hljs-title" style="box-sizing: border-box;">stream</span>, <span class="hljs-title" style="box-sizing: border-box;">while</span> <span class="hljs-title" style="box-sizing: border-box;">maintaining</span> <span class="hljs-title" style="box-sizing: border-box;">some</span> <span class="hljs-title" style="box-sizing: border-box;">state</span> <span class="hljs-title" style="box-sizing: border-box;">data</span> <span class="hljs-title" style="box-sizing: border-box;">for</span> <span class="hljs-title" style="box-sizing: border-box;">each</span> <span class="hljs-title" style="box-sizing: border-box;">unique</span> <span class="hljs-title" style="box-sizing: border-box;">key</span>. <span class="hljs-title" style="box-sizing: border-box;">The</span> <span class="hljs-title" style="box-sizing: border-box;">mapping</span> <span class="hljs-title" style="box-sizing: border-box;">function</span> * <span class="hljs-title" style="box-sizing: border-box;">and</span> <span class="hljs-title" style="box-sizing: border-box;">other</span> <span class="hljs-title" style="box-sizing: border-box;">specification</span> <span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(e.g. partitioners, timeouts, initial state data, etc.)</span></span> of this * transformation can be specified using <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[StateSpec]]</span> class. The state data is accessible <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> * as a parameter of <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">type</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[State]]</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> the mapping <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span>. * * <span class="hljs-title" style="box-sizing: border-box;">Example</span> <span class="hljs-title" style="box-sizing: border-box;">of</span> <span class="hljs-title" style="box-sizing: border-box;">using</span> `<span class="hljs-title" style="box-sizing: border-box;">mapWithState</span>`: * {{{ * // <span class="hljs-title" style="box-sizing: border-box;">A</span> <span class="hljs-title" style="box-sizing: border-box;">mapping</span> <span class="hljs-title" style="box-sizing: border-box;">function</span> <span class="hljs-title" style="box-sizing: border-box;">that</span> <span class="hljs-title" style="box-sizing: border-box;">maintains</span> <span class="hljs-title" style="box-sizing: border-box;">an</span> <span class="hljs-title" style="box-sizing: border-box;">integer</span> <span class="hljs-title" style="box-sizing: border-box;">state</span> <span class="hljs-title" style="box-sizing: border-box;">and</span> <span class="hljs-title" style="box-sizing: border-box;">return</span> <span class="hljs-title" style="box-sizing: border-box;">a</span> <span class="hljs-title" style="box-sizing: border-box;">String</span> //此时的<span class="hljs-title" style="box-sizing: border-box;">state</span>就可以看成一张表,这张表记录了状态维护中所有的历史状态。 * <span class="hljs-title" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">mappingFunction</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(key: String, value: Option[Int], state: State[Int])</span></span>: Option[String] = { * // Use state.exists(), state.get(), state.update() <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> state.remove() * // to manage state, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> the necessary <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span> * } * * val spec = StateSpec.function(mappingFunction).numPartitions(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) * * val mapWithStateDStream = keyValueDStream.mapWithState[StateType, MappedType](spec) * }}} * * @param spec Specification of this transformation * @tparam StateType Class <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">type</span> of the state data * @tparam MappedType Class <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">type</span> of the mapped data */ @Experimental def mapWithState[StateType: ClassTag, MappedType: ClassTag]( spec: StateSpec[K, V, StateType, MappedType] ): MapWithStateDStream[K, V, StateType, MappedType] = { new MapWithStateDStreamImpl[K, V, StateType, MappedType]( self, // StateSpecImpl类封装了StateSpec操作。 spec.asInstanceOf[StateSpecImpl[K, V, StateType, MappedType]] ) } </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li></ul>
2. MapWithStateDStream源码如下:
<code class="hljs fsharp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">/** * :: Experimental :: * DStream representing the stream <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> data generated by `mapWithState` operation on a * [[org.apache.spark.streaming.dstream.PairDStreamFunctions pair DStream]]. * Additionally, it also gives access <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> the stream <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> state snapshots, that is, the state data <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> * all keys after a batch has updated them. * * @tparam KeyType Class <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the key * @tparam ValueType Class <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the value * @tparam StateType Class <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the state data * @tparam MappedType Class <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the mapped data */ @Experimental sealed <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">abstract</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> MapWithStateDStream[KeyType, ValueType, StateType, MappedType: ClassTag]( ssc: StreamingContext) extends DStream[MappedType](ssc) { /** Return a pair DStream where each RDD is the snapshot <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> all the keys. */ def stateSnapshots(): DStream[(KeyType, StateType)] } /** Internal implementation <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the [[MapWithStateDStream]] */ <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span>[streaming] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> MapWithStateDStreamImpl[ KeyType: ClassTag, ValueType: ClassTag, StateType: ClassTag, MappedType: ClassTag]( dataStream: DStream[(KeyType, ValueType)], spec: StateSpecImpl[KeyType, ValueType, StateType, MappedType]) extends MapWithStateDStream[KeyType, ValueType, StateType, MappedType](dataStream.context) { <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> internalStream = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> InternalMapWithStateDStream[KeyType, ValueType, StateType, MappedType](dataStream, spec) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">override</span> def slideDuration: Duration = internalStream.slideDuration <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">override</span> def dependencies: List[DStream[_]] = List(internalStream) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//计算的时候是通过InternalMapWithStateDStream来实现的。</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">override</span> def compute(validTime: Time): Option[RDD[MappedType]] = { internalStream.getOrCompute(validTime).map { _.flatMap[MappedType] { _.mappedData } } } </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li></ul>
3. 更新历史数据。
<code class="hljs haskell has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">/** * <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">A</span> <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">DStream</span> that allows per-key state to be maintains, and arbitrary records to be generated * based on updates to the state. <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">This</span> is the main <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">DStream</span> that implements the `mapWithState` * operation on <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">DStreams</span>. * * @param parent (key, value) stream that is the source * @param spec <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Specifications</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the mapWithState operation * @tparam <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span> <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Key</span> <span class="hljs-typedef" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">type</span></span> * @tparam <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">V</span> <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Value</span> <span class="hljs-typedef" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">type</span></span> * @tparam <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span> <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Type</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the state maintained * @tparam <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">E</span> <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Type</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the mapped <span class="hljs-typedef" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">data</span></span> */ <span class="hljs-title" style="box-sizing: border-box;">private</span>[streaming] <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">InternalMapWithStateDStream</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">ClassTag</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">V</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">ClassTag</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">ClassTag</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">E</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">ClassTag</span>]<span class="hljs-container" style="box-sizing: border-box;">( <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">parent</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">DStream</span>[(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">V</span>)</span>], spec: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">StateSpecImpl</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">V</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">E</span>]) extends <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">DStream</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">MapWithStateRDDRecord</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">E</span>]]<span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">parent</span>.<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">context</span>)</span> { //不断的更新内存数据结构。 persist<span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">StorageLevel</span>.<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">MEMORY_ONLY</span>)</span> </span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li></ul>
4. MapWithStateDStream.Compute
<code class="hljs sql has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">/** Method that generates a RDD for the given time */</span> override def compute(validTime: Time): Option[RDD[MapWithStateRDDRecord[K, S, E]]] = { // Get the previous state or <span class="hljs-operator" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">create</span> a new empty state RDD val prevStateRDD = getOrCompute(validTime - slideDuration) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">match</span> { <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">case</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">Some</span>(rdd) => <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (rdd.partitioner != <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">Some</span>(partitioner)) { // <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">If</span> the RDD <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">not</span> partitioned the <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">right</span> way, let us repartition it <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">using</span> the // partition index <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> the <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">key</span>. This <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> ensure that state RDD <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> always partitioned // <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">before</span> creating another state RDD <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">using</span> it MapWithStateRDD.createFromRDD[K, V, S, E]( rdd.flatMap { _.stateMap.getAll() }, partitioner, validTime) } <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> { rdd } <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">case</span> None => MapWithStateRDD.createFromPairRDD[K, V, S, E]( spec.getInitialStateRDD().getOrElse(new EmptyRDD[(K, S)](ssc.sparkContext)), partitioner, validTime ) } //基于时间窗口创建RDD // Compute the new state RDD <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">with</span> previous state RDD <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> partitioned data RDD // Even <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> there <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">no</span> data RDD, use an empty one <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">create</span> a new state RDD val dataRDD = parent.getOrCompute(validTime).getOrElse { context.sparkContext.emptyRDD[(K, V)] } val partitionedDataRDD = dataRDD.partitionBy(partitioner) val timeoutThresholdTime = spec.getTimeoutInterval().map { <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">interval</span> => (validTime - <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">interval</span>).milliseconds } <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">Some</span>(new MapWithStateRDD( prevStateRDD, partitionedDataRDD, mappingFunction, validTime, timeoutThresholdTime)) } } </span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li></ul>
5. MapWithStateRDD: 是一个RDD,他本身包含了对mapWithState操作的数据,以及对数据怎么操作,MapWithStateRDDRecord代表了每个RDD的partition。
<code class="hljs fsharp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">/** * RDD storing the keyed states <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> `mapWithState` operation <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> corresponding mapped data. * Each partition <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> this RDD has a single record <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">type</span> [[<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">MapWithStateRDDRecord</span>]]. <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">This</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">contains</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">a</span></span> * [[StateMap]] (containing the keyed-states) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> the sequence <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> records returned by the mapping * <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> `mapWithState`. * @param prevStateRDD The previous MapWithStateRDD on whose StateMap data `this` RDD * will be created * @param partitionedDataRDD The partitioned data RDD which is used update the previous StateMaps * <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> the `prevStateRDD` <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> create `this` RDD * @param mappingFunction The <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span> that will be used <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> update state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> data * @param batchTime The time <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the batch <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> which this RDD belongs <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span>. Use <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> update * @param timeoutThresholdTime The time <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> indicate which keys are timeout */ <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span>[streaming] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> MapWithStateRDD[K: ClassTag, V: ClassTag, S: ClassTag, E: ClassTag]( <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> var prevStateRDD: RDD[MapWithStateRDDRecord[K, S, E]], <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> var partitionedDataRDD: RDD[(K, V)], mappingFunction: (Time, K, Option[V], State[S]) => Option[E], batchTime: Time, timeoutThresholdTime: Option[Long] ) extends RDD[MapWithStateRDDRecord[K, S, E]]( partitionedDataRDD.sparkContext, List( <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> OneToOneDependency[MapWithStateRDDRecord[K, S, E]](prevStateRDD), <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> OneToOneDependency(partitionedDataRDD)) ) { @volatile <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> var doFullScan = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">false</span> require(prevStateRDD.partitioner.nonEmpty) require(partitionedDataRDD.partitioner == prevStateRDD.partitioner) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">override</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> partitioner = prevStateRDD.partitioner <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">override</span> def checkpoint(): Unit = { super.checkpoint() doFullScan = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">true</span> } <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">override</span> def compute( partition: Partition, context: TaskContext): Iterator[MapWithStateRDDRecord[K, S, E]] = { <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> stateRDDPartition = partition.asInstanceOf[MapWithStateRDDPartition] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> prevStateRDDIterator = prevStateRDD.iterator( stateRDDPartition.previousSessionRDDPartition, context) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> dataIterator = partitionedDataRDD.iterator( stateRDDPartition.partitionedDataRDDPartition, context) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> prevRecord = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (prevStateRDDIterator.hasNext) Some(prevStateRDDIterator.next()) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> None <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> newRecord = MapWithStateRDDRecord.updateRecordWithData( prevRecord, dataIterator, mappingFunction, batchTime, timeoutThresholdTime, removeTimedoutData = doFullScan <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// remove timedout data only when full scan is enabled</span> ) Iterator(newRecord) } </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li></ul>
6. updateRecordWithData: RDD本身不可变的,但是可以处理变化的数据。
<code class="hljs coffeescript has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">def updateRecordWithData[<span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">K</span>: ClassTag, <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">V</span>: ClassTag, <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">S</span>: ClassTag, <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">E</span>: ClassTag]( <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">prevRecord</span>: Option[MapWithStateRDDRecord[K, S, E]], <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">dataIterator</span>: Iterator[(K, V)], <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">mappingFunction</span>: <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(Time, K, Option[V], State[S])</span> =></span> Option[E], <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">batchTime</span>: Time, <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">timeoutThresholdTime</span>: Option[Long], <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">removeTimedoutData</span>: Boolean ): MapWithStateRDDRecord[K, S, E] = { <span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span> Create a <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> state map <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">by</span> cloning the previous one (<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> it exists) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">or</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">by</span> creating an empty one val newStateMap = prevRecord.map { _.stateMap.copy() }. getOrElse { <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> EmptyStateMap[K, S]() } val mappedData = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> ArrayBuffer[E] val wrappedState = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> StateImpl[S]() <span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span> Call the mapping <span class="hljs-reserved" style="box-sizing: border-box;">function</span> <span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">on</span> each record <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> the data iterator, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> accordingly <span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span> update the states touched, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> collect the data returned <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">by</span> the mapping <span class="hljs-reserved" style="box-sizing: border-box;">function</span> dataIterator.foreach { <span class="hljs-reserved" style="box-sizing: border-box;">case</span> <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(key, value)</span> =></span> wrappedState.wrap(newStateMap.get(key)) val returned = mappingFunction(batchTime, key, Some(value), wrappedState) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (wrappedState.isRemoved) { newStateMap.remove(key) } <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (wrappedState.isUpdated || timeoutThresholdTime.isDefined) { <span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span>遍历当前所有batchTime的所有数据,然后使用自定义的函数对当前的batch数据进行计算,更新newStateMap数据结构。 <span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span> newStateMap是保存历史数据 newStateMap.put(key, wrappedState.get(), batchTime.milliseconds) } mappedData ++= returned } <span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span> Get the timed out state records, call the mapping <span class="hljs-reserved" style="box-sizing: border-box;">function</span> <span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">on</span> each <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> collect the <span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span> data returned <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (removeTimedoutData && timeoutThresholdTime.isDefined) { newStateMap.getByTime<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(timeoutThresholdTime.get)</span>.<span class="hljs-title" style="box-sizing: border-box;">foreach</span> { <span class="hljs-title" style="box-sizing: border-box;">case</span> <span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(key, state, _)</span> =></span> wrappedState.wrapTimingOutState(state) val returned = mappingFunction(batchTime, key, None, wrappedState) mappedData ++= returned newStateMap.remove(key) } } <span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span> MapWithStateRDDRecord所代表的partition,从RDD的角度来说,没有变。但是内部变了。只是内部数据发送变化了。 MapWithStateRDDRecord(newStateMap, mappedData) } } </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li></ul>
MapWithState实现如下:
总结:
本课程笔记来源于: