Spark Streaming源码解读之State管理之updateStateByKey和mapWithState解密

源地址:http://blog.csdn.net/snail_gesture/article/details/5151058

背景: 
整个Spark Streaming是按照Batch Duractions划分Job的。但是很多时候我们需要算过去的一天甚至一周的数据,这个时候不可避免的要进行状态管理,而Spark Streaming每个Batch Duractions都会产生一个Job,Job里面都是RDD,所以此时面临的问题就是怎么对状态进行维护?这个时候就需要借助updateStateByKey和mapWithState方法完成核心的步骤。 
源码分析: 
1. 无论是updateStateByKey还是mapWithState方法在DStream中均没有,但是是通过隐身转换函数实现其功能。

<code class="hljs markdown has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">object DStream {

  // <span class="hljs-code" style="box-sizing: border-box;">`toPairDStreamFunctions`</span> was in SparkContext before 1.3 and users had to
  // <span class="hljs-code" style="box-sizing: border-box;">`import StreamingContext._`</span> to enable it. Now we move it here to make the compiler find
  // it automatically. However, we still keep the old function in StreamingContext for backward
  // compatibility and forward to the following function directly.

  implicit def toPairDStreamFunctions[<span class="hljs-link_label" style="box-sizing: border-box;">K, V</span>](<span class="hljs-link_url" style="box-sizing: border-box;">stream: DStream[(K, V</span>)])
<span class="hljs-code" style="box-sizing: border-box;">      (implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null):</span>
<span class="hljs-code" style="box-sizing: border-box;">    PairDStreamFunctions[K, V] = {</span>
<span class="hljs-code" style="box-sizing: border-box;">    new PairDStreamFunctions[K, V](stream)</span>
  }
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li></ul>

updateStateByKey: 
1. 在PairDStreamFunctions中updateStateByKey具体实现如下: 
在已有的历史基础上,updateFunc对历史数据进行更新。该函数的返回值是DStream类型的。

<code class="hljs fsharp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">/**
 * Return a <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"state"</span> DStream where the state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> each key is updated by applying
 * the given <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span> on the previous state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> the <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> values <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> each key.
 * Hash partitioning is used <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> generate the RDDs <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">with</span> Spark's <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">default</span> number <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> partitions.
 * @param updateFunc State update <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span>. If `this` <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span> returns None, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">then</span>
 *                   corresponding state key-value pair will be eliminated.
 * @tparam S State <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">type</span></span>
 */
def updateStateByKey[S: ClassTag](
    updateFunc: (Seq[V], Option[S]) => Option[S]
  ): DStream[(K, S)] = ssc.withScope {
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// defaultPartitioner</span>
  updateStateByKey(updateFunc, defaultPartitioner())
}
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li></ul>
2.  defaultPartitioner:
<code class="hljs cs has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span>[streaming] def <span class="hljs-title" style="box-sizing: border-box;">defaultPartitioner</span>(numPartitions: Int = self.ssc.sc.defaultParallelism) = {
  <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> HashPartitioner(numPartitions)
}
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>
3.  partitioner就是控制RDD的每个patition
<code class="hljs coffeescript has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">/**
 * Return a <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"state"</span> DStream where the state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> each key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> updated <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">by</span> applying
 * the given <span class="hljs-reserved" style="box-sizing: border-box;">function</span> <span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">on</span> the previous state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> the <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> values <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the key.
 * org.apache.spark.Partitioner <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> used to control the partitioning <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> each RDD.
 * <span class="hljs-property" style="box-sizing: border-box;">@param</span> updateFunc State update <span class="hljs-reserved" style="box-sizing: border-box;">function</span>. If `<span class="javascript" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">this</span></span>` <span class="hljs-reserved" style="box-sizing: border-box;">function</span> returns None, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">then</span>
 *                   corresponding state key-value pair will be eliminated.
 * <span class="hljs-property" style="box-sizing: border-box;">@param</span> partitioner Partitioner <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> controlling the partitioning <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> each RDD <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> the <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span>
 *                    DStream.
 * <span class="hljs-property" style="box-sizing: border-box;">@tparam</span> S State type
 */
def updateStateByKey[<span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">S</span>: ClassTag](
    <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">updateFunc</span>: <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(Seq[V], Option[S])</span> =></span> Option[S],
    <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">partitioner</span>: Partitioner
  ): DStream[(K, S)] = ssc.withScope {
  val cleanedUpdateF = sparkContext.clean(updateFunc)
  val <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-title" style="box-sizing: border-box;">newUpdateFunc</span> = <span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(iterator: Iterator[(K, Seq[V], Option[S])])</span> =></span> {
    iterator.flatMap(t<span class="hljs-function" style="box-sizing: border-box;"> =></span> cleanedUpdateF(t._2, t._3).map(s<span class="hljs-function" style="box-sizing: border-box;"> =></span> (t._1, s)))
  }
  updateStateByKey(newUpdateFunc, partitioner, <span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">true</span>)
}
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li></ul>
4.  rememberPartitioner默认为true
<code class="hljs applescript has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">/**
 * Return a new <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"state"</span> DStream <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">where</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> each key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> updated <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">by</span> applying
 * <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">given</span> function <span class="hljs-function_start" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">on</span></span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> previous state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> key <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> new values <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> each key.
 * org.apache.spark.Partitioner <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> used <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> control <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> partitioning <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> each RDD.
 * @param updateFunc State update function. Note, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">that</span> this function may generate a different
 *                   tuple <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">with</span> a different key than <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> input key. Therefore keys may be removed
 *                   <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">or</span> added <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> this way. It <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> up <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> developer <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> decide whether <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span>
 *                   remember <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> partitioner despite <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> key being changed.
 * @param partitioner Partitioner <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> controlling <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> partitioning <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> each RDD <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> new
 *                    DStream
 * @param rememberPartitioner Whether <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> remember <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> paritioner object <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">the</span> generated RDDs.
 * @tparam S State type
 */
def updateStateByKey[S: ClassTag](
    updateFunc: (Iterator[(K, Seq[V], Option[S])]) => Iterator[(K, S)],
    partitioner: Partitioner,
    rememberPartitioner: Boolean
  ): DStream[(K, S)] = ssc.withScope {
   new StateDStream(self, ssc.sc.clean(updateFunc), partitioner, rememberPartitioner, None)
}
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li></ul>
5.  在StateDStream中,StorageLevel是直接存储到磁盘,因为此时的数据非常大
<code class="hljs haskell has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">StateDStream</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">ClassTag</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">V</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">ClassTag</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">ClassTag</span>]<span class="hljs-container" style="box-sizing: border-box;">(
    <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">parent</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">DStream</span>[(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">V</span>)</span>],
    updateFunc: <span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Iterator</span>[(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Seq</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">V</span>], <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Option</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>])</span>]) => <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Iterator</span>[<span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>)</span>],
    partitioner: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Partitioner</span>,
    preservePartitioning: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Boolean</span>,
    initialRDD : <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Option</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">RDD</span>[<span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>)</span>]]
  ) extends <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">DStream</span>[<span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>)</span>]<span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">parent</span>.<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">ssc</span>)</span> {

  super.persist<span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">StorageLevel</span>.<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">MEMORY_ONLY_SER</span>)</span>
</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li></ul>
  1. 在computeUsingPreiviousRDD源码如下:
<code class="hljs scala has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> [<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">this</span>] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> computeUsingPreviousRDD (
  parentRDD : RDD[(K, V)], prevStateRDD : RDD[(K, S)]) = {
  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// Define the function for the mapPartition operation on cogrouped RDD;</span>
  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// first map the cogrouped tuple to tuples of required type,</span>
  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// and then apply the update function</span>
  <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> updateFuncLocal = updateFunc
  <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> finalFunc = (iterator: Iterator[(K, (Iterable[V], Iterable[S]))]) => {
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> i = iterator.map(t => {
      <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> itr = t._2._2.iterator
      <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> headOption = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (itr.hasNext) Some(itr.next()) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> None
      (t._1, t._2._1.toSeq, headOption)
    })
    updateFuncLocal(i)
  }
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//cogroup每次计算的时候都会遍历prevSrateRDD中的所有parititioner的信息</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//</span>
  <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> cogroupedRDD = parentRDD.cogroup(prevStateRDD, partitioner)
  <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> stateRDD = cogroupedRDD.mapPartitions(finalFunc, preservePartitioning)
  Some(stateRDD)
}
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li></ul>

所以,如果数据很多的时候不建议使用updateStateByKey。 
updateStateByKey函数实现如下:

这里写图片描述

mapWithState: 
1. 返回MapWithStateDStream函数,维护和更新历史状态都是基于Key。使用一个function对key-value形式的数据进行状态维护。

<code class="hljs lua has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">/**
 * :: Experimental ::
 * Return a <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[MapWithStateDStream]]</span> by applying a <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span> <span class="hljs-title" style="box-sizing: border-box;">to</span> <span class="hljs-title" style="box-sizing: border-box;">every</span> <span class="hljs-title" style="box-sizing: border-box;">key</span>-<span class="hljs-title" style="box-sizing: border-box;">value</span> <span class="hljs-title" style="box-sizing: border-box;">element</span> <span class="hljs-title" style="box-sizing: border-box;">of</span>
 * `<span class="hljs-title" style="box-sizing: border-box;">this</span>` <span class="hljs-title" style="box-sizing: border-box;">stream</span>, <span class="hljs-title" style="box-sizing: border-box;">while</span> <span class="hljs-title" style="box-sizing: border-box;">maintaining</span> <span class="hljs-title" style="box-sizing: border-box;">some</span> <span class="hljs-title" style="box-sizing: border-box;">state</span> <span class="hljs-title" style="box-sizing: border-box;">data</span> <span class="hljs-title" style="box-sizing: border-box;">for</span> <span class="hljs-title" style="box-sizing: border-box;">each</span> <span class="hljs-title" style="box-sizing: border-box;">unique</span> <span class="hljs-title" style="box-sizing: border-box;">key</span>. <span class="hljs-title" style="box-sizing: border-box;">The</span> <span class="hljs-title" style="box-sizing: border-box;">mapping</span> <span class="hljs-title" style="box-sizing: border-box;">function</span>
 * <span class="hljs-title" style="box-sizing: border-box;">and</span> <span class="hljs-title" style="box-sizing: border-box;">other</span> <span class="hljs-title" style="box-sizing: border-box;">specification</span> <span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(e.g. partitioners, timeouts, initial state data, etc.)</span></span> of this
 * transformation can be specified using <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[StateSpec]]</span> class. The state data is accessible <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span>
 * as a parameter of <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">type</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[State]]</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> the mapping <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span>.
 *
 * <span class="hljs-title" style="box-sizing: border-box;">Example</span> <span class="hljs-title" style="box-sizing: border-box;">of</span> <span class="hljs-title" style="box-sizing: border-box;">using</span> `<span class="hljs-title" style="box-sizing: border-box;">mapWithState</span>`:
 * {{{
 *    // <span class="hljs-title" style="box-sizing: border-box;">A</span> <span class="hljs-title" style="box-sizing: border-box;">mapping</span> <span class="hljs-title" style="box-sizing: border-box;">function</span> <span class="hljs-title" style="box-sizing: border-box;">that</span> <span class="hljs-title" style="box-sizing: border-box;">maintains</span> <span class="hljs-title" style="box-sizing: border-box;">an</span> <span class="hljs-title" style="box-sizing: border-box;">integer</span> <span class="hljs-title" style="box-sizing: border-box;">state</span> <span class="hljs-title" style="box-sizing: border-box;">and</span> <span class="hljs-title" style="box-sizing: border-box;">return</span> <span class="hljs-title" style="box-sizing: border-box;">a</span> <span class="hljs-title" style="box-sizing: border-box;">String</span>
//此时的<span class="hljs-title" style="box-sizing: border-box;">state</span>就可以看成一张表,这张表记录了状态维护中所有的历史状态。
 *    <span class="hljs-title" style="box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">mappingFunction</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(key: String, value: Option[Int], state: State[Int])</span></span>: Option[String] = {
 *      // Use state.exists(), state.get(), state.update() <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> state.remove()
 *      // to manage state, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> the necessary <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span>
 *    }
 *
 *    val spec = StateSpec.function(mappingFunction).numPartitions(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>)
 *
 *    val mapWithStateDStream = keyValueDStream.mapWithState[StateType, MappedType](spec)
 * }}}
 *
 * @param spec          Specification of this transformation
 * @tparam StateType    Class <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">type</span> of the state data
 * @tparam MappedType   Class <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">type</span> of the mapped data
 */
@Experimental
def mapWithState[StateType: ClassTag, MappedType: ClassTag](
    spec: StateSpec[K, V, StateType, MappedType]
  ): MapWithStateDStream[K, V, StateType, MappedType] = {
  new MapWithStateDStreamImpl[K, V, StateType, MappedType](
    self,
// StateSpecImpl类封装了StateSpec操作。
    spec.asInstanceOf[StateSpecImpl[K, V, StateType, MappedType]]
  )
}
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li></ul>
2.  MapWithStateDStream源码如下:
<code class="hljs fsharp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">/**
 * :: Experimental ::
 * DStream representing the stream <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> data generated by `mapWithState` operation on a
 * [[org.apache.spark.streaming.dstream.PairDStreamFunctions pair DStream]].
 * Additionally, it also gives access <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> the stream <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> state snapshots, that is, the state data <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span>
 * all keys after a batch has updated them.
 *
 * @tparam KeyType Class <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the key
 * @tparam ValueType Class <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the value
 * @tparam StateType Class <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the state data
 * @tparam MappedType Class <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the mapped data
 */
@Experimental
sealed <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">abstract</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> MapWithStateDStream[KeyType, ValueType, StateType, MappedType: ClassTag](
    ssc: StreamingContext) extends DStream[MappedType](ssc) {

  /** Return a pair DStream where each RDD is the snapshot <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> all the keys. */
  def stateSnapshots(): DStream[(KeyType, StateType)]
}

/** Internal implementation <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the [[MapWithStateDStream]] */
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span>[streaming] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> MapWithStateDStreamImpl[
    KeyType: ClassTag, ValueType: ClassTag, StateType: ClassTag, MappedType: ClassTag](
    dataStream: DStream[(KeyType, ValueType)],
    spec: StateSpecImpl[KeyType, ValueType, StateType, MappedType])
  extends MapWithStateDStream[KeyType, ValueType, StateType, MappedType](dataStream.context) {

  <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> internalStream =
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> InternalMapWithStateDStream[KeyType, ValueType, StateType, MappedType](dataStream, spec)

  <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">override</span> def slideDuration: Duration = internalStream.slideDuration

  <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">override</span> def dependencies: List[DStream[_]] = List(internalStream)
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//计算的时候是通过InternalMapWithStateDStream来实现的。</span>
  <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">override</span> def compute(validTime: Time): Option[RDD[MappedType]] = {
    internalStream.getOrCompute(validTime).map { _.flatMap[MappedType] { _.mappedData } }
  }
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li></ul>
3.  更新历史数据。
<code class="hljs haskell has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">/**
 * <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">A</span> <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">DStream</span> that allows per-key state to be maintains, and arbitrary records to be generated
 * based on updates to the state. <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">This</span> is the main <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">DStream</span> that implements the `mapWithState`
 * operation on <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">DStreams</span>.
 *
 * @param parent (key, value) stream that is the source
 * @param spec <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Specifications</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the mapWithState operation
 * @tparam <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>   <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Key</span> <span class="hljs-typedef" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">type</span></span>
 * @tparam <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">V</span>   <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Value</span> <span class="hljs-typedef" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">type</span></span>
 * @tparam <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>   <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Type</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the state maintained
 * @tparam <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">E</span>   <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">Type</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the mapped <span class="hljs-typedef" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">data</span></span>
 */
<span class="hljs-title" style="box-sizing: border-box;">private</span>[streaming]
<span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">InternalMapWithStateDStream</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">ClassTag</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">V</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">ClassTag</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">ClassTag</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">E</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">ClassTag</span>]<span class="hljs-container" style="box-sizing: border-box;">(
    <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">parent</span>: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">DStream</span>[(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">V</span>)</span>], spec: <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">StateSpecImpl</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">V</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">E</span>])
  extends <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">DStream</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">MapWithStateRDDRecord</span>[<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">K</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">S</span>, <span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">E</span>]]<span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">parent</span>.<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">context</span>)</span> {
//不断的更新内存数据结构。
  persist<span class="hljs-container" style="box-sizing: border-box;">(<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">StorageLevel</span>.<span class="hljs-type" style="box-sizing: border-box; color: rgb(102, 0, 102);">MEMORY_ONLY</span>)</span>
</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li></ul>
4.  MapWithStateDStream.Compute
<code class="hljs sql has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">/** Method that generates a RDD for the given time */</span>
  override def compute(validTime: Time): Option[RDD[MapWithStateRDDRecord[K, S, E]]] = {
    // Get the previous state or <span class="hljs-operator" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">create</span> a new empty state RDD
    val prevStateRDD = getOrCompute(validTime - slideDuration) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">match</span> {
      <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">case</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">Some</span>(rdd) =>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (rdd.partitioner != <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">Some</span>(partitioner)) {
          // <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">If</span> the RDD <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">not</span> partitioned the <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">right</span> way, let us repartition it <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">using</span> the
          // partition index <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> the <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">key</span>. This <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> ensure that state RDD <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> always partitioned
          // <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">before</span> creating another state RDD <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">using</span> it
          MapWithStateRDD.createFromRDD[K, V, S, E](
            rdd.flatMap { _.stateMap.getAll() }, partitioner, validTime)
        } <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> {
          rdd
        }
      <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">case</span> None =>
        MapWithStateRDD.createFromPairRDD[K, V, S, E](
          spec.getInitialStateRDD().getOrElse(new EmptyRDD[(K, S)](ssc.sparkContext)),
          partitioner,
          validTime
        )
    }

//基于时间窗口创建RDD
    // Compute the new state RDD <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">with</span> previous state RDD <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> partitioned data RDD
    // Even <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> there <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">is</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">no</span> data RDD, use an empty one <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">create</span> a new state RDD
    val dataRDD = parent.getOrCompute(validTime).getOrElse {
      context.sparkContext.emptyRDD[(K, V)]
    }
    val partitionedDataRDD = dataRDD.partitionBy(partitioner)
    val timeoutThresholdTime = spec.getTimeoutInterval().map { <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">interval</span> =>
      (validTime - <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">interval</span>).milliseconds
    }
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">Some</span>(new MapWithStateRDD(
      prevStateRDD, partitionedDataRDD, mappingFunction, validTime, timeoutThresholdTime))
  }
}
</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li></ul>
5.  MapWithStateRDD: 是一个RDD,他本身包含了对mapWithState操作的数据,以及对数据怎么操作,MapWithStateRDDRecord代表了每个RDD的partition。
<code class="hljs fsharp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">/**
 * RDD storing the keyed states <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> `mapWithState` operation <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> corresponding mapped data.
 * Each partition <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> this RDD has a single record <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">type</span> [[<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">MapWithStateRDDRecord</span>]]. <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">This</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">contains</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">a</span></span>
 * [[StateMap]] (containing the keyed-states) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> the sequence <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> records returned by the mapping
 * <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span>  `mapWithState`.
 * @param prevStateRDD The previous MapWithStateRDD on whose StateMap data `this` RDD
  *                    will be created
 * @param partitionedDataRDD The partitioned data RDD which is used update the previous StateMaps
 *                           <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> the `prevStateRDD` <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> create `this` RDD
 * @param mappingFunction  The <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span> that will be used <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> update state <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> data
 * @param batchTime        The time <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">of</span> the batch <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> which this RDD belongs <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span>. Use <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> update
 * @param timeoutThresholdTime The time <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">to</span> indicate which keys are timeout
 */
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span>[streaming] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> MapWithStateRDD[K: ClassTag, V: ClassTag, S: ClassTag, E: ClassTag](
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> var prevStateRDD: RDD[MapWithStateRDDRecord[K, S, E]],
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> var partitionedDataRDD: RDD[(K, V)],
    mappingFunction: (Time, K, Option[V], State[S]) => Option[E],
    batchTime: Time,
    timeoutThresholdTime: Option[Long]
  ) extends RDD[MapWithStateRDDRecord[K, S, E]](
    partitionedDataRDD.sparkContext,
    List(
      <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> OneToOneDependency[MapWithStateRDDRecord[K, S, E]](prevStateRDD),
      <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> OneToOneDependency(partitionedDataRDD))
  ) {

  @volatile <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> var doFullScan = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">false</span>

  require(prevStateRDD.partitioner.nonEmpty)
  require(partitionedDataRDD.partitioner == prevStateRDD.partitioner)

  <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">override</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> partitioner = prevStateRDD.partitioner

  <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">override</span> def checkpoint(): Unit = {
    super.checkpoint()
    doFullScan = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">true</span>
  }

  <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">override</span> def compute(
      partition: Partition, context: TaskContext): Iterator[MapWithStateRDDRecord[K, S, E]] = {

    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> stateRDDPartition = partition.asInstanceOf[MapWithStateRDDPartition]
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> prevStateRDDIterator = prevStateRDD.iterator(
      stateRDDPartition.previousSessionRDDPartition, context)
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> dataIterator = partitionedDataRDD.iterator(
      stateRDDPartition.partitionedDataRDDPartition, context)

    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> prevRecord = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (prevStateRDDIterator.hasNext) Some(prevStateRDDIterator.next()) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> None
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> newRecord = MapWithStateRDDRecord.updateRecordWithData(
      prevRecord,
      dataIterator,
      mappingFunction,
      batchTime,
      timeoutThresholdTime,
      removeTimedoutData = doFullScan <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// remove timedout data only when full scan is enabled</span>
    )
    Iterator(newRecord)
  }
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li></ul>
6.  updateRecordWithData: RDD本身不可变的,但是可以处理变化的数据。
<code class="hljs coffeescript has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">def updateRecordWithData[<span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">K</span>: ClassTag, <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">V</span>: ClassTag, <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">S</span>: ClassTag, <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">E</span>: ClassTag](
    <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">prevRecord</span>: Option[MapWithStateRDDRecord[K, S, E]],
    <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">dataIterator</span>: Iterator[(K, V)],
    <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">mappingFunction</span>: <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(Time, K, Option[V], State[S])</span> =></span> Option[E],
    <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">batchTime</span>: Time,
    <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">timeoutThresholdTime</span>: Option[Long],
    <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">removeTimedoutData</span>: Boolean
  ): MapWithStateRDDRecord[K, S, E] = {
    <span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span> Create a <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> state map <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">by</span> cloning the previous one (<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> it exists) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">or</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">by</span> creating an empty one
    val newStateMap = prevRecord.map { _.stateMap.copy() }. getOrElse { <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> EmptyStateMap[K, S]() }

    val mappedData = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> ArrayBuffer[E]
    val wrappedState = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> StateImpl[S]()

    <span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span> Call the mapping <span class="hljs-reserved" style="box-sizing: border-box;">function</span> <span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">on</span> each record <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> the data iterator, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> accordingly
    <span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span> update the states touched, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> collect the data returned <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">by</span> the mapping <span class="hljs-reserved" style="box-sizing: border-box;">function</span>
    dataIterator.foreach { <span class="hljs-reserved" style="box-sizing: border-box;">case</span> <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(key, value)</span> =></span>
      wrappedState.wrap(newStateMap.get(key))
      val returned = mappingFunction(batchTime, key, Some(value), wrappedState)
      <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (wrappedState.isRemoved) {
        newStateMap.remove(key)
      } <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (wrappedState.isUpdated || timeoutThresholdTime.isDefined) {
<span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span>遍历当前所有batchTime的所有数据,然后使用自定义的函数对当前的batch数据进行计算,更新newStateMap数据结构。
<span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span> newStateMap是保存历史数据
        newStateMap.put(key, wrappedState.get(), batchTime.milliseconds)
      }
      mappedData ++= returned
    }

    <span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span> Get the timed out state records, call the mapping <span class="hljs-reserved" style="box-sizing: border-box;">function</span> <span class="hljs-literal" style="color: rgb(0, 102, 102); box-sizing: border-box;">on</span> each <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">and</span> collect the
    <span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span> data returned
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (removeTimedoutData && timeoutThresholdTime.isDefined) {
      newStateMap.getByTime<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(timeoutThresholdTime.get)</span>.<span class="hljs-title" style="box-sizing: border-box;">foreach</span> { <span class="hljs-title" style="box-sizing: border-box;">case</span> <span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(key, state, _)</span> =></span>
        wrappedState.wrapTimingOutState(state)
        val returned = mappingFunction(batchTime, key, None, wrappedState)
        mappedData ++= returned
        newStateMap.remove(key)
      }
    }
<span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span> MapWithStateRDDRecord所代表的partition,从RDD的角度来说,没有变。但是内部变了。只是内部数据发送变化了。
    MapWithStateRDDRecord(newStateMap, mappedData)
  }
}
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li></ul>

MapWithState实现如下: 
这里写图片描述

总结: 
这里写图片描述

本课程笔记来源于: 


 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值