![](https://img-blog.csdnimg.cn/20201014180756780.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
Spark
XYQ2022
这个作者很懒,什么都没留下…
展开
-
collectAsMap的作用
scala> val data = sc.parallelize(List((1, "www"), (1, "iteblog"), (1, "com"), (2, "bbs"), (2, "iteblog"), (2, "com"), (3, "good")))data: org.apache.spark.rdd.RDD[(Int, String)] = ParallelCollectio...原创 2019-10-23 11:17:59 · 1589 阅读 · 0 评论 -
Spark Streaming 在监控端口时,对数据进行标记
import org.apache.spark.SparkConfimport org.apache.spark.rdd.RDDimport org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}import org.apache.spark.streaming.{Durations, StreamingCont...原创 2019-10-18 20:52:42 · 188 阅读 · 1 评论 -
Spark 自定义分区器
自定义分区器,按照key将数据保存到指定分区import org.apache.spark.{Partitioner, SparkConf, SparkContext}import org.apache.spark.rdd.RDDobject SparkPartitionBy_Opter1 { def main(args: Array[String]): Unit = { v...原创 2019-09-28 14:45:58 · 108 阅读 · 0 评论 -
Spark 累加器
import java.utilimport org.apache.spark.rdd.RDDimport org.apache.spark.util.{AccumulatorV2, LongAccumulator}import org.apache.spark.{SparkConf, SparkContext}object SparkAccumulator { def mai...原创 2019-09-29 23:56:41 · 69 阅读 · 0 评论