Spark
Spark的 学习之崖
DearNingning
这个作者很懒,什么都没留下…
展开
-
spark_sql案例之流量统计DSL
uid,start_time,end_time,flow 1,2020-02-18 14:20:30,2020-02-18 14:46:30,20 1,2020-02-18 14:47:20,2020-02-18 15:20:30,30 1,2020-02-18 15:37:23,2020-02-18 16:05:26,40 1,2020-02-18 16:06:27,2020-02-18 17:20:49,50 1,2020-02-18 17:21:50,2020-02-18 18:03:27,60 2,原创 2021-06-18 23:32:30 · 178 阅读 · 1 评论 -
spark_sql案例之流量统计
uid,start_time,end_time,flow 1,2020-02-18 14:20:30,2020-02-18 14:46:30,20 1,2020-02-18 14:47:20,2020-02-18 15:20:30,30 1,2020-02-18 15:37:23,2020-02-18 16:05:26,40 1,2020-02-18 16:06:27,2020-02-18 17:20:49,50 1,2020-02-18 17:21:50,2020-02-18 18:03:27,60 2,原创 2021-06-18 22:46:57 · 151 阅读 · 0 评论 -
saprk:计算连续登陆3天及以上的用户
数据 guid01,2018-02-28 guid01,2018-03-01 guid01,2018-03-01 guid01,2018-03-02 guid01,2018-03-05 guid01,2018-03-04 guid01,2018-03-06 guid01,2018-03-07 guid02,2018-03-01 guid02,2018-03-02 guid02,2018-03-03 guid02,2018-03-06 guid03,2018-03-06 guid03,2018-03-07 g原创 2021-06-16 21:44:19 · 111 阅读 · 0 评论 -
各种join合集
原创 2021-06-14 10:01:42 · 176 阅读 · 0 评论 -
进阶RDD
import Utils.SparkUtils import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD object Demo { def main(args: Array[String]): Unit = { val sc: SparkContext =SparkUtils.getSparkContext() val rdd: RDD[(String, Int)] =sc.textFile("data/原创 2021-06-05 10:55:57 · 98 阅读 · 0 评论 -
MapPartitionsWitnIndex算子
import Utils.SparkUtils import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD object IndexDemo { def main(args: Array[String]): Unit = { val sc: SparkContext = SparkUtils.getSparkContext() val rdd: RDD[Int] = sc.makeRDD(List(1,2,3,4原创 2021-06-02 18:02:25 · 53 阅读 · 0 评论 -
MapPartitionsWithIndex算子
import Utils.SparkUtils import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD object MapartitionsIndexDemo { def main(args: Array[String]): Unit = { val sc: SparkContext = SparkUtils.getSparkContext() val rdd: RDD[Int] =sc.makeRDD(原创 2021-06-02 17:56:35 · 126 阅读 · 0 评论 -
求每个分区的最大值
import Utils.SparkUtils import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD object MapMaxDemo { def main(args: Array[String]): Unit = { val sc: SparkContext = SparkUtils.getSparkContext() val rdd1: RDD[Int] =sc.makeRDD(List(1,2,3原创 2021-06-02 17:44:43 · 407 阅读 · 0 评论 -
map算子
import java.sql.{Connection, Driver, DriverManager, PreparedStatement, ResultSet} import Utils.SparkUtils import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD object MapDemo01 { def main(args: Array[String]): Unit = { val sc: SparkCo原创 2021-06-02 17:17:34 · 207 阅读 · 0 评论 -
Spark去重操作
import Utils.SparkUtils import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD object ReduceByKeyDemo { def main(args: Array[String]): Unit = { val sc: SparkContext =SparkUtils.getSparkContext() val tf: RDD[String] =sc.textFile("da原创 2021-06-04 21:37:47 · 804 阅读 · 0 评论 -
Spark进阶之sample、takesample算子
import Utils.SparkUtils import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD object SampleDemo { def main(args: Array[String]): Unit = { val sc: SparkContext = SparkUtils.getSparkContext() val rdd: RDD[Int] =sc.makeRDD(List(1,2,3原创 2021-06-04 20:51:47 · 283 阅读 · 0 评论 -
spark
import org.apache.log4j.{Level, Logger} import org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext} object RDDdEMO { Logger.getLogger("org").setLevel(Level.ERROR) def main(args: Array[String]): Unit = { val conf: SparkConf =new原创 2021-06-02 11:29:46 · 73 阅读 · 0 评论