spark streaming应用场景(一) 日志分析

最新推荐文章于 2024-01-21 03:40:42 发布

dreaper126

最新推荐文章于 2024-01-21 03:40:42 发布

阅读量3.6k

点赞数 1

分类专栏： spark 文章标签： spark streaming 流式计算应用

本文链接：https://blog.csdn.net/dreaper126/article/details/50495993

版权

spark 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

日志分析

场景：

日志数据访问IP,URL,耗时
统计每个URL在最近一分钟的访问次数,平均耗时

解决方案

将日志数据导入kafka, 通过spark streaming 从kafka中将数据抽取出来，实时统计一分钟内每个接口的访问次数，及平均耗时，将结果打印出来

实现代码

val conf = new SparkConf().setAppName("log").setMaster("local[4]")
          val sc = new SparkContext(conf)

          val ssc = new StreamingContext(sc, Seconds(1))

          // 模拟日志数据
          val seqData = Seq("10.1.96.221 GET /mobile/mobileStat?Imei=123&Para=0&Type=15         0.003",
            "10.2.81.231 GET /mobile/monitoringStat?Imei=223&Para=0&Type=1018         0.005",
            "20.1.61.211 GET /mobile/mobileStat?Imei=333&Para=0&Type=12         0.012")

          val queue = new SynchronizedQueue[RDD[String]]()

          for(i <- (0 to 100)) {
            queue += sc.parallelize(seqData)
          }

          // 模拟流式日志数据
          val logStream = ssc.queueStream(queue, oneAtATime = true)

          // 获取 url,访问耗时
          val logs = logStream.map(log => {
            val arr = log.split("\\s{1,}")
            (arr(2), arr(3))
          }).map(t2 => {
            (t2._1.split("\\?")(0),t2._2)
          })
          // 取最新60秒的window数据
          val win60 = logs.window(Seconds(60))

          // 按照url分组,获取每个url的总耗时及数量
          win60.groupByKey().map(t2 => {
            val api = t2._1

            var total = BigDecimal(0)
            var count = BigInt(0)

            for(t <- t2._2) {
              total += BigDecimal(t)
              count += 1
            }
            (api, total, count, total / BigDecimal(count.toLong))
          }).foreachRDD(rdd => {
            rdd.collect().foreach(t4 => {
              println("\t---60秒---" + t4)
            })
          })

          ssc.start()             // Start the computation
          ssc.awaitTermination()  // Wait for the computation to terminate