Spark读取日志,统计每个service所用的平均时间

原创 2015年07月06日 17:08:55

获取log日志,每个service以“#*#”开头。统计每个service所需的平均时间。


import java.io.{File, PrintWriter}
import org.apache.spark.{SparkContext, SparkConf}

object SimpleApp {

  def main(args: Array[String]) {
    System.setProperty("hadoop.home.dir","D://spark-1.3.1-bin-hadoop-2.3.0-cdh5.0.2");

    val logFile = "d://Debug.2015-06-12_1556.log" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application").setMaster("local")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val result = logData.filter(line => line.contains("#*#"))

    println("********统计开始**********")

    //转化为key-value形式的RDD。
    val jobNameAndTime = result.map(line => (line.split("#*#").last.split(" ").head, line.split("#*#").last.split(" ").last.toInt/1000))

    val jobNameTimes = jobNameAndTime.map(line => (line._1, 1)).reduceByKey((x, y) => x + y)

    val jobAvgTime = jobNameAndTime.reduceByKey((x, y) => (x + y)/2)

    //join方法
    val jobTimesAndAvgTime = jobNameTimes.join(jobAvgTime).sortBy(x => x._2._2)

    println("********************************************************************")

    jobTimesAndAvgTime.map(x => println(s"jobName: ${x._1} | times: ${x._2._1} | avgTime: ${x._2._2}s")).collect

    val writer = new PrintWriter(new File("d://test.txt" ))
    writer.write(jobTimesAndAvgTime.map(x => s"jobName: ${x._1} | times: ${x._2._1} | avgTime: ${x._2._2}s\n").collect.toList.mkString(",").replace(",", ""))
    writer.close


    println(s"一共 ${result.count} 统计条数据")

    println("********************************************************************")


    println("********统计结束**********")


  }

}

------------------------------

每个service以“#*#”开头,后面接上所用的时间。
log日志片段:
2015-06-11 00:05:32.23423742063 [Worker-88] DEBUG c.z.b.v.a.u.c.d.ConnectionFactoryPrefs$$anon$1 - Spark useDatabase =use ran
2015-06-11 00:05:32.82023742649 [worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: reading data length: 109
2015-06-11 00:05:35.18423745013 [Worker-88] DEBUG o.a.thrift.transport.TSaslTransport - writing data length: 110
2015-06-11 00:05:35.18423745013 [worker-1] DEBUG o.a.thrift.transport.TSaslTransport - writing data length: 102
2015-06-11 00:05:35.18523745014 [worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: reading data length: 778
2015-06-11 00:05:35.18523745014 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - writing data length: 96
2015-06-11 00:05:35.18523745014 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: reading data length: 42
2015-06-11 00:05:35.18523745014 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - writing data length: 83
2015-06-11 00:05:35.18623745015 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: reading data length: 40
2015-06-11 00:05:35.18623745015 [18-worker-1] DEBUG c.z.b.v.a.u.c.j.Quarter1thCleanJob - #*#HelloWorldService 26993
2015-06-11 00:05:35.18623745015 [18-worker-1] DEBUG c.z.b.v.a.u.c.d.ConnectionFactoryPrefs$$anon$1 - database config: DatabaseInfo(jdbc:hive2://192.168.2.110:11000,mr,mr,org.apache.hive.jdbc.HiveDriver,ran)
2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - opening transport org.apache.thrift.transport.TSaslClientTransport@c0770c
2015-06-11 00:05:35.18723745015 [18-worker-1] DEBUG c.z.b.v.a.u.c.j.Quarter1thCleanJob - #*#HelloWorldService 36993 
2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.t.t.TSaslClientTransport - Sending mechanism name PLAIN and initial response of length 6
2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: Writing message with status START and payload length 5
2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: Writing message with status COMPLETE and payload length 6
2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: Start message handled
2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: Main negotiation loop complete
2015-06-11 00:05:35.18723745015 [18-worker-1] DEBUG c.z.b.v.a.u.c.j.Quarter1thCleanJob - #*#HelloSUMService 336993 
2015-06-11 00:05:35.18723745015 [18-worker-1] DEBUG c.z.b.v.a.u.c.j.Quarter1thCleanJob - #*#HelloSUMService 236993 



版权声明:本文为博主原创文章,未经博主允许不得转载。

scala获取程序运行时间

object Only_Scala_Test{ def main(args:Array[String]): Unit ={ val start=new Date() var contents=Sour...

Spark学习之10:Task执行结果返回流程

当ShuffleMapTask或ResultTask执行完成后,其结果会传递给Driver。 1. 返回流程 返回流程涉及Executor和Driver。 2. TaskRunner....

Spark 加强版WordCount ,统计日志中文件访问数量

原文地址:http://blog.csdn.net/whzhaochao/article/details/72416956写在前面学习Scala和Spark基本语法比较枯燥无味,搞搞简单的实际运用可以...

使用Flume+Kafka+SparkStreaming进行实时日志分析

每个公司想要进行数据分析或数据挖掘,收集日志、ETL都是第一步的,今天就讲一下如何实时地(准实时,每分钟分析一次)收集日志,处理日志,把处理后的记录存入Hive中,并附上完整实战代码1. 整体架构思...
  • Trigl
  • Trigl
  • 2017年05月24日 15:33
  • 8051

sparkstreaming统计一段时间内的热搜词

sparkstreaming同时一段时间内的热搜词

使用Spark对日志进行简单的文本解析

使用Spark对日志进行简单的文本解析 原日志格式: 从hdfs上读取日志文件: val rdd = sc.textFile("hdfs://master:9000/spark/localhos...

mariadb,mysql数据库如何快速插入数据到表中

原文地址:https://mariadb.com/kb/en/mariadb/how-to-quickly-insert-data-into-mariadb/声明:本人第一次翻译文档,不当之处,请指正...

Storm 实现滑动窗口计数和TopN排序

计算top N words的topology, 用于比如trending topics or trending images on Twitter.    实现了滑动窗口计数和TopN排序, 比较...

Spark移动平均:时间序列数据平均值

一、内存排序 import org.apache.spark.SparkConf import org.apache.spark.SparkContext object MovingAver...

分拣存储2-统计学生每个班级的总分和平均分

题目要求:定义一个Student类,属性有,name名字,no班级,score成绩,现在将若干不同班级的Student对象放入List中,统计每个班的总分和平均分题目分析:首先需要一个student类...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:Spark读取日志,统计每个service所用的平均时间
举报原因:
原因补充:

(最多只允许输入30个字)