统计web日志里面一个时间段的get请求数量

日志数据:

0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] "GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1" 200 13821
0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:32 +0800] "GET /CloudDocLib/xng/xngAction!listDeamons.action?page=0&count=10&sort=SYMBOL&order=asc&query=STYPE%3AEQA%3BCINDUSTRY.STYLE%3A009%3BCINDUSTRY.STYLECODE%3AZC7&jobListType=1&host=unknown HTTP/1.1" 200 332
0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:40 +0800] "POST /CloudDocLib/xng/xngAction!startDeamon.action HTTP/1.1" 200 132```
**要求:按照时间每个小时统计get产生的次数**
第一种做法是使用sql的做法:
scala代码:
import org.apache.Spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}

/**
* Created by xiaopengpeng on 2016/12/15.
*/
class countget {

}
object countget{
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName(“countget”).setMaster(“local[*]”)
val spark = SparkSession
.builder()
.config(conf)
.getOrCreate()
import spark.implicits._
//0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] “GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1” 200 13821
val logDF = spark.sparkContext.textFile(“D:\Program\apache-tomcat-7.0.72\logs\localhost_access_log.2016-11-11.txt”)
//.foreach(x=>x.split(” “).map())
.map(line =>line.split(” “)).map(list=>( list(3).substring(list(3).lastIndexOf(“/”)+1,list(3).lastIndexOf(“/”)+8),list(5)))
.toDF(“time”,”method”);
logDF.show();
logDF.createOrReplaceTempView(“log”);
spark.sql(“SELECT time,COUNT(method) FROM log WHERE method=’\”GET’ group by time”).show();

}
}
第二种做法是用的纯粹的scala代码实现的
代码:
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

/**
* Created by root on 2016/12/15.
*/
class CountGetByScala {

}
object CountGetByScala{
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName(“countget”).setMaster(“local[*]”)
val spark = SparkSession
.builder()
.config(conf)
.getOrCreate()
import spark.implicits._
//0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] “GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1” 200 13821
val logLine = spark.sparkContext.textFile(“D:\Program\apache-tomcat-7.0.72\logs\localhost_access_log.2016-11-11.txt”)
.map(line =>line.split(” “)).map(list=>( list(3).substring(list(3).lastIndexOf(“/”)+1,list(3).lastIndexOf(“/”)+8),list(5)))
val filter = logLine.filter(y=>y._2.equals(“\”GET”))

val group = filter.groupBy(line=>line._1)
val result = group.map(g =>(g._1,g._2.toList.size))
result.foreach(x=>println(x))

}
}

 

 

转载于:https://www.cnblogs.com/itboys/p/6860772.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值