要求
在Spark环境中,实现从服务器日志数据apache.log中获取每个时间段访问量
这里以一个小时为时间段,不考虑年份日期
日志数据
资源链接,免费下载
以下是部分数据,供测试:
83.149.9.216 - - 17/05/2015:10:05:03 +0000 GET /presentations/logstash-monitorama-2013/images/kibana-search.png
83.149.9.216 - - 17/05/2015:10:05:43 +0000 GET /presentations/logstash-monitorama-2013/images/kibana-dashboard3.png
83.149.9.216 - - 17/05/2015:10:05:47 +0000 GET /presentations/logstash-monitorama-2013/plugin/highlight/highlight.js
208.115.111.72 - - 17/05/2015:11:05:41 +0000 GET /files/fastsplit/?C=M;O=D
208.115.111.72 - - 17/05/2015:11:05:19 +0000 GET /files/xdotool/docs/man/?C=M;O=D
208.115.111.72 - - 17/05/2015:11:05:16 +0000 GET /scripts/python/wrap/?C=N;O=D
208.115.111.72 - - 17/05/2015:11:05:32 +0000 GET /files/images/?C=S;O=D
208.115.111.72 - - 17/05/2015:11:05:00 +0000 GET /files/blogposts/20080611/
208.115.111.72 - - 17/05/2015:11:05:16 +0000 GET /files/logstash/?C=D;O=D
208.115.111.72 - - 17/05/2015:11:05:53 +0000 GET /presentations/hackday06/
208.115.111.72 - - 17/05/2015:11:05:29 +0000 GET /scripts/grok-py-test/
208.115.111.72 - - 17/05/2015:11:05:08 +0000 GET /?N=A&page=21
144.76.194.187 - - 17/05/2015:13:05:28 +0000 GET /wp-login.php
144.76.194.187 - - 17/05/2015:13:05:37 +0000 GET /administrator/index.php
144.76.194.187 - - 17/05/2015:13:05:11 +0000 GET /reset.css