hive 分析nginx的access.log日志

在hive 的安装目录启动hive后。

[root@master hive]# ./bin/hive
which: no hbase in (/usr/tools/hadoop-2.7.3/bin/:/usr/java/jdk1.7.0_79/bin/:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/tools/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/tools/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in file:/usr/tools/hive/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
 分析access.log日志的格式内容。

58.60.168.164 - - [07/Jan/2016:09:09:43 +0800] "GET / HTTP/1.1" 200 11250 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
58.60.168.164 - - [07/Jan/2016:09:09:43 +0800] "GET /tomcat.css HTTP/1.1" 304 0 "http://120.55.190.57/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
58.60.168.164 - - [07/Jan/2016:09:09:43 +0800] "GET /tomcat.png HTTP/1.1" 304 0 "http://120.55.190.57/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
58.60.168.164 - - [07/Jan/2016:09:09:44 +0800] "GET / HTTP/1.1" 200 11250 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
58.60.168.164 - - [07/Jan/2016:09:09:44 +0800] "GET /tomcat.css HTTP/1.1" 304 0 "http://120.55.190.57/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
58.60.168.164 - - [07/Jan/2016:09:09:44 +0800] "GET /tomcat.png HTTP/1.1" 304 0 "http://120.55.190.57/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
58.60.168.164 - - [07/Jan/2016:09:09:45 +0800] "GET / HTTP/1.1" 200 11250 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
58.60.168.164 - - [07/Jan/2016:09:09:45 +0800] "GET /tomcat.css HTTP/1.1" 304 0 "http://120.55.190.57/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"

在hive中创建表 
hive>CREATE TABLE  apachelog  (ipaddress STRING,identity STRING,t_user STRING,time 
STRING,request STRING,protocol STRING,status STRING,size STRING,referer STRING,agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ("input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*) ([^ ]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ \"]*|\".*\"))?",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s")STORED AS TEXTFILE;

然后把access.log内容导入表中。

hive>load data local inpath '/usr/toos/access.log' into table apachelog;
local inpath表示的是本地文件系统的文件路径, 如果不加local 表示的是从hdfs文件系统导入 example : load data inpath '/usr/dfs/access.log' into taable apachelog;

此时可能会报错:

Caused by: java.lang.RuntimeException: Map operator initialization failed
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:137)
	... 22 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.RegexSerDe not found
	at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:329)
	at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:364)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106)
	... 22 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.RegexSerDe not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
	at org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:175)
	at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:295)
	... 24 more

找不到class 类 REgexSerDe类,

在hive的conf目录下找到 hive-env.sh打开,最后一行配置上 所需jar路径   在hive的安装目录lib下

export HIVE_AUX_JARS_PATH=/usr/tools/hive/lib

重新启动hive,再执行上面的load data操作。

load data成功之后,可以简单查询下。

hive> select ipaddress,time,agent from apachelog limit 5;
OK
58.60.168.164	[07/Jan/2016:09:09:43 +0800]	"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
58.60.168.164	[07/Jan/2016:09:09:43 +0800]	"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
58.60.168.164	[07/Jan/2016:09:09:43 +0800]	"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
58.60.168.164	[07/Jan/2016:09:09:44 +0800]	"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
58.60.168.164	[07/Jan/2016:09:09:44 +0800]	"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
Time taken: 0.143 seconds, Fetched: 5 row(s)

接下来做个简单的统计查询,查询ip访问次数降序排列(此处是把结果输出到本地文件系统的目录 /usr/tools/hive/output目录,字段之间用,号分割显示)

 也可以直接执行后面的select....输出到控制台。

hive> INSERT OVERWRITE local DIRECTORY '/usr/tools/hive/output' row format delimited
fields terminated by ',' select ipaddress ,count(ipaddress) as count from apachelog group by ipaddress order by count desc;

控制台运行过程如下。



也可以通过hadoop的web管理页面查看mapreduce的运行情况



运行结果:(,号前面ipaddress,后面是访问次数。此次分析的只是很少的一个片段的access.log日志)





  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

奔跑的窝窝牛

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值