Maven Hadoop日志清洗项目(二)
Hadoop 2.7.2
Hive 2.1.0
Sqoop 1.4.6
参考:
http://www.cnblogs.com/edisonchou/p/4464349.html
1、将HDFS中清洗好的文件入库hive
为了能够借助Hive进行统计分析,首先我们需要将清洗后的数据存入Hive中,那么我们需要先建立一张表。这里我们选择分区表,以日期作为分区的指标,建表语句如下:(这里关键之处就在于确定映射的HDFS位置,我这里是/user/root/logcleanjob_output即清洗后的数据存放的位置)
文件位置:
root@py-server:/projects/data# hadoop fs -ls /user/root/logcleanjob_output
Found 2 items
-rw-r--r-- 2 root supergroup 0 2016-08-13 18:46 /user/root/logcleanjob_output/_SUCCESS
-rw-r--r-- 2 root supergroup 50810594 2016-08-13 18:46 /user/root/logcleanjob_output/part-r-00000
hive>create database logtest;
hive>CREATE EXTERNAL TABLE techbbs(ip string, atime string, url string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/root/logcleanjob_output';
验证:
select * from techbbs;
119.127.191.86 20130531235956 forum.php?mod=viewthread&tid=11528&page=60&authorid=53387
157.56.177.164 20130531235957 api.php?mod=js&bid=65
223.240.215.151 20130531235958 source/plugin/pcmgr_url_safeguard/url_api.inc.php
112.64.235.246 20130531235957 home.php?mod=misc&ac=sendmail&rand=1370014195
49.74.113.251 20130531235958 home.php?mod=spacecp&ac=follow&op=checkfeed&rand=1370015996
117.79.176.9 20130531235958 home.php?mod=space&do=notice
Time taken: 0.097 s
Hadoop 2.7.2
Hive 2.1.0
Sqoop 1.4.6
参考:
http://www.cnblogs.com/edisonchou/p/4464349.html
1、将HDFS中清洗好的文件入库hive
为了能够借助Hive进行统计分析,首先我们需要将清洗后的数据存入Hive中,那么我们需要先建立一张表。这里我们选择分区表,以日期作为分区的指标,建表语句如下:(这里关键之处就在于确定映射的HDFS位置,我这里是/user/root/logcleanjob_output即清洗后的数据存放的位置)
文件位置:
root@py-server:/projects/data# hadoop fs -ls /user/root/logcleanjob_output
Found 2 items
-rw-r--r-- 2 root supergroup 0 2016-08-13 18:46 /user/root/logcleanjob_output/_SUCCESS
-rw-r--r-- 2 root supergroup 50810594 2016-08-13 18:46 /user/root/logcleanjob_output/part-r-00000
hive>create database logtest;
hive>CREATE EXTERNAL TABLE techbbs(ip string, atime string, url string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/root/logcleanjob_output';
验证:
select * from techbbs;
119.127.191.86 20130531235956 forum.php?mod=viewthread&tid=11528&page=60&authorid=53387
157.56.177.164 20130531235957 api.php?mod=js&bid=65
223.240.215.151 20130531235958 source/plugin/pcmgr_url_safeguard/url_api.inc.php
112.64.235.246 20130531235957 home.php?mod=misc&ac=sendmail&rand=1370014195
49.74.113.251 20130531235958 home.php?mod=spacecp&ac=follow&op=checkfeed&rand=1370015996
117.79.176.9 20130531235958 home.php?mod=space&do=notice
Time taken: 0.097 s