承接上一篇文档《新增访客数量MR统计之MR数据输出到MySQL》
hive-1.2.1的版本可以直接映射HBase已经存在的表
如果说想在hive创建表,同时HBase不存在对应的表,也想做映射,那么采用编译后的hive版本hive-1.2.1-hbase
1. Hive中创建外部表,关联hbase
CREATE EXTERNAL TABLE event_log_20180728(key string,pl string,ver string,s_time string,u_ud string,u_sd string,en string)STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:pl,info:ver,info:s_time,info:u_ud,info:u_sd,info:en")TBLPROPERTIES("hbase.table.name" = "event_log_20180728");
统计多少个新用户:
select count(*) from event_log_20180728 where en="e_l";
2. 提取数据,进行初步的数据过滤操作,最终将数据保存到临时表
创建临时表
CREATE TABLE stats_hourly_tmp01(pl string,ver string,s_time string,u_ud string,u_sd string,en string,`date` string,hour int);
将原始数据提取到临时表中
INSERT OVERWRITE TABLE stats_hourly_tmp01SELECT pl,ver,s_time,u_ud,u_sd,en,from_unixtime(cast(s_time/1000 as int),'yyyy-MM-dd'), hour(from_unixtime(cast(s_time/1000 as int),'yyyy-MM-dd HH:mm:ss'))FROM event_log_20200510WHERE en="e_l" or en="e_pv";