Hive 在多维统计分析中的应用

本文总结了Hive在多维统计分析中的应用,包括同属性的多维组合统计和不同属性的多维组合统计,并提供了具体的解决思路和Hive使用中的注意事项,如union all的限制、多表插入、任务并行执行配置、正则表达式使用等实用技巧。
摘要由CSDN通过智能技术生成

Hive 在多维统计分析中的应用 & 技巧总结

  2626人阅读  评论(0)  收藏  举报
  分类:

多维统计一般分两种,我们看看 Hive 中如何解决:

1、同属性的多维组合统计

(1)问题:
有如下数据,字段内容分别为:url, catePath0, catePath1, catePath2, unitparams


https://cwiki.apache.org/confluence 0 1 8 {"store":{"fruit":[{"weight":1,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.951,"color":"red1"}},"email":"amy@only_for_json_udf_test.net","owner":"amy1"} 
http://my.oschina.net/leejun2005/blog/83058 0 1 23 {"store":{"fruit":[{"weight":1,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.951,"color":"red1"}},"email":"amy@only_for_json_udf_test.net","owner":"amy1"} 
http://www.hao123.com/indexnt.html?sto 0 1 25 {"store":{"fruit":[{"weight":1,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.951,"color":"red1"}},"email":"amy@only_for_json_udf_test.net","owner":"amy1"} 
https://cwiki.apache.org/confluence 0 5 18 {"store":{"fruit":[{"weight":5,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.951,"color":"red1"}},"email":"amy@only_for_json_udf_test.net","owner":"amy1"} 
http://my.oschina.net/leejun2005/blog/83058 0 5 118 {"store":{"fruit":[{"weight":5,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.951,"color":"red1"}},"email":"amy@only_for_json_udf_test.net","owner":"amy1"} 
http://www.hao123.com/indexnt.html?sto 0 3 98 {"store":{"fruit":[{"weight":3,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.951,"color":"red1"}},"email":"amy@only_for_json_udf_test.net","owner":"amy1"} 
http://www.hao123.com/indexnt.html?sto 0 3 8 {"store":{"fruit":[{"weight":3,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.951,"color":"red1"}},"email":"amy@only_for_json_udf_test.net","owner":"amy1"} 
http://my.oschina.net/leejun2005/blog/83058 0 5 81 {"store":{"fruit":[{"weight":5,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.951,"color":"red1"}},"email":"amy@only_for_json_udf_test.net","owner":"amy1"} 
http://www.hao123.com/indexnt.html?sto 0 9 8 {"store":{"fruit":[{"weight":9,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.951,"color":"red1"}},"email":"amy@only_for_json_udf_test.net","owner":"amy1"} 

(2)需求:
计算 catePath0, catePath1, catePath2 这三种维度组合下,各个 url 对应的 pv、uv,如:


0 1 23 1 1 
0 1 25 1 1 
0 1 8 1 1 
0 1 ALL 3 3 
0 3 8 1 1 
0 3 98 1 1 
0 3 ALL 2 1 
0 5 118 1 1 
0 5 18 1 1 
0 5 81 1 1 
0 5 ALL 3 2 
0 ALL ALL 8 3 
ALL ALL ALL 8 3 

(3)解决思路:
hive 中同属性多维统计问题通常用 union all 组合出各种维度然后 group by 进行求解:

01 create EXTERNAL table IF NOT EXISTS t_log (
02     url string, c0 string, c1 string, c2 string, unitparams string
03 )  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' location '/tmp/decli/1';
04  
05 select from (
06         select host, c0, c1, c2 from t_log t0
07         LATERAL VIEW parse_url_tuple(url, 'HOST') t1 as host
08         where get_json_object(t0.unitparams, '$.store.fruit[0].weight') != 9
09     union all
10         select host, c0, c1, 'ALL' c2 from t_log t0
11         LATERAL VIEW parse_url_tuple(url, 'HOST') t1 as host
12         where get_json_object(t0.unitparams, '$.store.fruit[0].weight'
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值