hive查询实战1

本文介绍了在Hive中进行JSON操作的方法,包括使用get_json_object函数筛选JSON字段,以及处理NULL值的不同方式。同时,还探讨了Hive的不等于操作符(<>)和(<=>)在遇到NULL时的行为。此外,提到了Hive的group by操作和XML处理函数如xpath_int的应用。
摘要由CSDN通过智能技术生成
1)json操作
hive> select get_json_object("{\"a\":1}","$.a") from filterd_url limit 2;               
OK 


Time taken: 364.962 seconds, Fetched: 2 row(s) 
hive> select * from filterd_url where get_json_object("{\"a\":1}","$.a") <=> NULL limit 2; 
OK 
Time taken: 1.06 seconds 
hive> select * from filterd_url where get_json_object("{\"a\":1}","$.b") <=> NULL limit 2; 
OK 
http://my.tv.sohu.com/us/54198372/14304719.shtml 20140505 noblog 
http://www.letv.com/ptv/vplay/2126589.html 20140505 noblog 
Time taken: 0.072 seconds, Fetched: 2 row(s) 
hive> 
hive ;
    > select * from filterd_url where get_json_object("{\"a\":1}","$.a") = '1' limit 2;
OK
http://my.tv.sohu.com/us/54198372/14304719.shtml 20140505 noblog
http://www.letv.com/ptv/vplay/2126589.html 20140505 noblog
Time taken: 0.109 seconds, Fetched: 2 row(s)
hive> select * from filterd_url where get_json_object("{\"a\":1}","$.a") = NULL limit 2;
OK
Time taken: 463.32 seconds
hive> select * from filterd_url where get_json_object("{\"a\":1}","$.b") = NULL limit 2;
OK
Time taken: 151.736 seconds
hive> select * from filterd_url where !(get_json_object("{\"a\":1}","$.a") <=> NULL) limit 2;
OK
http://my.tv.sohu.com/us/54198372/14304719.shtml 20140505 noblog
http://www.letv.com/ptv/vplay/2126589.html 20140505 noblog
Time taken: 1.092 seconds, Fetched: 2 row(s)
hive> 
hive> select * from filterd_url where get_json_object("{\"a\":1}","$.a") is not NULL limit 2;
OK
http://my.tv.sohu.com/us/54198372/14304719.shtml 20140505 noblog
http://www.letv.com/ptv/vplay/2126589.html 20140505 noblog
Time taken: 1.07 seconds, Fetched: 2 row(s)
所以,有效过滤出json字符串中不存在的信息的方法是:!(get_json_object("{\"a\":1}","$.a") <=> NULL)
或者使用:get_json_object("{\"a\":1}","$.a") is not NULL

2)hive的等于和不等于操作时遇见NULL的情况
    不等于操作符:<>,如果操作数中只要有一个为NULL,则返回NULL;否则,执行比较操作
    不等于操作符:<=>,如果操作数中一个为NULL,则返回False;如果两个操作数都为NULL,则返回True;否则,执行比较操作

3)hive group by 操作
select uid, concat_ws(" ", collect_set(cluster_ids)) from site_user_cluster_info group by uid;

4)hive xml 操作
  > SELECT xpath_int ('<a>1</a>', 'a') FROM wxj_test_url;
  1
 
  > SELECT xpath_int ('<a><b>1</b><c>2</c></a>', 'a') FROM wxj_test_url;
  12
 
  > SELECT xpath_int ('<a><b>1</b><c>2</c></a>', 'a/*') FROM wxj_test_url;
  1
 
  > SELECT xpath_int ('<a><b>1</b><c>2</c></a>', 'sum(a/*)') FROM wxj_test_url;
  3
 
  > SELECT xpath_int ('<a><b>5</b><c>2</c></a>', 'a/b * a/c') FROM wxj_test_url;
  10
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值