1、显示当前可用函数
>SHOW FUNCTIONS;
2、显示函数的具体描述信息
>DESC FUNCTION EXTENDED concat;
3、聚合函数
函数处理的数据粒度为多条记录。
sum()—求和
count()—求数据量
avg()—求平均直
distinct—求不同值数(去重)
min—求最小值
max—求最人值
4、字符串函数
(1)字符串长度函数:length
select length('acbfa');
结果=5
(2)字符串反转函数:reverse
reverse('abcde')=edcba
(3)字符串链接函数:concat
concat('aa','bb','cc')=aabbcc
(4)带分隔符字符串连接函数:concat_ws
concat_ws(',','aaa','bbb','ccc')=aaa,bbb,ccc
(5)字符串截取函数:substr,substring
substr('abcde',3)=cde #从第三个字符开始截取
substr('abcde',-1)=e
substr('abcde',-2,2)=de
substr('abcde',3,2)=cd
(6)字符串转大写函数:upper,ucase
upper('Hive')=HIVE
(7).字符串转小写函数:lower,lcase
lower('HIVE')=hive
(8).去空格函数:trim()、ltrim()、rtrim()
trim(' abc ')=abc
(9).正則表达式替换函数:regexp_replace
regexp_replace('foobar','oo|ar','')=fb #替换oo|ar为空字符
(10).正則表达式解析函数:regexp_extract
语法: regexp_extract(string subject, string pattern, int index)
返回值: string
说明:将字符串subject依照pattern正則表達式的规则拆分,返回index指定的字符。
0表示把整个正则表达式对应的结果全部返回
1表示返回正则表达式中第一个() 对应的结果 以此类推
注意点:
要注意的是index的值不能大于表达式中()的个数。
hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 1) from lxw_dual;
the
hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 2) from lxw_dual;
bar
hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 0) from lxw_dual;
foothebar
5、json解析函数
+----+
json=
+----+
{"store":
{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
"bicycle":{"price":19.95,"color":"red"}
},
"email":"amy@only_for_json_udf_test.net",
"owner":"amy"
}
+----+
(1)get_json_objec函数
hive>SELECT get_json_object(src_json.json, '$.owner') FROM src_json;
amy
hive>SELECT get_json_object(src_json.json, '$.store.fruit\[0]') FROM src_json;
{"weight":8,"type":"apple"}
hive>SELECT get_json_object(src_json.json, '$.non_exist_key') FROM src_json;
NULL
(2)json_tuple函数处理多个key(lateral view关键字)
select src.timestamp, b.* from src_json src lateral view json_tuple(src.json, 'email', 'owner') b as f1, f2;
6、URL解析函数:parse_url
语法: parse_url(string urlString, string partToExtract [, stringkeyToExtract])
返回值: string
说明:返回URL中指定的部分。
partToExtract的有效值为:HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.
举例:
hive> select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') from lxw_dual;
facebook.com
hive> select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY','k1') from lxw_dual;
v1
7、parse_url_tuple
测试数据:
url1 http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1
url2 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-getjsonobject
url3 https://www.google.com.hk/#hl=zh-CN&newwindow=1&safe=strict&q=hive+translate+example&oq=hive+translate+example&gs_l=serp.3...10174.11861.6.12051.8.8.0.0.0.0.132.883.0j7.7.0...0.0...1c.1j4.8.serp.0B9C1T_n0Hs&bav=on.2,or.&bvm=bv.44770516,d.aGc&fp=e13e41a6b9dab3f6&biw=1241&bih=589
create external table if not exists t_url(f1 string, f2 string)
row format delimited fields TERMINATED BY ' '
location '/test/url';
SELECT f1, b.* FROM t_url
LATERAL VIEW parse_url_tuple(f2, 'HOST', 'PATH', 'QUERY', 'QUERY:k1') b as host, path, query, query_id;
(LATERAL VIEW关键字,可理解为将URL字符串分解为各字段host、path等)
结果:
url1 facebook.com /path1/p.php k1=v1&k2=v2 v1
url2 cwiki.apache.org /confluence/display/Hive/LanguageManual+UDF NULL NULL
url3 www.google.com.hk / NULL NULL
8、row_number()分组求topN
例如:
用户信息表user
1,woman,18,吃棒棒糖
2,man,18,养金鱼
3,woman,20,洗头发
4,man,18,养乌龟
5,man,19,养鸭
6,man,38,养鸡
7,woman,22,做头发
8,woman,23,买衣服
需求:
每种性别人群中,年龄最大的两个人
od
8,woman,23,买衣服 1
7,woman,22,做头发 2
3,woman,20,洗头发 3
1,woman,18,吃棒棒糖 4
6,man,38,养鸡 1
5,man,19,养鸭 2
4,man,18,养乌龟 3
2,man,18,养金鱼 4
sql语句:
select * from
(select id,sex,age,hobby,
row_number() over (partition by sex order by age desc) as new_user
from user) c
where c.new_user <= 2;
9、窗口函数
https://blog.csdn.net/qq_26937525/article/details/54925827
https://www.cnblogs.com/wujin/p/6051768.html