# Hive函数总结

SHOW FUNCTIONS;

hive> show functions ;
OK
!
!=
%

DESC FUNCTION concat;

hive> DESC FUNCTION concat;
OK
concat(str1, str2, ... strN) - returns the concatenation of str1, str2, ... strN or concat(bin1, bin2, ... binN) - returns the concatenation of bytes in binary data  bin1, bin2, ... binN
Time taken: 0.005 seconds, Fetched: 1 row(s)

DESC FUNCTION EXTENDED concat;

### 1.数学函数

返回对a四舍五入的BIGINT值

1 返回值：
2 hive> select round(2.5);
3 OK
4 3.0
5 Time taken: 0.093 seconds, Fetched: 1 row(s)

1 返回值：double
2 hive> select round(0.5002,2);
3 OK
4 0.5
5 Time taken: 0.074 seconds, Fetched: 1 row(s)

返回值：double
bround(2.5) = 2, bround(3.5) = 4.

1 返回值：double
2 bround(8.25, 1) = 8.2, bround(8.35, 1) = 8.4

1 返回值：double
2 hive> select floor(6.10);
3 OK
4 6
5 Time taken: 0.07 seconds, Fetched: 1 row(s)
6 hive> select floor(-3.4);
7 OK
8 -4
9 Time taken: 0.104 seconds, Fetched: 1 row(s)

 1 返回值：BIGINT
2 hive> select ceil(6);
3 OK
4 6
5 Time taken: 0.2 seconds, Fetched: 1 row(s)
6 hive> select ceil(6.1);
7 OK
8 7
9 Time taken: 0.061 seconds, Fetched: 1 row(s)
10 hive> select ceil(6.9);
11 OK
12 7
13 Time taken: 0.153 seconds, Fetched: 1 row(s)

1 返回值：DOUBLE
2 hive> select rand(2);
3 OK
4 0.7311469360199058
5 Time taken: 0.068 seconds, Fetched: 1 row(s)
6 hive> select rand();
7 OK
8 0.7859071491095923
9 Time taken: 0.064 seconds, Fetched: 1 row(s)

1 返回值：double
2 hive> select exp(2);
3 OK
4 7.38905609893065
5 Time taken: 0.1 seconds, Fetched: 1 row(s)

以自然数为底d的对数，a可为小数 ln(DOUBLE a), ln(DECIMAL a)

 1 返回值：double
2 ln(DOUBLE a), ln(DECIMAL a)
3    > select ln(3);
4 OK
5 1.0986122886681098
6 Time taken: 0.081 seconds, Fetched: 1 row(s)
7 hive> select ln(3.2);
8 OK
9 1.1631508098056809
10 Time taken: 0.067 seconds, Fetched: 1 row(s)

1 返回值：double
2 hive> select log10(3.2);
3 OK
4 0.505149978319906
5 Time taken: 0.084 seconds, Fetched: 1 row(s)
6 hive> select log10(3);
7 OK
8 0.47712125471966244
9 Time taken: 0.075 seconds, Fetched: 1 row(s)

以2为底数d的对数，a可为小数  log2(DOUBLE a), log2(DECIMAL a)

 1 返回值：double
2 hive>
3     > select log2(3);
4 OK
5 1.5849625007211563
6 Time taken: 0.083 seconds, Fetched: 1 row(s)
7 hive> select log2(3.2);
8 OK
9 1.6780719051126378
10 Time taken: 0.07 seconds, Fetched: 1 row(s)

log(DOUBLE base, DOUBLE a)

log(DECIMAL base, DECIMAL a)

1 返回值：double
2 hive> select log(2,3.2);
3 OK
4 1.6780719051126378
5 Time taken: 0.084 seconds, Fetched: 1 row(s)
6 hive> select log(2,3);
7 OK
8 1.5849625007211563
9 Time taken: 0.066 seconds, Fetched: 1 row(s)

1 返回值:double
2 hive> select pow(2,4);
3 OK
4 16.0
5 Time taken: 0.065 seconds, Fetched: 1 row(s)

1 返回值：double
2 select sqrt(2);

返回值：string
hive> select bin(2);
OK
10
Time taken: 0.194 seconds, Fetched: 1 row(s)

1 返回值：STRING
2 hive> select hex(2);
3 OK
4 2
5 Time taken: 0.097 seconds, Fetched: 1 row(s)

hex的逆方法

unhex(STRING a)

1 返回值：BINARY
2 hive> select unhex(2);
3 OK
4
5 Time taken: 0.077 seconds, Fetched: 1 row(s)

1 返回值：STRING
2 hive> select conv(2,10,2);
3 OK
4 10
5 Time taken: 0.075 seconds, Fetched: 1 row(s)

1 返回值：DOUBLE
2 hive> select abs(-2);
3 OK
4 2
5 Time taken: 0.077 seconds, Fetched: 1 row(s)

a对b取模 pmod(INT a, INT b), pmod(DOUBLE a, DOUBLE b)

1 返回值：double
2 hive> select pmod(4,2);
3 OK
4 0
5 Time taken: 0.077 seconds, Fetched: 1 row(s)

1 返回值：double
2 hive> select sin(2.5);
3 OK
4 0.5984721441039564
5 Time taken: 0.092 seconds, Fetched: 1 row(s)

1 返回值：double
2 hive> select asin(2.5);
3 OK
4 NaN
5 Time taken: 0.097 seconds, Fetched: 1 row(s)

1 返回值：double
2 hive> select cos(2.5);
3 OK
4 -0.8011436155469337
5 Time taken: 0.087 seconds, Fetched: 1 row(s)

1 返回值：double
2 hive> select acos(2.5);
3 OK
4 NaN
5 Time taken: 0.091 seconds, Fetched: 1 row(s)

1 返回值：double
2 hive> select tan(2.5);
3 OK
4 -0.7470222972386603
5 Time taken: 0.076 seconds, Fetched: 1 row(s)

1 返回值：double
2 hive> select atan(2.5);
3 OK
4 1.1902899496825317
5 Time taken: 0.074 seconds, Fetched: 1 row(s)

1 返回值：DOUBLE
2 hive> select degrees(30);
3 OK
4 1718.8733853924698
5 Time taken: 0.114 seconds, Fetched: 1 row(s)

1 返回值：double
3 OK
4 0.5235987755982988
5 Time taken: 0.093 seconds, Fetched: 1 row(s)

1 返回值：INT or DOUBLE
2 hive> select positive(2);
3 OK
4 2
5 Time taken: 0.124 seconds, Fetched: 1 row(s)

1 返回值：double
2 hive> select negative(2);
3 OK
4 -2
5 Time taken: 0.066 seconds, Fetched: 1 row(s)

1 返回值：DOUBLE or INT
2 hive> select sign(2);
3 OK
4 1.0
5 Time taken: 0.091 seconds, Fetched: 1 row(s)

1 返回值：double
2 hive> select e();
3 OK
4 2.718281828459045
5 Time taken: 0.07 seconds, Fetched: 1 row(s)

1 返回值：double
2 hive> select pi();
3 OK
4 3.141592653589793
5 Time taken: 0.082 seconds, Fetched: 1 row(s)

1 返回值：BIGINT
2 select  factorial(2);

1 返回值：DOUBLE
2 select cbrt(2);

shiftleft(TINYINT|SMALLINT|INT a, INT b)

shiftleft(BIGINT a, INT b)

1 返回值：int bigint
2 hive> select shiftleft(2,3);

shiftright(TINYINT|SMALLINT|INT a, INTb)

shiftright(BIGINT a, INT b)

1 返回值：INT BIGINT
2 hive> select shiftrigth(2,3);

shiftrightunsigned(TINYINT|SMALLINT|INTa, INT b),

shiftrightunsigned(BIGINT a, INT b)

1 返回值：INT BIGINT
2 select shiftrightunsigned(2,3);

1 返回值：T
2 hive> select greatest(2,3,6,7);
3 OK
4 7
5 Time taken: 0.072 seconds, Fetched: 1 row(s)

1 返回值：double
2 hive> select least(2,3,6,7);
3 OK
4 2
5 Time taken: 0.079 seconds, Fetched: 1 row(s)

2.类型转换函数

将输入的值转换成二进制  binary(string|binary)

1 返回值：binary
2 hive> select binary('4');
3 OK
4 4
5 Time taken: 0.08 seconds, Fetched: 1 row(s)

1 返回值：Expected "=" to follow "type"
2 hive> select cast("1" as BIGINT) ;
3 OK
4 1
5 Time taken: 0.266 seconds, Fetched: 1 row(s)

3.日期函数

日期函数UNIX时间戳转日期函数: from_unixtime语法:   from_unixtime(bigint unixtime[, string format])

1 返回值: string
2 说明: 转化UNIX时间戳（从1970-01-01 00:00:00 UTC到指定时间的秒数）到当前时区的时间格式
3 举例：
4 hive> select from_unixtime(1323308943,'yyyyMMdd');
5 OK
6 20111208
7 Time taken: 0.152 seconds, Fetched: 1 row(s)

1 返回值:   bigint
2 说明: 获得当前时区的UNIX时间戳
3 举例：
4 Time taken: 0.152 seconds, Fetched: 1 row(s)
5 hive>  select unix_timestamp();
6 OK
7 1487931871
8 Time taken: 0.106 seconds, Fetched: 1 row(s)

1 返回值:   bigint
2 说明: 转换格式为“yyyy-MM-dd HH:mm:ss“的日期到UNIX时间戳。如果转化失败，则返回0。
3 举例：
4 hive> select unix_timestamp('2011-12-07 13:01:03');
5 OK
6 1323234063
7 Time taken: 0.083 seconds, Fetched: 1 row(s)

1 返回值:   bigint
2 说明: 转换pattern格式的日期到UNIX时间戳。如果转化失败，则返回0。
3 举例：
4 hive> select unix_timestamp('20111207 13:01:03','yyyyMMdd HH:mm:ss');
5 OK
6 1323234063
7 Time taken: 0.079 seconds, Fetched: 1 row(s)

1 返回值:   string
2 说明: 返回日期时间字段中的日期部分。
3 举例：
4 hive> select to_date('2011-12-08 10:03:01') ;
5 OK
6 2011-12-08
7 Time taken: 0.194 seconds, Fetched: 1 row(s)

1 返回值: int
2 说明: 返回日期中的年。
3 举例：
4 hive>  select year('2011-12-08 10:03:01');
5 OK
6 2011
7 Time taken: 0.168 seconds, Fetched: 1 row(s)

1 返回值: int
2 说明: 返回日期中的月份。
3 举例：
4 hive>  select month('2011-12-08 10:03:01');
5 OK
6 12
7 Time taken: 0.084 seconds, Fetched: 1 row(s)

1 hive> select month('2011-08-08');
2 OK
3 8
4 Time taken: 0.095 seconds, Fetched: 1 row(s)

1 返回值: int
2 说明: 返回日期中的天。
3 举例：
4 hive>  select day('2011-12-08 10:03:01');
5 OK
6 8
7 Time taken: 0.115 seconds, Fetched: 1 row(s)

1 hive> select day('2011-12-24');
2 OK
3 24
4 Time taken: 0.294 seconds, Fetched: 1 row(s)

1 返回值: int
2 说明: 返回日期中的小时。
3 举例：
4 hive> select hour('2011-12-08 10:03:01');
5 OK
6 10
7 Time taken: 0.082 seconds, Fetched: 1 row(s)

1 返回值: int
2 说明: 返回日期中的分钟。
3 举例：
4 hive>  select minute('2011-12-08 10:03:01');
5 OK
6 3
7 Time taken: 0.181 seconds, Fetched: 1 row(s)

1 返回值: int
2 说明: 返回日期中的秒。
3 举例：
4 hive>  select second('2011-12-08 10:03:01');
5 OK
6 1
7 Time taken: 0.693 seconds, Fetched: 1 row(s)

1 返回值: int
2 说明: 返回日期在当前的周数。
3 举例：
4 hive> select weekofyear('2011-12-08 10:03:01')
5     > ;
6 OK
7 49
8 Time taken: 0.119 seconds, Fetched: 1 row(s)

1 返回值: int
2 说明: 返回结束日期减去开始日期的天数。
3 举例：
4 hive> select datediff('2012-12-08','2012-05-09');
5 OK
6 213
7 Time taken: 0.082 seconds, Fetched: 1 row(s)

1 返回值: string
2 说明: 返回开始日期startdate增加days天后的日期。
3 举例：
5 OK
6 2012-12-18
7 Time taken: 0.201 seconds, Fetched: 1 row(s)

1 返回值: string
2 说明: 返回开始日期startdate减少days天后的日期。
3 举例：
4 hive> select date_sub('2012-12-08',10);
5 OK
6 2012-11-28
7 Time taken: 0.125 seconds, Fetched: 1 row(s)

1 返回值：string
2 hive> select  last_day('2017-02-17 08:34:23');
3 OK
4 2017-02-28
5 Time taken: 0.082 seconds, Fetched: 1 row(s)

1 返回值：string
2 hive> select next_day('2015-01-14', 'TU') ;
3 OK
4 2015-01-20
5 Time taken: 0.319 seconds, Fetched: 1 row(s)

1 返回值：int
2 quarter(date/timestamp/string)

返回值：date
hive> select current_date;
OK
2017-02-25
Time taken: 0.087 seconds, Fetched: 1 row(s)

1 返回值：timestamp
2 from_utc_timestamp(timestamp, string timezone)
3 hive> select from_utc_timestamp('1970-01-01 08:00:00','PST');
4 OK
5 1970-01-01 00:00:00
6 Time taken: 0.122 seconds, Fetched: 1 row(s)

1 返回值：timestamp
2 hive> select to_utc_timestamp('1970-01-01 00:00:00','PST');
3 OK
4 1970-01-01 08:00:00
5 Time taken: 0.099 seconds, Fetched: 1 row(s)

1 返回值：timestamp
2 hive> select current_timestamp;
3 OK
4 2017-02-25 00:28:46.724
5 Time taken: 0.069 seconds, Fetched: 1 row(s)

1 返回值：string
3 OK
4 2017-04-10
5 Time taken: 0.061 seconds, Fetched: 1 row(s)

5.条件函数

如果testCondition 为true就返回valueTrue,否则返回valueFalseOrNull ，（valueTrue，valueFalseOrNull为泛型）

if(boolean testCondition, T valueTrue, T valueFalseOrNull)

1 返回值：T

nvl(T value, T default_value)

1 返回值：T

COALESCE(T v1, T v2, ...)

1 返回值：T

CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END

1 返回值：T

CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END

1 返回值：T

isnull( a )

1 返回值：boolean

isnotnull ( a )

1 返回值：boolean

6.字符函数

返回str中首个ASCII字符串的整数值

ascii(string str)

1 返回值：int

base64(binary bin)

1 返回值：string

concat(string|binary A, string|binary B...)

1 返回值：string

context_ngrams(array<array<string>>, array<string>, int K, int pf)

1 返回值：array<struct<string,double>>

concat_ws(string SEP, string A, string B...)

1 返回值：string

concat_ws(string SEP, array<string>)

1 返回值：string

encode(string src, string charset)

1 返回值：binary

find_in_set(string str, string strList)

1 返回值：int

format_number(number x, int d)

1 返回值：string

get_json_object(string json_string, string path)

1 返回值：string

in_file(string str, string filename)

1 返回值：boolean

instr(string str, string substr)

1 返回值：int

length(string A)

1 返回值：int

locate(string substr, string str[, int pos])

1 返回值：int

lower(string A) lcase(string A)

1 返回值：string

1 返回值：string

ltrim(string A)

1 返回值：string

ngrams(array<array<string>>, int N, int K, int pf)

1 返回值：array<struct<string,double>>

parse_url(string urlString, string partToExtract [, string keyToExtract])

1 返回值：string

printf(String format, Obj... args)

1 返回值：string

regexp_extract(string subject, string pattern, int index)

1 返回值：string

regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)

1 返回值：string

repeat(string str, int n)

1 返回值：
2 string

reverse(string A)

1 返回值：string

1 返回值：string

rtrim(string A)

1 返回值：string

sentences(string str, string lang, string locale)

1 返回值：array<array<string>>

space(int n)

返回值：string

split(string str, string pat)

1 返回值：string

str_to_map(text[, delimiter1, delimiter2])

1 返回值：map<string,string>

substr(string|binary A, int start) substring(string|binary A, int start)

1 返回值：string

substr(string|binary A, int start, int len) substring(string|binary A, int start, int len)

1 返回值：string

substring_index(string A, string delim, int count)

1 返回值：string

translate(string|char|varchar input, string|char|varchar from, string|char|varchar to)

1 返回值：string

unbase64(string str)

1 返回值：binary

upper(string A) ucase(string A)

1 返回值：string

initcap(string A)

1 返回值：string

levenshtein(string A, string B)

1 返回值：int

soundex(string A)

1 返回值：string

7.字符串函数

1 返回值:  array
2 说明: 按照pat字符串分割str，会返回分割后的字符串数组
3 举例：
4 hive> select split('abtcdtef','t');
5 OK
6 ["ab","cd","ef"]
7 Time taken: 0.118 seconds, Fetched: 1 row(s)

1 返回值: string
3 举例：
4 hive>
6 OK
7 abctdtdtdt
8 Time taken: 0.077 seconds, Fetched: 1 row(s)

1 hive>
3 OK
4 abctdtdtdt
5 Time taken: 0.077 seconds, Fetched: 1 row(s)

1 返回值: int
2 说明：返回字符串str第一个字符的ascii码
3 举例：  hive>  select ascii('abcde');
4 OK
5 97
6 Time taken: 0.066 seconds, Fetched: 1 row(s)

1 返回值: string
2 说明：返回重复n次后的str字符串
3 举例：
4  hive> select repeat('abc',5);
5 OK
6 abcabcabcabcabc
7 Time taken: 0.064 seconds, Fetched: 1 row(s)

 1 返回值: string
2 说明：返回长度为n的字符串
3 举例：hive> select space(10);
4 OK
5
6 Time taken: 0.101 seconds, Fetched: 1 row(s)
7 hive> select length(space(10));
8 OK
9 10
10 Time taken: 1.905 seconds, Fetched: 1 row(s)

1 语法: length(string A)
2 返回值: int
3 说明：返回字符串A的长度
4 举例：hive> select length('abcedfg');
5 OK
6 7
7 Time taken: 0.065 seconds, Fetched: 1 row(s)

语法: reverse(string A)

OK
gfdecba
Time taken: 0.077 seconds, Fetched: 1 row(s)

1 语法: concat(string A, string B…)
2 返回值: string
3 说明：返回输入字符串连接后的结果，支持任意个输入字符串
4 举例：hive>  select concat('abc','def','gh');
5 OK
6 abcdefgh
7 Time taken: 0.063 seconds, Fetched: 1 row(s)

语法: concat_ws(string SEP, string A, string B…)

OK
abc-def-gh
Time taken: 0.06 seconds, Fetched: 1 row(s)

 1 语法: substr(string A, int start),substring(string A, int start)
2 返回值: string
3 说明：返回字符串A从start位置到结尾的字符串
4 举例： hive> select substr('abcde',3);
5 OK
6 cde
7 Time taken: 0.062 seconds, Fetched: 1 row(s)
8 hive> select substring('abcde',3);
9 OK
10 cde
11 Time taken: 0.05 seconds, Fetched: 1 row(s)
12 hive> select substr('abcde',-1);
13 OK
14 e
15 Time taken: 0.061 seconds, Fetched: 1 row(s)

 1 语法: substr(string A, int start, int len),substring(string A, int start, int len)
2 返回值: string
3 说明：返回字符串A从start位置开始，长度为len的字符串
4 举例：
5 hive>  select substr('abcde',3,2);
6 OK
7 cd
8 Time taken: 0.07 seconds, Fetched: 1 row(s)
9 hive> select substring('abcde',3,2);
10 OK
11 cd
12 Time taken: 0.062 seconds, Fetched: 1 row(s)
13 hive> select substring('abcde',-2,2);
14 OK
15 de
16 Time taken: 0.113 seconds, Fetched: 1 row(s)

 1 语法: upper(string A) ucase(string A)
2 返回值: string
3 说明：返回字符串A的大写格式
4 举例：hive> select upper('abSEd');
5 OK
6 ABSED
7 Time taken: 0.059 seconds, Fetched: 1 row(s)
8 hive> select ucase('abSEd');
9 OK
10 ABSED
11 Time taken: 0.058 seconds, Fetched: 1 row(s)

 1 语法: lower(string A) lcase(string A)
2 返回值: string
3 说明：返回字符串A的小写格式
4 举例：
5 hive> select lower('abSEd');
6 OK
7 absed
8 Time taken: 0.068 seconds, Fetched: 1 row(s)
9 hive> select lcase('abSEd');
10 OK
11 absed
12 Time taken: 0.057 seconds, Fetched: 1 row(s)

1 语法: trim(string A)
2 返回值: string
3 说明：去除字符串两边的空格
4 举例：hive> select trim(' abc ');
5 OK
6 abc
7 Time taken: 0.058 seconds, Fetched: 1 row(s)

1 语法: ltrim(string A)
2 返回值: string
3 说明：去除字符串左边的空格
4 举例：  hive> select ltrim(' abc ');
5 OK
6 abc
7 Time taken: 0.059 seconds, Fetched: 1 row(s)

1 语法: rtrim(string A)
2 返回值: string
3 说明：去除字符串右边的空格
4 举例：hive> select rtrim(' abc ');
5 OK
6  abc
7 Time taken: 0.058 seconds, Fetched: 1 row(s)

 1 语法: regexp_extract(string subject, string pattern, int index)
2 返回值: string
3 说明：将字符串subject按照pattern正则表达式的规则拆分，返回index指定的字符。注意，在有些情况下要使用转义字符
4 举例：
5  hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 1);
6 OK
7 the
8 Time taken: 0.389 seconds, Fetched: 1 row(s)
9 hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 2);
10 OK
11 bar
12 Time taken: 0.051 seconds, Fetched: 1 row(s)
13 hive>  select regexp_extract('foothebar', 'foo(.*?)(bar)', 0);
14 OK
15 foothebar
16 Time taken: 0.058 seconds, Fetched: 1 row(s)

 1 parse_url(url, partToExtract[, key]) - extracts a part from a URL
2 解析URL字符串，partToExtract的选项包含[HOST,PATH,QUERY,REF,PROTOCOL,FILE,AUTHORITY,USERINFO]。
3 举例：
4 hive> select parse_url('http://facebook.com/path/p1.php?query=1', 'HOST') ;
5 OK
7 Time taken: 0.286 seconds, Fetched: 1 row(s)
9 OK
10 /path/p1.php
11 Time taken: 0.069 seconds, Fetched: 1 row(s)
13 OK
14 query=1
15 可以指定key来返回特定参数，例如
16 Time taken: 0.21 seconds, Fetched: 1 row(s)
18 OK
19 1
20 Time taken: 0.057 seconds, Fetched: 1 row(s)
22 OK
23 Ref
24 Time taken: 0.055 seconds, Fetched: 1 row(s)
26 OK
27 http
28 Time taken: 0.06 seconds, Fetched: 1 row(s)

1 hive> select parse_url_tuple('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY:k1', 'QUERY:k2');
2 OK
3 v1      v2
4 Time taken: 0.2 seconds, Fetched: 1 row(s)

json解析函数：get_json_object

 1 返回值: string
2 说明：解析json的字符串json_string,返回path指定的内容。如果输入的json字符串无效，那么返回NULL。
3 举例： hive> select get_json_object('{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net","owner":"amy"}','$.store'); 4 OK 5 {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"}} 6 Time taken: 0.108 seconds, Fetched: 1 row(s) 7 8 hive> select get_json_object('{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net","owner":"amy"}','$.email');
9 OK
10 amy@only_for_json_udf_test.net
11
12 hive> select get_json_object('{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net","owner":"amy"}','$.owner'); 13 OK 14 amy 15 Time taken: 0.499 seconds, Fetched: 1 row(s) 行转列：explode （posexplode Available as of Hive 0.13.0）  1 说明：将输入的一行数组或者map转换成列输出 2 语法：explode(array (or map)) 3 举例： 4 5 hive> select explode(split(concat_ws('-','1','2','3','4','5','6','7','8','9'),'-')); 6 OK 7 1 8 2 9 3 10 4 11 5 12 6 13 7 14 8 15 9 16 Time taken: 0.095 seconds, Fetched: 9 row(s) ### 二.集合函数 集合查找函数: find_in_set 语法: find_in_set(string str, string strList) 返回值: int 说明: 返回str在strlist第一次出现的位置，strlist是用逗号分割的字符串。如果没有找该str字符，则返回0 举例： hive> select find_in_set('ab','ef,ab,de'); OK 2 Time taken: 2.336 seconds, Fetched: 1 row(s) hive> select find_in_set('at','ef,ab,de'); OK 0 Time taken: 0.094 seconds, Fetched: 1 row(s) hive提供了复合数据类型： Structs： structs内部的数据可以通过DOT（.）来存取，例如，表中一列c的类型为STRUCT{a INT; b INT}，我们可以通过c.a来访问域a Maps（K-V对）：访问指定域可以通过["指定域名称"]进行，例如，一个Map M包含了一个group-》gid的kv对，gid的值可以通过M[' group']来获取 Arrays：array中的数据为相同类型，例如，假如array A中元素['a','b','c']，则A[1]的值为'b' Struct使用  1 create table qa_test.student_test(id INT, info struct<name:STRING, age:INT>) 2 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 3 COLLECTION ITEMS TERMINATED BY ':'; 4 5 hive> desc qa_test.student_test; 6 OK 7 id int 8 info struct<name:string,age:int> 9 Time taken: 0.048 seconds, Fetched: 2 row(s) 10 11 12$cat test5.txt
13 1,zhou:30
14 2,yan:30
15 3,chen:20
16 4,li:80
17
18
20
21 hive> select  *  from qa_test.student_test;
22 OK
23 1       {"name":"zhou","age":30}
24 2       {"name":"yan","age":30}
25 3       {"name":"chen","age":20}
26 4       {"name":"li","age":80}
27
28
29 hive> select info.age from qa_test.student_test;
30 OK
31 30
32 30
33 20
34 80
35 Time taken: 0.234 seconds, Fetched: 4 row(s)
36 hive> select info.name from qa_test.student_test;
37 OK
38 zhou
39 yan
40 chen
41 li
42 Time taken: 0.08 seconds, Fetched: 4 row(s)

Array使用

 1 create table qa_test.class_test(name string, student_id_list array<INT>)
2 ROW FORMAT DELIMITED
3 FIELDS TERMINATED BY ','
4 COLLECTION ITEMS TERMINATED BY ':';
5
6 hive>desc qa_test.class_test;
7 OK
8 name                    string
9 student_id_list         array<int>
10 Time taken: 0.052 seconds, Fetched: 2 row(s)
11
12 $cat test6.txt 13 034,1:2:3:4 14 035,5:6 15 036,7:8:9:10 16 17 LOAD DATA LOCAL INPATH '/home/hadoop/test/test6' INTO TABLE qa_test.class_test; 18 19 hive> select * from qa_test.class_test; 20 OK 21 034 [1,2,3,4] 22 035 [5,6] 23 036 [7,8,9,10] 24 Time taken: 0.076 seconds, Fetched: 3 row(s) 25 26 select student_id_list[3] from qa_test.class_test; 27 28 hive> select student_id_list[3] from qa_test.class_test; 29 OK 30 4 31 NULL 32 10 33 Time taken: 0.12 seconds, Fetched: 3 row(s) 34 35 hive> select size(student_id_list) from qa_test.class_test; 36 OK 37 4 38 2 39 4 40 Time taken: 0.086 seconds, Fetched: 3 row(s) 41 42 43 hive> select array_contains(student_id_list,4) from qa_test.class_test; 44 OK 45 true 46 false 47 false 48 Time taken: 0.129 seconds, Fetched: 3 row(s) 49 50 hive> 51 > select sort_array(student_id_list) from qa_test.class_test; 52 OK 53 [1,2,3,4] 54 [5,6] 55 [7,8,9,10] 56 Time taken: 0.085 seconds, Fetched: 3 row(s) Map使用  1 create table qa_test.employee(id string, perf map<string, int>) 2 ROW FORMAT DELIMITED 3 FIELDS TERMINATED BY '\t' 4 COLLECTION ITEMS TERMINATED BY ',' 5 MAP KEYS TERMINATED BY ':'; 6 7 8$ cat test7.txt
9 1    job:80,team:60,person:70
10 2    job:60,team:80
11 3    job:90,team:70,person:100
12
14
15 hive> select * from qa_test.employee;
16 OK
17 1       {"job":80,"team":60,"person":70}
18 2       {"job":60,"team":80}
19 3       {"job":90,"team":70,"person":100}
20 Time taken: 0.075 seconds, Fetched: 3 row(s)
21
22 hive> select perf['job'] from qa_test.employee where perf['job'] is not null;
23 OK
24 80
25 60
26 90
27 Time taken: 0.096 seconds, Fetched: 3 row(s)
28
29 hive> select size(perf) from qa_test.employee;
30 OK
31 3
32 2
33 3
34 Time taken: 0.091 seconds, Fetched: 3 row(s)
35
36 hive> select map_keys(perf) from qa_test.employee;
37 OK
38 ["job","team","person"]
39 ["job","team"]
40 ["job","team","person"]
41 Time taken: 0.136 seconds, Fetched: 3 row(s)
42
43 hive> select map_values(perf) from qa_test.employee;
44 OK
45 [80,60,70]
46 [60,80]
47 [90,70,100]
48 Time taken: 0.077 seconds, Fetched: 3 row(s)

1 返回值：int
2 hive> select size(perf) from qa_test.employee;
3 OK
4 3
5 2
6 3
7 Time taken: 0.091 seconds, Fetched: 3 row(s)

1 返回值：int
2 hive> select size(student_id_list) from qa_test.class_test;
3 OK
4 4
5 2
6 4
7 Time taken: 0.086 seconds, Fetched: 3 row(s)

map_keys(Map<K.V>)

1 返回值：array<K>
2 hive> select map_keys(perf) from qa_test.employee;
3 OK
4 ["job","team","person"]
5 ["job","team"]
6 ["job","team","person"]
7 Time taken: 0.136 seconds, Fetched: 3 row(s)

1 返回值：array<V>
2 hive> select map_values(perf) from qa_test.employee;
3 OK
4 [80,60,70]
5 [60,80]
6 [90,70,100]
7 Time taken: 0.077 seconds, Fetched: 3 row(s)

1 返回值：boolean
2 hive> select array_contains(student_id_list,4) from qa_test.class_test;
3 OK
4 true
5 false
6 false
7 Time taken: 0.129 seconds, Fetched: 3 row(s)

1 返回值：array
2 hive> select   sort_array(student_id_list) from qa_test.class_test;
3 OK
4 [1,2,3,4]
5 [5,6]
6 [7,8,9,10]
7 Time taken: 0.085 seconds, Fetched: 3 row(s)

统计总行数，包括含有NULL值的行  count(*)

1 返回值：BIGINT

sum(col),表示求指定列的和，sum(DISTINCT col)表示求去重后的列的和

1 返回值：DOUBLE

avg(col),表示求指定列的平均值，avg(DISTINCT col)表示求去重后的列的平均值

1 返回值：DOUBLE

1 返回值：DOUBLE

1 返回值：DOUBLE

1 返回值：DOUBLE

var_samp(col)

1 返回值：DOUBLE

stddev_pop(col)

1 返回值：DOUBLE

stddev_samp(col)

1 返回值：DOUBLE

1 返回值：DOUBLE

1 返回值：DOUBLE

1 返回值：DOUBLE

1 返回值：DOUBLE

UDTF

### 多行转换：lateral view

 1 说明：lateral view用于和json_tuple，parse_url_tuple，split, explode等UDTF一起使用，它能够将一行数据拆成多行数据，在此基础上可以对拆分后的数据进行聚合。
2 举例：
3
4     hive> select s.x,sp from test.dual s lateral view explode(split(concat_ws(',','1','2','3','4','5','6','7','8','9'),',')) t as sp;
5     x sp
6     a 1
7     b 2
8     a 3
9 解释一下，from后面是你的表名，在表名后面加lateral view explode。。。（你的行转列sql） ，还必须要起一个别名，我这个字段的别名为sp。然后再看看select后面的 s.*，就是原表的字段，我这里面只有一个字段，且为X
10
11 多个lateral view的sql类如：
12
13     SELECT * FROM exampleTable LATERAL VIEW explode(col1) myTable1 AS myCol1 LATERAL VIEW explode(myCol1) myTable2 AS myCol2; 

### 抽取一行数据转换到新表的多列样例：

http_referer是获取的带参数请求路径，其中非法字符用\做了转义，根据路径解析出地址，查询条件等存入新表中，

 1 drop table if exists t_ods_tmp_referurl;
2 create table t_ ods _tmp_referurl as
3 SELECT a.*,b.*
4 FROM ods_origin_weblog a LATERAL VIEW parse_url_tuple(regexp_replace(http_referer, "\"", ""), 'HOST', 'PATH','QUERY', 'QUERY:id') b as host, path, query, query_id;
5
6 复制表，并将时间截取到日：
7 drop table if exists t_ods_tmp_detail;
8 create table t_ods_tmp_detail as
9 select b.*,substring(time_local,0,10) as daystr,
10 substring(time_local,11) as tmstr,
11 substring(time_local,5,2) as month,
12 substring(time_local,8,2) as day,
13 substring(time_local,11,2) as hour
14 From t_ ods _tmp_referurl b;  

表生成函数

explode(array<TYPE> a)

1 返回值：Array Type

1 返回值：N rows

explode(MAP)

1 返回值：N rows

explode类似，不同的是还返回各元素在数组中的位置

posexplode(ARRAY)

1 返回值：N rows

stack(INT n, v_1, v_2, ..., v_k)

1 返回值：N rows

json_tuple(jsonStr, k1, k2, ...)

1 返回值：tuple

1 返回值；tuple

05-17 1742
06-26 612

03-20 6771
08-25 3万+
09-13 1万+
11-22 477
06-22 294