大数据开发之Hive篇6-Hive函数详解

备注:
Hive 版本 2.1.1

一.Hive函数概述

1.1 Hive函数分类

  1. 函数/UDF
    输入一行记录,输出一行记录
    示例:upper/lower/length

  2. 聚集函数/UDAF
    输入多行记录,输出一行记录
    示例:sum/count/avg

  3. 表生成函数/UDTF
    输入一行记录,输出多行记录
    示例:explode

1.2 查看Hive函数

1.2.1 show functions命令

show functions命令是查询当前hive支持哪些函数,可以看到hive总共支持248个函数,总体来说函数与MySQL的比较类似,但是也包含 to_date、decode等Oracle独有的函数(函数用法与Oracle存在差别)。真的是给从关系型数据库转到大数据库开发的人提供了便利。

hive> SHOW FUNCTIONS;
OK
!
!=
$sum0
%
&
*
+
-
/
<
<=
<=>
<>
=
==
>
>=
^
abs
acos
add_months
and
array
array_contains
ascii
asin
assert_true
atan
avg
base64
between
bin
bround
case
cbrt
ceil
ceiling
chr
coalesce
collect_list
collect_set
compute_stats
concat
concat_ws
context_ngrams
conv
corr
cos
count
covar_pop
covar_samp
crc32
create_union
cume_dist
current_database
current_date
current_timestamp
current_user
date_add
date_format
date_sub
datediff
day
dayofmonth
dayofweek
decode
degrees
dense_rank
div
e
elt
encode
ewah_bitmap
ewah_bitmap_and
ewah_bitmap_empty
ewah_bitmap_or
exp
explode
factorial
field
find_in_set
first_value
floor
floor_day
floor_hour
floor_minute
floor_month
floor_quarter
floor_second
floor_week
floor_year
format_number
from_unixtime
from_utc_timestamp
get_json_object
get_splits
greatest
grouping
hash
hex
histogram_numeric
hour
if
in
in_file
index
initcap
inline
instr
internal_interval
isnotnull
isnull
java_method
json_tuple
lag
last_day
last_value
lcase
lead
least
length
levenshtein
like
ln
locate
log
log10
log2
logged_in_user
lower
lpad
ltrim
map
map_keys
map_values
mask
mask_first_n
mask_hash
mask_last_n
mask_show_first_n
mask_show_last_n
matchpath
max
md5
min
minute
month
months_between
named_struct
negative
next_day
ngrams
noop
noopstreaming
noopwithmap
noopwithmapstreaming
not
ntile
nvl
or
parse_url
parse_url_tuple
percent_rank
percentile
percentile_approx
pi
pmod
posexplode
positive
pow
power
printf
quarter
radians
rand
rank
reflect
reflect2
regexp
regexp_extract
regexp_replace
repeat
replace
reverse
rlike
round
row_number
rpad
rtrim
second
sentences
sha
sha1
sha2
shiftleft
shiftright
shiftrightunsigned
sign
sin
size
sort_array
soundex
space
split
sqrt
stack
std
stddev
stddev_pop
stddev_samp
str_to_map
struct
substr
substring
substring_index
sum
tan
to_date
to_unix_timestamp
to_utc_timestamp
translate
trim
trunc
ucase
unbase64
unhex
unix_timestamp
upper
uuid
var_pop
var_samp
variance
version
weekofyear
when
windowingtablefunction
xpath
xpath_boolean
xpath_double
xpath_float
xpath_int
xpath_long
xpath_number
xpath_short
xpath_string
year
|
~
Time taken: 1.246 seconds, Fetched: 248 row(s)
hive> 

1.2.2 DESCRIBE 命令

语法:

DESCRIBE FUNCTION <function_name>;
DESCRIBE FUNCTION EXTENDED <function_name>;

测试记录:

hive> 
    > 
    > desc function decode;
OK
decode(bin, str) - Decode the first argument using the second argument character set
Time taken: 0.023 seconds, Fetched: 1 row(s)
hive> desc function extended decode;
OK
decode(bin, str) - Decode the first argument using the second argument character set
Possible options for the character set are 'US-ASCII', 'ISO-8859-1',
'UTF-8', 'UTF-16BE', 'UTF-16LE', and 'UTF-16'. If either argument
is null, the result will also be null
Time taken: 0.027 seconds, Fetched: 4 row(s)
hive> 

二.Hive的数值函数

下列是Hive常见的数值函数:

函数名描述返回值类型
round(DOUBLE a)
round(DOUBLE a, INT d)
四舍五入函数DOUBLE
floor(DOUBLE a)返回小于等于a的整数BIGINT
ceil(DOUBLE a), ceiling(DOUBLE a)返回大于等于a的整数BIGINT
rand(), rand(INT seed)返回0 到1 之间的随机数DOUBLE
pow(DOUBLE a, DOUBLE p), power(DOUBLE a, DOUBLE p)返回a的p次方DOUBLE
sqrt(DOUBLE a), sqrt(DECIMAL a)开方函数DOUBLE
bin(BIGINT a)返回二进制数STRING
abs(DOUBLE a)返回a的绝对值DOUBLE
greatest(T v1, T v2, …)返回最大值T
least(T v1, T v2, …)返回最小值T

2.1 round函数

round函数,用于四舍五入。

测试记录:

hive> 
    > select round(5.1);
OK
5.0
Time taken: 0.09 seconds, Fetched: 1 row(s)
hive> select round(5.5);
OK
6.0
Time taken: 0.07 seconds, Fetched: 1 row(s)
hive> 
    > select round(5.5,1);
OK
5.5
Time taken: 0.086 seconds, Fetched: 1 row(s)

2.2 floor函数

floor向下取整

测试记录:

hive> 
    > 
    > select floor(5.4);
OK
5
Time taken: 0.092 seconds, Fetched: 1 row(s)
hive> select floor(5.5);
OK
5
Time taken: 0.074 seconds, Fetched: 1 row(s)
hive> select floor(5.9);
OK
5
Time taken: 0.08 seconds, Fetched: 1 row(s)
hive> 

2.3 ceil函数

ceil 向上取整

测试记录:

hive> 
    > select ceil(5.4);
OK
6
Time taken: 0.094 seconds, Fetched: 1 row(s)
hive> select ceil(5.1);
OK
6
Time taken: 0.063 seconds, Fetched: 1 row(s)
hive> select ceil(5.9);
OK
6
Time taken: 0.08 seconds, Fetched: 1 row(s)

2.4 rand()

rand 随机数函数

测试记录:

-- 取值1-100之间的随机数
hive> 
    > select round(100*rand());
Query ID = root_20201208110850_c85b86e9-012e-449d-a1f5-0b158741315b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0141, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0141/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0141
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-08 11:08:57,146 Stage-1 map = 0%,  reduce = 0%
2020-12-08 11:09:03,404 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.04 sec
MapReduce Total cumulative CPU time: 3 seconds 40 msec
Ended Job = job_1606698967173_0141
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 3.04 sec   HDFS Read: 5300 HDFS Write: 104 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 40 msec
OK
62.0
Time taken: 14.362 seconds, Fetched: 1 row(s)

2.5 power函数

power平方函数

测试记录:

-- 求2的3次方
hive> 
    > select power(2,3);
OK
8.0
Time taken: 0.087 seconds, Fetched: 1 row(s)

2.6 sqrt函数

sqrt开方函数

测试记录:

hive> 
    > select sqrt(9);
OK
3.0
Time taken: 0.074 seconds, Fetched: 1 row(s)

2.7 bin

bin函数,返回二进制数

测试记录:

hive> 
    > 
    > select bin(10);
OK
1010
Time taken: 0.085 seconds, Fetched: 1 row(s)
hive> select bin(9);
OK
1001
Time taken: 0.065 seconds, Fetched: 1 row(s)
hive> 

2.8 abs函数

abc函数求绝对值

测试记录:

hive> 
    > select abs(10);
OK
10
Time taken: 0.09 seconds, Fetched: 1 row(s)
hive> select abs(-10);
OK
10
Time taken: 0.084 seconds, Fetched: 1 row(s)

2.9 greatest函数

greatest函数,求最大值函数,如果有参数为null,就会返回null

测试记录:

hive> 
    > select greatest(1,2,3);
OK
3
Time taken: 0.094 seconds, Fetched: 1 row(s)
hive> select greatest(1,2,3,1.5);
OK
3.0
Time taken: 0.075 seconds, Fetched: 1 row(s)
hive> 
    > select greatest(1,2,3,null);
OK
NULL
Time taken: 0.084 seconds, Fetched: 1 row(s)

2.10 least函数

least函数求最小值,如参数中有null值,则返回null

测试记录:

hive> 
    > 
    > select least(1,2,3);
OK
1
Time taken: 0.094 seconds, Fetched: 1 row(s)
hive> select least(1,2,3,-1);
OK
-1
Time taken: 0.079 seconds, Fetched: 1 row(s)
hive> select least(1,2,3,null);
OK
NULL
Time taken: 0.059 seconds, Fetched: 1 row(s)
hive> 

三.Hive的日期时间函数

下列是Hive常见的日期时间函数:

函数名描述返回值类型
from_unixtime(bigint unixtime[, string format])与UNIX_TIMESTAMP()结合使用,数值转换成日期时间类型string
unix_timestamp()
unix_timestamp(string date)
unix_timestamp(string date, string pattern)
将时间转成数字,与FROM_UNIXTIME相互对应bigint
to_date(string timestamp)返回时间戳类型的日期时间部分date
year(string date)返回年份int
quarter(date/timestamp/string)返回季度int
month(string date)返回月份int
day(string date) dayofmonth(date)返回日int
hour(string date)返回小时int
minute(string date)返回分钟int
second(string date)返回秒int
weekofyear(string date)返回当年的星期数int
extract(field FROM source)提取日期的部分int
datediff(string enddate, string startdate)返回两个日期间隔的天数int
date_add(date/timestamp/string startdate, tinyint/smallint/int days)增加日期天数date
date_sub(date/timestamp/string startdate, tinyint/smallint/int days)减去日期天数date
from_utc_timestamp({any primitive type} ts, string timezone)将UTC中的时间戳*转换为给定时区timestamp
to_utc_timestamp({any primitive type} ts, string timezone)将给定时区中的时间戳*转换为UTCtimestamp
current_date返回当前日期date
current_timestamp返回当前时间timestamp
add_months(string start_date, int num_months, output_date_format)增加月份string
last_day(string date)求本月最后一天string
next_day(string start_date, string day_of_week)返回比start_date晚并命名为day_of_week的第一个日期string
trunc(string date, string format)返回格式指定的单元截断的日期string
months_between(date1, date2)两个日期之间间隔的月份double
date_format(date/timestamp/string ts, string fmt)格式日期指定string

3.1 from_unixtime和UNIX_TIMESTAMP

from_unixtime 将数字转成时间,与UNIX_TIMESTAMP相互对应
UNIX_TIMESTAMP将时间转成数字,与FROM_UNIXTIME相互对应
测试记录:

hive> 
    > select UNIX_TIMESTAMP('2020-12-09 23:50:59');
OK
1607529059
Time taken: 0.101 seconds, Fetched: 1 row(s)
hive> 
    > select from_unixtime(1607529059);
OK
2020-12-09 23:50:59
Time taken: 0.084 seconds, Fetched: 1 row(s)

3.2 to_date函数

to_date(string timestamp) 函数 (string timestamp)

hive> 
    > 
    > select to_date('2020-12-09 23:59:59 123456789');
OK
2020-12-09
Time taken: 0.109 seconds, Fetched: 1 row(s)
hive> select to_date('2020-12-09 23:59:59');
OK
2020-12-09
Time taken: 0.086 seconds, Fetched: 1 row(s)

3.3 year/quarter/month/day/hour/minute/second/weekofyear

  1. year(string date) 返回年份
  2. quarter(date/timestamp/string) 返回季度
  3. month(string date) 返回月份
  4. day(string date) dayofmonth(date) 返回日
  5. hour(string date) 返回小时
  6. minute(string date) 返回分钟
  7. second(string date) 返回秒

测试记录:

hive> 
    > select year(dt) dt_year,
    >        quarter(dt) dt_quarter,
    >        month(dt) dt_month,
    >        day(dt) dt_day,
    >        hour(dt) dt_hour,
    >        minute(dt) dt_minute,
    >        second(dt) dt_second
    > from
    > (
    > select '2020-12-09 23:50:59' as dt
    > ) tmp;
OK
dt_year dt_quarter      dt_month        dt_day  dt_hour dt_minute       dt_second
2020    4       12      9       23      50      59
Time taken: 0.098 seconds, Fetched: 1 row(s)
hive> 

3.4 weekofyear 函数

weekofyear(string date) 返回当年的星期数

测试记录:
'2020-12-09’是本年的第50周

hive> 
    > select weekofyear('2020-12-09');
OK
_c0
50
Time taken: 0.073 seconds, Fetched: 1 row(s)

3.5 extract 函数

extract(field FROM source) 提取日期的部分
代码:

-- 提取年
SELECT EXTRACT(YEAR FROM '2020-12-09');
-- 提取月
SELECT EXTRACT(MONTH FROM '2020-12-09');
-- 提取日
SELECT EXTRACT(DAY FROM '2020-12-09');
-- 提取小时
SELECT EXTRACT(HOUR FROM '2020-12-09 13:02:01');
-- 提取分钟
SELECT EXTRACT(MINUTE FROM '2020-12-09 13:02:01');
-- 提取秒
SELECT EXTRACT(SECOND FROM '2020-12-09 13:02:01');

测试记录:

hive> 
    > SELECT EXTRACT(YEAR FROM '2020-12-09');
OK
_c0
2020
Time taken: 1.85 seconds, Fetched: 1 row(s)
hive> SELECT EXTRACT(MONTH FROM '2020-12-09');
OK
_c0
12
Time taken: 0.227 seconds, Fetched: 1 row(s)
hive> SELECT EXTRACT(DAY FROM '2020-12-09');
OK
_c0
9
Time taken: 0.089 seconds, Fetched: 1 row(s)
hive> SELECT EXTRACT(HOUR FROM '2020-12-09 13:02:01');
OK
_c0
13
Time taken: 0.097 seconds, Fetched: 1 row(s)
hive> SELECT EXTRACT(MINUTE FROM '2020-12-09 13:02:01');
OK
_c0
2
Time taken: 0.081 seconds, Fetched: 1 row(s)
hive> SELECT EXTRACT(SECOND FROM '2020-12-09 13:02:01');
OK
_c0
1

3.6 datediff 函数

datediff(string enddate, string startdate) 返回两个日期间隔的天数

测试记录:

hive> 
    > 
    > select datediff('2020-12-09','2020-11-09');
OK
_c0
30
Time taken: 0.101 seconds, Fetched: 1 row(s)
hive> select datediff('2020-12-09 23:50:59','2020-11-09');
OK
_c0
30
Time taken: 0.113 seconds, Fetched: 1 row(s)

3.6 date_add/date_sub 函数

date_add(date/timestamp/string startdate, tinyint/smallint/int days) 增加日期间隔,默认为天
date_sub(date/timestamp/string startdate, tinyint/smallint/int days) 减去日期间隔,默认为天

测试记录:

hive> 
    > 
    > SELECT DATE_ADD('2020-01-01', 31);
OK
_c0
2020-02-01
Time taken: 0.098 seconds, Fetched: 1 row(s)
hive> SELECT DATE_ADD('2020-01-01', -31);
OK
_c0
2019-12-01
Time taken: 0.075 seconds, Fetched: 1 row(s)
hive> 
    > SELECT DATE_SUB('2020-01-01', 31);
OK
_c0
2019-12-01
Time taken: 0.075 seconds, Fetched: 1 row(s)
hive> SELECT DATE_SUB('2020-01-01', -31);
OK
_c0
2020-02-01
Time taken: 0.071 seconds, Fetched: 1 row(s)

3.7 from_utc_timestamp/to_utc_timestamp 函数

from_utc_timestamp({any primitive type} ts, string timezone) 将UTC中的时间戳转换为给定时区
to_utc_timestamp({any primitive type} ts, string timezone) 将给定时区中的时间戳
转换为UTC

测试记录:
CST是美国时间简写
UTC的

hive> 
    > select from_utc_timestamp('2020-12-09 23:59:59','CST');
OK
_c0
2020-12-09 17:59:59
Time taken: 0.078 seconds, Fetched: 1 row(s)
hive> 
    > select to_utc_timestamp('2020-12-09 23:59:59','CST');
OK
_c0
2020-12-10 05:59:59
Time taken: 0.085 seconds, Fetched: 1 row(s)

3.8 current_date/current_timestamp

current_date 返回当前日期
current_timestamp 返回当前时间

测试记录:

hive> 
    > select current_date() cur_date,current_timestamp() cur_timestamp;
OK
cur_date        cur_timestamp
2020-12-09      2020-12-09 14:58:44.535
Time taken: 0.079 seconds, Fetched: 1 row(s)

3.9 add_months 函数

add_months(string start_date, int num_months, output_date_format) 增加月份

测试记录:

hive> 
    > 
    > select add_months('2020-12-31 23:59:59',2,'YYYY-MM-dd HH:mm:ss');
OK
_c0
2021-02-28 23:59:59
Time taken: 0.097 seconds, Fetched: 1 row(s)

3.10 last_day 函数

last_day(string date) 求本月最后一天

测试记录:

hive> 
    > select last_day('2020-12-09');
OK
_c0
2020-12-31
Time taken: 0.104 seconds, Fetched: 1 row(s)
hive> select last_day('2020-12-09 23:59:59');
OK
_c0
2020-12-31
Time taken: 0.077 seconds, Fetched: 1 row(s)

3.10 next_day函数

next_day(string start_date, string day_of_week) 返回比start_date晚并命名为day_of_week的第一个日期

测试记录:
求下一个礼拜二

hive> 
    > 
    > select next_day('2020-12-09','TU');
OK
_c0
2020-12-15
Time taken: 0.086 seconds, Fetched: 1 row(s)

3.11 trunc 函数

trunc(string date, string format) 返回格式指定的单元截断的日期
支持格式:MONTH/MON/MM, YEAR/YYYY/YY

测试记录:

hive> 
    > select trunc('2020-12-09 23:59:59','YYYY');
OK
_c0
2020-01-01
hive> 
    > select trunc('2020-12-09 23:59:59','MM');
OK
_c0
2020-12-01
Time taken: 0.063 seconds, Fetched: 1 row(s)

3.12 months_between 函数

months_between(date1, date2) 两个日期之间间隔的月份

date1和date2之间的月份间隔,会有小数的存在,另外,如果date1小于date2,那么就会出现负数

测试记录:

hive> 
    > 
    > select months_between('2020-12-09','2020-01-01');
OK
_c0
11.25806452
Time taken: 0.09 seconds, Fetched: 1 row(s)
hive> 
    > select months_between('2020-12-09','2021-01-01');
OK
_c0
-0.74193548
Time taken: 0.083 seconds, Fetched: 1 row(s)
hive> 

3.13 date_format 函数

date_format(date/timestamp/string ts, string fmt) 格式日期指定

测试记录:

hive> SELECT date_format('2020-12-09 23:59:59', 'yyyy.MM.dd HH:mm:ss');
OK
_c0
2020.12.09 23:59:59
Time taken: 0.071 seconds, Fetched: 1 row(s)

四.条件函数

下列是Hive常见的条件函数:

函数名描述返回值类型
nvl(a,b)当a为null的时候返回b,否则返回aT
isnull( a )当a为null的时候返回true,否则返回falseboolean
isnotnull ( a )当a为null的时候返回false,否则返回trueboolean
if(boolean testCondition, T valueTrue, T valueFalseOrNull)当testCondition为true 返回 valueTrue,否则返回 valueFalseOrNull)T
COALESCE(T v1, T v2, …)返回第一个不为null,当所有的都是null,就返回nullT
CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] ENDcase条件判断语句T
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] ENDcase值判断语句T

4.1 nvl函数

nvl(a,b) 当a为null的时候返回b,否则返回a

测试记录:

hive> 
    > select nvl(null,123);
OK
_c0
123
hive> 
    > select nvl('345',123);
OK
_c0
345
Time taken: 0.067 seconds, Fetched: 1 row(s)

4.2 isnull/isnotnull 函数

isnull( a ) 当a为null的时候返回true,否则返回false
isnotnull ( a ) 当a为null的时候返回false,否则返回true

测试记录:

hive> 
    > select isnull(1);
OK
_c0
false
Time taken: 0.083 seconds, Fetched: 1 row(s)
hive> select isnull(null);
OK
_c0
true
Time taken: 0.072 seconds, Fetched: 1 row(s)
hive> select isnull('');
OK
_c0
false
Time taken: 0.076 seconds, Fetched: 1 row(s)
hive> 
    > select isnull(' ');
OK
_c0
false
Time taken: 0.08 seconds, Fetched: 1 row(s)
hive> 
    > 
    > select isnotnull(1);
OK
_c0
true
Time taken: 0.081 seconds, Fetched: 1 row(s)
hive> 
    > select isnotnull(null);
OK
_c0
false
Time taken: 0.065 seconds, Fetched: 1 row(s)
hive> 
    > select isnotnull('');
OK
_c0
true
Time taken: 0.071 seconds, Fetched: 1 row(s)
hive> 
    > select isnotnull(' ');
OK
_c0
true
Time taken: 0.086 seconds, Fetched: 1 row(s)
hive> 

4.3 if函数

if(boolean testCondition, T valueTrue, T valueFalseOrNull) 当testCondition为true 返回 valueTrue,否则返回 valueFalseOrNull)

测试记录:

hive> 
    > select if(1>2,'abc','def');
OK
_c0
def
Time taken: 0.134 seconds, Fetched: 1 row(s)
hive> 
    > select if(1<2,'abc','def');
OK
_c0
abc
Time taken: 0.1 seconds, Fetched: 1 row(s)

4.4 COALESCE函数

COALESCE(T v1, T v2, …) 返回第一个不为null,当所有的都是null,就返回null

测试记录:

hive> 
    > select coalesce('a','b',null);
OK
_c0
a
Time taken: 0.081 seconds, Fetched: 1 row(s)
hive> 
    > select coalesce('null','b','c');
OK
_c0
null
Time taken: 0.073 seconds, Fetched: 1 row(s)
hive> 
    > select coalesce('null',null,null);
OK
_c0
null
Time taken: 0.073 seconds, Fetched: 1 row(s)

4.5 CASE 函数

CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END case条件判断语句
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END case值判断语句

代码:

-- 值判断,根据职位英文名判断职位中文名
select e.empno,
     e.job,
     case when e.job = 'SALESMAN' then '销售员'
          when e.job = 'MANAGER' then '管理人员'
          else '未知' end  as job_china
from emp e;

-- 条件判断,小于某个时间点入职的是老员工,大于等于的是新员工
select e.empno,
     e.hiredate,
     case e.job when  'SALESMAN' then '销售员'
                when  'MANAGER' then '管理人员'
                 else '未知' 
     end  as job_china
from emp e;

-- 没有else语句且判断不成功的时候,会返回为空
SELECT CASE 'c'
    WHEN 'a' THEN 1 WHEN 'b' THEN 2 END is_c;

测试记录:

hive> 
    > -- 值判断,根据职位英文名判断职位中文名
    > select e.empno,
    >      e.job,
    >      case when e.job = 'SALESMAN' then '销售员'
    >           when e.job = 'MANAGER' then '管理人员'
    >           else '未知' end  as job_china
    > from emp e;
Query ID = root_20201209163913_c6290550-dfa0-4f82-891e-24673ce9c704
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0142, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0142/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0142
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-12-09 16:39:21,669 Stage-1 map = 0%,  reduce = 0%
2020-12-09 16:39:27,895 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.68 sec
MapReduce Total cumulative CPU time: 6 seconds 680 msec
Ended Job = job_1606698967173_0142
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 6.68 sec   HDFS Read: 12384 HDFS Write: 764 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 680 msec
OK
e.empno e.job   job_china
7499    SALESMAN        销售员
7876    CLERK   未知
7934    CLERK   未知
7521    SALESMAN        销售员
7654    SALESMAN        销售员
7698    MANAGER 管理人员
7788    ANALYST 未知
7369    CLERK   未知
7900    CLERK   未知
7902    ANALYST 未知
7566    MANAGER 管理人员
7782    MANAGER 管理人员
7839    PRESIDENT       未知
7844    SALESMAN        销售员
Time taken: 15.553 seconds, Fetched: 14 row(s)
hive> 
    > -- 条件判断,小于某个时间点入职的是老员工,大于等于的是新员工
    > select e.empno,
    >      e.hiredate,
    >      case e.job when  'SALESMAN' then '销售员'
    >                 when  'MANAGER' then '管理人员'
    >                  else '未知' 
    >      end  as job_china
    > from emp e;
Query ID = root_20201209164150_7030d7f3-686c-4bb8-85e7-4f4e1db2dfef
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0143, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0143/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0143
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-12-09 16:41:57,812 Stage-1 map = 0%,  reduce = 0%
2020-12-09 16:42:05,052 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.23 sec
MapReduce Total cumulative CPU time: 6 seconds 230 msec
Ended Job = job_1606698967173_0143
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 6.23 sec   HDFS Read: 12252 HDFS Write: 808 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 230 msec
OK
e.empno e.hiredate      job_china
7499    1981-02-20      销售员
7876    1987-06-13      未知
7934    1982-01-23      未知
7521    1981-02-22      销售员
7654    1981-09-28      销售员
7698    1981-05-01      管理人员
7788    1987-06-13      未知
7369    1980-12-17      未知
7900    1981-12-03      未知
7902    1981-12-03      未知
7566    1981-04-02      管理人员
7782    1981-06-09      管理人员
7839    1981-11-17      未知
7844    1981-09-08      销售员
Time taken: 15.383 seconds, Fetched: 14 row(s)
hive> -- 没有else语句且判断不成功的时候,会返回为空
hive> SELECT CASE 'c'
    >     WHEN 'a' THEN 1 WHEN 'b' THEN 2 END is_c;
OK
is_c
NULL
Time taken: 0.082 seconds, Fetched: 1 row(s)
hive> 

五.字符函数

下列是Hive常见的字符函数:

函数名描述返回值类型
ascii(string str)返回字符的ascii值int
chr(bigint or double A)通过ascii值返回对应字符string
concat(string or binary A, string or binary B…)字符连接函数string
concat_ws(string SEP, string A, string B…)字符连接函数,可以加连接符string
decode(binary bin, string charset)将二进制文件转字符集string
elt(N int,str1 string,str2 string,str3 string,…)返回指定的字符string
field(val T,val1 T,val2 T,val3 T,…)返回字符在参数中出现的位置int
find_in_set(string str, string strList)返回字符在字符串中出现的位置int
instr(string str, string substr)返回substr在str中出现的位置int
length(string A)字符串长度长度函数int
locate(string substr, string str[, int pos])返回在位置pos后的str中substr第一次出现的位置。int
lower(string A) lcase(string A)将字符转为小写string
lpad(string str, int len, string pad)字符填充函数,向左填充string
ltrim(string A)截断字符串左边的空格string
repeat(string str, int n)字符str重复n次string
replace(string A, string OLD, string NEW)字符串替换函数string
reverse(string A)字符反转函数string
rpad(string str, int len, string pad)字符填充函数,从右边填充string
rtrim(string A)截断字符串右边的空格string
space(int n)空格函数,返回n个空格string
substr(string or binary A, int start, int len)
substring(string or binary A, int start, int len)
字符截取函数string
translate(string or char or varchar input, string or char or varchar from, string or char or varchar to)同postgresql的translate函数string
trim(string A)截断字符串左边和右边的空格string
upper(string A) ucase(string A)字符串大写函数string

5.1 ascii值函数

ascii(string str) 返回字符的ascii值
chr(bigint or double A) 通过ascii值返回对应字符

这两个函数刚好是相对应的。

测试记录:

hive> 
    > select ascii('a') ascii_a,ascii('A') ascii_a;
OK
ascii_a ascii_a
97      65
Time taken: 0.104 seconds, Fetched: 1 row(s)
hive> 
    > select chr(97) chr_97,chr(65) chr_65;
OK
chr_97  chr_65
a       A
Time taken: 0.075 seconds, Fetched: 1 row(s)

5.2 字符连接函数

concat(string or binary A, string or binary B…) 字符连接函数
concat_ws(string SEP, string A, string B…) 字符连接函数,可以加连接符

测试记录:

hive> 
    > 
    > select concat('A','b','DEF');
OK
_c0
AbDEF
Time taken: 0.214 seconds, Fetched: 1 row(s)
hive> 
    > select concat_ws('-','A','b','DEF');
OK
_c0
A-b-DEF
Time taken: 0.063 seconds, Fetched: 1 row(s)
hive> 

5.3 大小写函数

lower(string A) lcase(string A) 将字符转为小写
upper(string A) ucase(string A) 字符串大写函数

测试记录:

hive> 
    > select lower('AbCdE'),upper('AbCdE');
OK
_c0     _c1
abcde   ABCDE
Time taken: 0.085 seconds, Fetched: 1 row(s)

5.4 字符填充函数

lpad(string str, int len, string pad) 字符填充函数,向左填充
rpad(string str, int len, string pad) 字符填充函数,从右边填充

测试记录:

hive> 
    > SELECT LPAD('hi',4,'??');
OK
_c0
??hi
Time taken: 0.065 seconds, Fetched: 1 row(s)
hive> SELECT LPAD('hi',1,'??');
OK
_c0
h
Time taken: 0.072 seconds, Fetched: 1 row(s)
hive> SELECT RPAD('hi',4,'??');
OK
_c0
hi??
Time taken: 0.064 seconds, Fetched: 1 row(s)
hive> SELECT RPAD('hi',1,'??');
OK
_c0
h
Time taken: 0.08 seconds, Fetched: 1 row(s)

5.5 空格函数

LTRIM() 左边去掉空格
TRIM() 去掉空格
RTRIM() 右边去掉空格
SPACE() 返回指定长度的空格

如果中间有空格需要去掉,可以使用replace

代码:

-- 去掉左边、两端、右边的空格
select ltrim(' abc def '),trim(' abc def '),rtrim(' abc def ');
-- 返回10个空格
select SPACE(10);

测试记录:

hive> select ltrim(' abc def '),trim(' abc def '),rtrim(' abc def ');
OK
_c0     _c1     _c2
abc def         abc def  abc def
Time taken: 0.077 seconds, Fetched: 1 row(s)
hive> select SPACE(10);
OK
_c0
          
Time taken: 0.062 seconds, Fetched: 1 row(s)

5.6 字符长度函数

length(string A) 字符串长度长度函数

测试记录:

hive> 
    > select length('aaaaaabbbbcccc');
OK
_c0
14
Time taken: 0.081 seconds, Fetched: 1 row(s)

5.7 字符截取函数

SUBSTR() 截取字符
SUBSTRING_INDEX() 截取字符
INSTR() 字符出现位置函数

语法:
substr(string or binary A, int start, int len)
substring(string or binary A, int start, int len)
INSTR(str,substr)
locate(string substr, string str[, int pos])

代码:

-- 字符截取,从第5位开始截,len没有值,表示截取到最后
select substr('abcdefghi',5);

-- 字符截取,从第5位开始截,截取2个字符
select substr('abcdefghi',5,2);

-- 字符截取,从第-5位开始截,截取2个字符
select substr('abcdefghi',-5,2);



-- '.'第一次出现的地方,保留左边的
select substring_index('www.mysql.com','.',1);

-- '.'最后一次出现的地方,保留右边的
select substring_index('www.mysql.com','.',-1);

-- '.'出现的位置
select instr('www.mysql.com','.');

-- '.'在第6个字符之后第一次出现的位置
select locate('.','hive.apache.org',6);

测试记录:

hive> 
    > 
    > select substr('abcdefghi',5);
OK
_c0
efghi
Time taken: 0.091 seconds, Fetched: 1 row(s)
hive> select substr('abcdefghi',5,2);
OK
_c0
ef
Time taken: 0.066 seconds, Fetched: 1 row(s)
hive> select substr('abcdefghi',-5,2);
OK
_c0
ef
Time taken: 0.064 seconds, Fetched: 1 row(s)
hive> select substring_index('www.mysql.com','.',1);
OK
_c0
www
Time taken: 0.062 seconds, Fetched: 1 row(s)
hive> select substring_index('www.mysql.com','.',-1);
OK
_c0
com
Time taken: 0.062 seconds, Fetched: 1 row(s)
hive> select instr('www.mysql.com','.');
OK
_c0
4
Time taken: 0.062 seconds, Fetched: 1 row(s)
hive> select locate('.','hive.apache.org',6);
OK
_c0
12
Time taken: 0.067 seconds, Fetched: 1 row(s)
hive> 

5.8 字符替换函数

replace(string A, string OLD, string NEW) 字符串替换函数
translate(string or char or varchar input, string or char or varchar from, string or char or varchar to) 同postgresql的translate函数

replace 会把string OLD当做一个整体来进行替换
translate 会把string or char or varchar from拆分成一个一个的字符,与string to一个一个对应进行替换

translate比replace功能更强大
translate(‘123abc’,‘2dc’,‘4e’): 将会把2->4,d->e,c-’’
translate(‘123abc’,‘2’,‘4e’): 将会把2->4

测试记录:

hive> 
    > SELECT REPLACE('acdd','cd','ef');
OK
_c0
aefd
Time taken: 0.093 seconds, Fetched: 1 row(s)
hive> SELECT TRANSLATE('acdd','cd','ef');
OK
_c0
aeff
Time taken: 0.072 seconds, Fetched: 1 row(s)
hive> select translate('123abc','2dc','4e');
OK
_c0
143ab
Time taken: 0.073 seconds, Fetched: 1 row(s)
hive> select translate('123abc','2','4e');
OK
_c0
143abc
Time taken: 0.075 seconds, Fetched: 1 row(s)
hive> 

5.9 其它函数

decode(binary bin, string charset) 将二进制文件转字符集
elt(N int,str1 string,str2 string,str3 string,…) 返回指定的字符
field(val T,val1 T,val2 T,val3 T,…) 返回字符在参数中出现的位置
repeat(string str, int n) 字符str重复n次
reverse(string A) 字符反转函数

从Oracle开发转过来的同学看到decode这个函数会觉得特别亲切,不过还是得把Hive的decode语法看一下,这两者完全不是一回事,这个decode是转字符集的。

测试记录:

hive> 
    > select reflect('java.net.URLDecoder', 'decode','1' , "GBK");
Query ID = root_20201209172130_cafe84ad-3e66-4e4a-92a0-3868efd52957
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0145, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0145/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0145
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-09 17:21:39,319 Stage-1 map = 0%,  reduce = 0%
2020-12-09 17:21:45,570 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.88 sec
MapReduce Total cumulative CPU time: 2 seconds 880 msec
Ended Job = job_1606698967173_0145
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.88 sec   HDFS Read: 4726 HDFS Write: 101 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 880 msec
OK
_c0
1
Time taken: 15.797 seconds, Fetched: 1 row(s)
hive> 
    > select elt(3,'abc','aaa','def','test');
OK
_c0
def
Time taken: 0.084 seconds, Fetched: 1 row(s)
hive> 
    > select field('ab','abc','ba','ab','dab');
OK
_c0
3
hive> 
    > 
    > select repeat('abc',10);
OK
_c0
abcabcabcabcabcabcabcabcabcabc
Time taken: 0.085 seconds, Fetched: 1 row(s)
hive> 
    > select reverse('abc');
OK
_c0
cba
Time taken: 0.078 seconds, Fetched: 1 row(s)

六.UDTF

UDTF 是表生成函数

这点与关系型数据库的区别比较大,下面通过几个例子来看看

代码:

-- explode (array)
select explode(array('A','B','C'));
select explode(array('A','B','C')) as col;
select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf;
select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf as col;
-- explode (map)
select explode(map('A',10,'B',20,'C',30));
select explode(map('A',10,'B',20,'C',30)) as (key,value);
select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;
select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf as key,value;

测试记录:

hive> 
    > 
    > 
    > select explode(array('A','B','C'));
Query ID = root_20201210161223_3527486d-76e5-4306-b3aa-8521f89774d9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0150, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0150/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0150
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:12:30,620 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:12:37,839 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.92 sec
MapReduce Total cumulative CPU time: 2 seconds 920 msec
Ended Job = job_1606698967173_0150
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.92 sec   HDFS Read: 5007 HDFS Write: 129 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 920 msec
OK
col
A
B
C
Time taken: 15.401 seconds, Fetched: 3 row(s)
hive> select explode(array('A','B','C')) as col;
Query ID = root_20201210161249_acc85662-e686-4176-a5e8-032ed0992a45
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0151, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0151/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0151
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:12:56,243 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:13:02,434 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.8 sec
MapReduce Total cumulative CPU time: 2 seconds 800 msec
Ended Job = job_1606698967173_0151
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.8 sec   HDFS Read: 5007 HDFS Write: 129 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 800 msec
OK
col
A
B
C
Time taken: 14.35 seconds, Fetched: 3 row(s)
hive> select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf;
Query ID = root_20201210161305_5012fe10-4599-4ffe-838f-b867d212a10a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0152, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0152/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0152
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:13:14,529 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:13:20,716 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.86 sec
MapReduce Total cumulative CPU time: 2 seconds 860 msec
Ended Job = job_1606698967173_0152
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.86 sec   HDFS Read: 5764 HDFS Write: 129 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 860 msec
OK
tf.col
A
B
C
Time taken: 16.663 seconds, Fetched: 3 row(s)
hive> select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf as col;
Query ID = root_20201210161324_8f1bcaa5-931d-49c1-8dbd-5da936bfa67f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0153, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0153/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0153
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:13:32,968 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:13:40,179 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.99 sec
MapReduce Total cumulative CPU time: 2 seconds 990 msec
Ended Job = job_1606698967173_0153
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.99 sec   HDFS Read: 5778 HDFS Write: 129 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 990 msec
OK
tf.col
A
B
C
Time taken: 17.102 seconds, Fetched: 3 row(s)
hive> select explode(map('A',10,'B',20,'C',30));
Query ID = root_20201210161354_1c522046-cac8-4182-9e28-9789a17c1a98
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0154, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0154/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0154
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:14:00,978 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:14:08,199 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.96 sec
MapReduce Total cumulative CPU time: 2 seconds 960 msec
Ended Job = job_1606698967173_0154
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.96 sec   HDFS Read: 5144 HDFS Write: 138 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 960 msec
OK
key     value
A       10
B       20
C       30
Time taken: 15.114 seconds, Fetched: 3 row(s)
hive> select explode(map('A',10,'B',20,'C',30)) as (key,value);
Query ID = root_20201210161411_59b615ac-2733-40c2-bec4-1a17f593140f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0155, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0155/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0155
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:14:19,606 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:14:25,805 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.95 sec
MapReduce Total cumulative CPU time: 2 seconds 950 msec
Ended Job = job_1606698967173_0155
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.95 sec   HDFS Read: 5144 HDFS Write: 138 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 950 msec
OK
key     value
A       10
B       20
C       30
Time taken: 15.767 seconds, Fetched: 3 row(s)
hive> select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;
Query ID = root_20201210161428_5b45482b-39c0-4a6f-9e67-ec640e03d816
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0156, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0156/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0156
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:14:38,792 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:14:46,098 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.92 sec
MapReduce Total cumulative CPU time: 2 seconds 920 msec
Ended Job = job_1606698967173_0156
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.92 sec   HDFS Read: 5987 HDFS Write: 138 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 920 msec
OK
tf.key  tf.value
A       10
B       20
C       30
Time taken: 18.456 seconds, Fetched: 3 row(s)
hive> select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;
Query ID = root_20201210161448_4ebe123f-57b7-4869-9e0e-557cccb27a10
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0157, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0157/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0157
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-10 16:14:57,112 Stage-1 map = 0%,  reduce = 0%
2020-12-10 16:15:03,299 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.74 sec
MapReduce Total cumulative CPU time: 2 seconds 740 msec
Ended Job = job_1606698967173_0157
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.74 sec   HDFS Read: 5987 HDFS Write: 138 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 740 msec
OK
tf.key  tf.value
A       10
B       20
C       30
Time taken: 16.725 seconds, Fetched: 3 row(s)
hive> 

参考

1.https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-MathematicalFunctions

  • 1
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
一、关系运算: 4 1. 等值比较: = 4 2. 不等值比较: 4 3. 小于比较: < 4 4. 小于等于比较: 5 6. 大于等于比较: >= 5 7. 空值判断: IS NULL 5 8. 非空判断: IS NOT NULL 6 9. LIKE比较: LIKE 6 10. JAVA的LIKE操作: RLIKE 6 11. REGEXP操作: REGEXP 7 二、数学运算: 7 1. 加法操作: + 7 2. 减法操作: - 7 3. 乘法操作: * 8 4. 除法操作: / 8 5. 取余操作: % 8 6. 位与操作: & 9 7. 位或操作: | 9 8. 位异或操作: ^ 9 9.位取反操作: ~ 10 三、逻辑运算: 10 1. 逻辑与操作: AND 10 2. 逻辑或操作: OR 10 3. 逻辑非操作: NOT 10 四、数值计算 11 1. 取整函数: round 11 2. 指定精度取整函数: round 11 3. 向下取整函数: floor 11 4. 向上取整函数: ceil 12 5. 向上取整函数: ceiling 12 6. 取随机数函数: rand 12 7. 自然指数函数: exp 13 8. 以10为底对数函数: log10 13 9. 以2为底对数函数: log2 13 10. 对数函数: log 13 11. 幂运算函数: pow 14 12. 幂运算函数: power 14 13. 开平方函数: sqrt 14 14. 二进制函数: bin 14 15. 十六进制函数: hex 15 16. 反转十六进制函数: unhex 15 17. 进制转换函数: conv 15 18. 绝对值函数: abs 16 19. 正取余函数: pmod 16 20. 正弦函数: sin 16 21. 反正弦函数: asin 16 22. 余弦函数: cos 17 23. 反余弦函数: acos 17 24. positive函数: positive 17 25. negative函数: negative 17 五、日期函数 18 1. UNIX时间戳转日期函数: from_unixtime 18 2. 获取当前UNIX时间函数: unix_timestamp 18 3. 日期转UNIX时间函数: unix_timestamp 18 4. 指定格式日期转UNIX时间函数: unix_timestamp 18 5. 日期时间日期函数: to_date 19 6. 日期转年函数: year 19 7. 日期转月函数: month 19 8. 日期转天函数: day 19 9. 日期转小时函数: hour 20 10. 日期转分钟函数: minute 20 11. 日期转秒函数: second 20 12. 日期转周函数: weekofyear 20 13. 日期比较函数: datediff 21 14. 日期增加函数: date_add 21 15. 日期减少函数: date_sub 21 六、条件函数 21 1. If函数: if 21 2. 非空查找函数: COALESCE 22 3. 条件判断函数:CASE 22 4. 条件判断函数:CASE 22 七、字符串函数 23 1. 字符串长度函数:length 23 2. 字符串反转函数:reverse 23 3. 字符串连接函数:concat 23 4. 带分隔符字符串连接函数:concat_ws 23 5. 字符串截取函数:substr,substring 24 6. 字符串截取函数:substr,substring 24 7. 字符串转大写函数:upper,ucase 24 8. 字符串转小写函数:lower,lcase 25 9. 去空格函数:trim 25 10. 左边去空格函数:ltrim 25 11. 右边去空格函数:rtrim 25 12. 正则表达式替换函数:regexp_replace 26 13. 正则表达式解析函数:regexp_extract 26 14. URL解析函数:parse_url 26 15. json解析函数:get_json_object 27 16. 空格字符串函数:space 27 17. 重复字符串函数:repeat 27 18. 首字符ascii函数:ascii 28 19. 左补足函数:lpad 28 20. 右补足函数:rpad 28 21. 分割字符串函数: split 28 22. 集合查找函数: find_in_set 29 八、集合统计函数 29 1. 个数统计函数: count 29 2. 总和统计函数: sum 29 3. 平均值统计函数: avg 30 4. 最小值统计函数: min 30 5. 最大值统计函数: max 30 6. 非空集合总体变量函数: var_pop 30 7. 非空集合样本变量函数: var_samp 31 8. 总体标准偏离函数: stddev_pop 31 9. 样本标准偏离函数: stddev_samp 31 10.中位数函数: percentile 31 11. 中位数函数: percentile 31 12. 近似中位数函数: percentile_approx 32 13. 近似中位数函数: percentile_approx 32 14. 直方图: histogram_numeric 32 九、复合类型构建操作 32 1. Map类型构建: map 32 2. Struct类型构建: struct 33 3. array类型构建: array 33 十、复杂类型访问操作 33 1. array类型访问: A[n] 33 2. map类型访问: M[key] 34 3. struct类型访问: S.x 34 十一、复杂类型长度统计函数 34 1. Map类型长度函数: size(Map) 34 2. array类型长度函数: size(Array) 34 3. 类型转换函数 35
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值