[Hive] Buildin Functions in Hive
2021-09-29
Introduction
时间函数、字符串函数、数学计算函数。
Content
文章目录
Usage
Functions in Hive are categorized as below.
-
Numeric and Mathematical Functions: These functions mainly used to perform mathematical calculations.
-
Date Functions: These functions are used to perform operations on date data types like adding the number of days to the date etc.
-
String Functions: These functions are used to perform operations on strings like finding the length of a string etc.
-
Conditional Functions: These functions are used to test conditions and returns a value based on whether the test condition is true or false.
-
Collection Functions: These functions are used to find the size of the complex types like array and map. The only collection function is
SIZE
. TheSIZE
function is used to find the number of elements in an array and map. The syntax of SIZE function isSIZE(Array<A>) SIZE(MAP<key,value>)
select size(array('1','2','3')) return 3
-
Type Conversion Function: This function is used to convert from one data type to another. The only type conversion function is
CAST
. The syntax of CAST isCAST(expr as <type>)
The
CAST
function converts the expr into the specified type. -
Table Generating Functions: These functions transform a single row into multiple rows.
EXPLODE
is the only table generated function. This function takes array as an input and outputs the elements of array into separate rows. The syntax of EXPLODE isEXPLODE(ARRAY<A>)
select EXPLODE(array('a', 'b', 'c'))
When you use the table generating functions in the SELECT clause, you cannot specify any other columns in the SELECT clause.
Date Functions
Date data types do not exist in Hive. In fact the dates are treated as strings in Hive. The date functions are listed below.
UNIX_TIMESTAMP()
UNIX_TIMESTAMP()
This function returns the number of seconds from the Unix epoch (1970-01-01 00:00:00 UTC) using the default time zone.
UNIX_TIMESTAMP()
UNIX_TIMESTAMP(string date)
This function converts the date in format ‘yyyy-MM-dd HH:mm:ss’ into Unix timestamp. This will return the number of seconds between the specified date and the Unix epoch. If it fails, then it returns 0.
Example: UNIX_TIMESTAMP('2000-01-01 00:00:00') returns 946713600
UNIX_TIMESTAMP()
UNIX_TIMESTAMP(string date, string pattern)
This function converts the date to the specified date format and returns the number of seconds between the specified date and Unix epoch. If it fails, then it returns 0.
Example: UNIX_TIMESTAMP('2000-01-01 10:20:30','yyyy-MM-dd')
returns 946713600
FROM_UNIXTIME()
FROM_UNIXTIME(bigint number_of_seconds [, string format])
The FROM_UNIX
function converts the specified number of seconds from Unix epoch and returns the date in the format ‘yyyy-MM-dd HH:mm:ss’.
Example: FROM_UNIXTIME(UNIX_TIMESTAMP())
returns the current date including the time. This is equivalent to the SYSDATE in oracle.
SELECT CURRENT_DATE; --/Selecting Current Date/ return 2021-09-30
SELECT CURRENT_TIMESTAMP; --/Selecting Current Time stamp/ return 2021-09-30 16:11:23.138
TO_DATE()
TO_DATE(string timestamp)
The TO_DATE
function returns the date part of the timestamp in the format ‘yyyy-MM-dd’.
Example: TO_DATE('2000-01-01 10:20:30') returns '2000-01-01'
combination clause
select date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(), 'yyyy-MM-dd'), 1), CURRENT_DATE(), CURRENT_timestamp(), TO_DATE(CURRENT_timestamp())
return
YEAR()
YEAR(string date)
The YEAR
function returns the year part of the date.
Example: YEAR('2000-01-01 10:20:30') returns 2000
MONTH()
MONTH(string date)
The MONTH
function returns the month part of the date.
Example: YEAR('2000-03-01 10:20:30') returns 3
DAY()
DAY( string date ), DAYOFMONTH( date )
The DAY
or DAYOFMONTH
function returns the day part of the date.
Example: DAY('2000-03-01 10:20:30') returns 1
HOUR()
HOUR(string date)
The HOUR
function returns the hour part of the date.
Example: HOUR('2000-03-01 10:20:30') returns 10
MINUTE()
MINUTE(string date)
The MINUTE
function returns the minute part of the timestamp.
Example: MINUTE('2000-03-01 10:20:30') returns 20
SECOND()
SECOND(string date)
The SECOND
function returns the second part of the timestamp.
Example: SECOND('2000-03-01 10:20:30') returns 30
WEEKOFYEAR()
WEEKOFYEAR(string date)
The WEEKOFYEAR
function returns the week number of the date.
Example: WEEKOFYEAR('2020-08-04 22:33:30') returns 32
DATEDIFF()
DATEDIFF(string date1, string date2)
The DATEDIFF
function returns the number of days between the two given dates.
Example: DATEDIFF('2000-03-01', '2000-01-10') returns 51
DATE_ADD()
DATE_ADD(string date, int days)
The DATE_ADD
function adds the number of days to the specified date
Example: DATE_ADD('2000-03-01', 5) returns '2000-03-06'
DATE_SUB()
DATE_SUB(string date, int days)
The DATE_SUB function subtracts the number of days to the specified date
Example: DATE_SUB('2000-03-01', 5) returns ‘2000-02-25’
String Functions
ASCII()
ASCII(string str)
The ASCII
function converts the first character of the string into its numeric ascii value.
Example1: ASCII('abc') returns 97
Example2: ASCII('A') returns 65
CONCAT()
CONCAT(string str1, string str2...)
The CONCAT
function concatenates all the stings.
Example: CONCAT('hadoop','-','hive') returns 'hadoop-hive'
CONCAT_WS()
CONCAT_WS(string delimiter, string str1, string str2...)
The CONCAT_WS
function is similar to the CONCAT
function. Here you can also provide the delimiter, which can be used in between the strings to concat.
Example: CONCAT_WS('-','hadoop','hive','i','love','u') returns 'hadoop-hive-i-love-u'
批注:注意这个用法。先将appid放入list集合中,然后再按照指定分隔符转换为字符串。
Example: concat_ws('|' ,collect_list(cast(type_id as string))) as second_types,
FIND_IN_SET()
FIND_IN_SET(string search_string, string source_string_list )
The FIND_IN_SET
function searches for the search string in the source_string_list
and returns the position of the first occurrence in the source string list. Here the source string list should be comma delimited one. It returns 0 if the first argument contains comma.
Example: FIND_IN_SET('ha','hao,mn,hc,ha,hef') returns 4
LENGTH()
LENGTH(string str)
The LENGTH
function returns the number of characters in a string.
Example: LENGTH('hive') returns 4
LOWER(), LCASE()
LOWER(string str)
LCASE(string str)
The LOWER
or LCASE
function converts the string into lower case letters.
Example: LOWER('HiVe') returns 'hive'
UPPER(), UCASE()
UPPER(string str)
UCASE(string str)
The UPPER
or UCASE
function converts the string into upper case letters.
Example: UPPER('HiVe') returns 'HIVE'
LPAD()
LPAD(string str, int len, string pad)
The LPAD function returns the string with a length of len characters left-padded with pad.
Example: LPAD('hive',6,'v') returns 'vvhive'
RPAD()
RPAD(string str, int len, string pad)
The RPAD
function returns the string with a length of len characters right-padded with pad.
Example: RPAD('hive',6,'v') returns 'hivevv'
LTRIM()
LTRIM(string str )
The LTRIM
function removes all the leading spaces from the string.
Example: LTRIM(' hive') returns 'hive'
RTRIM()
RTRIM(string str)
The RTRIM
function removes all the tailing spaces from the string.
Example: LTRIM('hive ') returns 'hive'
REPEAT()
REPEAT(string str, int n )
The REPEAT
function repeats the specified string n times.
Example: REPEAT('hive', 2) returns 'hivehive'
REVERSE()
REVERSE(string str)
The REVERSE
function gives the reversed string
Example: REVERSE('hive') returns 'evih'
SPACE()
SPACE(int number_of_spaces)
The SPACE
function returns the specified number of spaces.
SPACE(4) returns ' '
SPLIT()
SPLIT(string str, string pat)
The SPLIT
function splits the string around the pattern pat and returns an array of strings. You can specify regular expressions as patterns.
Example: SPLIT('hive:hadoop', ':') returns ["hive","hadoop"]
SUBSTR()
SUBSTR(string source_str, int start_position [,int length] )
SUBSTRING(string source_str, int start_position [,int length])
The SUBSTR
or SUBSTRING
function returns a part of the source string from the start position with the specified length of characters. If the length is not given, then it returns from the start position to the end of the string.
Example1: SUBSTR('hadoop',4) returns 'oop'
Example2: SUBSTR('hadoop',4,2) returns 'oo'
TRIM()
TRIM( string str )
The TRIM
function removes both the tailing and leading spaces from the string.
Example: TRIM(' hive ') returns 'hive'
Numeric and Mathematical Functions
ABS()
ABS(double n)
The ABS
function returns the absolute value of a number.
Example: ABS(-100)
ACOS()
ACOS(double n )
The ACOS function returns the arc cosine of value n. This function returns Null
if the value n is not in the range of -1<=n<=1
.
Example: ACOS(0.5)
ASIN()
ASIN(double n)
The ASIN
function returns the arc sin of value n. This function returns Null
if the value n is not in the range of -1<=n<=1
.
Example: ASIN(0.5)
BIN()
BIN(bigint n)
The BIN
function returns the number n in the binary format.
Example: BIN(100) return 1100100
CEIL() CEILING()
CEIL(double n),
CEILING(double n)
The CEILING
or CEILING
function returns the smallest integer greater than or equal to the decimal value n.
Example: CEIL(9.5) return 10
FLOOR()
FLOOR(double n)
The FLOOR
function returns the largest integer less than or equal to the given value n.
Example: FLOOR(10.9) return 10
CONV()
CONV(bigint n, int from_base, int to_base)
The CONV function converts the given number n from one base to another base. (进制转换)
EXAMPLE: CONV(4, 10, 2) return 100
COS()
COS(double n)
The COS
function returns the cosine of the value n. Here n should be specified in radians.
Example: COS(180*3.1415926/180)
return -0.9999
HEX()
HEX(bigint n)
This HEX
function converts the value n into hexadecimal format.
Example: HEX(16)
return 10
HEX()
HEX(string n)
This function converts each character into hex representation format.
Example: HEX(‘ABC’)
UNHEX()
UNHEX(string n)
The UNHEX
function is the inverse of HEX function. It converts the specified string to the number format.
Example: UNHEX(‘AB’)
LN()
LN(double n)
The LN
function returns the natural log of a number.
Example: LN(123.45)
LOG()
LOG(double base, double n)
The LOG
function returns the base logarithm of the number n.
Example: LOG(2, 128)
return 7.0
LOG2()
LOG2(double n)
The LOG2
function returns the base-2 logarithm of the number n.
Example: LOG2(44)
LOG10()
LOG10(double n)
The LOG10
function returns the base-10 logarithm of the number n.
Example: LOG10(100)
NEGATIVE()
NEGATIVE(int n),
NEGATIVE(double n)
The NEGATIVE
function returns –n
Example: NEGATIVE(10)
POSITIVE()
POSITIVE(int n),
POSITIVE(double n)
The POSITIVE
function returns n
Example: POSITIVE(-10)
PMOD()
PMOD(int m, int n),
PMOD(double m, double n)
The PMOD
function returns the positive modulus of a number.
Example: PMOD(3,2) return 1
EXP()
EXP(double n)
The EXP
function returns e to the power of n. Where e
is the base of natural logarithm and its value is 2.718.
Example: EXP(50)
POW()
POW(double m, double n),
POWER(double m, double n)
The POW
or POWER
function returns m value raised to the n power.
Example: POW(10,2) return 100
SQRT()
SQRT(double n)
The SQRT
function returns the square root of the number
Example: SQRT(4)
RAND()
RAND([int seed])
The RAND
function returns a random number. If you specify the seed
value, the generated random number will become deterministic.
Example: RAND()
ROUND()
ROUND(double value [, int n])
The ROUND function returns the value rounded to n integer places.
Example: ROUND(123.456,2) return 123.46
SIN()
SIN(double n)
The SIN
function returns the sin of a number. Here n should be specified in radians.
Example: SIN(2)