HIVE全部函数详解

最新推荐文章于 2022-10-17 22:27:21 发布

Yang.Young

最新推荐文章于 2022-10-17 22:27:21 发布

阅读量1.3k

点赞数

分类专栏：数据

数据专栏收录该内容

8 篇文章 0 订阅

订阅专栏

转载From ：http://blog.csdn.net/liuzhoulong/article/details/77969224

base on hive 2.3.0

函数名	含义及实例
abs	abs(x) - returns the absolute value of x Example: > SELECT abs(0) FROM src LIMIT 1; 0 > SELECT abs(-5) FROM src LIMIT 1; 5
acos	acos(x) - returns the arc cosine of x if -1<=x<=1 or NULL otherwise Example: > SELECT acos(1) FROM src LIMIT 1; 0 > SELECT acos(2) FROM src LIMIT 1;
add_months	add_months(start_date, num_months) - Returns the date that is num_months after start_date. start_date is a string in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. num_months is a number. The time part of start_date is ignored. Example: > SELECT add_months('2009-08-31', 1) FROM src LIMIT 1; '2009-09-30'
aes_decrypt	aes_decrypt(input binary, key string/binary) - Decrypt input using AES. AES (Advanced Encryption Standard) algorithm. Key lengths of 128, 192 or 256 bits can be used. 192 and 256 bits keys can be used if Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files are installed. If either argument is NULL or the key length is not one of the permitted values, the return value is NULL. Example: > SELECT aes_decrypt(unbase64('y6Ss+zCYObpCbgfWfyNWTw=='), '1234567890123456'); 'ABC'
aes_encrypt	aes_encrypt(input string/binary, key string/binary) - Encrypt input using AES. AES (Advanced Encryption Standard) algorithm. Key lengths of 128, 192 or 256 bits can be used. 192 and 256 bits keys can be used if Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files are installed. If either argument is NULL or the key length is not one of the permitted values, the return value is NULL. Example: > SELECT base64(aes_encrypt('ABC', '1234567890123456')); 'y6Ss+zCYObpCbgfWfyNWTw=='
and	a1 and a2 and ... and an - Logical and
array	array(n0, n1...) - Creates an array with the given elements
array_contains	array_contains(array, value) - Returns TRUE if the array contains value. Example: > SELECT array_contains(array(1, 2, 3), 2) FROM src LIMIT 1; true
ascii	ascii(str) - returns the numeric value of the first character of str Returns 0 if str is empty or NULL if str is NULL Example: > SELECT ascii('222') FROM src LIMIT 1; 50 > SELECT ascii(2) FROM src LIMIT 1; 50
asin	asin(x) - returns the arc sine of x if -1<=x<=1 or NULL otherwise Example: > SELECT asin(0) FROM src LIMIT 1; 0 > SELECT asin(2) FROM src LIMIT 1; NULL
assert_true	assert_true(condition) - Throw an exception if 'condition' is not true. Example: > SELECT assert_true(x >= 0) FROM src LIMIT 1; NULL
atan	atan(x) - returns the atan (arctan) of x (x is in radians) Example: > SELECT atan(0) FROM src LIMIT 1; 0
avg	avg(x) - Returns the mean of a set of numbers
base64	base64(bin) - Convert the argument from binary to a base 64 string
between	between a [NOT] BETWEEN b AND c - evaluate if a is [not] in between b and c
bin	bin(n) - returns n in binary n is a BIGINT. Returns NULL if n is NULL. Example: > SELECT bin(13) FROM src LIMIT 1 '1101'
bloom_filter	There is no documentation for function 'bloom_filter'
bround	bround(x[, d]) - round x to d decimal places using HALF_EVEN rounding mode. Banker's rounding. The value is rounded to the nearest even number. Also known as Gaussian rounding. Example: > SELECT bround(12.25, 1); 12.2
cardinality_violation	cardinality_violation(n0, n1...) - raises Cardinality Violation
case	CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END - When a = b, returns c; when a = d, return e; else return f Example: SELECT CASE deptno WHEN 1 THEN Engineering WHEN 2 THEN Finance ELSE admin END, CASE zone WHEN 7 THEN Americas ELSE Asia-Pac END FROM emp_details
cbrt	cbrt(double) - Returns the cube root of a double value. Example: > SELECT cbrt(27.0); 3.0
ceil	ceil(x) - Find the smallest integer not smaller than x Synonyms: ceiling Example: > SELECT ceil(-0.1) FROM src LIMIT 1; 0 > SELECT ceil(5) FROM src LIMIT 1; 5
ceiling	ceiling(x) - Find the smallest integer not smaller than x Synonyms: ceil Example: > SELECT ceiling(-0.1) FROM src LIMIT 1; 0 > SELECT ceiling(5) FROM src LIMIT 1; 5
char_length	char_length(str \| binary) - Returns the number of characters in str or binary data Synonyms: character_length Example: > SELECT char_length('HUX81') FROM src LIMIT 1; 5
character_length	character_length(str \| binary) - Returns the number of characters in str or binary data Synonyms: char_length Example: > SELECT character_length('HUX81') FROM src LIMIT 1; 5
chr	chr(str) - convert n where n : [0, 256) into the ascii equivalent as a varchar.If n is less than 0 return the empty string. If n > 256, return chr(n % 256). Example: > SELECT chr('48') FROM src LIMIT 1; '0' > SELECT chr('65') FROM src LIMIT 1; 'A'
coalesce	coalesce(a1, a2, ...) - Returns the first non-null argument Example: > SELECT coalesce(NULL, 1, NULL) FROM src LIMIT 1; 1
collect_list	collect_list(x) - Returns a list of objects with duplicates
collect_set	collect_set(x) - Returns a set of objects with duplicate elements eliminated
compute_stats	compute_stats(x) - Returns the statistical summary of a set of primitive type values.
concat	concat(str1, str2, ... strN) - returns the concatenation of str1, str2, ... strN or concat(bin1, bin2, ... binN) - returns the concatenation of bytes in binary data bin1, bin2, ... binN Returns NULL if any argument is NULL. Example: > SELECT concat('abc', 'def') FROM src LIMIT 1; 'abcdef'
concat_ws	concat_ws(separator, [string \| array(string)]+) - returns the concatenation of the strings separated by the separator. Example: > SELECT concat_ws('.', 'www', array('facebook', 'com')) FROM src LIMIT 1; 'www.facebook.com'
context_ngrams	context_ngrams(expr, array<string1, string2, ...>, k, pf) estimates the top-k most frequent n-grams that fit into the specified context. The second parameter specifies a string of words that specify the positions of the n-gram elements, with a null value standing in for a 'blank' that must be filled by an n-gram element. The primary expression must be an array of strings, or an array of arrays of strings, such as the return type of the sentences() UDF. The second parameter specifies the context -- for example, array("i", "love", null) -- which would estimate the top 'k' words that follow the phrase "i love" in the primary expression. The optional fourth parameter 'pf' controls the memory used by the heuristic. Larger values will yield better accuracy, but use more memory. Example usage: SELECT context_ngrams(sentences(lower(review)), array("i", "love", null, null), 10) FROM movies would attempt to determine the 10 most common two-word phrases that follow "i love" in a database of free-form natural language movie reviews.
conv	conv(num, from_base, to_base) - convert num from from_base to to_base If to_base is negative, treat num as a signed integer,otherwise, treat it as an unsigned integer. Example: > SELECT conv('100', 2, 10) FROM src LIMIT 1; '4' > SELECT conv(-10, 16, -10) FROM src LIMIT 1; '16'
corr	corr(y,x) - Returns the Pearson coefficient of correlation between a set of number pairs The function takes as arguments any pair of numeric types and returns a double. Any pair with a NULL is ignored. If applied to an empty set: NULL is returned. If NSUM(xx) = SUM(x)SUM(x): NULL is returned. If NSUM(yy) = SUM(y)SUM(y): NULL is returned. Otherwise, it computes the following: COVAR_POP(x,y)/(STDDEV_POP(x)*STDDEV_POP(y)) where neither x nor y is null, COVAR_POP is the population covariance, and STDDEV_POP is the population standard deviation.
cos	cos(x) - returns the cosine of x (x is in radians) Example: > SELECT cos(0) FROM src LIMIT 1; 1
count	count(*) - Returns the total number of retrieved rows, including rows containing NULL values. count(expr) - Returns the number of rows for which the supplied expression is non-NULL. count(DISTINCT expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL.
covar_pop	covar_pop(x,y) - Returns the population covariance of a set of number pairs The function takes as arguments any pair of numeric types and returns a double. Any pair with a NULL is ignored. If the function is applied to an empty set, NULL will be returned. Otherwise, it computes the following: (SUM(xy)-SUM(x)SUM(y)/COUNT(x,y))/COUNT(x,y) where neither x nor y is null.
covar_samp	covar_samp(x,y) - Returns the sample covariance of a set of number pairs The function takes as arguments any pair of numeric types and returns a double. Any pair with a NULL is ignored. If applied to an empty set: NULL is returned. If applied to a set with a single element: NULL is returned. Otherwise, it computes the following: (SUM(xy)-SUM(x)SUM(y)/COUNT(x,y))/(COUNT(x,y)-1) where neither x nor y is null. 1
crc32	crc32(str or bin) - Computes a cyclic redundancy check value for string or binary argument and returns bigint value. Example: > SELECT crc32('ABC'); 2743272264 > SELECT crc32(binary('ABC')); 2743272264
create_union	create_union(tag, obj1, obj2, obj3, ...) - Creates a union with the object for given tag Example: > SELECT create_union(1, 1, "one") FROM src LIMIT 1; one
cume_dist	There is no documentation for function 'cume_dist'
current_database	current_database() - returns currently using database name
current_date	current_date() - Returns the current date at the start of query evaluation. All calls of current_date within the same query return the same value.
current_timestamp	current_timestamp() - Returns the current timestamp at the start of query evaluation. All calls of current_timestamp within the same query return the same value
current_user	current_user() - Returns current user name
date_add	date_add(start_date, num_days) - Returns the date that is num_days after start_date. start_date is a string in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. num_days is a number. The time part of start_date is ignored. Example: > SELECT date_add('2009-07-30', 1) FROM src LIMIT 1; '2009-07-31'
date_format	date_format(date/timestamp/string, fmt) - converts a date/timestamp/string to a value of string in the format specified by the date format fmt. Supported formats are SimpleDateFormat formats - https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html. Second argument fmt should be constant. Example: > SELECT date_format('2015-04-08', 'y'); '2015'
date_sub	date_sub(start_date, num_days) - Returns the date that is num_days before start_date. start_date is a string in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. num_days is a number. The time part of start_date is ignored. Example: > SELECT date_sub('2009-07-30', 1) FROM src LIMIT 1; '2009-07-29'
datediff	datediff(date1, date2) - Returns the number of days between date1 and date2 date1 and date2 are strings in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. The time parts are ignored.If date1 is earlier than date2, the result is negative. Example: > SELECT datediff('2009-07-30', '2009-07-31') FROM src LIMIT 1; 1
day	day(param) - Returns the day of the month of date/timestamp, or day component of interval Synonyms: dayofmonth param can be one of: 1. A string in the format of 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. 2. A date value 3. A timestamp value 4. A day-time interval valueExample: > SELECT day('2009-07-30') FROM src LIMIT 1; 30
dayofmonth	dayofmonth(param) - Returns the day of the month of date/timestamp, or day component of interval Synonyms: day param can be one of: 1. A string in the format of 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. 2. A date value 3. A timestamp value 4. A day-time interval valueExample: > SELECT dayofmonth('2009-07-30') FROM src LIMIT 1; 30
dayofweek	dayofweek(param) - Returns the day of the week of date/timestamp (1 = Sunday, 2 = Monday, ..., 7 = Saturday) param can be one of: 1. A string in the format of 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. 2. A date value 3. A timestamp valueExample: > SELECT dayofweek('2009-07-30') FROM src LIMIT 1; 5
decode	decode(bin, str) - Decode the first argument using the second argument character set Possible options for the character set are 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', and 'UTF-16'. If either argument is null, the result will also be null
degrees	degrees(x) - Converts radians to degrees Example: > SELECT degrees(30) FROM src LIMIT 1; -1
dense_rank	There is no documentation for function 'dense_rank'
div	a div b - Divide a by b rounded to the long integer Example: > SELECT 3 div 2 FROM src LIMIT 1; 1
e	e() - returns E Example: > SELECT e() FROM src LIMIT 1; 2.718281828459045
elt	elt(n, str1, str2, ...) - returns the n-th string Example: > SELECT elt(1, 'face', 'book') FROM src LIMIT 1; 'face'
encode	encode(str, str) - Encode the first argument using the second argument character set Possible options for the character set are 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', and 'UTF-16'. If either argument is null, the result will also be null
ewah_bitmap	ewah_bitmap(expr) - Returns an EWAH-compressed bitmap representation of a column.
ewah_bitmap_and	ewah_bitmap_and(b1, b2) - Return an EWAH-compressed bitmap that is the bitwise AND of two bitmaps.
ewah_bitmap_empty	ewah_bitmap_empty(bitmap) - Predicate that tests whether an EWAH-compressed bitmap is all zeros
ewah_bitmap_or	ewah_bitmap_or(b1, b2) - Return an EWAH-compressed bitmap that is the bitwise OR of two bitmaps.
exp	exp(x) - Returns e to the power of x Example: > SELECT exp(0) FROM src LIMIT 1; 1
explode	explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns
extract_union	extract_union(union[, tag]) - Recursively explodes unions into structs or simply extracts the given tag. > SELECT extract_union({0:"foo"}).tag_0 FROM src; foo > SELECT extract_union({0:"foo"}).tag_1 FROM src; null > SELECT extract_union({0:"foo"}, 0) FROM src; foo > SELECT extract_union({0:"foo"}, 1) FROM src; null
factorial	factorial(int) - Returns n factorial. Valid n is [0..20]. Returns null if n is out of [0..20] range. Example: > SELECT factorial(5); 120
field	field(str, str1, str2, ...) - returns the index of str in the str1,str2,... list or 0 if not found All primitive types are supported, arguments are compared using str.equals(x). If str is NULL, the return value is 0.
find_in_set	find_in_set(str,str_array) - Returns the first occurrence of str in str_array where str_array is a comma-delimited string. Returns null if either argument is null. Returns 0 if the first argument has any commas. Example: > SELECT find_in_set('ab','abc,b,ab,c,def') FROM src LIMIT 1; 3 > SELECT * FROM src1 WHERE NOT find_in_set(key,'311,128,345,956')=0; 311 val_311 128
first_value	There is no documentation for function 'first_value'
floor	floor(x) - Find the largest integer not greater than x Example: > SELECT floor(-0.1) FROM src LIMIT 1; -1 > SELECT floor(5) FROM src LIMIT 1; 5
floor_day	floor_day(param) - Returns the timestamp at a day granularity param needs to be a timestamp value Example: > SELECT floor_day(CAST('yyyy-MM-dd HH:mm:ss' AS TIMESTAMP)) FROM src; yyyy-MM-dd 00:00:00
floor_hour	floor_hour(param) - Returns the timestamp at a hour granularity param needs to be a timestamp value Example: > SELECT floor_hour(CAST('yyyy-MM-dd HH:mm:ss' AS TIMESTAMP)) FROM src; yyyy-MM-dd HH:00:00
floor_minute	floor_minute(param) - Returns the timestamp at a minute granularity param needs to be a timestamp value Example: > SELECT floor_minute(CAST('yyyy-MM-dd HH:mm:ss' AS TIMESTAMP)) FROM src; yyyy-MM-dd HH:mm:00
floor_month	floor_month(param) - Returns the timestamp at a month granularity param needs to be a timestamp value Example: > SELECT floor_month(CAST('yyyy-MM-dd HH:mm:ss' AS TIMESTAMP)) FROM src; yyyy-MM-01 00:00:00
floor_quarter	floor_quarter(param) - Returns the timestamp at a quarter granularity param needs to be a timestamp value Example: > SELECT floor_quarter(CAST('yyyy-MM-dd HH:mm:ss' AS TIMESTAMP)) FROM src; yyyy-xx-01 00:00:00 5
floor_second	floor_second(param) - Returns the timestamp at a second granularity param needs to be a timestamp value Example: > SELECT floor_second(CAST('yyyy-MM-dd HH:mm:ss' AS TIMESTAMP)) FROM src; yyyy-MM-dd HH:mm:ss
floor_week	floor_week(param) - Returns the timestamp at a week granularity param needs to be a timestamp value Example: > SELECT floor_week(CAST('yyyy-MM-dd HH:mm:ss' AS TIMESTAMP)) FROM src; yyyy-MM-xx 00:00:00
floor_year	floor_year(param) - Returns the timestamp at a year granularity param needs to be a timestamp value Example: > SELECT floor_year(CAST('yyyy-MM-dd HH:mm:ss' AS TIMESTAMP)) FROM src; yyyy-01-01 00:00:00
format_number	format_number(X, D or F) - Formats the number X to a format like '#,###,###.##', rounded to D decimal places, Or Uses the format specified F to format, and returns the result as a string. If D is 0, the result has no decimal point or fractional part. This is supposed to function like MySQL's FORMAT Example: > SELECT format_number(12332.123456, 4) FROM src LIMIT 1; '12,332.1235' > SELECT format_number(12332.123456, '##################.###') FROM src LIMIT 1; '12332.123'
from_unixtime	from_unixtime(unix_time, format) - returns unix_time in the specified format Example: > SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ss') FROM src LIMIT 1; '1970-01-01 00:00:00'
from_utc_timestamp	from_utc_timestamp(timestamp, string timezone) - Assumes given timestamp is UTC and converts to given timezone (as of Hive 0.8.0)
get_json_object	get_json_object(json_txt, path) - Extract a json object from path Extract json object from a json string based on json path specified, and return json string of the extracted json object. It will return null if the input json string is invalid. A limited version of JSONPath supported: $ : Root object . : Child operator [] : Subscript operator for array * : Wildcard for [] Syntax not supported that's worth noticing: '' : Zero length string as key .. : Recursive descent &#064; : Current object/element () : Script expression ?() : Filter (script) expression. [,] : Union operator [start:end:step] : array slice operator
get_splits	get_splits(string,int) - Returns an array of length int serialized splits for the referenced tables string.
greatest	greatest(v1, v2, ...) - Returns the greatest value in a list of values Example: > SELECT greatest(2, 3, 1) FROM src LIMIT 1; 3
grouping	grouping(a, b) - Indicates whether a specified column expression in is aggregated or not. Returns 1 for aggregated or 0 for not aggregated. a is the grouping id, b is the index we want to extract
hash	hash(a1, a2, ...) - Returns a hash value of the arguments
hex	hex(n, bin, or str) - Convert the argument to hexadecimal If the argument is a string, returns two hex digits for each character in the string. If the argument is a number or binary, returns the hexadecimal representation. Example: > SELECT hex(17) FROM src LIMIT 1; 'H1' > SELECT hex('Facebook') FROM src LIMIT 1; '46616365626F6F6B'
histogram_numeric	histogram_numeric(expr, nb) - Computes a histogram on numeric 'expr' using nb bins. Example: > SELECT histogram_numeric(val, 3) FROM src; [{"x":100,"y":14.0},{"x":200,"y":22.0},{"x":290.5,"y":11.0}] The return value is an array of (x,y) pairs representing the centers of the histogram's bins. As the value of 'nb' is increased, the histogram approximationgets finer-grained, but may yield artifacts around outliers. In practice, 20-40 histogram bins appear to work well, with more bins being required for skewed or smaller datasets. Note that this function creates a histogram with non-uniform bin widths. It offers no guarantees in terms of the mean-squared-error of the histogram, but in practice is comparable to the histograms produced by the R/S-Plusstatistical computing packages.
hour	hour(param) - Returns the hour componemnt of the string/timestamp/interval param can be one of: 1. A string in the format of 'yyyy-MM-dd HH:mm:ss' or 'HH:mm:ss'. 2. A timestamp value 3. A day-time interval valueExample: > SELECT hour('2009-07-30 12:58:59') FROM src LIMIT 1; 12 > SELECT hour('12:58:59') FROM src LIMIT 1; 12
if	IF(expr1,expr2,expr3) - If expr1 is TRUE (expr1 <> 0 and expr1 <> NULL) then IF() returns expr2; otherwise it returns expr3. IF() returns a numeric or string value, depending on the context in which it is used.
in	test in(val1, val2...) - returns true if test equals any valN
in_bloom_filter
in_file	in_file(str, filename) - Returns true if str appears in the file
index	index(a, n) - Returns the n-th element of a
initcap	initcap(str) - Returns str, with the first letter of each word in uppercase, all other letters in lowercase. Words are delimited by white space. Example: > SELECT initcap('tHe soap') FROM src LIMIT 1; 'The Soap'
inline	inline( ARRAY( STRUCT()[,STRUCT()] - explodes and array and struct into a table
instr	instr(str, substr) - Returns the index of the first occurance of substr in str Example: > SELECT instr('Facebook', 'boo') FROM src LIMIT 1; 5
internal_interval	internal_interval(intervalType,intervalArg) this method is not designed to be used by directly calling it - it provides internal support for 'INTERVAL (intervalArg) intervalType' constructs
isnotnull	isnotnull a - Returns true if a is not NULL and false otherwise
isnull	isnull a - Returns true if a is NULL and false otherwise
java_method	java_method(class,method[,arg1[,arg2..]]) calls method with reflection Synonyms: reflect Use this UDF to call Java methods by matching the argument signature
json_tuple	json_tuple(jsonStr, p1, p2, ..., pn) - like get_json_object, but it takes multiple names and return a tuple. All the input parameters and output column types are string. Function class:org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple
lag	LAG (scalar_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause); The LAG function is used to access data from a previous row. Example: select p1.p_mfgr, p1.p_name, p1.p_size, p1.p_size - lag(p1.p_size,1,p1.p_size) over( distribute by p1.p_mfgr sort by p1.p_name) as deltaSz from part p1 join part p2 on p1.p_partkey = p2.p_partkey
last_day	last_day(date) - Returns the last day of the month which the date belongs to. date is a string in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. The time part of date is ignored. Example: > SELECT last_day('2009-01-12') FROM src LIMIT 1; '2009-01-31'
last_value
lcase	lcase(str) - Returns str with all characters changed to lowercase Synonyms: lower Example: > SELECT lcase('Facebook') FROM src LIMIT 1; 'facebook'
lead	LEAD (scalar_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause); The LEAD function is used to return data from the next row. Example: select p_name, p_retailprice, lead(p_retailprice) over() as l1, lag(p_retailprice) over() as l2 from part where p_retailprice = 1173.15
least	least(v1, v2, ...) - Returns the least value in a list of values Example: > SELECT least(2, 3, 1) FROM src LIMIT 1; 1
length	length(str \| binary) - Returns the length of str or number of bytes in binary data Example: > SELECT length('Facebook') FROM src LIMIT 1; 8
levenshtein	levenshtein(str1, str2) - This function calculates the Levenshtein distance between two strings. Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.Example: > SELECT levenshtein('kitten', 'sitting'); 3
like	like(str, pattern) - Checks if str matches pattern Example: > SELECT a.* FROM srcpart a WHERE a.hr like '%2' LIMIT 1; 27 val_27 2008-04-08 12
ln	ln(x) - Returns the natural logarithm of x Example: > SELECT ln(1) FROM src LIMIT 1; 0
locate	locate(substr, str[, pos]) - Returns the position of the first occurance of substr in str after position pos Example: > SELECT locate('bar', 'foobarbar', 5) FROM src LIMIT 1; 7
log	log([b], x) - Returns the logarithm of x with base b Example: > SELECT log(13, 13) FROM src LIMIT 1; 1
log10	log10(x) - Returns the logarithm of x with base 10 Example: > SELECT log10(10) FROM src LIMIT 1; 1
log2	log2(x) - Returns the logarithm of x with base 2 Example: > SELECT log2(2) FROM src LIMIT 1; 1
logged_in_user	logged_in_user() - Returns logged in user name SessionState GetUserName - the username provided at session initialization
lower	lower(str) - Returns str with all characters changed to lowercase Synonyms: lcase Example: > SELECT lower('Facebook') FROM src LIMIT 1; 'facebook'
lpad	lpad(str, len, pad) - Returns str, left-padded with pad to a length of len If str is longer than len, the return value is shortened to len characters. In case of empty pad string, the return value is null. Example: > SELECT lpad('hi', 5, '??') FROM src LIMIT 1; '???hi' > SELECT lpad('hi', 1, '??') FROM src LIMIT 1; 'h' > SELECT lpad('hi', 5, '') FROM src LIMIT 1; null
ltrim	ltrim(str) - Removes the leading space characters from str Example: > SELECT ltrim(' facebook') FROM src LIMIT 1; 'facebook'
map	map(key0, value0, key1, value1...) - Creates a map with the given key/value pairs
map_keys	map_keys(map) - Returns an unordered array containing the keys of the input map
map_values	map_values(map) - Returns an unordered array containing the values of the input map.
mask	masks the given value Examples: mask(ccn) mask(ccn, 'X', 'x', '0') mask(ccn, 'x', 'x', 'x') Arguments: mask(value, upperChar, lowerChar, digitChar, otherChar, numberChar, dayValue, monthValue, yearValue) value - value to mask. Supported types: TINYINT, SMALLINT, INT, BIGINT, STRING, VARCHAR, CHAR, DATE upperChar - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X' lowerChar - character to replace lower-case characters with. Specify -1 to retain original character. Default value: 'x' digitChar - character to replace digit characters with. Specify -1 to retain original character. Default value: 'n' otherChar - character to replace all other characters with. Specify -1 to retain original character. Default value: -1 numberChar - character to replace digits in a number with. Valid values: 0-9. Default value: '1' dayValue - value to replace day field in a date with. Specify -1 to retain original value. Valid values: 1-31. Default value: 1 monthValue - value to replace month field in a date with. Specify -1 to retain original value. Valid values: 0-11. Default value: 0 yearValue - value to replace year field in a date with. Specify -1 to retain original value. Default value: 0
mask_first_n	masks the first n characters of the value Examples: mask_first_n(ccn, 8) mask_first_n(ccn, 8, 'x', 'x', 'x') Arguments: mask(value, charCount, upperChar, lowerChar, digitChar, otherChar, numberChar) value - value to mask. Supported types: TINYINT, SMALLINT, INT, BIGINT, STRING, VARCHAR, CHAR charCount - number of characters. Default value: 4 upperChar - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X' lowerChar - character to replace lower-case characters with. Specify -1 to retain original character. Default value: 'x' digitChar - character to replace digit characters with. Specify -1 to retain original character. Default value: 'n' otherChar - character to replace all other characters with. Specify -1 to retain original character. Default value: -1 numberChar - character to replace digits in a number with. Valid values: 0-9. Default value: '1'
mask_hash	returns hash of the given value Examples: mask_hash(value) Arguments: value - value to mask. Supported types: STRING, VARCHAR, CHAR
mask_last_n	masks the last n characters of the value Examples: mask_last_n(ccn, 8) mask_last_n(ccn, 8, 'x', 'x', 'x') Arguments: mask_last_n(value, charCount, upperChar, lowerChar, digitChar, otherChar, numberChar) value - value to mask. Supported types: TINYINT, SMALLINT, INT, BIGINT, STRING, VARCHAR, CHAR charCount - number of characters. Default value: 4 upperChar - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X' lowerChar - character to replace lower-case characters with. Specify -1 to retain original character. Default value: 'x' digitChar - character to replace digit characters with. Specify -1 to retain original character. Default value: 'n' otherChar - character to replace all other characters with. Specify -1 to retain original character. Default value: -1 numberChar - character to replace digits in a number with. Valid values: 0-9. Default value: '1'
mask_show_first_n	masks all but first n characters of the value Examples: mask_show_first_n(ccn, 8) mask_show_first_n(ccn, 8, 'x', 'x', 'x') Arguments: mask_show_first_n(value, charCount, upperChar, lowerChar, digitChar, otherChar, numberChar) value - value to mask. Supported types: TINYINT, SMALLINT, INT, BIGINT, STRING, VARCHAR, CHAR charCount - number of characters. Default value: 4 upperChar - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X' lowerChar - character to replace lower-case characters with. Specify -1 to retain original character. Default value: 'x' digitChar - character to replace digit characters with. Specify -1 to retain original character. Default value: 'n' otherChar - character to replace all other characters with. Specify -1 to retain original character. Default value: -1 numberChar - character to replace digits in a number with. Valid values: 0-9. Default value: '1'
mask_show_last_n	masks all but last n characters of the value Examples: mask_show_last_n(ccn, 8) mask_show_last_n(ccn, 8, 'x', 'x', 'x') Arguments: mask_show_last_n(value, charCount, upperChar, lowerChar, digitChar, otherChar, numberChar) value - value to mask. Supported types: TINYINT, SMALLINT, INT, BIGINT, STRING, VARCHAR, CHAR charCount - number of characters. Default value: 4 upperChar - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X' lowerChar - character to replace lower-case characters with. Specify -1 to retain original character. Default value: 'x' digitChar - character to replace digit characters with. Specify -1 to retain original character. Default value: 'n' otherChar - character to replace all other characters with. Specify -1 to retain original character. Default value: -1 numberChar - character to replace digits in a number with. Valid values: 0-9. Default value: '1'
matchpath
max	max(expr) - Returns the maximum value of expr
md5	md5(str or bin) - Calculates an MD5 128-bit checksum for the string or binary. The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. Example: > SELECT md5('ABC'); '902fbdd2b1df0c4f70b4a5d23525e932' > SELECT md5(binary('ABC')); '902fbdd2b1df0c4f70b4a5d23525e932'
min	min(expr) - Returns the minimum value of expr
minute	minute(param) - Returns the minute component of the string/timestamp/interval param can be one of: 1. A string in the format of 'yyyy-MM-dd HH:mm:ss' or 'HH:mm:ss'. 2. A timestamp value 3. A day-time interval valueExample: > SELECT minute('2009-07-30 12:58:59') FROM src LIMIT 1; 58 > SELECT minute('12:58:59') FROM src LIMIT 1; 58
mod	a mod b - Returns the remainder when dividing a by b Synonyms: %
month	month(param) - Returns the month component of the date/timestamp/interval param can be one of: 1. A string in the format of 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. 2. A date value 3. A timestamp value 4. A year-month interval valueExample: > SELECT month('2009-07-30') FROM src LIMIT 1; 7
months_between	months_between(date1, date2, roundOff) - returns number of months between dates date1 and date2 If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise the UDF calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. date1 and date2 type can be date, timestamp or string in the format 'yyyy-MM-dd' or 'yyyy-MM-dd HH:mm:ss'. The result is rounded to 8 decimal places by default. Set roundOff=false otherwise. Example: > SELECT months_between('1997-02-28 10:30:00', '1996-10-30'); 3.94959677
named_struct	named_struct(name1, val1, name2, val2, ...) - Creates a struct with the given field names and values
negative	negative a - Returns -a
next_day	next_day(start_date, day_of_week) - Returns the first date which is later than start_date and named as indicated. start_date is a string in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. day_of_week is day of the week (e.g. Mo, tue, FRIDAY).Example: > SELECT next_day('2015-01-14', 'TU') FROM src LIMIT 1; '2015-01-20'
ngrams	ngrams(expr, n, k, pf) - Estimates the top-k n-grams in rows that consist of sequences of strings, represented as arrays of strings, or arrays of arrays of strings. 'pf' is an optional precision factor that controls memory usage. The parameter 'n' specifies what type of n-grams are being estimated. Unigrams are n = 1, and bigrams are n = 2. Generally, n will not be greater than about 5. The 'k' parameter specifies how many of the highest-frequency n-grams will be returned by the UDAF. The optional precision factor 'pf' specifies how much memory to use for estimation; more memory will give more accurate frequency counts, but could crash the JVM. The default value is 20, which internally maintains 20*k n-grams, but only returns the k highest frequency ones. The output is an array of structs with the top-k n-grams. It might be convenient to explode() the output of this UDAF.
noop
noopstreaming
noopwithmap
noopwithmapstreaming
not	not a - Logical not Synonyms: !
ntile
nullif	SELECT nullif(1,1),nullif(1,2)
nvl	nvl(value,default_value) - Returns default value if value is null else returns value Example: > SELECT nvl(null,'bla') FROM src LIMIT 1; bla
octet_length	octet_length(str \| binary) - Returns the number of bytes in str or binary data Example: > SELECT octet_length('HUX8�') FROM src LIMIT 1; 15
or	a1 or a2 or ... or an - Logical or
parse_url	parse_url(url, partToExtract[, key]) - extracts a part from a URL Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO key specifies which query to extract Example: > SELECT parse_url('http://facebook.com/path/p1.php?query=1', 'HOST') FROM src LIMIT 1; 'facebook.com' > SELECT parse_url('http://facebook.com/path/p1.php?query=1', 'QUERY') FROM src LIMIT 1; 'query=1' > SELECT parse_url('http://facebook.com/path/p1.php?query=1', 'QUERY', 'query') FROM src LIMIT 1; '1'
parse_url_tuple	parse_url_tuple(url, partname1, partname2, ..., partnameN) - extracts N (N>=1) parts from a URL. It takes a URL and one or multiple partnames, and returns a tuple. All the input parameters and output column types are string. Partname: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:<KEY_NAME> Note: Partnames are case-sensitive, and should not contain unnecessary white spaces. Example: > SELECT b.* FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY', 'QUERY:id') b as host, path, query, query_id LIMIT 1; > SELECT parse_url_tuple(a.fullurl, 'HOST', 'PATH', 'QUERY', 'REF', 'PROTOCOL', 'FILE', 'AUTHORITY', 'USERINFO', 'QUERY:k1') as (ho, pa, qu, re, pr, fi, au, us, qk1) from src a;
percent_rank
percentile	percentile(expr, pc) - Returns the percentile(s) of expr at pc (range: [0,1]).pc can be a double or double array
percentile_approx	percentile_approx(expr, pc, [nb]) - For very large data, computes an approximate percentile value from a histogram, using the optional argument [nb] as the number of histogram bins to use. A higher value of nb results in a more accurate approximation, at the cost of higher memory usage. 'expr' can be any numeric column, including doubles and floats, and 'pc' is either a single double/float with a requested percentile, or an array of double/float with multiple percentiles. If 'nb' is not specified, the default approximation is done with 10,000 histogram bins, which means that if there are 10,000 or fewer unique values in 'expr', you can expect an exact result. The percentile() function always computes an exact percentile and can run out of memory if there are too many unique values in a column, which necessitates this function. Example (three percentiles requested using a finer histogram approximation): > SELECT percentile_approx(val, array(0.5, 0.95, 0.98), 100000) FROM somedata; [0.05,1.64,2.26]
pi	pi() - returns pi Example: > SELECT pi() FROM src LIMIT 1; 3.14159...
pmod	a pmod b - Compute the positive modulo
posexplode	posexplode(a) - behaves like explode for arrays, but includes the position of items in the original array
positive	positive a - Returns a
pow	pow(x1, x2) - raise x1 to the power of x2 Synonyms: power Example: > SELECT pow(2, 3) FROM src LIMIT 1; 8
power	power(x1, x2) - raise x1 to the power of x2 Synonyms: pow Example: > SELECT power(2, 3) FROM src LIMIT 1; 8
printf	printf(String format, Obj... args) - function that can format strings according to printf-style format strings Example: > SELECT printf("Hello World %d %s", 100, "days")FROM src LIMIT 1; "Hello World 100 days"
quarter	quarter(date/timestamp/string) - Returns the quarter of the year for date, in the range 1 to 4. Example: > SELECT quarter('2015-04-08'); 2
radians	radians(x) - Converts degrees to radians Example: > SELECT radians(90) FROM src LIMIT 1; 1.5707963267949mo
rand	rand([seed]) - Returns a pseudorandom number between 0 and 1
rank
reflect	reflect(class,method[,arg1[,arg2..]]) calls method with reflection Synonyms: java_method Use this UDF to call Java methods by matching the argument signature
reflect2	reflect2(arg0,method[,arg1[,arg2..]]) calls method of arg0 with reflection Use this UDF to call Java methods by matching the argument signature
regexp	str regexp regexp - Returns true if str matches regexp and false otherwise Synonyms: rlike Example: > SELECT 'fb' regexp '.*' FROM src LIMIT 1; true
regexp_extract	regexp_extract(str, regexp[, idx]) - extracts a group that matches regexp Example: > SELECT regexp_extract('100-200', '(\d+)-(\d+)', 1) FROM src LIMIT 1; '100'
regexp_replace	regexp_replace(str, regexp, rep) - replace all substrings of str that match regexp with rep Example: > SELECT regexp_replace('100-200', '(\d+)', 'num') FROM src LIMIT 1; 'num-num'
regr_avgx	egr_avgx(y,x) - evaluates the average of the independent variable The function takes as arguments any pair of numeric types and returns a double. Any pair with a NULL is ignored. If applied to an empty set: NULL is returned. Otherwise, it computes the following: AVG(X)
regr_avgy	regr_avgy(y,x) - evaluates the average of the dependent variable The function takes as arguments any pair of numeric types and returns a double. Any pair with a NULL is ignored. If applied to an empty set: NULL is returned. Otherwise, it computes the following: AVG(Y)
regr_count	regr_count(y,x) - returns the number of non-null pairs The function takes as arguments any pair of numeric types and returns a long. Any pair with a NULL is ignored.
regr_intercept	regr_intercept(y,x) - returns the y-intercept of the regression line. The function takes as arguments any pair of numeric types and returns a double. Any pair with a NULL is ignored. If applied to an empty set: NULL is returned. If NSUM(xx) = SUM(x)SUM(x): NULL is returned. Otherwise, it computes the following: ( SUM(y)SUM(xx)-SUM(X)SUM(xy) ) / ( NSUM(xx)-SUM(x)SUM(x) )
regr_r2	regr_r2(y,x) - returns the coefficient of determination (also called R-squared or goodness of fit) for the regression line. The function takes as arguments any pair of numeric types and returns a double. Any pair with a NULL is ignored. If applied to an empty set: NULL is returned. If NSUM(xx) = SUM(x)SUM(x): NULL is returned. If NSUM(yy) = SUM(y)SUM(y): 1 is returned. Otherwise, it computes the following: POWER( NSUM(xy)-SUM(x)SUM(y) ,2) / ( (NSUM(xx)-SUM(x)SUM(x)) * (NSUM(yy)-SUM(y)*SUM(y)) )
regr_slope	regr_slope(y,x) - returns the slope of the linear regression line The function takes as arguments any pair of numeric types and returns a double. Any pair with a NULL is ignored. If applied to an empty set: NULL is returned. If NSUM(xx) = SUM(x)SUM(x): NULL is returned (the fit would be a vertical). Otherwise, it computes the following: (NSUM(xy)-SUM(x)SUM(y)) / (NSUM(xx)-SUM(x)*SUM(x))
regr_sxx	regr_sxx(y,x) - auxiliary analytic function The function takes as arguments any pair of numeric types and returns a double. Any pair with a NULL is ignored. If applied to an empty set: NULL is returned. Otherwise, it computes the following: SUM(xx)-SUM(x)SUM(x)/N
regr_sxy	regr_sxy(y,x) - return a value that can be used to evaluate the statistical validity of a regression model. The function takes as arguments any pair of numeric types and returns a double. Any pair with a NULL is ignored. If applied to an empty set: NULL is returned. If NSUM(xx) = SUM(x)SUM(x): NULL is returned. Otherwise, it computes the following: SUM(xy)-SUM(x)*SUM(y)/N
regr_syy	regr_syy(y,x) - auxiliary analytic function The function takes as arguments any pair of numeric types and returns a double. Any pair with a NULL is ignored. If applied to an empty set: NULL is returned. Otherwise, it computes the following: SUM(yy)-SUM(y)SUM(y)/N
repeat	repeat(str, n) - repeat str n times Example: > SELECT repeat('123', 2) FROM src LIMIT 1; '123123'
replace	replace(str, search, rep) - replace all substrings of 'str' that match 'search' with 'rep' Example: > SELECT replace('Hack and Hue', 'H', 'BL') FROM src LIMIT 1; 'BLack and BLue'
replicate_rows	replicate_rows(n, cols...) - turns 1 row into n rows
reverse	reverse(str) - reverse str Example: > SELECT reverse('Facebook') FROM src LIMIT 1; 'koobecaF'
rlike	str rlike regexp - Returns true if str matches regexp and false otherwise Synonyms: regexp Example: > SELECT 'fb' rlike '.*' FROM src LIMIT 1; true
round	round(x[, d]) - round x to d decimal places Example: > SELECT round(12.3456, 1) FROM src LIMIT 1; 12.3'
row_number
rpad	rpad(str, len, pad) - Returns str, right-padded with pad to a length of len If str is longer than len, the return value is shortened to len characters. In case of empty pad string, the return value is null. Example: > SELECT rpad('hi', 5, '??') FROM src LIMIT 1; 'hi???' > SELECT rpad('hi', 1, '??') FROM src LIMIT 1; 'h' > SELECT rpad('hi', 5, '') FROM src LIMIT 1; null
rtrim	rtrim(str) - Removes the trailing space characters from str Example: > SELECT rtrim('facebook ') FROM src LIMIT 1; 'facebook'
second	second(date) - Returns the second component of the string/timestamp/interval param can be one of: 1. A string in the format of 'yyyy-MM-dd HH:mm:ss' or 'HH:mm:ss'. 2. A timestamp value 3. A day-time interval valueExample: > SELECT second('2009-07-30 12:58:59') FROM src LIMIT 1; 59 > SELECT second('12:58:59') FROM src LIMIT 1; 59
sentences	sentences(str, lang, country) - Splits str into arrays of sentences, where each sentence is an array of words. The 'lang' and'country' arguments are optional, and if omitted, the default locale is used. Example: > SELECT sentences('Hello there! I am a UDF.') FROM src LIMIT 1; [ ["Hello", "there"], ["I", "am", "a", "UDF"] ] > SELECT sentences(review, language) FROM movies; Unnecessary punctuation, such as periods and commas in English, is automatically stripped. If specified, 'lang' should be a two-letter ISO-639 language code (such as 'en'), and 'country' should be a two-letter ISO-3166 code (such as 'us'). Not all country and language codes are fully supported, and if an unsupported code is specified, a default locale is used to process that string.
sha	sha(str or bin) - Calculates the SHA-1 digest for string or binary and returns the value as a hex string. Synonyms: sha1 Example: > SELECT sha('ABC'); '3c01bdbb26f358bab27f267924aa2c9a03fcfdb8' > SELECT sha(binary('ABC')); '3c01bdbb26f358bab27f267924aa2c9a03fcfdb8'
sha1	sha1(str or bin) - Calculates the SHA-1 digest for string or binary and returns the value as a hex string. Synonyms: sha Example: > SELECT sha1('ABC'); '3c01bdbb26f358bab27f267924aa2c9a03fcfdb8' > SELECT sha1(binary('ABC')); '3c01bdbb26f358bab27f267924aa2c9a03fcfdb8'
sha2	sha2(string/binary, len) - Calculates the SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). The first argument is the string or binary to be hashed. The second argument indicates the desired bit length of the result, which must have a value of 224, 256, 384, 512, or 0 (which is equivalent to 256). SHA-224 is supported starting from Java 8. If either argument is NULL or the hash length is not one of the permitted values, the return value is NULL. Example: > SELECT sha2('ABC', 256); 'b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78'
shiftleft	shiftleft(a, b) - Bitwise left shift Returns int for tinyint, smallint and int a. Returns bigint for bigint a. Example: > SELECT shiftleft(2, 1); 4
shiftright	shiftright(a, b) - Bitwise right shift Returns int for tinyint, smallint and int a. Returns bigint for bigint a. Example: > SELECT shiftright(4, 1); 2
shiftrightunsigned	shiftrightunsigned(a, b) - Bitwise unsigned right shift Returns int for tinyint, smallint and int a. Returns bigint for bigint a. Example: > SELECT shiftrightunsigned(4, 1); 2
sign	sign(x) - returns the sign of x ) Example: > SELECT sign(40) FROM src LIMIT 1; 1
sin	sin(x) - returns the sine of x (x is in radians) Example: > SELECT sin(0) FROM src LIMIT 1; 0
size	size(a) - Returns the size of a
sort_array	sort_array(array(obj1, obj2,...)) - Sorts the input array in ascending order according to the natural ordering of the array elements. Example: > SELECT sort_array(array('b', 'd', 'c', 'a')) FROM src LIMIT 1; 'a', 'b', 'c', 'd'
sort_array_by	sort_array_by(array(obj1, obj2,...),'f1','f2',...,['ASC','DESC']) - Sorts the input tuple array in user specified order(ASC,DESC) by desired field[s] name If sorting order is not mentioned by user then dafault sorting order is ascending Example: > SELECT sort_array_by(array(struct('g',100),struct('b',200)),'col1','ASC') FROM src LIMIT 1; array(struct('b',200),struct('g',100))
soundex	soundex(string) - Returns soundex code of the string. The soundex code consist of the first letter of the name followed by three digits. Example: > SELECT soundex('Miller'); M460
space	space(n) - returns n spaces Example: > SELECT space(2) FROM src LIMIT 1; ' '
split	split(str, regex) - Splits str around occurances that match regex Example: > SELECT split('oneAtwoBthreeC', '[ABC]') FROM src LIMIT 1; ["one", "two", "three"]
sq_count_check	sq_count_check(x) - Internal check on scalar subquery expression to make sure atmost one row is returned For internal use only
sqrt	sqrt(x) - returns the square root of x Example: > SELECT sqrt(4) FROM src LIMIT 1; 2
stack	stack(n, cols...) - turns k columns into n rows of size k/n each
std	std(x) - Returns the standard deviation of a set of numbers Synonyms: stddev, stddev_pop
stddev	stddev(x) - Returns the standard deviation of a set of numbers Synonyms: std, stddev_pop
stddev_pop	stddev_pop(x) - Returns the standard deviation of a set of numbers Synonyms: std, stddev
stddev_samp	stddev_samp(x) - Returns the sample standard deviation of a set of numbers
str_to_map	str_to_map(text, delimiter1, delimiter2) - Creates a map by parsing text Split text into key-value pairs using two delimiters. The first delimiter separates pairs, and the second delimiter sperates key and value. If only one parameter is given, default delimiters are used: ',' as delimiter1 and ':' as delimiter2.
struct	struct(col1, col2, col3, ...) - Creates a struct with the given field values
substr	substr(str, pos[, len]) - returns the substring of str that starts at pos and is of length len orsubstr(bin, pos[, len]) - returns the slice of byte array that starts at pos and is of length len Synonyms: substring pos is a 1-based index. If pos<0 the starting position is determined by counting backwards from the end of str. Example: > SELECT substr('Facebook', 5) FROM src LIMIT 1; 'book' > SELECT substr('Facebook', -5) FROM src LIMIT 1; 'ebook' > SELECT substr('Facebook', 5, 1) FROM src LIMIT 1; 'b'
substring	substring(str, pos[, len]) - returns the substring of str that starts at pos and is of length len orsubstring(bin, pos[, len]) - returns the slice of byte array that starts at pos and is of length len Synonyms: substr pos is a 1-based index. If pos<0 the starting position is determined by counting backwards from the end of str. Example: > SELECT substring('Facebook', 5) FROM src LIMIT 1; 'book' > SELECT substring('Facebook', -5) FROM src LIMIT 1; 'ebook' > SELECT substring('Facebook', 5, 1) FROM src LIMIT 1; 'b'
substring_index	substring_index(str, delim, count) - Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. Substring_index performs a case-sensitive match when searching for delim. Example: > SELECT substring_index('www.apache.org', '.', 2); 'www.apache'
sum	sum(x) - Returns the sum of a set of numbers
tan	tan(x) - returns the tangent of x (x is in radians) Example: > SELECT tan(0) FROM src LIMIT 1; 1
to_date	to_date(expr) - Extracts the date part of the date or datetime expression expr Example: > SELECT to_date('2009-07-30 04:17:52') FROM src LIMIT 1; '2009-07-30'
to_unix_timestamp	to_unix_timestamp(date[, pattern]) - Returns the UNIX timestamp Converts the specified time to number of seconds since 1970-01-01.
to_utc_timestamp	to_utc_timestamp(timestamp, string timezone) - Assumes given timestamp is in given timezone and converts to UTC (as of Hive 0.8.0)
translate	translate(input, from, to) - translates the input string by replacing the characters present in the from string with the corresponding characters in the to string translate(string input, string from, string to) is an equivalent function to translate in PostGreSQL. It works on a character by character basis on the input string (first parameter). A character in the input is checked for presence in the from string (second parameter). If a match happens, the character from to string (third parameter) which appears at the same index as the character in from string is obtained. This character is emitted in the output string instead of the original character from the input string. If the to string is shorter than the from string, there may not be a character present at the same index in the to string. In such a case, nothing is emitted for the original character and it's deleted from the output string. For example, translate('abcdef', 'adc', '19') returns '1b9ef' replacing 'a' with '1', 'd' with '9' and removing 'c' from the input string translate('a b c d', ' ', '') return 'abcd' removing all spaces from the input string If the same character is present multiple times in the input string, the first occurence of the character is the one that's considered for matching. However, it is not recommended to have the same character more than once in the from string since it's not required and adds to confusion. For example, translate('abcdef', 'ada', '192') returns '1bc9ef' replaces 'a' with '1' and 'd' with '9' ignoring the second occurence of 'a' in the from string mapping it to '2'
trim	trim(str) - Removes the leading and trailing space characters from str Example: > SELECT trim(' facebook ') FROM src LIMIT 1; 'facebook'
trunc	trunc(date, fmt) / trunc(N,D) - Returns If input is date returns date with the time portion of the day truncated to the unit specified by the format model fmt. If you omit fmt, then date is truncated to the nearest day. It currently only supports 'MONTH'/'MON'/'MM', 'QUARTER'/'Q' and 'YEAR'/'YYYY'/'YY' as format.If input is a number group returns N truncated to D decimal places. If D is omitted, then N is truncated to 0 places.D can be negative to truncate (make zero) D digits left of the decimal point. date is a string in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. The time part of date is ignored. Example: > SELECT trunc('2009-02-12', 'MM'); OK '2009-02-01' > SELECT trunc('2017-03-15', 'Q'); OK '2017-01-01' > SELECT trunc('2015-10-27', 'YEAR'); OK '2015-01-01' > SELECT trunc(1234567891.1234567891,4); OK 1234567891.1234 > SELECT trunc(1234567891.1234567891,-4); OK 1234560000 > SELECT trunc(1234567891.1234567891,0); OK 1234567891 > SELECT trunc(1234567891.1234567891); OK 1234567891
ucase	ucase(str) - Returns str with all characters changed to uppercase Synonyms: upper Example: > SELECT ucase('Facebook') FROM src LIMIT 1; 'FACEBOOK'
unbase64	unbase64(str) - Convert the argument from a base 64 string to binary
unhex	unhex(str) - Converts hexadecimal argument to binary Performs the inverse operation of HEX(str). That is, it interprets each pair of hexadecimal digits in the argument as a number and converts it to the byte representation of the number. The resulting characters are returned as a binary string. Example: > SELECT DECODE(UNHEX('4D7953514C'), 'UTF-8') from src limit 1; 'MySQL' The characters in the argument string must be legal hexadecimal digits: '0' .. '9', 'A' .. 'F', 'a' .. 'f'. If UNHEX() encounters any nonhexadecimal digits in the argument, it returns NULL. Also, if there are an odd number of characters a leading 0 is appended.
unix_timestamp	unix_timestamp(date[, pattern]) - Converts the time to a number Converts the specified time to number of seconds since 1970-01-01. The unix_timestamp(void) overload is deprecated, use current_timestamp.
upper	upper(str) - Returns str with all characters changed to uppercase Synonyms: ucase Example: > SELECT upper('Facebook') FROM src LIMIT 1; 'FACEBOOK'
uuid	uuid() - Returns a universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string. Example: > SELECT uuid(); '0baf1f52-53df-487f-8292-99a03716b688' > SELECT uuid(); '36718a53-84f5-45d6-8796-4f79983ad49d'
var_pop	var_pop(x) - Returns the variance of a set of numbers Synonyms: variance
var_samp	var_samp(x) - Returns the sample variance of a set of numbers
variance	variance(x) - Returns the variance of a set of numbers Synonyms: var_pop
version	version() - Returns the Hive build version string - includes base version and revision.
weekofyear	weekofyear(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days. Examples: > SELECT weekofyear('2008-02-20') FROM src LIMIT 1; 8 > SELECT weekofyear('1980-12-31 12:59:59') FROM src LIMIT 1; 1
when	CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END - When a = true, returns b; when c = true, return d; else return e Example: SELECT CASE WHEN deptno=1 THEN Engineering WHEN deptno=2 THEN Finance ELSE admin END, CASE WHEN zone=7 THEN Americas ELSE Asia-Pac END FROM emp_details
windowingtablefunction
xpath	xpath(xml, xpath) - Returns a string array of values within xml nodes that match the xpath expression Example: > SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>', 'a/text()') FROM src LIMIT 1 [] > SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>', 'a/b/text()') FROM src LIMIT 1 ["b1","b2","b3"] > SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>', 'a/c/text()') FROM src LIMIT 1 ["c1","c2"]
xpath_boolean	xpath_boolean(xml, xpath) - Evaluates a boolean xpath expression Example: > SELECT xpath_boolean('<a><b>1</b></a>','a/b') FROM src LIMIT 1; true > SELECT xpath_boolean('<a><b>1</b></a>','a/b = 2') FROM src LIMIT 1; false
xpath_double	xpath_double(xml, xpath) - Returns a double value that matches the xpath expression Synonyms: xpath_number Example: > SELECT xpath_double('<a><b>1</b><b>2</b></a>','sum(a/b)') FROM src LIMIT 1; 3.0
xpath_float	xpath_float(xml, xpath) - Returns a float value that matches the xpath expression Example: > SELECT xpath_float('<a><b>1</b><b>2</b></a>','sum(a/b)') FROM src LIMIT 1; 3.0
xpath_int	xpath_int(xml, xpath) - Returns an integer value that matches the xpath expression Example: > SELECT xpath_int('<a><b>1</b><b>2</b></a>','sum(a/b)') FROM src LIMIT 1; 3
xpath_long	xpath_long(xml, xpath) - Returns a long value that matches the xpath expression Example: > SELECT xpath_long('<a><b>1</b><b>2</b></a>','sum(a/b)') FROM src LIMIT 1; 3
xpath_number	xpath_number(xml, xpath) - Returns a double value that matches the xpath expression Synonyms: xpath_double Example: > SELECT xpath_number('<a><b>1</b><b>2</b></a>','sum(a/b)') FROM src LIMIT 1; 3.0
xpath_short	xpath_short(xml, xpath) - Returns a short value that matches the xpath expression Example: > SELECT xpath_short('<a><b>1</b><b>2</b></a>','sum(a/b)') FROM src LIMIT 1; 3
xpath_string	xpath_string(xml, xpath) - Returns the text contents of the first xml node that matches the xpath expression Example: > SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c') FROM src LIMIT 1; 'cc' > SELECT xpath_string('<a><b>b1</b><b>b2</b></a>','a/b') FROM src LIMIT 1; 'b1' > SELECT xpath_string('<a><b>b1</b><b>b2</b></a>','a/b[2]') FROM src LIMIT 1; 'b2' > SELECT xpath_string('<a><b>b1</b><b>b2</b></a>','a') FROM src LIMIT 1; 'b1b2'
year	year(param) - Returns the year component of the date/timestamp/interval param can be one of: 1. A string in the format of 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. 2. A date value 3. A timestamp value 4. A year-month interval valueExample: > SELECT year('2009-07-30') FROM src LIMIT 1; 2009