---------------------------------------------------! a - Logical not
Synonyms: not---------------------------------------------------
a != b -ReturnsTRUEif a isnot equal to b
Synonyms: <>---------------------------------------------------
a % b -Returns the remainder when dividing a by b
---------------------------------------------------
a & b - Bitwise and
Example:
>SELECT3&5FROM src LIMIT1;1---------------------------------------------------Function'desc_functions.txt' does not exist.---------------------------------------------------Function'desc.sh' does not exist.---------------------------------------------------Function'functions.txt' does not exist.---------------------------------------------------
a + b -Returns a+b
---------------------------------------------------
a - b -Returns the difference a-b
---------------------------------------------------
a / b - Divide a by b
Example:
>SELECT3/2FROM src LIMIT1;1.5---------------------------------------------------
a < b -ReturnsTRUEif a is less than b
---------------------------------------------------
a <= b -ReturnsTRUEif a isnot greater than b
---------------------------------------------------
a <=> b -Returns same result with EQUAL(=) operator for non-null operands, but returnsTRUEif both are NULL,FALSEif one of the them isNULL---------------------------------------------------
a <> b -ReturnsTRUEif a isnot equal to b
Synonyms: !=---------------------------------------------------
a = b -ReturnsTRUEif a equals b andfalse otherwise
Synonyms: ==---------------------------------------------------
a == b -ReturnsTRUEif a equals b andfalse otherwise
Synonyms: =---------------------------------------------------
a > b -ReturnsTRUEif a is greater than b
---------------------------------------------------
a >= b -ReturnsTRUEif a isnot smaller than b
---------------------------------------------------
a ^ b - Bitwise exclusive or
Example:
>SELECT3^5FROM src LIMIT1;2---------------------------------------------------
abs(x)-returns the absolute valueof x
Example:
>SELECT abs(0)FROM src LIMIT1;0>SELECT abs(-5)FROM src LIMIT1;5---------------------------------------------------
acos(x)-returns the arc cosine of x if-1<=x<=1orNULL otherwise
Example:
>SELECT acos(1)FROM src LIMIT1;0>SELECT acos(2)FROM src LIMIT1;NULL---------------------------------------------------
add_months(start_date, num_months)-Returns the date that is num_months after start_date.
start_date is a string in the format 'yyyy-MM-dd HH:mm:ss'or'yyyy-MM-dd'. num_months is a number. The time part of start_date is ignored.
Example:
>SELECT add_months('2009-08-31',1)FROM src LIMIT1;'2009-09-30'---------------------------------------------------
a and b - Logical and---------------------------------------------------
array(n0, n1...)- Creates an array with the given elements
---------------------------------------------------
array_contains(array,value)-ReturnsTRUEif the array containsvalue.
Example:
>SELECT array_contains(array(1,2,3),2)FROM src LIMIT1;true---------------------------------------------------
ascii(str)-returns the numericvalueof the firstcharacterof str
Returns0if str is empty orNULLif str isNULL
Example:
>SELECT ascii('222')FROM src LIMIT1;50>SELECT ascii(2)FROM src LIMIT1;50---------------------------------------------------
asin(x)-returns the arc sine of x if-1<=x<=1orNULL otherwise
Example:
>SELECT asin(0)FROM src LIMIT1;0>SELECT asin(2)FROM src LIMIT1;NULL---------------------------------------------------
assert_true(condition)- Throw an exception if'condition'isnottrue.
Example:
>SELECT assert_true(x >=0)FROM src LIMIT1;NULL---------------------------------------------------
atan(x)-returns the atan (arctan)of x (x isin radians)
Example:
>SELECT atan(0)FROM src LIMIT1;0---------------------------------------------------avg(x)-Returns the mean of a setof numbers
---------------------------------------------------
base64(bin)-Convert the argument frombinaryto a base 64 string
---------------------------------------------------between a [NOT]BETWEEN b AND c - evaluate if a is[not]inbetween b and c
---------------------------------------------------
bin(n)-returns n inbinary
n is a BIGINT.ReturnsNULLif n isNULL.
Example:
>SELECT bin(13)FROM src LIMIT1'1101'---------------------------------------------------CASE a WHEN b THEN c [WHEN d THEN e]*[ELSE f]END-When a = b,returns c;when a = d,return e;elsereturn f
Example:
SELECTCASE deptno
WHEN1THEN Engineering
WHEN2THEN Finance
ELSE admin
END,CASE zone
WHEN7THEN Americas
ELSE Asia-Pac
ENDFROM emp_details
---------------------------------------------------
cbrt(double)-Returns the cube root of a doublevalue.
Example:
>SELECT cbrt(27.0);3.0---------------------------------------------------
ceil(x)- Find the smallest integernot smaller than x
Synonyms: ceiling
Example:
>SELECT ceil(-0.1)FROM src LIMIT1;0>SELECT ceil(5)FROM src LIMIT1;5---------------------------------------------------
ceiling(x)- Find the smallest integernot smaller than x
Synonyms: ceil
Example:
>SELECT ceiling(-0.1)FROM src LIMIT1;0>SELECT ceiling(5)FROM src LIMIT1;5---------------------------------------------------coalesce(a1, a2,...)-Returns the first non-null argument
Example:
>SELECTcoalesce(NULL,1,NULL)FROM src LIMIT1;1---------------------------------------------------
collect_list(x)-Returns a list of objects with duplicates
---------------------------------------------------
collect_set(x)-Returns a setof objects withduplicate elements eliminated
---------------------------------------------------
compute_stats(x)-Returns the statistical summary of a setof primitive typevalues.---------------------------------------------------
concat(str1, str2,... strN)-returns the concatenation of str1, str2,... strN or concat(bin1, bin2,... binN)-returns the concatenation of bytes inbinarydata bin1, bin2,... binN
ReturnsNULLifany argument isNULL.
Example:
>SELECT concat('abc','def')FROM src LIMIT1;'abcdef'---------------------------------------------------
concat_ws(separator,[string | array(string)]+)-returns the concatenation of the strings separated by the separator.
Example:
>SELECT concat_ws('.','www', array('facebook','com'))FROM src LIMIT1;'www.facebook.com'---------------------------------------------------
context_ngrams(expr, array<string1, string2,...>, k, pf) estimates the top-k most frequent n-grams that fit into the specified context. The second parameter specifies a string of words that specify the positions of the n-gram elements,with a nullvalue standing infor a 'blank' that must be filled by an n-gram element.
The primary expression must be an array of strings,or an array of arrays of strings, such as the returntypeof the sentences() UDF. The second parameter specifies the context -- for example, array("i", "love", null) -- which would estimate the top 'k' words that follow the phrase "i love" in the primary expression. The optional fourth parameter 'pf' controls the memory used by the heuristic. Larger values will yield better accuracy, but use more memory. Example usage:SELECT context_ngrams(sentences(lower(review)), array("i","love",null,null),10)FROM movies
would attempt to determine the 10 most common two-word phrases that follow "i love"in a databaseof free-form naturallanguage movie reviews.---------------------------------------------------
conv(num, from_base, to_base)-convert num from from_base to to_base
If to_base is negative, treat num as a signed integer,otherwise, treat it as an unsignedinteger.
Example:
>SELECT conv('100',2,10)FROM src LIMIT1;'4'>SELECT conv(-10,16,-10)FROM src LIMIT1;'16'---------------------------------------------------
corr(x,y)-Returns the Pearson coefficient of correlation
between a setof number pairs
The function takes as arguments any pair ofnumerictypesandreturns a double.Any pair with a NULLis ignored.If the functionis applied to an empty setor
a singleton set,NULL will be returned. Otherwise, it computes the following:
COVAR_POP(x,y)/(STDDEV_POP(x)*STDDEV_POP(y))where neither x nor y isnull,
COVAR_POP is the population covariance,and STDDEV_POP is the population standard deviation.---------------------------------------------------
cos(x)-returns the cosine of x (x isin radians)
Example:
>SELECT cos(0)FROM src LIMIT1;1---------------------------------------------------count(*)-Returns the total number of retrieved rows, including rows containing NULLvalues.count(expr)-Returns the number ofrowsfor which the supplied expression is non-NULL.count(DISTINCT expr[, expr...])-Returns the number ofrowsfor which the supplied expression(s) are uniqueand non-NULL.---------------------------------------------------
covar_pop(x,y)-Returns the population covariance of a setof number pairs
The function takes as arguments any pair ofnumerictypesandreturns a double.Any pair with a NULLis ignored.If the functionis applied to an empty set,NULL
will be returned. Otherwise, it computes the following:
(SUM(x*y)-SUM(x)*SUM(y)/COUNT(x,y))/COUNT(x,y)where neither x nor y isnull.---------------------------------------------------
covar_samp(x,y)-Returns the sample covariance of a setof number pairs
The function takes as arguments any pair ofnumerictypesandreturns a double.Any pair with a NULLis ignored.If the functionis applied to an empty set,NULL
will be returned. Otherwise, it computes the following:
(SUM(x*y)-SUM(x)*SUM(y)/COUNT(x,y))/(COUNT(x,y)-1)where neither x nor y isnull.---------------------------------------------------
create_union(tag, obj1, obj2, obj3,...)- Creates a unionwith the object for given tag
Example:
>SELECT create_union(1,1,"one")FROM src LIMIT1;
one
---------------------------------------------------
There isno documentation forfunction'cume_dist'---------------------------------------------------
current_database()-returns currently usingdatabase name
---------------------------------------------------current_user()-Returnscurrentuser name
SessionState UserFromAuthenticator
---------------------------------------------------
date_add(start_date, num_days)-Returns the date that is num_days after start_date.
start_date is a string in the format 'yyyy-MM-dd HH:mm:ss'or'yyyy-MM-dd'. num_days is a number. The time part of start_date is ignored.
Example:
>SELECT date_add('2009-07-30',1)FROM src LIMIT1;'2009-07-31'---------------------------------------------------
date_format(date/timestamp/string, fmt)- converts a date/timestamp/string to a valueof string in the format specified by the date format fmt.
Supported formats are SimpleDateFormat formats - https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html. Second argument fmt should be constant.
Example: >SELECT date_format('2015-04-08','y');'2015'---------------------------------------------------
date_sub(start_date, num_days)-Returns the date that is num_days before start_date.
start_date is a string in the format 'yyyy-MM-dd HH:mm:ss'or'yyyy-MM-dd'. num_days is a number. The time part of start_date is ignored.
Example:
>SELECT date_sub('2009-07-30',1)FROM src LIMIT1;'2009-07-29'---------------------------------------------------
datediff(date1, date2)-Returns the number of days between date1 and date2
date1 and date2 are strings in the format 'yyyy-MM-dd HH:mm:ss'or'yyyy-MM-dd'. The time parts are ignored.If date1 is earlier than date2, the result is negative.
Example:
>SELECT datediff('2009-07-30','2009-07-31')FROM src LIMIT1;1---------------------------------------------------day(param)-Returns the dayof the monthofdate/timestamp,orday component ofinterval
Synonyms: dayofmonth
param can be one of:
1. A string in the format of'yyyy-MM-dd HH:mm:ss'or'yyyy-MM-dd'.2. A datevalue3. A timestampvalue4. A day-timeinterval valueExample:
>SELECTday('2009-07-30')FROM src LIMIT1;30---------------------------------------------------
dayofmonth(param)-Returns the dayof the monthofdate/timestamp,orday component ofinterval
Synonyms: day
param can be one of:
1. A string in the format of'yyyy-MM-dd HH:mm:ss'or'yyyy-MM-dd'.2. A datevalue3. A timestampvalue4. A day-timeinterval valueExample:
>SELECT dayofmonth('2009-07-30')FROM src LIMIT1;30---------------------------------------------------
decode(bin, str)- Decode the first argument using the second argument characterset
Possible options for the characterset are 'US_ASCII','ISO-8859-1','UTF-8','UTF-16BE','UTF-16LE',and'UTF-16'.If either argument
isnull, the result will also be null---------------------------------------------------
degrees(x)- Converts radians to degrees
Example:
>SELECT degrees(30)FROM src LIMIT1;-1---------------------------------------------------
There isno documentation forfunction'dense_rank'---------------------------------------------------
a div b - Divide a by b rounded to the long integer
Example:
>SELECT3div2FROM src LIMIT1;1---------------------------------------------------
e()-returns E
Example:
>SELECT e()FROM src LIMIT1;2.718281828459045---------------------------------------------------
elt(n, str1, str2,...)-returns the n-th string
Example:
>SELECT elt(1,'face','book')FROM src LIMIT1;'face'---------------------------------------------------
encode(str, str)- Encode the first argument using the second argument characterset
Possible options for the characterset are 'US_ASCII','ISO-8859-1','UTF-8','UTF-16BE','UTF-16LE',and'UTF-16'.If either argument
isnull, the result will also be null---------------------------------------------------
ewah_bitmap(expr)-Returns an EWAH-compressed bitmap representation of a column.---------------------------------------------------
ewah_bitmap_and(b1, b2)-Return an EWAH-compressed bitmap that is the bitwise ANDof two bitmaps.---------------------------------------------------
ewah_bitmap_empty(bitmap)- Predicate that tests whether an EWAH-compressed bitmap isall zeros
---------------------------------------------------
ewah_bitmap_or(b1, b2)-Return an EWAH-compressed bitmap that is the bitwise ORof two bitmaps.---------------------------------------------------
exp(x)-Returns e to the power of x
Example:
>SELECT exp(0)FROM src LIMIT1;1---------------------------------------------------
explode(a)- separates the elements of array a into multiple rows,or the elements of a map into multiple rowsandcolumns---------------------------------------------------
factorial(int)-Returns n factorial. Valid n is[0..20].Returnsnullif n isoutof[0..20] range.
Example:
>SELECT factorial(5);120---------------------------------------------------
field(str, str1, str2,...)-returns the indexof str in the str1,str2,... list or0ifnot found
All primitive types are supported, arguments are compared using str.equals(x).If str isNULL, the returnvalueis0.---------------------------------------------------
find_in_set(str,str_array)-Returns the first occurrence of str in str_array where str_array is a comma-delimited string.Returnsnullif either argument isnull.Returns0if the first argument has any commas.
Example:
>SELECT find_in_set('ab','abc,b,ab,c,def')FROM src LIMIT1;3>SELECT*FROM src1 WHERENOT find_in_set(key,'311,128,345,956')=0;311 val_311
128---------------------------------------------------
There isno documentation forfunction'first_value'---------------------------------------------------
floor(x)- Find the largest integernot greater than x
Example:
>SELECT floor(-0.1)FROM src LIMIT1;-1>SELECT floor(5)FROM src LIMIT1;5---------------------------------------------------
format_number(X, D)- Formats the number X to a format like'#,###,###.##', rounded to D decimal places,andreturns the result as a string.If D is0, the result has nodecimalpointor fractional part. This is supposed tofunctionlike MySQL's FORMAT
Example:
> SELECT format_number(12332.123456, 4) FROM src LIMIT 1;
'12,332.1235'
---------------------------------------------------
from_unixtime(unix_time, format) - returns unix_time in the specified format
Example:
> SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ss') FROM src LIMIT 1;
'1970-01-0100:00:00'
---------------------------------------------------
from_utc_timestamp(timestamp, string timezone) - Assumes given timestamp is UTC and converts to given timezone (as of Hive 0.8.0)
---------------------------------------------------
get_json_object(json_txt, path) - Extract a json object from path
Extract json object from a json string based on json path specified, and return json string of the extracted json object. It will return null if the input json string is invalid.
A limited version of JSONPath supported:
$ : Root object
. : Child operator
[] : Subscript operator for array
* : Wildcard for []
Syntax not supported that's worth noticing:
'' : Zero length string askey.. : Recursive descent
&#064; : Current object/element() : Script expression
?() : Filter (script) expression.[,] : Union operator
[start:end:step] : array slice operator
---------------------------------------------------
greatest(v1, v2,...)-Returns the greatest valuein a list ofvalues
Example:
>SELECT greatest(2,3,1)FROM src LIMIT1;3---------------------------------------------------hash(a1, a2,...)-Returns a hashvalueof the arguments
---------------------------------------------------
hex(n, bin,or str)-Convert the argument to hexadecimal
If the argument is a string,returns two hex digits for each characterin the string.If the argument is a number orbinary,returns the hexadecimal representation.
Example:
>SELECT hex(17)FROM src LIMIT1;'H1'>SELECT hex('Facebook')FROM src LIMIT1;'46616365626F6F6B'---------------------------------------------------
histogram_numeric(expr, nb)- Computes a histogram onnumeric'expr'using nb bins.
Example:
>SELECT histogram_numeric(val,3)FROM src;[{"x":100,"y":14.0},{"x":200,"y":22.0},{"x":290.5,"y":11.0}]
The returnvalueis an array of(x,y) pairs representing the centers of the histogram's bins. As the value of 'nb' is increased, the histogram approximationgets finer-grained, but may yield artifacts around outliers. In practice, 20-40 histogram bins appear to work well, with more bins being required for skewed or smaller datasets. Note that this function creates a histogram with non-uniform bin widths. It offers no guarantees in terms of the mean-squared-error of the histogram, but in practice is comparable to the histograms produced by the R/S-Plusstatistical computing packages.
---------------------------------------------------
hour(param) - Returns the hour componemnt of the string/timestamp/interval
param can be one of:
1. A string in the format of 'yyyy-MM-dd HH:mm:ss' or 'HH:mm:ss'.
2. A timestamp value
3. A day-time interval valueExample:
> SELECT hour('2009-07-3012:58:59') FROM src LIMIT 1;
12
> SELECT hour('12:58:59') FROM src LIMIT 1;
12
---------------------------------------------------
IF(expr1,expr2,expr3) - If expr1 is TRUE (expr1 <> 0 and expr1 <> NULL) then IF() returns expr2; otherwise it returns expr3. IF() returns a numeric or string value, depending on the context in which it is used.
---------------------------------------------------
test in(val1, val2...) - returns true if test equals any valN
---------------------------------------------------
in_file(str, filename) - Returns true if str appears in the file
---------------------------------------------------
index(a, n) - Returns the n-th element of a
---------------------------------------------------
initcap(str) - Returns str, with the first letter of each word in uppercase, all other letters in lowercase. Words are delimited by white space.
Example:
> SELECT initcap('tHe soap') FROM src LIMIT 1;
'The Soap'
---------------------------------------------------
inline( ARRAY( STRUCT()[,STRUCT()] - explodes and array and struct into a table
---------------------------------------------------
instr(str, substr) - Returns the index of the first occurance of substr in str
Example:
> SELECT instr('Facebook', 'boo') FROM src LIMIT 1;
5
---------------------------------------------------
isnotnull a - Returns true if a is not NULL and false otherwise
---------------------------------------------------
isnull a - Returns true if a is NULL and false otherwise
---------------------------------------------------
java_method(class,method[,arg1[,arg2..]]) calls method with reflection
Synonyms: reflect
Use this UDF to call Java methods by matching the argument signature
---------------------------------------------------
json_tuple(jsonStr, p1, p2, ..., pn) - like get_json_object, but it takes multiple names and return a tuple. All the input parameters and output column types are string.
---------------------------------------------------
LAG (scalar_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause); The LAG function is used to access data from a previous row.
Example:
select p1.p_mfgr, p1.p_name, p1.p_size,
p1.p_size - lag(p1.p_size,1,p1.p_size) over( distribute by p1.p_mfgr sort by p1.p_name) as deltaSz
from part p1 join part p2 on p1.p_partkey = p2.p_partkey
---------------------------------------------------
last_day(date) - Returns the last day of the month which the date belongs to.
date is a string in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. The time part of date is ignored.
Example:
> SELECT last_day('2009-01-12') FROM src LIMIT 1;
'2009-01-31'
---------------------------------------------------
There is no documentation for function 'last_value'
---------------------------------------------------
lcase(str) - Returns str with all characters changed to lowercase
Synonyms: lower
Example:
> SELECT lcase('Facebook') FROM src LIMIT 1;
'facebook'
---------------------------------------------------
LEAD (scalar_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause); The LEAD function is used to return data from the next row.
Example:
select p_name, p_retailprice, lead(p_retailprice) over() as l1,
lag(p_retailprice) over() as l2
from part
where p_retailprice = 1173.15
---------------------------------------------------
least(v1, v2, ...) - Returns the least value in a list of values
Example:
> SELECT least(2, 3, 1) FROM src LIMIT 1;
1
---------------------------------------------------
length(str | binary) - Returns the length of str or number of bytes in binary data
Example:
> SELECT length('Facebook') FROM src LIMIT 1;
8
---------------------------------------------------
levenshtein(str1, str2) - This function calculates the Levenshtein distance between two strings.
Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.Example:
> SELECT levenshtein('kitten', 'sitting');
3
---------------------------------------------------
like(str, pattern) - Checks if str matches pattern
Example:
> SELECT a.* FROM srcpart a WHERE a.hr like '%2' LIMIT 1;
27 val_27 2008-04-08 12
---------------------------------------------------
ln(x) - Returns the natural logarithm of x
Example:
> SELECT ln(1) FROM src LIMIT 1;
0
---------------------------------------------------
locate(substr, str[, pos]) - Returns the position of the first occurance of substr in str after position pos
Example:
> SELECT locate('bar', 'foobarbar', 5) FROM src LIMIT 1;
7
---------------------------------------------------
log([b], x) - Returns the logarithm of x with base b
Example:
> SELECT log(13, 13) FROM src LIMIT 1;
1
---------------------------------------------------
log10(x) - Returns the logarithm of x with base 10
Example:
> SELECT log10(10) FROM src LIMIT 1;
1
---------------------------------------------------
log2(x) - Returns the logarithm of x with base 2
Example:
> SELECT log2(2) FROM src LIMIT 1;
1
---------------------------------------------------
lower(str) - Returns str with all characters changed to lowercase
Synonyms: lcase
Example:
> SELECT lower('Facebook') FROM src LIMIT 1;
'facebook'
---------------------------------------------------
lpad(str, len, pad) - Returns str, left-padded with pad to a length of len
If str is longer than len, the return value is shortened to len characters.
Example:
> SELECT lpad('hi', 5, '??') FROM src LIMIT 1;
'???hi' > SELECT lpad('hi', 1, '??') FROM src LIMIT 1;
'h'
---------------------------------------------------
ltrim(str) - Removes the leading space characters from str
Example:
> SELECT ltrim(' facebook') FROM src LIMIT 1;
'facebook'
---------------------------------------------------
map(key0, value0, key1, value1...) - Creates a map with the given key/value pairs
---------------------------------------------------
map_keys(map) - Returns an unordered array containing the keys of the input map.
---------------------------------------------------
map_values(map) - Returns an unordered array containing the values of the input map.
---------------------------------------------------
There is no documentation for function 'matchpath'
---------------------------------------------------
max(expr) - Returns the maximum value of expr
---------------------------------------------------
min(expr) - Returns the minimum value of expr
---------------------------------------------------
minute(param) - Returns the minute component of the string/timestamp/interval
param can be one of:
1. A string in the format of 'yyyy-MM-dd HH:mm:ss' or 'HH:mm:ss'.
2. A timestamp value
3. A day-time interval valueExample:
> SELECT minute('2009-07-3012:58:59') FROM src LIMIT 1;
58
> SELECT minute('12:58:59') FROM src LIMIT 1;
58
---------------------------------------------------
month(param) - Returns the month component of the date/timestamp/interval
param can be one of:
1. A string in the format of 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'.
2. A date value
3. A timestamp value
4. A year-month interval valueExample:
> SELECT month('2009-07-30') FROM src LIMIT 1;
7
---------------------------------------------------
months_between(date1, date2) - returns number of months between dates date1 and date2
If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise the UDF calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2.
date1 and date2 type can be date, timestamp or string in the format 'yyyy-MM-dd' or 'yyyy-MM-dd HH:mm:ss'. The result is rounded to 8 decimal places.
Example:
> SELECT months_between('1997-02-2810:30:00', '1996-10-30');
3.94959677
---------------------------------------------------
named_struct(name1, val1, name2, val2, ...) - Creates a struct with the given field names and values
---------------------------------------------------
negative a - Returns -a
---------------------------------------------------
next_day(start_date, day_of_week) - Returns the first date which is later than start_date and named as indicated.
start_date is a string in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. day_of_week is day of the week (e.g. Mo, tue, FRIDAY).Example:
> SELECT next_day('2015-01-14', 'TU') FROM src LIMIT 1;
'2015-01-20'
---------------------------------------------------
ngrams(expr, n, k, pf) - Estimates the top-k n-grams in rows that consist of sequences of strings, represented as arrays of strings, or arrays of arrays of strings. 'pf' is an optional precision factor that controls memory usage.
The parameter 'n' specifies what type of n-grams are being estimated. Unigrams are n = 1, and bigrams are n = 2. Generally, n will not be greater than about 5. The 'k' parameter specifies how many of the highest-frequency n-grams will be returned by the UDAF. The optional precision factor 'pf' specifies how much memory to use for estimation; more memory will give more accurate frequency counts, but could crash the JVM. The default value is 20, which internally maintains 20*k n-grams, but only returns the k highest frequency ones. The output is an array of structs with the top-k n-grams. It might be convenient to explode() the output of this UDAF.
---------------------------------------------------
There is no documentation for function 'noop'
---------------------------------------------------
There is no documentation for function 'noopstreaming'
---------------------------------------------------
There is no documentation for function 'noopwithmap'
---------------------------------------------------
There is no documentation for function 'noopwithmapstreaming'
---------------------------------------------------
not a - Logical not
Synonyms: !
---------------------------------------------------
There is no documentation for function 'ntile'
---------------------------------------------------
nvl(value,default_value) - Returns default value if value is null else returns value
Example:
> SELECT nvl(null,'bla') FROM src LIMIT 1;
bla
---------------------------------------------------
a or b - Logical or
---------------------------------------------------
parse_url(url, partToExtract[, key]) - extracts a part from a URL
Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO
key specifies which query to extract
Example:
> SELECT parse_url('http://facebook.com/path/p1.php?query=1', 'HOST') FROM src LIMIT 1;'facebook.com'>SELECT parse_url('http://facebook.com/path/p1.php?query=1','QUERY')FROM src LIMIT1;'query=1'>SELECT parse_url('http://facebook.com/path/p1.php?query=1','QUERY','query')FROM src LIMIT1;'1'---------------------------------------------------
parse_url_tuple(url, partname1, partname2,..., partnameN)- extracts N (N>=1) parts from a URL.
It takes a URL and one or multiple partnames,andreturns a tuple.All the input parameters and output columntypes are string.
Partname: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY,FILE, USERINFO, QUERY:<KEY_NAME>
Note: Partnames are case-sensitive,and should not contain unnecessary white spaces.
Example:
>SELECT b.*FROM src LATERAL VIEW parse_url_tuple(fullurl,'HOST','PATH','QUERY','QUERY:id') b as host, path, query, query_id LIMIT1;>SELECT parse_url_tuple(a.fullurl,'HOST','PATH','QUERY','REF','PROTOCOL','FILE','AUTHORITY','USERINFO','QUERY:k1')as(ho, pa, qu, re, pr, fi, au, us, qk1)from src a;---------------------------------------------------
There isno documentation forfunction'percent_rank'---------------------------------------------------
percentile(expr, pc)-Returns the percentile(s)of expr at pc (range: [0,1]).pc can be a doubleordouble array
---------------------------------------------------
percentile_approx(expr, pc,[nb])-For very large data, computes an approximate percentile valuefrom a histogram,using the optional argument [nb]as the number of histogram bins touse. A higher valueof nb results in a more accurate approximation, at the cost of higher memory usage.'expr' can be anynumericcolumn, including doubles and floats,and'pc'is either a single double/floatwith a requested percentile,or an array ofdouble/floatwith multiple percentiles.If'nb'isnot specified, the default approximation is done with10,000 histogram bins, which means that if there are 10,000or fewer uniquevaluesin'expr', you can expect an exact result. The percentile()function always computes an exact percentile and can run outof memory if there are too many uniquevaluesin a column, which necessitates this function.
Example (three percentiles requested using a finer histogram approximation):
>SELECT percentile_approx(val, array(0.5,0.95,0.98),100000)FROM somedata;[0.05,1.64,2.26]---------------------------------------------------
pi()-returns pi
Example:
>SELECT pi()FROM src LIMIT1;3.14159...---------------------------------------------------
a pmod b -Compute the positive modulo
---------------------------------------------------
posexplode(a)- behaves like explode for arrays, but includes the position of items in the original array
---------------------------------------------------
positive a -Returns a
---------------------------------------------------
pow(x1, x2)- raise x1 to the power of x2
Synonyms: power
Example:
>SELECT pow(2,3)FROM src LIMIT1;8---------------------------------------------------
power(x1, x2)- raise x1 to the power of x2
Synonyms: pow
Example:
>SELECT power(2,3)FROM src LIMIT1;8---------------------------------------------------
printf(String format, Obj... args)-function that can format strings according to printf-style format strings
Example:
>SELECT printf("Hello World %d %s",100,"days")FROM src LIMIT1;"Hello World 100 days"---------------------------------------------------
radians(x)- Converts degrees to radians
Example:
>SELECT radians(90)FROM src LIMIT1;1.5707963267949mo
---------------------------------------------------
rand([seed])-Returns a pseudorandom number between0and1---------------------------------------------------
There isno documentation forfunction'rank'---------------------------------------------------
reflect(class,method[,arg1[,arg2..]]) calls method with reflection
Synonyms: java_method
Use this UDF tocall Java methods by matching the argument signature
---------------------------------------------------
reflect2(arg0,method[,arg1[,arg2..]]) calls method of arg0 with reflection
Use this UDF tocall Java methods by matching the argument signature
---------------------------------------------------
str regexpregexp-Returnstrueif str matches regexpandfalse otherwise
Synonyms: rlike
Example:
>SELECT'fb'regexp'.*'FROM src LIMIT1;true---------------------------------------------------
regexp_extract(str,regexp[, idx])- extracts a group that matches regexp
Example:
>SELECT regexp_extract('100-200','(\d+)-(\d+)',1)FROM src LIMIT1;'100'---------------------------------------------------
regexp_replace(str,regexp, rep)-replaceall substrings of str that matchregexpwith rep
Example:
>SELECT regexp_replace('100-200','(\d+)','num')FROM src LIMIT1;'num-num'---------------------------------------------------repeat(str, n)-repeat str n times
Example:
>SELECTrepeat('123',2)FROM src LIMIT1;'123123'---------------------------------------------------
reverse(str)- reverse str
Example:
>SELECT reverse('Facebook')FROM src LIMIT1;'koobecaF'---------------------------------------------------
str rlikeregexp-Returnstrueif str matches regexpandfalse otherwise
Synonyms: regexp
Example:
>SELECT'fb'rlike'.*'FROM src LIMIT1;true---------------------------------------------------round(x[, d])- round x to d decimal places
Example:
>SELECTround(12.3456,1)FROM src LIMIT1;12.3'
---------------------------------------------------
There is no documentation for function 'row_number'
---------------------------------------------------
rpad(str, len, pad) - Returns str, right-padded with pad to a length of len
If str is longer than len, the return value is shortened to len characters.
Example:
> SELECT rpad('hi', 5, '??') FROM src LIMIT 1;
'hi???' > SELECT rpad('hi', 1, '??') FROM src LIMIT 1;
'h'
---------------------------------------------------
rtrim(str) - Removes the trailing space characters from str
Example:
> SELECT rtrim('facebook ') FROM src LIMIT 1;
'facebook'
---------------------------------------------------
second(date) - Returns the second component of the string/timestamp/interval
param can be one of:
1. A string in the format of 'yyyy-MM-dd HH:mm:ss' or 'HH:mm:ss'.
2. A timestamp value
3. A day-time interval valueExample:
> SELECT second('2009-07-3012:58:59') FROM src LIMIT 1;
59
> SELECT second('12:58:59') FROM src LIMIT 1;
59
---------------------------------------------------
sentences(str, lang, country) - Splits str into arrays of sentences, where each sentence is an array of words. The 'lang' and'country' arguments are optional, and if omitted, the default locale is used.
Example:
> SELECT sentences('Hello there! I am a UDF.') FROM src LIMIT 1;
[ ["Hello", "there"], ["I", "am", "a", "UDF"] ]
> SELECT sentences(review, language) FROM movies;
Unnecessary punctuation, such as periods and commas in English, is automatically stripped. If specified, 'lang' should be a two-letter ISO-639 language code (such as 'en'), and 'country' should be a two-letter ISO-3166 code (such as 'us'). Not all country and language codes are fully supported, and if an unsupported code is specified, a default locale is used to process that string.
---------------------------------------------------
shiftleft(a, b) - Bitwise left shift
Returns int for tinyint, smallint and int a. Returns bigint for bigint a.
Example:
> SELECT shiftleft(2, 1);
4
---------------------------------------------------
shiftright(a, b) - Bitwise right shift
Returns int for tinyint, smallint and int a. Returns bigint for bigint a.
Example:
> SELECT shiftright(4, 1);
2
---------------------------------------------------
shiftrightunsigned(a, b) - Bitwise unsigned right shift
Returns int for tinyint, smallint and int a. Returns bigint for bigint a.
Example:
> SELECT shiftrightunsigned(4, 1);
2
---------------------------------------------------
sign(x) - returns the sign of x )
Example:
> SELECT sign(40) FROM src LIMIT 1;
1
---------------------------------------------------
sin(x) - returns the sine of x (x is in radians)
Example:
> SELECT sin(0) FROM src LIMIT 1;
0
---------------------------------------------------
size(a) - Returns the size of a
---------------------------------------------------
sort_array(array(obj1, obj2,...)) - Sorts the input array in ascending order according to the natural ordering of the array elements.
Example:
> SELECT sort_array(array('b', 'd', 'c', 'a')) FROM src LIMIT 1;
'a', 'b', 'c', 'd'
---------------------------------------------------
soundex(string) - Returns soundex code of the string.
The soundex code consist of the first letter of the name followed by three digits.
Example:
> SELECT soundex('Miller');
M460
---------------------------------------------------
space(n) - returns n spaces
Example:
> SELECT space(2) FROM src LIMIT 1;
''
---------------------------------------------------
split(str, regex) - Splits str around occurances that match regex
Example:
> SELECT split('oneAtwoBthreeC', '[ABC]') FROM src LIMIT 1;
["one", "two", "three"]
---------------------------------------------------
sqrt(x) - returns the square root of x
Example:
> SELECT sqrt(4) FROM src LIMIT 1;
2
---------------------------------------------------
stack(n, cols...) - turns k columns into n rows of size k/n each
---------------------------------------------------
std(x) - Returns the standard deviation of a set of numbers
Synonyms: stddev, stddev_pop
---------------------------------------------------
stddev(x) - Returns the standard deviation of a set of numbers
Synonyms: std, stddev_pop
---------------------------------------------------
stddev_pop(x) - Returns the standard deviation of a set of numbers
Synonyms: std, stddev
---------------------------------------------------
stddev_samp(x) - Returns the sample standard deviation of a set of numbers
---------------------------------------------------
str_to_map(text, delimiter1, delimiter2) - Creates a map by parsing text
Split text into key-value pairs using two delimiters. The first delimiter seperates pairs, and the second delimiter sperates key and value. If only one parameter is given, default delimiters are used: ',' as delimiter1 and '=' as delimiter2.
---------------------------------------------------
struct(col1, col2, col3, ...) - Creates a struct with the given field values
---------------------------------------------------
substr(str, pos[, len]) - returns the substring of str that starts at pos and is of length len orsubstr(bin, pos[, len]) - returns the slice of byte array that starts at pos and is of length len
Synonyms: substring
pos is a 1-based index. If pos<0 the starting position is determined by counting backwards from the end of str.
Example:
> SELECT substr('Facebook', 5) FROM src LIMIT 1;
'book'
> SELECT substr('Facebook', -5) FROM src LIMIT 1;
'ebook'
> SELECT substr('Facebook', 5, 1) FROM src LIMIT 1;
'b'
---------------------------------------------------
substring(str, pos[, len]) - returns the substring of str that starts at pos and is of length len orsubstring(bin, pos[, len]) - returns the slice of byte array that starts at pos and is of length len
Synonyms: substr
pos is a 1-based index. If pos<0 the starting position is determined by counting backwards from the end of str.
Example:
> SELECT substring('Facebook', 5) FROM src LIMIT 1;
'book'
> SELECT substring('Facebook', -5) FROM src LIMIT 1;
'ebook'
> SELECT substring('Facebook', 5, 1) FROM src LIMIT 1;
'b'
---------------------------------------------------
sum(x) - Returns the sum of a set of numbers
---------------------------------------------------
tan(x) - returns the tangent of x (x is in radians)
Example:
> SELECT tan(0) FROM src LIMIT 1;
1
---------------------------------------------------
to_date(expr) - Extracts the date part of the date or datetime expression expr
Example:
> SELECT to_date('2009-07-3004:17:52') FROM src LIMIT 1;
'2009-07-30'
---------------------------------------------------
to_unix_timestamp(date[, pattern]) - Returns the UNIX timestamp
Converts the specified time to number of seconds since 1970-01-01.
---------------------------------------------------
to_utc_timestamp(timestamp, string timezone) - Assumes given timestamp is in given timezone and converts to UTC (as of Hive 0.8.0)
---------------------------------------------------
translate(input, from, to) - translates the input string by replacing the characters present in the from string with the corresponding characters in the to string
translate(string input, string from, string to) is an equivalent function to translate in PostGreSQL. It works on a character by character basis on the input string (first parameter). A character in the input is checked for presence in the from string (second parameter). If a match happens, the character from to string (third parameter) which appears at the same index as the character in from string is obtained. This character is emitted in the output string instead of the original character from the input string. If the to string is shorter than the from string, there may not be a character present at the same index in the to string. In such a case, nothing is emitted for the original character and it's deleted from the output string.For example,
translate('abcdef','adc','19')returns'1b9ef' replacing 'a'with'1','d'with'9'and removing 'c'from the input string
translate('a b c d',' ','')return'abcd' removing all spaces from the input string
If the same characteris present multiple times in the input string, the first occurence of the characteris the one that's considered for matching. However, it is not recommended to have the same character more than once in the from string since it's not required and adds to confusion.For example,
translate('abcdef','ada','192')returns'1bc9ef' replaces 'a'with'1'and'd'with'9' ignoring the second occurence of'a'in the from string mapping it to'2'---------------------------------------------------
trim(str)- Removes the leading and trailing space characters from str
Example:
>SELECT trim(' facebook ')FROM src LIMIT1;'facebook'---------------------------------------------------
trunc(date, fmt)-Returnsreturnsdatewith the time portion of the day truncated to the unit specified by the format model fmt.If you omit fmt,thendateis truncated to the nearest day. It now only supports 'MONTH'/'MON'/'MM'and'YEAR'/'YYYY'/'YY'as format.dateis a string in the format 'yyyy-MM-dd HH:mm:ss'or'yyyy-MM-dd'. The time part ofdateis ignored.
Example:
>SELECT trunc('2009-02-12','MM');
OK
'2009-02-01'>SELECT trunc('2015-10-27','YEAR');
OK
'2015-01-01'---------------------------------------------------ucase(str)-Returns str withall characters changed to uppercase
Synonyms: upper
Example:
>SELECTucase('Facebook')FROM src LIMIT1;'FACEBOOK'---------------------------------------------------
unbase64(str)-Convert the argument from a base 64 string tobinary---------------------------------------------------
unhex(str)- Converts hexadecimal argument tobinary
Performs the inverse operation of HEX(str). That is, it interprets
each pair of hexadecimal digits in the argument as a number and
converts it to the byte representation of the number. The
resulting characters are returned as a binary string.
Example:
>SELECT DECODE(UNHEX('4D7953514C'),'UTF-8')from src limit1;'MySQL'
The characters in the argument string must be legal hexadecimal
digits: '0'..'9','A'..'F','a'..'f'.If UNHEX() encounters
any nonhexadecimal digits in the argument, it returnsNULL. Also,if there are an odd number of characters a leading 0is appended.---------------------------------------------------
unix_timestamp([date[, pattern]])-Returns the UNIX timestamp
Converts the currentor specified timeto number of seconds since 1970-01-01.---------------------------------------------------
upper(str)-Returns str withall characters changed to uppercase
Synonyms: ucase
Example:
>SELECT upper('Facebook')FROM src LIMIT1;'FACEBOOK'---------------------------------------------------
var_pop(x)-Returns the variance of a setof numbers
Synonyms: variance
---------------------------------------------------
var_samp(x)-Returns the sample variance of a setof numbers
---------------------------------------------------
variance(x)-Returns the variance of a setof numbers
Synonyms: var_pop
---------------------------------------------------
weekofyear(date)-Returns the week of the yearof the given date. A week is considered tostarton a Monday and week 1is the first week with>3 days.
Examples:
>SELECT weekofyear('2008-02-20')FROM src LIMIT1;8>SELECT weekofyear('1980-12-31 12:59:59')FROM src LIMIT1;1---------------------------------------------------CASEWHEN a THEN b [WHEN c THEN d]*[ELSE e]END-When a =true,returns b;when c =true,return d;elsereturn e
Example:
SELECTCASEWHEN deptno=1THEN Engineering
WHEN deptno=2THEN Finance
ELSE admin
END,CASEWHEN zone=7THEN Americas
ELSE Asia-Pac
ENDFROM emp_details
---------------------------------------------------
There isno documentation forfunction'windowingtablefunction'---------------------------------------------------
xpath(xml, xpath)-Returns a string array ofvalueswithin xml nodes that match the xpath expression
Example:
>SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/text()')FROM src LIMIT1[]>SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b/text()')FROM src LIMIT1["b1","b2","b3"]>SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/c/text()')FROM src LIMIT1["c1","c2"]---------------------------------------------------
xpath_boolean(xml, xpath)- Evaluates a boolean xpath expression
Example:
>SELECT xpath_boolean('<a><b>1</b></a>','a/b')FROM src LIMIT1;true>SELECT xpath_boolean('<a><b>1</b></a>','a/b = 2')FROM src LIMIT1;false---------------------------------------------------
xpath_double(xml, xpath)-Returns a doublevalue that matches the xpath expression
Synonyms: xpath_number
Example:
>SELECT xpath_double('<a><b>1</b><b>2</b></a>','sum(a/b)')FROM src LIMIT1;3.0---------------------------------------------------
xpath_float(xml, xpath)-Returns a floatvalue that matches the xpath expression
Example:
>SELECT xpath_float('<a><b>1</b><b>2</b></a>','sum(a/b)')FROM src LIMIT1;3.0---------------------------------------------------
xpath_int(xml, xpath)-Returns an integervalue that matches the xpath expression
Example:
>SELECT xpath_int('<a><b>1</b><b>2</b></a>','sum(a/b)')FROM src LIMIT1;3---------------------------------------------------
xpath_long(xml, xpath)-Returns a long value that matches the xpath expression
Example:
>SELECT xpath_long('<a><b>1</b><b>2</b></a>','sum(a/b)')FROM src LIMIT1;3---------------------------------------------------
xpath_number(xml, xpath)-Returns a doublevalue that matches the xpath expression
Synonyms: xpath_double
Example:
>SELECT xpath_number('<a><b>1</b><b>2</b></a>','sum(a/b)')FROM src LIMIT1;3.0---------------------------------------------------
xpath_short(xml, xpath)-Returns a short value that matches the xpath expression
Example:
>SELECT xpath_short('<a><b>1</b><b>2</b></a>','sum(a/b)')FROM src LIMIT1;3---------------------------------------------------
xpath_string(xml, xpath)-Returns the text contents of the first xml node that matches the xpath expression
Example:
>SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c')FROM src LIMIT1;'cc'>SELECT xpath_string('<a><b>b1</b><b>b2</b></a>','a/b')FROM src LIMIT1;'b1'>SELECT xpath_string('<a><b>b1</b><b>b2</b></a>','a/b[2]')FROM src LIMIT1;'b2'>SELECT xpath_string('<a><b>b1</b><b>b2</b></a>','a')FROM src LIMIT1;'b1b2'---------------------------------------------------year(param)-Returns the year component of the date/timestamp/interval
param can be one of:
1. A string in the format of'yyyy-MM-dd HH:mm:ss'or'yyyy-MM-dd'.2. A datevalue3. A timestampvalue4. A year-monthinterval valueExample:
>SELECTyear('2009-07-30')FROM src LIMIT1;2009---------------------------------------------------
a | b - Bitwise or
Example:
>SELECT3|5FROM src LIMIT1;7---------------------------------------------------~ n - Bitwise not
Example:
>SELECT~0FROM src LIMIT1;-1