Hive的预定义UDF函数

 

Hive的预定义UDF函数列表如下,主要来自于describe function xxx和代码注释的翻译:)

  • abs(x) - returns the absolute value of x
  • acos(x) - returns the arc cosine of x if -1<=x<=1 or NULL otherwise
  • ascii(str) - returns the numeric value of the first character of str
  • asin(x) - returns the arc sine of x if -1<=x<=1 or NULL otherwise
  • bin(n) - returns n in binary

                        转换成二进制数

  • ceil(x) - Find the smallest integer not smaller than x
  • ceiling(x) - Find the smallest integer not smaller than x
  • coalesce(a1, a2, ...) - Returns the first non-null argument

                        返回第一个非空的参数

  • concat(str1, str2) - returns the concatenation of str1 and str2
  • conv(num, from_base, to_base) - convert num from from_base to to_base

                         转换进制,从from_base进制转换成to_base

  • cos(x) - returns the cosine of x (x is in radians)
  • date_add(start_date, num_days) - Returns the date that is num_days after start_date.

                         按天做日期加,日期的格式为'yyyy-MM-dd HH:mm:ss'或'yyyy-MM-dd',下同

  • date_sub(start_date, num_days) - Returns the date that is num_days before start_date.

                        按天做日期减

  • datediff(date1, date2) - Returns the number of days between date1 and date2

                         按天做日期差

  • day(date) - Returns the date of the month of date

                         同dayofmonth

  • dayofmonth(date) - Returns the date of the month of date
  • elt(n, str1, str2, ...) - returns the n-th string
  • exp(x) - Returns e to the power of x
  • floor(x) - Find the largest integer not greater than x
  • from_unixtime(unix_time, format) - returns unix_time in the specified format
  • get_json_object(json_txt, path) - Extract a json object from path

解析json object。

path支持JSONPath的一个子集,包括:

  * $ : Root object

  *  . : Child operator

  * [] : Subscript operator for array

  * * : Wildcard for []

例如,src_json 表只包含一列json,其中的一行内容为:

 +-------------------------------------------------------------------+
                                json
 +-------------------------------------------------------------------+
 {"store":
   {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
    "bicycle":{"price":19.95,"color":"red"}
   },
  "email":"amy@only_for_json_udf_test.net",
  "owner":"amy"
 }
 +-------------------------------------------------------------------+

json内容可以用以下查询语句解析

 hive> SELECT get_json_object(src_json.json, '$.owner') FROM src_json;
 amy
 hive> SELECT get_json_object(src_json.json, '$.store.fruit[0]') FROM src_json;
 {"weight":8,"type":"apple"}
 hive> SELECT get_json_object(src_json.json, '$.non_exist_key') FROM src_json;
 NULL 
  • hash(a1, a2, ...) - Returns a hash value of the arguments
  • hex(n or str) - Convert the argument to hexadecimal
  • index(a, n) - Returns the n-th element of a
  • instr(str, substr) - Returns the index of the first occurance of substr in str
  • isnotnull a - Returns true if a is not NULL and false otherwise
  • isnull a - Returns true if a is NULL and false otherwise
  • lcase(str) - Returns str with all characters changed to lowercase
  • length(str) - Returns the length of str
  • ln(x) - Returns the natural logarithm of x
  • locate(substr, str[, pos]) - Returns the position of the first occurance of substr in str after position pos
  • log([b], x) - Returns the logarithm of x with base b
  • log10(x) - Returns the logarithm of x with base 10
  • log2(x) - Returns the logarithm of x with base 2
  • lower(str) - Returns str with all characters changed to lowercase
  • lpad(str, len, pad) - Returns str, left-padded with pad to a length of len
  • ltrim(str) - Removes the leading space characters from str
  • month(date) - Returns the month of date
  • negative a - Returns -a
  • parse_url(url, partToExtract[, key]) - extracts a part from a URL

解析URL字符串,partToExtract的选项包含[HOST,PATH,QUERY,REF,PROTOCOL,FILE,AUTHORITY,USERINFO]。

例如, * parse_url('http://facebook.com/path/p1.php?query=1', 'HOST')返回'facebook.com' * parse_url('http://facebook.com/path/p1.php?query=1', 'PATH')返回'/path/p1.php' * parse_url('http://facebook.com/path/p1.php?query=1', 'QUERY')返回'query=1',可以指定key来返回特定参数,key的格式是QUERY:<KEY_NAME>,例如QUERY:k1 * parse_url('http://facebook.com/path/p1.php?query=1#Ref', 'REF')返回'Ref' * parse_url('http://facebook.com/path/p1.php?query=1#Ref', 'PROTOCOL')返回'http'

  • a pmod b - Compute the positive modulo
  • positive a - Returns a
  • pow(x1, x2) - raise x1 to the power of x2
  • power(x1, x2) - raise x1 to the power of x2
  • rand([seed]) - Returns a pseudorandom number between 0 and 1
  • regexp_extract(str, regexp[, idx]) - extracts a group that matches regexp:例如regexp_extract(url,'.*pvid=(.*?)(&|#|$)',1)!=""其中()的位置即为后面参数1,2,3所获得的数值,当前例子为第一个括号匹配的值
  • regexp_replace(str, regexp, rep) - replace all substrings of str that match regexp with rep
  • repeat(str, n) - repeat str n times
  • reverse(str) - reverse str
  • round(x[, d]) - round x to d decimal places
  • rpad(str, len, pad) - Returns str, right-padded with pad to a length of len
  • rtrim(str) - Removes the trailing space characters from str
  • sin(x) - returns the sine of x (x is in radians)
  • size(a) - Returns the size of a
  • space(n) - returns n spaces
  • split(str, regex) - Splits str around occurances that match regex ,例如split(split(f.field_ext1, ",")[2],"_")[0],
  • sqrt(x) - returns the square root of x
  • substr(str, pos[, len]) - returns the substring of str that starts at pos and is of length len
  • substring(str, pos[, len]) - returns the substring of str that starts at pos and is of length len
  • to_date(expr) - Extracts the date part of the date or datetime expression expr

                         返回date部分的字符串('yyyy-MM-dd')。expr可以是date或datetime('yyyy-MM-dd HH:mm:ss')。

  • trim(str) - Removes the leading and trailing space characters from str
  • ucase(str) - Returns str with all characters changed to uppercase
  • unix_timestamp([date[, pattern]]) - Returns the UNIX timestamp

返回unix风格的time epoch(1970-01-01 00:00:00以来的秒数)。默认返回当前时间的time epoch。

制定date时,返回指定时间的time epoch。可以用pattern制定格式化字符串,用于格式化date。例如 unix_timestamp('20121228', 'yyyyMMdd')返回1356624000

  • upper(str) - Returns str with all characters changed to uppercase
  • year(date) - Returns the year of date
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值