Hive 实用的第三方 UDF 收集

在GitHub上有丰富的Hive UDF资源,可针对项目需要进行拉取使用。

以下记录hive-third-functions的部署和使用

部署安装

安装依赖的jdo2-api-2.3-ec.jar

[root@server01 hive_third_jars]# wget http://www.datanucleus.org/downloads/maven2/javax/jdo/jdo2-api/2.3-ec/jdo2-api-2.3-ec.jar -O ~/jdo2-api-2.3-ec.jar
--2018-12-28 11:19:48--  http://www.datanucleus.org/downloads/maven2/javax/jdo/jdo2-api/2.3-ec/jdo2-api-2.3-ec.jar
Resolving www.datanucleus.org (www.datanucleus.org)... 80.86.85.8
Connecting to www.datanucleus.org (www.datanucleus.org)|80.86.85.8|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 198552 (194K) [application/java-archive]
Saving to: ‘/root/jdo2-api-2.3-ec.jar’

100%[==============================================================================================>] 198,552      133KB/s   in 1.5s

2018-12-28 11:19:52 (133 KB/s) - ‘/root/jdo2-api-2.3-ec.jar’ saved [198552/198552]

[root@server01 hive_third_jars]# mvn install:install-file -DgroupId=javax.jdo -DartifactId=jdo2-api -Dversion=2.3-ec -Dpackaging=jar -Dfile=jdo2-api-2.3-ec.jar
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------< org.apache.maven:standalone-pom >-------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] --- maven-install-plugin:2.4:install-file (default-cli) @ standalone-pom ---
[INFO] Installing /root/hive_third_jars/jdo2-api-2.3-ec.jar to /root/.m2/repository/javax/jdo/jdo2-api/2.3-ec/jdo2-api-2.3-ec.jar
[INFO] Installing /tmp/mvninstall1123492988314226896.pom to /root/.m2/repository/javax/jdo/jdo2-api/2.3-ec/jdo2-api-2.3-ec.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.394 s
[INFO] Finished at: 2018-12-28T11:23:25+08:00
[INFO] ------------------------------------------------------------------------
[root@server01 hive_third_jars]# ls
jdo2-api-2.3-ec.jar

MVN Package

[root@server01 hive_third_jars]# cd hive-third-functions/
[root@server01 hive-third-functions]# ls
pom.xml  README-geo.md  README.md  README-zh.md  src
[root@server01 hive-third-functions]# mvn clean package -DskipTests
[INFO] Scanning for projects...
[INFO]
[INFO] ----------------< cc.shanruifeng:hive-third-functions >-----------------
......
[INFO] Attaching shaded artifact.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:28 min
[INFO] Finished at: 2018-12-28T11:28:41+08:00
[INFO] ------------------------------------------------------------------------
[root@server01 hive-third-functions]# ls
pom.xml  README-geo.md  README.md  README-zh.md  src  target
[root@server01 hive-third-functions]# ls target/
classes            generated-test-sources          hive-third-functions-2.1.3-shaded.jar  maven-status
generated-sources  hive-third-functions-2.1.3.jar  maven-archiver                         test-classes

添加jar包至hive

0: jdbc:hive2://node222:10000/demo> add jar /usr/local/hive-2.1.1/datas/hive-third-functions-2.1.3-shaded.jar ;
No rows affected (0.031 seconds)
0: jdbc:hive2://node222:10000/demo> list jar;
+--------------------------------------------------------------------+--+
|                              resource                              |
+--------------------------------------------------------------------+--+
| /usr/local/hive-2.1.1/datas/hive-third-functions-2.1.3-shaded.jar  |
+--------------------------------------------------------------------+--+
1 row selected (0.014 seconds)

创建临时函数关联函数class

0: jdbc:hive2://node222:10000/demo> create temporary function array_contains as 'cc.shanruifeng.functions.array.UDFArrayContains';
No rows affected (0.04 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function array_equals as 'cc.shanruifeng.functions.array.UDFArrayEquals';
No rows affected (0.032 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function array_intersect as 'cc.shanruifeng.functions.array.UDFArrayIntersect';
No rows affected (0.039 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function array_max as 'cc.shanruifeng.functions.array.UDFArrayMax';
No rows affected (0.02 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function array_min as 'cc.shanruifeng.functions.array.UDFArrayMin';
No rows affected (0.019 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function array_join as 'cc.shanruifeng.functions.array.UDFArrayJoin';
No rows affected (0.022 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function array_distinct as 'cc.shanruifeng.functions.array.UDFArrayDistinct';
No rows affected (0.022 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function array_position as 'cc.shanruifeng.functions.array.UDFArrayPosition';
No rows affected (0.021 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function array_remove as 'cc.shanruifeng.functions.array.UDFArrayRemove';
No rows affected (0.021 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function array_reverse as 'cc.shanruifeng.functions.array.UDFArrayReverse';
No rows affected (0.021 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function array_sort as 'cc.shanruifeng.functions.array.UDFArraySort';
No rows affected (0.019 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function array_concat as 'cc.shanruifeng.functions.array.UDFArrayConcat';
No rows affected (0.022 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function array_value_count as 'cc.shanruifeng.functions.array.UDFArrayValueCount';
No rows affected (0.021 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function array_slice as 'cc.shanruifeng.functions.array.UDFArraySlice';
No rows affected (0.02 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function array_element_at as 'cc.shanruifeng.functions.array.UDFArrayElementAt';
No rows affected (0.019 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function bit_count as 'cc.shanruifeng.functions.bitwise.UDFBitCount';
No rows affected (0.02 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function bitwise_and as 'cc.shanruifeng.functions.bitwise.UDFBitwiseAnd';
No rows affected (0.02 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function bitwise_not as 'cc.shanruifeng.functions.bitwise.UDFBitwiseNot';
No rows affected (0.023 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function bitwise_or as 'cc.shanruifeng.functions.bitwise.UDFBitwiseOr';
No rows affected (0.021 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function bitwise_xor as 'cc.shanruifeng.functions.bitwise.UDFBitwiseXor';
No rows affected (0.021 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function map_build as 'cc.shanruifeng.functions.map.UDFMapBuild';
No rows affected (0.022 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function map_concat as 'cc.shanruifeng.functions.map.UDFMapConcat';
No rows affected (0.024 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function map_element_at as 'cc.shanruifeng.functions.map.UDFMapElementAt';
No rows affected (0.019 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function map_equals as 'cc.shanruifeng.functions.map.UDFMapEquals';
No rows affected (0.019 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function day_of_week as 'cc.shanruifeng.functions.date.UDFDayOfWeek';
No rows affected (0.034 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function day_of_year as 'cc.shanruifeng.functions.date.UDFDayOfYear';
No rows affected (0.017 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function type_of_day as 'cc.shanruifeng.functions.date.UDFTypeOfDay';
No rows affected (0.028 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function zodiac_cn as 'cc.shanruifeng.functions.date.UDFZodiacSignCn';
No rows affected (0.016 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function zodiac_en as 'cc.shanruifeng.functions.date.UDFZodiacSignEn';
No rows affected (0.018 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function pinyin as 'cc.shanruifeng.functions.string.UDFChineseToPinYin';
No rows affected (0.021 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function md5 as 'cc.shanruifeng.functions.string.UDFMd5';
No rows affected (0.019 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function sha256 as 'cc.shanruifeng.functions.string.UDFSha256';
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)
0: jdbc:hive2://node222:10000/demo> create temporary function json_array_get as 'cc.shanruifeng.functions.json.UDFJsonArrayGet';
No rows affected (0.019 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function json_array_length as 'cc.shanruifeng.functions.json.UDFJsonArrayLength';
No rows affected (0.019 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function json_array_extract as 'cc.shanruifeng.functions.json.UDFJsonArrayExtract';
No rows affected (0.021 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function json_array_extract_scalar as 'cc.shanruifeng.functions.json.UDFJsonArrayExtractScalar';
No rows affected (0.022 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function json_extract as 'cc.shanruifeng.functions.json.UDFJsonExtract';
No rows affected (0.021 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function json_extract_scalar as 'cc.shanruifeng.functions.json.UDFJsonExtractScalar';
No rows affected (0.016 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function json_size as 'cc.shanruifeng.functions.json.UDFJsonSize';
No rows affected (0.017 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function id_card_province as 'cc.shanruifeng.functions.card.UDFChinaIdCardProvince';
No rows affected (0.017 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function id_card_city as 'cc.shanruifeng.functions.card.UDFChinaIdCardCity';
No rows affected (0.016 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function id_card_area as 'cc.shanruifeng.functions.card.UDFChinaIdCardArea';
No rows affected (0.016 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function id_card_birthday as 'cc.shanruifeng.functions.card.UDFChinaIdCardBirthday';
No rows affected (0.016 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function id_card_gender as 'cc.shanruifeng.functions.card.UDFChinaIdCardGender';
No rows affected (0.015 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function is_valid_id_card as 'cc.shanruifeng.functions.card.UDFChinaIdCardValid';
No rows affected (0.013 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function id_card_info as 'cc.shanruifeng.functions.card.UDFChinaIdCardInfo';
No rows affected (0.014 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function wgs_distance as 'cc.shanruifeng.functions.geo.UDFGeoWgsDistance';
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)
0: jdbc:hive2://node222:10000/demo> create temporary function gcj_to_bd as 'cc.shanruifeng.functions.geo.UDFGeoGcjToBd';
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)
0: jdbc:hive2://node222:10000/demo> create temporary function bd_to_gcj as 'cc.shanruifeng.functions.geo.UDFGeoBdToGcj';
No rows affected (0.014 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function wgs_to_gcj as 'cc.shanruifeng.functions.geo.UDFGeoWgsToGcj';
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)
0: jdbc:hive2://node222:10000/demo> create temporary function gcj_to_wgs as 'cc.shanruifeng.functions.geo.UDFGeoGcjToWgs';
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)
0: jdbc:hive2://node222:10000/demo> create temporary function gcj_extract_wgs as 'cc.shanruifeng.functions.geo.UDFGeoGcjExtractWgs';
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)
0: jdbc:hive2://node222:10000/demo> create temporary function url_encode as 'cc.shanruifeng.functions.url.UDFUrlEncode';
No rows affected (0.015 seconds)
0: jdbc:hive2://node222:10000/demo> create temporary function url_decode as 'cc.shanruifeng.functions.url.UDFUrlDecode';
No rows affected (0.041 seconds)

查看所创建的函数

0: jdbc:hive2://node222:10000/demo> describe function zodiac_cn;
+--------------------------------------------------------------------------------------------------------------------+--+
|                                                      tab_name                                                      |
+--------------------------------------------------------------------------------------------------------------------+--+
| zodiac_cn(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac.  |
+--------------------------------------------------------------------------------------------------------------------+--+
1 row selected (0.041 seconds)
0: jdbc:hive2://node222:10000/demo> describe function extended zodiac_cn;
+--------------------------------------------------------------------------------------------------------------------+--+
|                                                      tab_name                                                      |
+--------------------------------------------------------------------------------------------------------------------+--+
| zodiac_cn(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac.  |
| Example:                                                                                                           |
|  > select zodiac_cn(date_string) from src;                                                                         |
|  > select zodiac_cn(month, day) from src;                                                                          |
+--------------------------------------------------------------------------------------------------------------------+--+
4 rows selected (0.029 seconds)

测试所定义的函数

0: jdbc:hive2://node222:10000/demo> select pinyin('中国');
+-----------+--+
|    _c0    |
+-----------+--+
| zhongguo  |
+-----------+--+
1 row selected (0.577 seconds)
0: jdbc:hive2://node222:10000/demo> select md5('aaronshan');
+-----------------------------------+--+
|                _c0                |
+-----------------------------------+--+
| 95686bc0483262afe170b550dd4544d1  |
+-----------------------------------+--+
1 row selected (0.117 seconds)
0: jdbc:hive2://node222:10000/demo> select day_of_week('2016-07-12');
+------+--+
| _c0  |
+------+--+
| 2    |
+------+--+
1 row selected (0.126 seconds)
0: jdbc:hive2://node222:10000/demo> select day_of_week('2018-12-28');
+------+--+
| _c0  |
+------+--+
| 5    |
+------+--+
1 row selected (0.108 seconds)
0: jdbc:hive2://node222:10000/demo> select zodiac_cn('1989-01-08');
+------+--+
| _c0  |
+------+--+
| 魔羯座  |
+------+--+
1 row selected (0.129 seconds)

Functions

1. string functions

functiondescription
pinyin(string) -> stringconvert chinese to pinyin
md5(string) -> stringmd5 hash
sha256(string) -> string

sha256 hash

 

2. array functions

functiondescription
array_contains(array<E>, E) -> booleanwhether array contains value or not.
array_equals(array<E>, array<E>) -> booleanwhether two array equals or not.
array_intersect(array, array) -> arrayreturns the two array's intersection, without duplicates.
array_max(array<E>) -> Ereturns the maximum value of input array.
array_min(array<E>) -> Ereturns the minimum value of input array.
array_join(array, delimiter, null_replacement) -> stringconcatenates the elements of the given array using the delimiter and an optional null_replacement to replace nulls.
array_distinct(array) -> arrayremove duplicate values from the array.
array_position(array<E>, E) -> longreturns the position of the first occurrence of the element in array (or 0 if not found).
array_remove(array<E>, E) -> arrayremove all elements that equal element from array.
array_reverse(array) -> arrayreverse the array element.
array_sort(array) -> arraysorts and returns the array. The elements of array must be orderable.
array_concat(array, array) -> arrayconcatenates two arrays.
array_value_count(array<E>, E) -> longcount array's element number that element value equals given value.
array_slice(array, start, length) -> arraysubsets array starting from index start (or starting from the end if start is negative) with a length of length.
array_element_at(array<E>, index) -> Ereturns element of array at given index. If index < 0, element_at accesses elements from the last to the first.

3. map functions

functiondescription
map_build(x<K>, y<V>) -> map<K, V>returns a map created using the given key/value arrays.
map_concat(x<K, V>, y<K, V>) -> map<K,V>returns the union of two maps. If a key is found in both x and y, that key’s value in the resulting map comes from y.
map_element_at(map<K, V>, key) -> Vreturns value for given key, or NULL if the key is not contained in the map.
map_equals(x<K, V>, y<K, V>) -> booleanwhether map x equals with map y or not.

4. date functions

functiondescription
day_of_week(date_string | date) -> intday of week,if monday,return 1, sunday return 7, error return null.
day_of_year(date_string | date) -> intday of year. The value ranges from 1 to 366.
zodiac_en(date_string | date) -> stringconvert date to zodiac
zodiac_cn(date_string | date) -> stringconvert date to zodiac chinese
type_of_day(date_string | date) -> stringfor chinese. 获取日期的类型(1: 法定节假日, 2: 正常周末, 3: 正常工作日 4:攒假的工作日),错误返回-1.

5. json functions

functiondescription
json_array_get(json, jsonPath) -> array(varchar)returns the element at the specified index into the json_array. The index is zero-based.
json_array_length(json, jsonPath) -> array(varchar)returns the array length of json (a string containing a JSON array).
json_array_extract(json, jsonPath) -> array(varchar)extract json array by given jsonPath.
json_array_extract_scalar(json, jsonPath) -> array(varchar)like json_array_extract, but returns the result value as a string (as opposed to being encoded as JSON).
json_extract(json, jsonPath) -> array(varchar)extract json by given jsonPath.
json_extract_scalar(json, jsonPath) -> array(varchar)like json_extract, but returns the result value as a string (as opposed to being encoded as JSON).
json_size(json, jsonPath) -> array(varchar)like json_extract, but returns the size of the value. For objects or arrays, the size is the number of members, and the size of a scalar value is zero.

6. bitwise functions

functiondescription
bit_count(x, bits) -> bigintcount the number of bits set in x (treated as bits-bit signed integer) in 2’s complement representation
bitwise_and(x, y) -> bigintreturns the bitwise AND of x and y in 2’s complement arithmetic.
bitwise_not(x) -> bigintreturns the bitwise NOT of x in 2’s complement arithmetic.
bitwise_or(x, y) -> bigintreturns the bitwise OR of x and y in 2’s complement arithmetic.
bitwise_xor(x, y) -> bigintreturns the bitwise XOR of x and y in 2’s complement arithmetic.

7. china id card functions

functiondescription
id_card_province(string) -> stringget user's province
id_card_city(string) -> stringget user's city
id_card_area(string) -> stringget user's area
id_card_birthday(string) -> stringget user's birthday
id_card_gender(string) -> stringget user's gender
is_valid_id_card(string) -> booleandetermine is valid china id card No.
id_card_info(string) -> jsonget china id card info. include province, city, area etc.

8. geographic functions

functiondescription
wgs_distance(double lat1, double lng1, double lat2, double lng2) -> doublecalculate WGS84 coordinate distance, in meters.
gcj_to_bd(double,double) -> jsonGCJ-02(火星坐标系) convert to BD-09(百度坐标系), 谷歌、高德——>百度
bd_to_gcj(double,double) -> jsonBD-09(百度坐标系) convert to GCJ-02(火星坐标系), 百度——>谷歌、高德
wgs_to_gcj(double,double) -> jsonWGS84(地球坐标系) convert to GCJ02(火星坐标系)
gcj_to_wgs(double,double) -> jsonGCJ02(火星坐标系) convert to GPS84(地球坐标系), output coordinate WGS-84 accuracy within 1 to 2 meters.
gcj_extract_wgs(double,double) -> jsonGCJ02(火星坐标系) convert to GPS84, output coordinate WGS-84 accuracy within 0.5 meters. but compute cost more time than gcj_to_wgs.

关于互联网地图坐标系的说明见: 当前互联网地图的坐标系现状

9. url functions

functiondescription
url_encode(value) -> stringescapes value by encoding it so that it can be safely included in URL query parameter names and values
url_decode(value) -> stringunescape the URL encoded value. This function is the inverse of url_encode.

参考资料:

https://github.com/aaronshan/hive-third-functions

 

转载于:https://my.oschina.net/peakfang/blog/2994624

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值