HIVE自定义函数

最新推荐文章于 2024-07-16 10:50:48 发布

Supper宝宝

最新推荐文章于 2024-07-16 10:50:48 发布

阅读量254

点赞数

分类专栏：大数据 hiv

本文链接：https://blog.csdn.net/kebexue/article/details/85225972

版权

大数据同时被 2 个专栏收录

21 篇文章 0 订阅

订阅专栏

hiv

3 篇文章 0 订阅

订阅专栏

-》自定义函数

1)创建工程,加载hive的依赖包

2）编写代码，需要继承UDF

3）打包 export jar file

4）双传jar包到linux目录下

5）启动hive

6）add jar jar路径 //不要加引号

add jar /root/lower.jar

7）关联到hive中

create temporary function 自定义函数名 as '包的函数名'

create temporary function lower as "com.alex.udf.func.lower";
OK
Time taken: 0.1 seconds
hive (default)> use month
month( months_between(
hive (default)> use mongdb;
OK
Time taken: 0.031 seconds
hive (mongdb)> select * from student;
OK
student.id student.name
4 Tonny
1 Alex
2 Amy
3 Mia
NULL NULL
Time taken: 1.647 seconds, Fetched: 5 row(s)
hive (mongdb)> select name, lower(name) as lower_name from student;
OK
name lower_name
Tonny tonny
Alex alex
Amy amy
Mia mia
NULL NULL
Time taken: 0.31 seconds, Fetched: 5 row(s)

-》压缩：

1》开启压缩

set hive.exec.compress.intermediate;

set hive.exec.compress.intermediate = true;

2>map开启

hive (default)>set hive.exec.compress.intermediate；
hive.exec.compress.intermediate=false
hive (default)> set hive.exec.compress.intermediate=true;
hive (default)> set mapreduce.map.output.compress;
mapreduce.map.output.compress=false
hive (default)> set mapreduce.map.output.compress=true;

3》reduce开启

开启最终输出压缩功能

set hive.exec.conpress.output=true

开启最终数据压缩功能

mapreduce.output.fileoutputformat.compress=true;

设置压缩方式：

set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;

设置块压缩：

set mapreduce.output.fileoutputformat.compress.type=BLOCK;