Hive的函数

最新推荐文章于 2023-01-18 17:33:17 发布

喵不可言zzz

最新推荐文章于 2023-01-18 17:33:17 发布

阅读量193

点赞数 1

分类专栏： hive 大数据 Hadoop 文章标签： hive 数据仓库大数据 hadoop

本文链接：https://blog.csdn.net/Godboy0001/article/details/127426425

版权

hive 同时被 3 个专栏收录

1 篇文章 0 订阅

订阅专栏

大数据

1 篇文章 0 订阅

订阅专栏

Hadoop

1 篇文章 0 订阅

订阅专栏

2、自定义一个java类继承UDF，重载 evaluate 方法

3、打包package上传到/opt/testData/hive

4、创建临时函数与开发好的 class 关联起来

5、在HQL中使用

一、内置函数

1、类型转换

cast(表达式 as 数据类型)

例：

select cast('money' as bigint);

2、切割

split(string str, string pat)

例：

select split('nihao|hello|nice','\\|');

3、正则表达式截取字符串

regexp_extract(string subject, string pattern, int index)

例：select regexp_extract(字段名,正则表达式,索引)

select regexp_extract('hello<B>nice</B>haha','<B>(.*)</B>',1);

4、将字符串前后出现的空格去掉

trim(string A)

例：

select trim('  浅糙一下天理吧  ');

5、求指定列的聚合函数

1)sum(col) 求和

2)avg(col) 求平均

3)min(col) 求最小

4)max(col) 求最大

公式： select subject,sum(score)

from 数据表名

group by subject

select中的字段，必须要在group by后面出现出行，或者用到聚合函数中。

例： worker表内数据

6、拼接字符串

concat(string A, string B...)

例：

select concat("100","200");

7、字符串的截取

select substr('abcde',3,2)

8、炸裂函数

select explode(split("nice|good|well","\\|"));

9、case when

在/opt/testdata/hive路径下创建数据文本student_level.txt

创建数据表

create table student_level(name string,score int)
row format delimited fields terminated by ",";

将数据导入创建好的表中

load data local inpath '/opt/testData/hive/student_level.txt' into table student_level;

查询数据表验证导入是否成功

select * from student_level;

导入成功！

接下来：

select name,score,
case when score >= 90 then 'very good'
when score >= 80 and score <90 then 'double good'
when score >= 70 and score <80 then 'good'
when score >= 60 and score <70 then 'go on'
else 'zhencai'
end level
from student_level;

10、日期处理函数

1）、date_format函数（根据格式整理日期）

yyyy-MM-dd HH:mm:ss

select date_format('2020-03-05','yyyy-MM');

2）、date_add函数（加减日期）

 select date_add('2020-03-05',-1);

 select date_add('2020-03-05',1);

 select date_sub('2020-03-05',1);

3)、next_day函数

（1）取当前天的下一个周一

select next_day('2020-03-05','MO');

说明：星期一到星期日的英文（Monday，Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday）

(2)取当前周的周一

select date_add(next_day('2020-03-05','MO'),-7);

(3)last_day函数（求当月最后一天日期）

select last_day('2020-03-05');

二、处理json函数

现有数据（路径：/opt/testData/hive/json.txt）

创建数据表并导入上面数据

create table json(data string);
load data local inpath '/opt/testData/hive/json.txt' into table json;

查询json数据

select get_json_object(data,'$.movie') as movie,
get_json_object(data,'$.rate') as rate
from json;

三、窗口函数

1、窗口聚合

现有数据（路径：/opt/testData/hive/cookie1.txt）

创建数据表并导入上面数据

创建表
create table cookie1(cookieid string, createtime string, pv int) row format delimited fields terminated by ',';
导入数据
load data local inpath "/opt/testData/hive/cookie1.txt" into table cookie1;

select cookieid,createtime, 
   pv, 
   sum(pv) over (partition by cookieid order by createtime rows between unbounded preceding and current row) as pv1, 
   sum(pv) over (partition by cookieid order by createtime) as pv2, 
   sum(pv) over (partition by cookieid) as pv3, 
   sum(pv) over (partition by cookieid order by createtime rows between 3 preceding and current row) as pv4, 
   sum(pv) over (partition by cookieid order by createtime rows between 3 preceding and 2 following) as pv5, 
   sum(pv) over (partition by cookieid order by createtime rows between current row and unbounded following) as pv6 
from cookie1;

2、窗口分片

现有数据（路径：/opt/testData/hive/cookie1.txt）

创建数据表并导入上面数据

创建数据表
create table cookie2(cookieid string, createtime string, pv int)
row format delimited
fields terminated by ',';
加载数据。
load data local inpath "/opt/testData/hive/cookie1.txt" into table cookie2;

select cookieid,createtime,
  pv,
  ntile(2) over (partition by cookieid order by createtime) as rn1,
  ntile(3) over (partition by cookieid order by createtime) as rn2,
  ntile(4) over (order by createtime) as rn3
from cookie1 
order by cookieid,createtime;

3、窗口排序

select
  cookieid,
  createtime,
  pv,
  rank() over (partition by cookieid order by pv desc) as rn1,
  dense_rank() over (partition by cookieid order by pv desc) as rn2,
  row_number() over (partition by cookieid order by pv desc) as rn3
from cookie2 
where cookieid='cookie1';

4、上下移动

现有数据（路径：/opt/testData/hive/cookie3.txt）

创建数据表
create table cookie3(cookieid string, createtime string, url string)
row format delimited fields terminated by ',';
导入数据
load data local inpath "/opt/testData/hive/cookie3.txt" into table cookie3;

5、首位值

select cookieid,createtime,url,
row_number() over (partition by cookieid order by createtime) as rn,
FIRST_VALUE(url) over (partition by cookieid order by createtime desc) as last1,
LAST_VALUE(url) over (partition by cookieid order by createtime desc) as last2
from cookie3;

四、自定义函数

1、创建Maven项目，并导入依赖

<dependencies>
       <dependency>
       <groupId>org.apache.hive</groupId>
       <artifactId>hive-exec</artifactId>
       <version>2.3.3</version>
       <exclusions>
           <exclusion>
               <groupId>jdk.tools</groupId>
               <artifactId>jdk.tools</artifactId>
           </exclusion>
       </exclusions>
       </dependency>
</dependencies>

2、自定义一个java类继承UDF，重载 evaluate 方法

import org.apache.hadoop.hive.ql.exec.UDF;

public class ToLower extends UDF {
    public String evaluate(String field) {
        String result = field.toLowerCase();
        return result;
    }
}

3、打包package上传到/opt/testData/hive

4、创建临时函数与开发好的 class 关联起来

create temporary function tolowercase as 'com.udf.ToLower';

5、在HQL中使用

select  tolowercase('ABCD');

注意：这种方式创建的临时函数只在一次hive会话中有效，重启会话后就无效

喵不可言zzz

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
Hive的函数

Hive的函数
复制链接

扫一扫

专栏目录

Hive的函数

一、内置函数

1、类型转换

2、 切割

3、正则表达式截取字符串

4、将字符串前后出现的空格去掉

5、求指定列的聚合函数

6、拼接字符串

7、字符串的截取

8、炸裂函数

9、case when

10、日期处理函数

二、处理json函数

三、窗口函数

1、窗口聚合

2、窗口分片

3、窗口排序

4、上下移动

5、首位值

四、自定义函数

1、创建Maven项目，并导入依赖

2、自定义一个java类继承UDF，重载 evaluate 方法

3、打包package上传到/opt/testData/hive

4、创建临时函数与开发好的 class 关联起来

5、在HQL中使用

“相关推荐”对你有帮助么？

2、切割