hive的函数

最新推荐文章于 2023-05-19 17:18:01 发布

要努力努力再努力啊

最新推荐文章于 2023-05-19 17:18:01 发布

阅读量439

点赞数 1

分类专栏：笔记文章标签： hive 大数据

本文链接：https://blog.csdn.net/qq_39891132/article/details/114802808

版权

笔记专栏收录该内容

1 篇文章 0 订阅

订阅专栏

hive的函数

1. 数学函数

取整函数:round

语法:round(double a)
返回值BIGINT
说明:返回double类型额整数值部分(遵循四舍五入)

举例:

hive> select round(3.1415926);
3

指定精度取整函数: round

语法: round(double a, int d)
返回值: DOUBLE
说明:返回指定精度d的double类型
举例:

hive> select round(3.1415926,4);
3.1416

向下取整函数: floor

语法: floor(double a)
返回值: BIGINT
说明:返回等于或者小于该double变量的最大的整数
举例:

hive> select floor(3.1415926);
3

向上取整函数: ceil

语法: ceil(double a)
返回值: BIGINT
说明:返回等于或者大于该double变量的最小的整数
举例:

hive> select ceil(3.1415926) ;
4

取随机数函数: rand

语法: rand(),rand(int seed)
返回值: double
说明:返回一个0到1范围内的随机数。如果指定种子seed，则会返回固定的随机数
举例:

hive> select rand();
0.5577432776034763

hive> select rand();
0.6638336467363424

hive> select rand(100);
0.7220096548596434

hive> select rand(100);
0.7220096548596434

幂运算函数: pow

语法: pow(double a, double p)
返回值: double
说明:返回a的p次幂
举例:

hive> select pow(2,4) ;
16.0

绝对值函数: abs

语法: abs(double a) abs(int a)
返回值: double int
说明:返回数值a的绝对值
举例:

hive> select abs(-3.9);
3.9
hive> select abs(10.9);
10.9

2. 字符串函数

字符串长度函数：length

语法: length(string A)
返回值: int
说明：返回字符串A的长度
举例:

hive> select length('abcedfg');
7

字符串反转函数：reverse

语法: reverse(string A)
返回值: string
说明：返回字符串A的反转结果
举例:

hive> select reverse('abcedfg');
gfdecba

字符串连接函数：concat

语法: concat(string A, string B…)
返回值: string
说明：返回输入字符串连接后的结果，支持任意个输入字符串
举例:

hive> select concat(‘abc’,'def’,'gh’);;
abcdefgh

字符串连接函数-带分隔符：concat_ws

语法: concat_ws(string SEP, string A, string B…)
返回值: string
说明：返回输入字符串连接后的结果，SEP表示各个字符串间的分隔符
举例:

hive> select concat_ws(',','abc','def','gh');
abc,def,gh

字符串截取函数：substr,substring

语法: substr(string A, int start),substring(string A, int start)
返回值: string
说明：返回字符串A从start位置到结尾的字符串
举例:

hive> select substr('abcde',3);
cde
hive> select substring('abcde',3);
cde
hive>select substr('abcde',-1);
e

字符串截取函数：substr,substring

语法: substr(string A, int start, int len),substring(string A, intstart, int len)
返回值: string
说明：返回字符串A从start位置开始，长度为len的字符串
举例:

hive> select substr('abcde',3,2);
cd
hive> select substring('abcde',3,2);
cd
hive>select substring('abcde',-2,2);
de

字符串转大写函数：upper,ucase

语法: upper(string A) ucase(string A)
返回值: string
说明：返回字符串A的大写格式
举例:

hive> select upper('abSEd');
ABSED
hive> select ucase('abSEd');
ABSED

字符串转小写函数：lower,lcase

语法: lower(string A) lcase(string A)
返回值: string
说明：返回字符串A的小写格式
举例:

hive> select lower('abSEd');
absed
hive> select lcase('abSEd');
absed

去空格函数：trim

语法: trim(string A)
返回值: string
说明:去除字符串两边的空格
举例:

hive> select trim(' abc ');
abc

左边去空格函数：ltrim

语法: ltrim(string A)
返回值: string
说明：去除字符串左边的空格
举例:

hive> select ltrim(' abc ');
abc

右边去空格函数：rtrim

语法: rtrim(string A)
返回值: string
说明：去除字符串右边的空格
举例:

hive> select rtrim(' abc ');
abc

正则表达式替换函数：regexp_replace

语法: regexp_replace(string A, string B, string C)
返回值: string
说明：将字符串A中的符合java正则表达式B的部分替换为C。注意，在有些情况下要使用转义字符,类似oracle中的regexp_replace函数。
举例:

hive> select regexp_replace('foobar', 'oo|ar', '');
fb

URL解析函数：parse_url

语法: parse_url(string urlString, string partToExtract [, stringkeyToExtract])
返回值: string
说明：返回URL中指定的部分。partToExtract的有效值为：HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.
举例:

hive> select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST');
facebook.com

hive> select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'PATH');
/path1/p.php

hive> select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY','k1');
v1

分割字符串函数: split

语法: split(string str, stringpat)
返回值: array
说明:按照pat字符串分割str，会返回分割后的字符串数组
举例:

hive> select split('abtcdtef','t');
["ab","cd","ef"]

3. 日期函数

获取当前UNIX时间戳函数:unix_timestamp

语法: unix_timestamp()
返回值: bigint
说明:获得当前时区的UNIX时间戳
举例:

hive> select unix_timestamp();
1323309615

UNIX时间戳转日期函数:from_unixtime

语法: from_unixtime(bigint unixtime[, string format])
返回值: string
说明:转化UNIX时间戳（从1970-01-01 00:00:00 UTC到指定时间的秒数）到当前时区的时间格式
举例:

hive> select from_unixtime(1323308943,'yyyyMMdd');
20111208

日期转UNIX时间戳函数:unix_timestamp

语法: unix_timestamp(string date)
返回值: bigint
说明:转换格式为"yyyy-MM-ddHH:mm:ss"的日期到UNIX时间戳。如果转化失败，则返回NULL。
举例:

hive> select unix_timestamp('2011-12-07 13:01:03');
1323234063

指定格式日期转UNIX时间戳函数:unix_timestamp

语法: unix_timestamp(string date, string pattern)
返回值: bigint
说明:转换pattern格式的日期到UNIX时间戳。如果转化失败，则返回0
举例:

hive> select unix_timestamp('20111207 13:01:03','yyyyMMdd HH:mm:ss');
1323234063

日期时间转日期函数:to_date

语法: to_date(string timestamp)
返回值: string
说明:返回日期时间字段中的日期部分。
举例:

hive> select to_date('2011-12-08 10:03:01');
2011-12-08

日期转年函数: year

语法: year(string date)
返回值: int
说明:返回日期中的年
举例:

hive> select year('2011-12-08 10:03:01');
2011
hive> select year('2012-12-08');
2012

日期转月函数: month

语法: month (string date)
返回值: int
说明:返回日期中的月份
举例:

hive> select month('2011-12-08 10:03:01');
12
hive> select month('2011-08-08');
8

日期转天函数: day

语法: day (string date)
返回值: int
说明:返回日期中的天。
举例:

hive> select day('2011-12-08 10:03:01');
8
hive> select day('2011-12-24');
24

同样的，还有 hour,minute,second函数，分别是获取小时，分钟和秒，使用方式和以上类似，这里就不再讲述。

日期转周函数:weekofyear

语法: weekofyear (string date)
返回值: int
说明:返回日期在当前的周数。
举例:

hive> select weekofyear('2011-12-08 10:03:01');
49

日期比较函数: datediff

语法: datediff(string enddate, string startdate)
返回值: int
说明:返回结束日期减去开始日期的天数。
举例:

hive> select datediff('2012-12-08','2012-05-09');
213

日期增加函数: date_add

语法: date_add(string startdate, int days)
返回值: string
说明:返回开始日期startdate增加days天后的日期。
举例:

hive> select date_add('2012-12-08',10);
2012-12-18

日期减少函数: date_sub

语法: date_sub (string startdate, int days)
返回值: string
说明:返回开始日期startdate减少days天后的日期
举例:

hive> select date_sub('2012-12-08',10);
2012-11-28

4. 条件函数

if函数: if

语法: if(boolean testCondition, T valueTrue, T valueFalseOrNull)
返回值: T
说明: 当条件testCondition为TRUE时，返回valueTrue；否则返回valueFalseOrNull
举例:

hive> select if(1==2,100,200) ;
200
hive> select if(1==1,100,200) ;
100

条件判断函数：CASE

语法: CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
返回值: T
说明：如果a等于b，那么返回c；如果a等于d，那么返回e；否则返回f
举例:

select
case 200
   when 100 then "张三"
   when 200 then "李四"
   when 300 then "王五"
   else "over"
end;

条件判断函数：CASE

语法: CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END
返回值: T
说明：如果a为TRUE,则返回b；如果c为TRUE，则返回d；否则返回e
举例:

select
case
    when 1==2 then "hello"
    when 1==1 then "world"
    else "over"
end

select *,
case
    when s_score>=80 then '优'
    when s_score>=60 then '良'
    else '中'
end
from score;

5. 转换函数

转换函数

报错原因: 因为参数类型必须是字符串
解决方案格式:
举例:

cast(表达式 as 数据类型)

cast("20190607" as int)
cast('2017-06-12' as date)

6. hive的行转列

1、行转列是指多行数据转换为一个列的字段。
2、Hive行转列用到的函数：
concat(str1,str2,…) --字段或字符串拼接
concat_ws(sep, str1,str2) --以分隔符拼接每个字符串
collect_set(col) --将某字段的值进行去重汇总，产生array类型字段

举例:

数据:
10 唐僧
20 刘备
30 乔峰
10 孙悟空
20 关羽
30 段誉
10 八戒
10 沙僧
20 张飞
30 虚竹
将资料中 emp.txt文件上传到 linux中
建表:
create table emp(
    deptno int,
    ename string
)
row format delimited fields terminated by ' ';
插入数据:
load data local inpath '/export/data/hive_data/g_collect_set.csv' into table emp;
转换:
select ename from emp;
select collect_set(ename) from emp;

select collect_set(concat_ws("#", cast(deptno as string), ename))
from emp;

select deptno,collect_set(ename) 
from emp 
group by deptno

select deptno,concat_ws('#',collect_set(ename)) 
from emp 
group by deptno

行转列，COLLECT_SET(col)：函数只接受基本数据类型，它的主要作用是将某字段的值进行去重汇总，产生array类型字段。

7. hive的表生成函数

explode函数

explode(col)：将hive一列中复杂的array或者map结构拆分成多行
explode(ARRAY) 列表中的每个元素生成一行
explode(MAP) map中每个key-value对，生成一行，key为一列，value为一列
举例:

10 唐僧|孙悟空|猪八戒|沙和尚
20 刘备|关羽|张飞
30 乔峰|段誉|虚竹|扫地僧
create table emp2(
    deptno int,
    names array<string>
)
row format delimited fields terminated by ' '
collection items terminated by '|';
load data local inpath '/export/data/hive_data/h_explode.txt' into table emp2;
select * from emp2;
select explode(names) as name from emp2;

LATERAL VIEW侧视图

用法：LATERAL VIEW udtf(expression) tableAlias AS columnAlias
解释：用于和split, explode等UDTF一起使用，它能够将一列数据拆成多行数据，在此基础上可以对拆分后的数据进行聚合。
举例:

select deptno,name from emp2 lateral view explode(names) tmp_tb as name;

8. Reflect函数

reflect函数可以支持在sql中调用java中的自带函数

举例:

@Test
public void demo01() {
    int max = Math.max(10, 20);
    System.out.println("max = " + max);

    int min = Math.min(10, 20);
    System.out.println("min = " + min);
}

select reflect("java.lang.Math", "max", 10, 20);
20

select reflect("java.lang.Math", "min", 10, 20);
10

Hive的开窗函数

ROW_NUMBER

ROW_NUMBER() 从1开始，按照顺序，生成分组内记录的序列
注意: ROW_NUMBER 不会考虑到数据重复的问题, 每个组从 1 开始打标即可.

RANK

RANK() 生成数据项在分组中的排名，排名相等会在名次中留下空位

DENSE_RANK

DENSE_RANK() 生成数据项在分组中的排名，排名相等会在名次中不会留下空位

ntile(N)

主要使用用于解决求几分之几的问题
举例:

select 
*,
ntile(3) over(order by pv desc) ntile3
from itcast_t2;
ntile为三就是求三分之一

sum

累计求和
如果不指定rows between,默认为从起点到当前行
如果不指定order by，则将分组内所有值累加
关键是理解rows between含义,也叫做window子句
- preceding：往前
- following：往后
- current row：当前行
- unbounded：起点
- unbounded preceding 表示从前面的起点
- unbounded following：表示到后面的终点

avg min max

和sum用法一样

lag

LAG(col,n,DEFAULT) 用于统计窗口内往上第n行值第一个参数为列名
第二个参数为往上第n行（可选，默认为1）
第三个参数为默认值（当往上第n行为NULL时候，取默认值，如不指定，则为NULL）

lead

与LAG相反LEAD(col,n,DEFAULT) 用于统计窗口内往下第n行值第一个参数为列名
第二个参数为往下第n行（可选，默认为1）
第三个参数为默认值（当往下第n行为NULL时候，取默认值，如不指定，则为NULL）

first_value

取分组内排序后，截止到当前行，第一个值

last_value

取分组内排序后，截止到当前行，最后一个值
注意:要取最后一个值需要加rows between unbounded preceding and unbounded following才能正常显示

自定义函数

UDF

一进一出

UDAF

多进一出

UDTF

一进多出
第一步：创建maven java 工程，导入jar包

自定义UDF演示 将小写字符串转换为大写字符串
<dependencies>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-exec</artifactId>
        <version>1.2.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.7.5</version>
    </dependency>
</dependencies>

<build>
    <!--指定打包之后的名字: 当前项目名称-->
    <finalName>${artifactId}</finalName>

    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.0</version>
            <configuration>
                <source>1.8</source>
                <target>1.8</target>
                <encoding>UTF-8</encoding>
            </configuration>
        </plugin>
    </plugins>
</build>

第二步：开发java类继承UDF，并重载evaluate 方法

package cn.itcast.a;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class MyUDF  extends UDF {

    public Text evaluate(final Text s) {
        if (null == s) {
            return null;
        }
        //返回大写字母
        return new Text(s.toString().toUpperCase());
    }
}

第三步：将我们的项目打包，并上传到hive的lib目录下

第四步：添加我们的jar包

add jar /export/server/hive-2.1.0/lib/my_upper.jar;

第五步：设置函数与我们的自定义函数关联

create function my_upper as 'cn.itcast.a.MyUDF';

第六步：使用自定义函数

select my_upper('abc');

自定义UDTF

代码实现

import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;

import java.util.ArrayList;
import java.util.List;
import java.util.function.ObjDoubleConsumer;

public class MyUDTF extends GenericUDTF {
    private final transient Object[] forwardListObj = new Object[1];

    @Override
    public StructObjectInspector initialize(StructObjectInspector argOIs) throws UDFArgumentException {
        //设置列名的类型
        List<String> fieldNames = new ArrayList<>();
        //设置列名
        fieldNames.add("column_01");
        List<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>()  ;//检查器列表

        //设置输出的列的值类型
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
         
        return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);

    }

    @Override
    public void process(Object[] objects) throws HiveException {
        //1:获取原始数据
        String args = objects[0].toString();
        //2:获取数据传入的第二个参数，此处为分隔符
        String splitKey = objects[1].toString();
        //3.将原始数据按照传入的分隔符进行切分
        String[] fields = args.split(splitKey);
        //4:遍历切分后的结果，并写出
        for (String field : fields) {
            //将每一个单词添加值对象数组
            forwardListObj[0] = field;
            //将对象数组内容写出
            forward(forwardListObj);
        }

    }

    @Override
    public void close() throws HiveException {

    }
}

添加我们的jar包

add jar /export/server/hive-2.1.0/lib/my_udtf.jar

创建临时函数与开发后的udtf代码关联

create temporary function my_udtf as 'cn.itcast.udf.ItcastUDF';

使用自定义udtf函数

select my_udtf("zookeeper,hadoop,hdfs,hive,MapReduce",",") word;

要努力努力再努力啊

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive的函数

hive的函数1. 数学函数取整函数:round语法:round(double a)返回值BIGINT说明:返回double类型额整数值部分(遵循四舍五入)举例:hive> select round(3.1415926);3指定精度取整函数: round语法: round(double a, int d)返回值: DOUBLE说明:返回指定精度d的double类型举例:hive> select round(3.1415926,4);3.1416向下取整
复制链接

扫一扫