HIVE必备语法

最新推荐文章于 2024-06-03 00:00:00 发布

中长跑路上crush

最新推荐文章于 2024-06-03 00:00:00 发布

阅读量88

点赞数

文章标签： hadoop

本文链接：https://blog.csdn.net/weixin_58026490/article/details/134423161

版权

1、数据加载【必须掌握】（1，2，3，4，5）

从本地文件加载：
    hadoop fs -put 本地路径 hdfs路径;
    load data local path 本地路径 into table 表名;
    load data local path 本地路径 overwrite into table 表名;
从一个表加载到另一个表：
    insert into table table_name select语句--插入
    insert overwrite table table_name select 语句---覆盖
    create  table table_name like select语句--建表时把另一张表的格式复制过来
    
    1/在实战中,通常都是从MySQL直接导入到ods的表中,通过sqoop
    2/背景,老师现在自己建的数据源,在hdfs上创建了文件,将数据源通过编辑的方式,将数据源报错到hdfs上,
    然后他直接就可以查到了?
    --我突然就想起来hive的作用就是将HFDS中的数据和表进行映射,让我疑惑的是我印象中的加载方式好像没有这种.
    回头看了一眼,都是将Linux上的文件加载到hdfs上(让hive进行映射),或者直接加载到表中

2、创建分桶排序表（1，2，3，4，5）【重点】

create table table_name(
    name string,
    id int,
    years array<string>
)
clustered by name 
    sorted by (id desc)
     into 4 buckets;
row format deliminted 
collection trims terminated by "|"

3、创建普通表，查询时分桶排序【重点】（1，2，3，4，5）

set.mapreduce.job.reduce=n;
cluster by  不能倒序 等效于distrbute by  + order by (全局排序)   可以倒序

-- 为什么要在查询时分桶排序,不在建表时操作呢?
    分桶字段一定是经常查询和关联的字段
-- 有些字段我们不经常连接,偶尔使用一次且想提高连接效率,则可以使用该方法，为什么？
分桶排序比全局排序效率更高

4、正则匹配[了解]（1，2，3,4,5）

rlike 
    .任意一个字符  * 任意个任意字符
    ....   等效与  .{4}

5、union联合查询【重要】（1，2,3，4，5）

union 是增加行
join  是增加列
union 会默认去重 想要不去重 union all
会默认使用哈希值排序，先要按照自己的规则排序需再排序合并后再添加

6、sampling抽样【理解】（1,2,3,4）

tablesample (bucket x out of y on column)
x 是从第几个桶开始取， 索引从1开始，其它索引从0开始
y 总桶数÷y 就是我们要取的桶数   
x 决不能大于y
column 是要抽取的字段
放在表名后面，如果有别名，放在别名前面
select * from table_nane tablesample(bucket x out of y on column)

rand()返回0-1的任意随机浮点数
包含0不博涵1

7、虚拟列【了解】（1，2，3）

INPUT_FILE_NAME：显示数据行在文件中的具体位置

Hive的函数

1、区分和Python的函数

2、分为

8、函数的分类（1，2，3）

UDF：一进一出：round（）四舍五入
UDTF生成表函数：explode 炸裂函数
UDAF聚合函数：多进一出 count（）

9、查看函数的使用方法【重要】(1,2,3)

show functions;
desc function extended +;
desc function extended rand;(不能有括号)

10、字符串函数(1,)

字符串拼接ws+字符串拆分（2个对比记忆）

'传智,有你,会更好'
'我','是','帅哥'
concat（）
concat_ws（'-','我','是','帅哥'）--》连接符是在最前面
我是
spilt ('传智,有你,会更好',',')-->分隔符是在最后面

截取字符串的部分信息【必须得掌握】

'我爱北京天安门abc'
/*
 我  爱 北  京 天 安 门
 1   2  3  4  5  6  7
 -7 -6 -5 -4 -3 -2 -1
 */
 substr（字符串，起始位置，截取长度）
substr('我爱北京天安门abc',5,3)
substr（'我爱北京天安门abc'，-3,3）
证明两个都是从左往右走

hive和java，Python不同的点

trim
select trim('  su  sf  ')
清除两侧的空白，不能清除中间的，
hive不能清除制表符和换行符
java和Python可以清除制表符（\t）和换行符(\n)

11、时间函数【全背】

时间，日期，时间标准格式，

时间转日期不成功的两种情况，

获取指定时间的部分信息（年，季度），获取时间差（3种情况），时间的增加和减少(按天来算)，

【时间转换为时间戳，将时间戳转换为时间类型，时间类型格式成自己想要的形式。_format】明天早上背

总结：不管是转换还是提取，都需要标准的时间格式

select `current_date`();
select `current_timestamp`();  timestamp  tablesample
yyyy-MM-dd HH:mm:ss
'2023-04-06 14:46:47.040000000'
select 格式不对（2023年11月1日）
select 日期残缺（2023-11-）
select year(`current_timestamp`())
select datediff(时间，时间)或者（日期，时间）
select dateadd(日期或者时间 +5/-4)
select
select
select
select
select
select
select

12、数学函数（1，）

获取从1-7的随机整数怎么搞??
ceil(rand()*7) 
获取从5-10的随机整数怎么搞??
ceil(rand()*5 +5)
向上取整，向下取整
 ceil      fioor
rand()
round()

13、条件函数（极其重要）【后面新零售项目，每个都用到它，先把它刷5次】（1，2，3）

1、if条件函数（hive的）昨天的if条件函数（shell的）
hive:
    if (条件 ，true返回的数据，false返回的数据)
    select name ,if(gender='男','男生','女生') from table_name；
2、空值类型  空值判断，
    null=null返回的还是null，并非布尔类型
    is null
    is notnull
3、空值替换  
    nvl（字段，默认字段）
    create table table_name1 as select if(gender='男',null,'女生') from table_name2；
    select nvl(gender,'男生') from table_name2；
4、获取第一个不为null的数据
    coalesce (1,null,3,null)
    coalesce (`array` (null,1,3))
    可以接收array类型，但是会把单个array看成一个整体
5、case when 的两种用法。[orderid  paytime totalmoney1  paypyte

select                            值  返回数据
    orderid,
    paytime,
    case totalmoney1
        when 0 then '现金'(这个位置绝对不能有逗号)
        when 2 then '微信'
        else '未知'
    end
    as totalmoney2，
    paypyte
from table_name；

select                          判断条件  返回数据
    orderid,
    paytime,
    case 
        when totalmoney1 = 0 then ''
        when totalmoney1 = 2 then ''
        else '未知'
    end
    as totalmoney2，
    paypyte
from table_name；

14、数据类型转换

cast（原有数据 as 想要的数据类型）
cast('123.4' as int);

15、其它函数（哈希，CRC）

哈希取值（分桶排序，union默认排序）
CRC循环冗余码校验 手机下载软件时进行包的完整性检测

16、集合函数【理解】

array_concatins--》判断是否在内部  array_contains(array(1,null,4,2,7),7)判断7是否在array内
sort_array(1,3,2,0,4)--排序

17、CET表达式【非常重要】（1，）

with 临时表名1 as 查询集1，临时表名2 as 查询2，临时表名3 as查询集3
查询集3可以使用临时表名1和2
优势：只需要加载一次表到内存当中，不使用CET时，需要读取一次加载一次
with table_name1 as（select * from table_name2）,
    table_name3 as (select * from table_name1)
select name ,id from table_name3;

18、炸裂函数和侧视图【理解】

array + map 炸裂 lateral view视窗

select explode （`map`()）
两列N行  炸成K列和V列
select explode (`array`(1,2,3,4,,5,6))一列6行

案例：

1、建表
create table table_name1(
    id int,
    years array<sting>
)
row format delimited 
    collection trims terminated by "|"
    
2、如何炸开？
方案一：
select explode(字段名) as 别名 from table_name1 b(炸成后的表起别名，一定要起)
3、利用侧视图合并表
select name , year from table_name1 lateral view explode(字段) b(表别名)  as字段别名；

N、今天用到的函数

rand()随机返回一个浮点数  
round()保留几位小数

单词：
extended：扩充的
coalesce: 
trim:
substr:
floor:
ceil:

记忆不清

获取第一个不为空的值

contains
contains
coalesce
coalesce

中长跑路上crush

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HIVE必备语法

1、数据加载【必须掌握】（1，2，3，4，5）从本地文件加载： hadoop fs -put 本地路径 hdfs路径; load data local path 本地路径 into table 表名; load data local path 本地路径 overwrite into table 表名;从一个表加载到另一个表： insert into table table_name select语句--插入 insert overwrite table t
复制链接

扫一扫