HIVE查询语法

最新推荐文章于 2023-02-13 21:58:10 发布

VIP文章学习·笔记

最新推荐文章于 2023-02-13 21:58:10 发布

阅读量5.3k

点赞数

分类专栏： Hive 文章标签： HIVE查询语法

本文链接：https://blog.csdn.net/weixin_45737446/article/details/103397691

版权

SELECT

需要准备的数据:

创建分区表语法
create table score(s_id string,c_id string, s_score int)
partitioned by (month string) row format delimited
fields terminated by ‘\t’;

加载数据到分区表中

load data local inpath '/export/servers/hivedatas/score.txt' 
into table score partition (month='201806');

score.txt

基本的Select操作

语法结构:

SELECT [ALL | DISTINCT] select_expr, select_expr, …

FROM table_reference

[WHERE where_condition]

[GROUP BY col_list [HAVING condition]]

[CLUSTER BY col_list
| [DISTRIBUTE BY col_list] [SORT BY| ORDER BY col_list]
]

[LIMIT number]

注：
1、order by 会对输入做全局排序，因此只有一个reducer时，会导致当输入规模较大时，需要较长的计算时间。
2、sort by不是全局排序是输入做全局排序，其在数据进入reducer前完成排序。因此，如果用sort by进行排序，并且设置mapred.reduce.tasks>1，则sort by只保证每个reducer的输出有序，不保证全局有序。

3、distribute by(字段)根据指定的字段将数据分到不同的reducer，且分发算法是hash散列。

4、Cluster by(字段) 除了具有Distribute by的功能外，还会对该字段进行排序。
因此，如果分桶和sort字段是同一个时，此时，cluster by = distribute by + sort by
分桶表的作用：最大的作用是用来提高join操作的效率；

全表查询

select * from score;

选择特定列查询

最低0.47元/天解锁文章

学习·笔记

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HIVE查询语法

SELECT需要准备的数据:创建分区表语法create table score(s_id string,c_id string, s_score int)partitioned by (month string) row format delimitedfields terminated by ‘\t’;加载数据到分区表中load data local inpath '/export...
复制链接

扫一扫