hadoop离线阶段（第十四节—2）hive数据加载、数据导入与导出、基本查询及排序

最新推荐文章于 2024-09-27 00:45:01 发布

hwq317622817

最新推荐文章于 2024-09-27 00:45:01 发布

阅读量202

点赞数

文章标签： hive 大数据

本文链接：https://blog.csdn.net/hwq317622817/article/details/110759303

版权

本文详细介绍了Hive中的数据加载方法，包括通过load data、查询、多插入模式以及创建表时指定加载路径。此外，还讲解了数据的导入与导出，如export和import操作，以及将数据导出到不同位置的方式。在查询语法部分，讨论了全表查询、选择特定列、列别名、常用函数、LIMIT、WHERE、比较运算符、LIKE和RLIKE、逻辑运算符、GROUP BY和JOIN。最后，探讨了排序的各种方式，包括全局排序、局部排序以及DISTRIBUTE BY和CLUSTER BY的使用。

摘要由CSDN通过智能技术生成

hive数据加载

通过load data加载文件中的数据

load data local inpath '/export/servers/hivedatas/score.csv' [overwrite] into table score partition(month='201806'); #加本地文件的数据加载到hive表，需要续写时不要加overwrite

load data inpath '/export/servers/hivedatas/score.csv' [overwrite] into table score partition(month='201806'); #将hdfs上的文件的数据加载到hive表，需要续写时不要加overwrite

通过查询加载数据

create table score2 like score; #创建一个空表
insert overwrite table score2 partition(month = '201806') select s_id,c_id,s_score from score; #将查询结果加载表中

从一个表向多个表加载数据（多插入模式）

create table score_first(s_id string,c_id string); #创建第一个空表
create table score_second(c_id string,s_score int); #创建第二个空表
# 从一个表向多个表加载数据
from score
insert overwrite table score_first partition(month='201806') select s_id,c_id
insert overwrite table score_second partition(month='201806') select c_id,s_score;

通过查询创建表

create table score5 as select * from score;

创建表时通过location指定加载数据路径

# 需要注意这里的路径需要是hdfs的路径
create external table score6 (s_id string,c_id string,s_score int) row format delimited fields terminated by '\t' location '/myscore6';