Hive 表加载，导出和查询

最新推荐文章于 2023-09-07 15:55:21 发布

枫火燕归林

最新推荐文章于 2023-09-07 15:55:21 发布

阅读量856

点赞数

分类专栏： Hadoop Hive MapReduce 文章标签： hive 加载导出查询

本文链接：https://blog.csdn.net/huoyuyan/article/details/50849005

版权

Hadoop 同时被 3 个专栏收录

8 篇文章 0 订阅

订阅专栏

Hive

5 篇文章 0 订阅

订阅专栏

MapReduce

5 篇文章 0 订阅

订阅专栏

Hive表的数据加载

加载本地文件到数据表

$ local data local inpath '/../../.' into table table_name;

这里写图片描述

加载hdfs文件到hive表

$ load data inpath '/load_students' into student_load_hdfs;

这里写图片描述

覆盖表中所有数据

overwrite 关键字

$ local data local inpath '/../../.'overwrite  into table table_name ;

这里写图片描述

创建表时通过select加载

create table load_create_select as select stuid,stuname from student_load_hdfs;

这里写图片描述

创建表时通过insert加载

$ create table load_create_insert like student_load_hdfs;  
insert into table load_create_insert select * from student_load_hdfs

创建表时，指定location加载

dfs -mkdir -p /user/hive/warehouse/db_hive.db/load_location_put

这里写图片描述

dfs -put /opt/modules/datafiles/load_students   /user/hive/warehouse/db_hive.db/load_location_put/;

这里写图片描述

create table  if not exists load_location_put(
stuid   int ,
stuname string ,
stuage  int 
)
row format delimited fields terminated by '\t'
stored as textfile

这里写代码片

$ location '/user/hive/warehouse/db_hive.db/load_location_put';

这里写图片描述

导出数据(对hive表数据分析后保存)

保存到本地目录文件中

insert overwrite local directory '/opt/modules/datafiles/'  select * from load_location_put

这里写代码片

保存到hdfs目录

insert overwrite directory '/hive_imp/load_localtion_put' select * from load_localtion_put;

这里写图片描述

重定向结果

在hive外部执行

bin/hive "select * from 
db_hive.load_location_put;">/opt/modules/datafiles/load_location_put.txt

这里写图片描述

外部分区表

建表

create external table  if not exists student_external_part(
stuid   int ,
stuname string ,
stuage  int 
)
partitioned  by (class string) 
row format delimited fields terminated by '\t'
stored as textfile
location '/user/student_external_part'
;

加载数据

load data loacal inpath '/,./../.' into table  table_name partition(class='ss');

分区表与常规表加载数据的区别

创建常规表，加载数据
这里写图片描述

创建常规表与分区表对比
这里写图片描述

加载数据前，外部数据表无数据
这里写图片描述
加载数据前外部分区表的分区

加载数据到分区

alter table student_part add partition(class='01');

这里写图片描述
查看分区表信息

通过UI查看信息
这里写图片描述

常规表上传数据到hdfs的表路径中，然后创建表，数据会加载到表中，而分区表则不会，还需要使用alter来加载分区

export，import 数据

export(从hive导出到hdfs) 数据

export  table student_load_hdfs to '/hive_inp/export_tables'

这里写图片描述
通过UI查看

import (从hdfs导入到hive表)数据

import table student_load_hdfs2  from '/hive_inp/export_tables';

这里写图片描述

sort by，distribute by ，cluster by ，order by 的作用

order by

全局排序，只作用于一个reduce

sort by

对每一个Reduce内部的数据进行排序
>set mapreduce.job.reduces=3;
>select * from emp sort by empno desc;  
>insert overwrite local directory '/opt/datafiles/sort-by/ ' select * from emp sort by empno desc;

这里写图片描述
结果截图

distribute by

作用类似于分区partitioner ，底层是mapreduce，通常与sort by进行使用，在sort by之前使用

set setmapre.job.reduces=3  
insert overwrite local directory '/opt/datafiles/sort-by/ ' select * from emp  distribute by deptno sort by empno desc;

cluster by

当sort by，distribute by 字段相同时，，用clustor代替

insert overwrite local directory '/opt/datafiles/sort-by/ ' select * from emp  cluster by empno desc;

这里写图片描述

输出结果
这里写图片描述

我这里因为插入数据间的类型不为制表符的原因导致输出的类型有问题

枫火燕归林

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录