hive数据导出

最新推荐文章于 2023-05-23 09:46:41 发布

炽天使YRLT

最新推荐文章于 2023-05-23 09:46:41 发布

阅读量416

点赞数

分类专栏： hive学习之路文章标签： hive hadoop 大数据

本文链接：https://blog.csdn.net/eyexin2018/article/details/126125341

版权

hive学习之路专栏收录该内容

9 篇文章 0 订阅

订阅专栏

给外部表添加分区信息(静态方式创建分区)：

ALTER TABLE extpart_flow ADD PARTITION (year=2017, month=10,day=24)
LOCATION 'hdfs:///extpartition/2017/10/24';

创建普通表
导入数据
使用场景：在不知道分区数量的情况下，使用动态分区！

create table if not exists pv01(
id int,
name string,
sex string
)
partitioned by(age int)
row format delimited fields terminated by'\t'
stored as textfile;

在动态插入数据之前，必须设置hive为"非严格"模式
打开动态分区功能

hive >set hive.exec.dynamic.partition=true;

设置为非严格模式

hive >set hive.exec.dynamic.partition.mode=nonstrict;

不是必须的，默认每个节点可以创建的分区数量为100

hive >set hive.exec.max.dynamic.partitions.pernode=100;

将用户表按年龄分区，存储到分区表

insert into table pv01 partition(age) select id,name,sex,age from pv;

查看分区

show partitions pv01;

桶也是一样

创建普通表
导入数据
创建分桶表
创建分桶表(对stu表的eNo字段分桶)
注意：分桶字段和分区字段区别

create table if not exists bk_stu(
eNo int,
name string,
sex string,
age int
)
clustered by(eNo) into 4 buckets
row format delimited fields terminated by'\t'
stored as textfile;

注意：
强制多个 reduce 进行输出：
插入数据前需设置，不设置将会只有一个文件：

set hive.enforce.bucketing = true

要向分桶表中填充数据，需要将 hive.enforce.bucketing 属性设置为 true。
这样，Hive 就知道用表定义中声明的数量来创建桶。

注意这个定义，结果解，结果集
从普通表将数据插入到分桶表(注意;插入数据到分桶表时，只能以结果集
的方式插入数据)。

insert into table bk_stu select * from stu;

(1)、随机查询并且返回一桶数据：

select * from bk_stu tablesample(bucket 3 out of 4);

(2)、随机查询并且返回两桶数据：

select * from bk_stu tablesample(bucket 1 out of 2);

#数据块取样 (TABLESAMPLE (n PERCENT))抽取表大小的n%

select * from bk_stu tablesample(50 PERCENT);
select * from bk_stu tablesample(25 PERCENT);

#指定数据大小取样(TABLESAMPLE (nM)) M为MB单位

select * from bk_stu tablesample(1M);

#指定抽取条数(TABLESAMPLE (n ROWS))

select * from bk_stu tablesample(4ROWS);

数据导入

四种方式

1.从本地导入

load data local inpath '/..........'  into table user;

2.从hdfs导入

load data inpath '/........' into table user;

3.从别的表中查询数据并导入hive中

insert into table use select * from users;

4.在创建表的时候，通过从别的表中查询相关记录然后导入

create table user  as select * from users;//全量导入
create table user as select id from user where id<10;

数据导出

1.导出到本地

insert overwrite into local directory '/usr/'  select * from user;//结果集方式

如需指定分割符

insert overwrite into local directory '/usr/' row format delimitered fields terminated by '\t' selectl * from user;

2.导出到hdfs

insert overwrite into local directory '/source' select * from user;

3.导出到hive的另一个表中，
就是普通方式

insert into table  use select * from user;

如若表有基本类型和复合数据类型！

insert overwrite local directory'/home/hadoop/outdir' 
 row format delimited fields terminated by '\t'
 collection items terminated by','
 map keys terminated by':'
 select* from userses;

 insert overwrite directory'/hivedata/user.txt' into table user
 row format delimited fields terminated by '\t'
 collection items terminated by','
 map keys terminated by':'
 select* from userses;

 直接在Linux终端执行(把数据导出到本地系统)：

 hive -S -e 'select * from hive_db.userses'>>/home/hadoop/hivedata/logsdir/users.txt;

 //shell脚本封装执行

 #!/bin/bash
 HQL="insert overwrite local directory '/home/hadoop/hivedata/logsdir' select log from reglog;"
 hive -S -e "$HQL"

炽天使YRLT

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive数据导出

从普通表将数据插入到分桶表(
复制链接

扫一扫

专栏目录