动态分区
按照某个字段的值的内容将数据分文件夹管理 方便按照这个维度查询
1数据 2建表 3导入数据 4创建分区表 5开始动态设置 6向动态分区表中导入数据
0数据
cd /hive/data
vi user.txt
u001 zss 23 M beijiing
u002 yhh 23 M nanjing
u003 lss 43 M beijiing
u004 zy 23 M beijiing
u005 zm 23 M beijiing
u006 cl 23 M dongjing
u007 lx 23 M beijiing
u008 yz 23 M beijiing
u009 ym 23 M nanjiing
u010 xm 23 M beijiing
u011 xd 23 M beijiing
u012 lh 23 M dongjiing
u013 ftm 23 M dongjiing
1创建一个普通表 导入数据方式
create table if not exists tb_user(
uid string,
name string,
age int,
gender string,
address string
)
row format delimited fields terminated by " ";
load data local inpath “/hive/data/user.txt” into table tb_user;
2创建分区表
create table if not exists tb_p_user(
uid string,
name string,
age int,
gender string,
address string
)
partitioned by (addr string)
row format delimited fields terminated by " ";
3开启动态分区功能
set 参数=值;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrick; 可以从普通表导入数据
4动态导入数据
普通表5个字段
分区表 5个主字段 1个分区字段
插入数据的时候字段个数类型一致 最后一个字段就是分区字段
insert into tb_p_user partition(addr)
select uid, name, age, gender, address, address from tb_user;
分桶表 方便运算,提高效率
clustered by(uid) into num buckes -分桶语法
分桶表 将数据分文件存储,类似于分区
cd /hive/data
vi cluster.txt
uid name
1001 ss1
1002 ss2
1003 ls1
1004 ks1
1005 ps1
1006 hhh
1007 ss10
1008 ss11
1009 ss12
1010 ss13
1011 ss14
1012 ss15
1013 ss16
1014 ss9
1015 ss8
1016 ss7
1017 ss6
1018 ss5
1创建分桶表
create table if not exists tb_cluster(
uid int,
name string
)
clustered by (uid) into 4 buckets
row format delimited fields terminated by “\t”;
desc formatted tb_cluster;
2创建普通表 导入到普通表中
create table if not exists tb_cluster2(
uid int,
name string
)
row format delimited fields terminated by “\t”;
load data local inpath “/hive/data/cluster.txt” into table tb_cluster2;
3开启分桶
set hive.enforce.bucketing=true;