hive建表,分区,数据操作

1,小明1,lol-book-movie,beijing:shangxuetang-shanghai:pudong
2,小明2,lol-book-movie,beijing:shangxuetang-shanghai:pudong
3,小明3,lol-book-movie,beijing:shangxuetang-shanghai:pudong
4,小明4,lol-book-movie,beijing:shangxuetang-shanghai:pudong
5,小明5,lol-movie,beijing:shangxuetang-shanghai:pudong
6,小明6,lol-book-movie,beijing:shangxuetang-shanghai:pudong
7,小明7,lol-book-movie,beijing:shangxuetang-shanghai:pudong
8,小明8,lol-book-movie,beijing:shangxuetang-shanghai:pudong

内部表:

建表 :

create table psn
(
id int,
name string,
likes array<string>,
address map<string,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':';

加载数据,类似于sql的insert

load data local inpath '/root/data/data' into table psn;


外部表:

先创建文件,再创建表;

hdfs  dfs -mkdir /usr


hdfs dfs -put data /usr

外部表

create external table psn4
(
id int,
name string,
likes array<string>,
address map<string,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
location '/usr/';

内部表和外部表的区别:
1、创建表的时候,内部表直接存储再默认的hdfs路径,外部表需要自己指定路径
2、删除表的时候,内部表会将数据和元数据全部删除,外部表只删除元数据,数据不删除

注意:hive:读时检查(实现解耦,提高数据记载的效率)
关系型数据库:写时检查

分区(每一个分区 相当于一个新的层次结构):
单分区

create table psn5
(
id int,
name string,
likes array<string>,
address map<string,string>
)
partitioned by(age int)
row format delimited 
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':';

加载数据(插入数据):

load data local inpath '/root/data/data' into table psn5 partition(age=10);

双分区(多分区):

create table psn6
(
id int,
name string,
likes array<string>,
address map<string,string>
)
partitioned by(age int,sex string)
row format delimited 
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':';

加载数据(插入数据):

load data local inpath '/root/data/data' into table psn6 partition(age=10,sex='man');

分区是按照 name匹配的,可以乱序

分区是在表定义时就给定了的,无法在后面添加,但是可以添加/修改分区的值,(这里就是创建新的目录),不添加值。加的时候,分区节点不能少
alter table psn6 add partition(age=30,sex=‘woman’);

但是删的时候可以少:

alter table psn6 drop partition(age=20);

给新创建的表添加已存在的分区:
[root@node4 data]# hdfs dfs -mkdir -p /usr/age=10
[root@node4 data]# hdfs dfs -mkdir -p /usr/age=20

[root@node4 data]# hdfs dfs -put data /usr/age=10/
[root@node4 data]# hdfs dfs -put data /usr/age=20/

create external table psn7
(
id int,
name string,
likes array<string>,
address map<string,string>
)
partitioned by(age int)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
location '/usr/';

修复分区:
msck repair table psn7;

静态分区:分区的值是认为指定的
动态分区:分区的值是由记录的某一列值来决定的

创建表的第二种方式:
create table psn8 as select * from psn7;

创建表的第三种方式:
create table psn9 like psn7;

hive 动态分区
开启支持动态分区
set hive.exec.dynamic.partition=true;
默认:true
set hive.exec.dynamic.partition.mode=nostrict;
默认:strict(至少有一个分区列是静态分区)
相关参数
set hive.exec.max.dynamic.partitions.pernode;
每一个执行mr节点上,允许创建的动态分区的最大数量(100)
set hive.exec.max.dynamic.partitions;
所有执行mr节点上,允许创建的所有动态分区的最大数量(1000)
set hive.exec.max.created.files;
所有的mr job允许创建的文件的最大数量(100000)

动态分区实现案列:

1,小明1,11,girl,lol-book-movie,beijing:shangxuetang-shanghai:pudong
2,小明2,12,boy,lol-book-movie,beijing:shangxuetang-shanghai:pudong
3,小明3,13,girl,lol-book-movie,beijing:shangxuetang-shanghai:pudong
4,小明4,14,boy,lol-book-movie,beijing:shangxuetang-shanghai:pudong
5,小明5,15,girl,lol-movie,beijing:shangxuetang-shanghai:pudong
6,小明6,16,boy,lol-book-movie,beijing:shangxuetang-shanghai:pudong
7,小明7,17,girl,lol-book-movie,beijing:shangxuetang-shanghai:pudong
8,小明8,18,boy,lol-book-movie,beijing:shangxuetang-shanghai:pudong

创建第一张表:

create table psn21(
id int ,
name  string,
age int,
gender string,
likes array<string>,
address map<string,string>
)
row format delimited
fields terminated by  ','
collection items terminated by '-'
map keys terminated by ':';

将数据加载到表中:

load data local inpath  '/root/data/xiaomin' into table psn21;

创建第二张表:

create table psn22(
id int,
name string,
likes array<string>,
address map<string,string>
)
partitioned by(age int,gender string)
row format delimited
fields terminated  by ','
collection items terminated by '-'
map keys terminated by ':';

将 第一张表中的数据按照分区规则加载到第二张表中:

from psn21
insert into psn22 partition(age,gender)
select id,name,likes,address,age,gender;

或者:

insert into psn22 partition(age,gender)
select id,name,likes,address,age,gender from psn21;
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值