Hive 基础

最新推荐文章于 2020-12-01 19:54:24 发布

左尔@

最新推荐文章于 2020-12-01 19:54:24 发布

阅读量162

点赞数

本文链接：https://blog.csdn.net/SmionFox/article/details/101306932

版权

hive执行的三种模式

这样我们创建了一个test_1表，两个字段id和name，用”,”作为分割符

hive -e “use default;create table tset_1(id int,name string) row format delimited fields terminated by ','; 在hive下执行
hive -f a.hql 以文件形式执行
vi a.hql
use default;
create table test_2(id int,name string,age int)
row format delimited
fields terminated by ‘,’;
load data local inpath ‘/home/userinfo.txt’ into table test_2;
select count(*) from test_2;
在hadoop1(虚拟机）上执行
在/usr/local/src/hive-1.2.1/bin下有一个服务：hiveserver2
我们启动这个服务 ./hiveserver2 启动完毕，光标定到下一行
在Hadoop2(虚拟机）同样进入 hive的bin目录，

[root@hdp01 bin]# ./beeline 
Beeline version 1.2.1 by Apache Hive
beeline> !connect jdbc:hive2://hdp02:10000
Connecting to jdbc:hive2://hdp02:10000
Enter username for jdbc:hive2://hdp02:10000: root
Enter password for jdbc:hive2://hdp02:10000: 
Connected to: Apache Hive (version 1.2.1)
Driver: Hive JDBC (version 1.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hdp02:10000>

这样我们就链接到了hdp02的服务端了，我们可以在这输入一些hive的语句
在这里插入图片描述

建表

表定义信息会被记录到hive的元数据中(mysql的hive库)
会在hdfs上的hive库目录中创建一个跟表名一致的文件夹
往表目录中放入文件就有数据了
test_1下并没有数据，此时我们进入到hive中
hive>desc test_1； 查看字段
我们到另一个机器上创建一个文件，好后上传到hdfs对应的test_1表对应的目录下
[root@hdp02 home]vi test_1.txt 添加字段
[root@hdp02 home]hadoop fs -put test_1.txt /user/hive/warehouse/test_1
返回到原虚拟机上执行
hive> select * from test_1;

我们去另一个虚拟机在建一个文件
[root@hdp02 home]# vi test_1.txt.1 添加字段
[root@hdp02 home]# hadoop fs -put test_1.txt.1 /user/hive/warehouse/test_1
返回原虚拟机
hive> select * from test_1;

变化：后面的数据将添加到前面数据末尾

内部表和外部表（external）

内部
hive> create table t_2(id int,name string,salary bigint,add string) row format delimited fields terminated by ',';

外部
hive>create external table t_3(id int,name string,salary bigint,add string) row format delimited fields terminated by ',' location '/aa/bb';

区别：
内部表的目录由hive创建在默认的仓库目录下：/user/hive/warehouse/…外部表的目录由用户建表时自己指定： location ‘/位置/’
drop一个内部表时，表的元信息和表数据目录都会被删除；
drop一个外部表时，只删除表的元信息，表的数据目录不会删除；

分区关键字（Partitioned by)

hive> create table test_44(ip string,url string,staylong int) partitioned by (day string) row format delimited fields terminated by ','; 用day string分区
注意分区的day不能存在于表字段中

day=2019-05-10区
准备数据
[root@hdp01 home]# vi pv.data.2019-05-10
192.168.9.10,www.a.com,1000
192.168.10.10,www.b.com,100
192.168.11.10,www.c.com,900
192.168.12.10,www.d.com,100
192.168.13.10,www.e.com,2000
导入数据到不同的分区目录：
hive>load data local inpath '/home/pv.data.2019-05-10' into table test_4 partition(day='2019-05-10');
查看hdp01:50070的 /user/hive/warehouse/test_4
可以看到有一个day=2019-05-10的文件夹，说明分区成功

day=2019-05-11区
准备数据
[root@hdp01 home]# vi pv.data.2019-05-11
192.168.9.10,www.a.com,1000
192.168.10.11,www.b.com,100
192.168.11.12,www.c.com,900
192.168.12.13,www.d.com,100
192.168.13.14,www.e.com,2000
导入数据到不同的分区目录：
hive>load data local inpath '/home/pv.data.2019-05-10' into table test_4 partition(day='2019-05-11');
查看hdp01:50070的 /user/hive/warehouse/test_4
可以看到有一个day=2019-05-10，一个day=2019-05-11的文件夹，说明分区成功

查询：
hive> select * from test_4; 全表查询内容
hive> select * from test_4 where day=“2019-05-11”; 查找day=2019-05-11分区的内容

查看2019-05-11这天的访问人数
hive> select distinct ip from test_4 where day="2019-05-11";

导入数据

1.将hive运行所在机器的本地磁盘上的文件导入表中
hive> load data local inpath '/home/pv.data.2019-05-11' overwrite into table t_1;
//覆盖t_1表
2.将hdfs中的文件导入表中
hive> load data inpath '/user.data.2' into table t_1;
不加local关键字，则是从hdfs的路径中移动文件到表目录中；
3. 从别的表查询数据后插入到一张新建表中
hive> create table t_1_jz as select id,name from test_1;
4.从别的表查询数据后插入到一张已存在的表中
hive> create table t_1_hd like test_1;
hive>insert into table t_1_hd select id,name from test_1 where name='lis';
5.关于分区数据导入另外一张表
hive> create table t_4_hd like test_4;
hive> insert into table t_4_hd partition(day='2019-05-10') select ip,url,staylong from test_4 where day='2019-05-10';

导出数据

将数据从hive的表中导出到hdfs的目录中
hive> insert overwrite directory '/aa/bb'
select * from test_1 where name='lis';

即使hdfs中没有/aa/bb/目录，也会自动生成
将数据从hive的表中导出到本地磁盘目录中
insert overwrite local directory '/aa/bb'
select * from test_1 ;

修改表的分区

查看表的分区 show partitions 表名;
添加分区
hive> alter table test_4 add partition(day='2019-05-12') partition(day='2017-04-13');
添加完成后，可以检查t_4的分区情况：
hive> show partitions test_4;
删除分区
hive> alter table test_4 drop partition (day='2019-05-13');
hive> select * from test_4;

修改表的列定义

查看t_seq表的定义
hive> desc t_seq;
添加列：
hive> alter table t_seq add columns(address string,age int);
将会添加address和age两个列名称，类似id
修改已存在的列定义：
hive> alter table t_seq change id uid string;