Hive 的基本命令

最新推荐文章于 2024-07-16 13:43:40 发布

wlk_328909605

最新推荐文章于 2024-07-16 13:43:40 发布

阅读量375

点赞数

分类专栏： Hive 文章标签： hive

本文链接：https://blog.csdn.net/wlk_328909605/article/details/82153176

版权

Hive 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

如果Hive没有安装，可以看前一篇文章
1.启动Hive
前台启动hive：
[root@hadoop-slave02 bin]# hiveserver2
后台启动hive：

[root@hadoop-slave02 bin]# nohup ./hiveserver2 &
[root@hadoop-slave02 bin]# beeline
beeline>  !connect jdbc:hive2://hadoop-slave02:10000

或者启动的时候直接连接：

[root@hadoop-slave02 bin]# beeline -u jdbc:jive2://hadoop-slave02:10000 -n root

2.hive创建数据库：

create database db_user;

库建好之后会在hdfs的/user/hive/warehouse/下有一个新的文件夹db_user.db。
3.基本建表语句：
这个是创建表的时候，hive会认为表数据文件中的字段分隔符是”,”。

create table t_user(id string,name string) 
row format delimited fields terminated by ',';

4.创建外部表，外部表可以用户自己指定表的路径

create external table t_user(id string,name string)
row format delimited fields terminated by ','
location '/user/table';

5.删除表
删除表的时候，注意是内部表还是外部表。
如果是内部表，删除表的时候会连着表中的数据删除，也会删除表的元数据，如果是外部表，指挥删除表的元数据，不会删除表中的数据，生产环境中大部分是外部表。

drop table t_user;

6.创建分区表
分区表的实质是：在表目录中为数据文件创建分区子目录，以便于在查询时，MR程序可以针对分区子目录中的数据进行处理，缩减读取数据的范围。

比如，网站每天产生的浏览记录，浏览记录应该建一个表来存放，但是，有时候，我们可能只需要对某一天的浏览记录进行分析
这时，就可以将这个表建为分区表，每天的数据导入其中的一个分区；
当然，每日的分区目录，应该有一个目录名（分区字段）
6.1 单个分区表

create table t_user(id string,name string,age string)
partitioned by(dt string)
row format delimited fields terminated by ',';

向分区表中导入数据：

load data local inpath '/root/access.log.2017-08-04.log' into table t_user partition(dt='15');
load data local inpath '/root/access.log.2017-08-05.log' into table t_user partition(dt='20');

查询的时候就可以查询自己想要的字段：

 select * from t_user where dt = '15';

6.2 创建多个分区字段实例

create table t_partition(id int,name string,age int)
partitioned by(department string,sex string,howold int)
row format delimited fields terminated by ',';

导数据：

load data local inpath '/root/p1.dat' into table t_partition partition(department='xiangsheng',sex='male',howold=20);

7.CTAS建表语法
7.1 可以通过已经存在的表建表

create table t_user_2 like t_user;

新建的表的结构与原先的表的结构完全一直，但是没有数据。
7.2 在建表的同时插入数据

create table t_access_user as select ip,uri from t_access;

8.数据导出
8.1、将hive表中的数据导入HDFS的文件

insert overwrite directory '/root/access-data'
row format delimited fields terminated by ','
select * from t_access;

8.2、将hive表中的数据导入本地磁盘文件

insert overwrite local directory '/root/access-data'
row format delimited fields terminated by ','
select * from t_access limit 100000;

9.基本查询语法
9.1基本查询示例

select * from t_user;

9.2 条件查询示例

select * from t_user where id = 1;

9.3join关联查询示例
9.3.1 内连接

select * from  t_a a join t_b b on a.name = b.name;

9.3.2 左外连接

select * from t_a a
left outer join t_b b
on a.name = b.name;

9.3.3 右外连接

select * from t_a a 
right outer join t_b b
on a.name = b.name;

9.3.4 全外连接

select * from t_a a
full join t_b b
on a.name = b.name;

9.3.5 left semi join

select t_a.* from t_a a 
left semi join t_b b
on a.name = b.name;

9.5分组聚合

select dt,count(*),max(ip) as cnt from t_access group by dt having dt > '20180829';

wlk_328909605

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hive 的基本命令

如果Hive没有安装，可以看前一篇文章 1.启动Hive 前台启动hive： [root@hadoop-slave02 bin]# hiveserver2 后台启动hive：[root@hadoop-slave02 bin]# nohup ./hiveserver2 &amp;amp;[root@hadoop-slave02 bin]# beelinebeeline&amp;gt; !con...
复制链接

扫一扫