Hive学习之路-2

最新推荐文章于 2023-12-25 15:47:38 发布

Topersuit

最新推荐文章于 2023-12-25 15:47:38 发布

阅读量147

点赞数 1

分类专栏： Hive

本文链接：https://blog.csdn.net/qq_25100219/article/details/100761430

版权

Hive 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

上篇讲了Hive的技术框架，应用场景，表分类，本篇文章我们着重讲Hive具体使用。

一、Hive的访问方式

1.beeline方式

beeline -u jdbc:hive2://hadoop02:10000 或 beeline （不同厂商小有区别）

备注：现在基本都是使用hiveserver2，高并发情况下性能优于hive1

2.API的方式

hive支持Java通过JDBC的方式操作数据库。

二、Hive的操作

1.查看数据表

show tables;

2.查看数据库

show databases;

3.创建并使用数据库

create database info;

use info;

4.查看表详细信息

desc 表名；

5.删除数据表

drop table 表名；

6.清空数据表

truncate table 表名；

7.创建表

普通表：

create table hello(name string,age int ) 
row format delimited fields terminated by ','       #列分隔符
lines terminated by '\n'          #行分隔，\n为单分隔符
stored as textfile;        #默认不指定存储格式，为RC格式

备注：Hive表数据默认存储在HDFS的/user/hive/warehouse目录下，元数据存在mysql或者gaussdb数据库里面。

外表：

create external table hello(name string,age int) 
row format delimited fields terminated by ','   
stored as textfile   
lines terminated by '\n'                           
location '/tmp/test/';             #指定HDFS目录，不指定默认在/user/hive/warehouse

分区表：

create external table hello(id int,name string,address string,age int) 
PARTITIONED BY (months string)                         #按月建立分区
row format delimited fields terminated by ','   
stored as textfile                              
location '/tmp/test/';

分区表导入数据：

load  data  local  inpath  '/opt/demo.csv'  <overwrite>  into  table  hello  partition(months='2019-01');

备注：加上overwrite表示覆盖分区下数据，不加表示追加

添加分区：

alter table hello add if not exists partition(months='2019-02');

添加多个分区：

alter table hello add if not exists partition(months='2012-1') partition(months='2012-6') partition(months='2012-12');

删除分区：

alter table hello drop if exists partition(months='2019-01');

查看分区

show partitions hello;

分桶表：（可以针对分区或者表进行更细粒度的分桶操作，提升查询效率）

create table hello(
  id int,
  name string
)
clustered by(id) sorted by(name) into 4 buckets
row format delimited fields terminated by '\t'
stored as textfile;

备注：在使用桶表的之前，需要在beeline中设置下面参数，开启桶表开关。只针对当前session生效。

set hive.enforce.bucketing = true;

8.插入数据

insert into table hello values(001,'zhangsan','xian',34);
insert into table hello values(1,'name1','company1'),(2,'name2','company2');
insert into table test select * from hello;

9.删除数据

Hive不支持delete操作。

10.查询操作

select * from hello;    #全表扫描
select * from hello a;    #字段较多时，可以指定别名，a相当于字段别名，显示会比较规整
select count(*)  from hello;    #统计行数，count(1)类似
select * from hello limit 10;     #查看前10行
select distinct(id) from hello;    #去重

Hive更多SQL用法类似于传统关系型数据库，其他操作这里不过多赘述，可自行查阅。