hive 的入门及基本操作

最新推荐文章于 2024-05-15 02:32:10 发布

IT小鸟鸟

最新推荐文章于 2024-05-15 02:32:10 发布

阅读量259

点赞数

分类专栏： hive 文章标签： hive hdfs sql

本文链接：https://blog.csdn.net/u013111855/article/details/102935973

版权

hive 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

hive的ddl操作

DDL：data define language 数据定义语言

数据库的操作

1）创建数据库

	create database if not exists dbname;

2)切换库

	use dbname;

3)查看正在使用的库（通过调用内部函数来查询）

	select current_database();

4)查看数据库列表

	show databases;    查询所有数据库列表
	show databases like " test* "；   查看名称以 test 开头的数据库。

5)查看库的详细描述信息

	desc database dbname;   查看数据库的描述信息
	desc database extended dbname;  查看数据库的详细描述信息（了解即可）

eg：desc database db1906;
显示信息如下：
OK
db1906 hdfs://myha01/user/hive/warehouse/db1906.db hadoop USER
Time taken: 0.06 seconds, Fetched: 1 row(s)

6）删除数据库
- drop删除

		drop database if exists dbname;
		eg：drop database if exists test01;

注意：drop不能删除非空数据库的，只能删除空数据库

级联删除（可以删除非空数据库）

	drop database if exists dbname cascade; 
	eg：drop database if exists test01 cascade;

默认的删除（drop删除其实就是默认删除）

   drop database if exists dbname restrict；
   eg：drop database if exists test01 restrict；

7）修改数据库
hive不支持修改数据库

表的操作

1）建表

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(字段名字段类型 [COMMENT 字段描述], …)]
[COMMENT 表描述]
[PARTITIONED BY (字段名字段类型 [COMMENT 字段描述], …)]
[CLUSTERED BY (col_name, col_name, …)
[SORTED BY (col_name [ASC|DESC], …)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]

说明：

1.1）EXTERNAL ：是外部表关键字，不加上的话，默认是内部表。
1.2）IF NOT EXISTS 建表防止报错
1.3）字段类型 int (tinyint smallint bigint)、 String等
1.4）COMMENT 描述信息的
1.5）PARTITIONED BY 指定分区分区表标识
注意：分区字段一定不能在建表字段中，也就是说，分区字段是一个全新的字段
1.6）CLUSTERED BY 是指定分桶的。后面跟2个核心数据：分桶字段和分桶个数。
每一个桶中的数据为：分桶字段.hash % 分桶个数。
后面加上：INTO 桶个数 BUCKETS。

SORTED BY (col_name [ASC|DESC], …) 是指定每一个桶内是否排序的。
注意：分桶字段，一定是建表字段中的
1.7)ROW FORMAT：是指定格式化，通常情况下用于指定列之间的分隔符。
hive中最常用的数据加载方式是：load 方式。将一个本地数据或hdfs数据（文件）加载到hive的表中的。format 就是将 hive 中的列和文件中的字段关联起来。所以ROW FORMAT 一般是指定列之间的分隔符的。

fields terminated by ","   ---->字段(列)之间的分隔符
lines terminated by "\n"    ---->行之间的分隔符
collection keys terminated by ","  ---->集合元素之间的分隔符

1.8)STORED AS 指定hive的表数据，在hdfs的存储格式的。
默认格式是：textfile 文本格式
SEQUENCEFILE | 二进制
RCFILE：行列结合的一种存储方式。
1.9）LOCATION：指定hive的表数据在hdfs的存储路径。
如果在这里指定了，将会覆盖默认路径：/user/hive/warehouse

案例：

 student.txt    95002,刘晨,女,19,IS
 
 1）创建一个内部表
create table if not exists stu_managed(id int,name string,sex string,age int,dept string) row format delimited fields terminated by "," stored as textfile;

 2）创建一个外部表
 create external table if not exists stu_external(id int,name 
 string,sex string,age int,dept string)row format delimited fields 
 terminated by "," stored as textfile;


 3）创建一个分区表：
分区字段：查询业务时，经常过滤的字段。在生产上，一般用 日期 作为分区字段。
分区字段：dept 
PARTITIONED BY  指定分区  分区表标识
注意： 分区字段一定不能在建表字段中，分区字段是一个全新的字段。

eg：create table if not exists stu_ptn(id int,name string,sex string,age int) partitioned by (dept string) row format delimited fields terminated by ",";


4）创建一个分桶表
CLUSTERED BY：指定分桶
    分桶字段  分桶个数
    每一个桶  分桶字段.hash % 分桶个数
    INTO 桶个数 BUCKETS
    
SORTED BY (col_name [ASC|DESC], ...)：指定每一个桶的是否排序
注意：分桶字段，一定是建表字段中的。

分桶字段：age  关联键    分桶个数：3

eg：create table if not exists stu_buk(id int,name string,sex string,age int,dept string) clustered by (age) sorted by (age desc) into 3 buckets row format delimited fields terminated by ",";

5）表复制
 like 
 表复制：只复制表结构(字段)，不会复制表属性。
 
create table tbname like tbname2;
eg：create external table stu01 like stu_managed;


 6）ctas语句建表
 create table tbname as select.....

2）查看表列表

show tables;

3）查看表的详细信息

desc tbname;
desc extended tbname; 查看表的详细信息 (了解即可)
desc formatted tbname; 查看表的详细信息格式化显示

4）修改表

4.1 修改表的列信息

4.1.1 修改表的列名、类型

alter table tbname change col col1 type;

alter table stu01 change id sid int;
alter table stu01 change sid sid string;  (从int--> string可以)
alter table stu01 change sid sid int;  （从string-->int 不可以）

修改类型时，只能从类型范围小的变为大的，不可逆向。（小---->大）

4.1.2 添加列

alter table tbname add columns (col type);

eg：alter table stu01 add columns(address string);

4.1.3 替换列

替换列是将整个表的所有列，替换为指定的列。

alter table tbname replace columns(col type);
alter table stu01 replace columns(idd int,names string);

4.1.4 删除列

不支持删除列

4.2 修改表的分区信息

4.2.1 添加分区

......./stu_ptn/dept="is"
    相当于给分区字段指定值
    
alter table tbname add partition(分区字段=分区值);
eg：alter table stu_ptn add partition(dept="IS");

一次性添加多个分区
alter table tbname add partition(分区字段=分区值) partition(分区字段=值);
eg：alter table stu_ptn add partition(dept="CS") partition(dept="MA");

4.2.2 修改分区

修改分区，主要指：修改分区的存储路径。

1、添加分区的时候直接指定路径
alter table tbname add partition(分区字段=分区值) location "hdfs path ";
eg：alter table stu_ptn add partition(dept="test") location "/data/hive/test";

2、修改已经存在的分区的存储路径
alter table tbname partition(分区字段=分区制) set location "hdfs path" 
eg：alter table stu_ptn partition(dept="test") set location "/user/hive/warehouse/bd1904.db/stu_ptn"; 
添加数据，创建对应目录

4.2.3 查询分区

只指定一个分区字段时，显示的结果是全部分区。
    show partitions tbname;
    eg：show partitions stu_ptn;
指定一个分区字段，是一级分区。

分区字段有多个的时候，叫多级分区。会有多个层级目录结构。
eg： select * from stu where age=18 and address="bj";

show partitions stu_ptn partition (高级分区);  查看某个高级分区下的所有子分区

4.2.4 删除分区

    属于表权限  
    alter table tbname drop partition(分区字段=分区制)；
    alter table stu_ptn drop partition(dept="IS");

5）查看表的详细建表语句

show create table tbname;
eg：show create table stu_buk;

6）清空表

truncate table tbname; 清空表数据

清空表数据这个操作（权限），只能针对内部表。内部表清空表的时候，会将hdfs的目录下的文件也删除了。

7）删除表

drop table if exists tbname;

hive的dml操作

DML：data manipulation language 数据操作语言

1)向表中添加数据

1.1 load方式

将一个已经存在的文件（本地或 hdfs）加载到hive表中，按照hive表中指定的分割方式进行解析。

语法：
    load data [local] inpath "path" into table tbname;

说明：
   local：代表数据来源，数据来自于本地。
   不加 local 关键字，则数据来自hdfs。

案例：

1）将数据从本地加载到hive的表中

	load data local inpath "/home/hadoop/tmp_data" into table stu_managed;

推断：数据加载过程，是将数据从指定的路径下，挪（复制）到了hive表所在的路径下。
测试：手动将数据上传hive表hdfs的对应的路径下，也是可以解析的。

2）将数据从hdfs 加载到hive表中

    load data inpath "/mydata" into table stu_managed;

此操作将hdfs的指定路径数据，移动到hive表所在的hdfs的路径。
load的操作本质：将数据挪到 hive表所在的目录下。只要数据在hive表所在的目录下，hive表可以自动解析。

1.2 insert方式

1.2.1 单条数据插入（一次插入一条数据）

insert into table tbname values();
eg：insert into table stu_managed values(1,"zs","f",18,"CS");

这种方式，最终是将数据，以存储文件的形式，存储在hdfs上。数据的字段分隔符，是建表时指定的分隔符。此方式效率低。

1.2.2 单重数据插入（一次性插入一个sql的查询结果）

这种方式，是将一个sql的查询结果（多条），一次性插入到表中。

insert into table tbname select ....
eg：insert into table stu_external select * from stu_managed where age=18;

1.2.3 多重数据插入

一次扫描表，但是最终将多个查询结果，插入到多张表中，或者一个表的多个分区中。

 语法：
	  from tbname 
	  insert into table tb1 select ... 
	  insert into table tb2 select .....
	  
eg：将数据插入多个表中：
扫描stu_managed表， age=18插入tb1， age=19插入tb2
sql语句如下：
from stu_managed 
insert into table tb1 select * where age=18 
insert into table tb2 select * where age=19;


eg：将数据插入多个分区中：
注意：分区表中的数据是分2块存储的，所以对分区表数据的操作，一定要指定分区名。
分区字段：存储在 /user/hive/warehouse/test.db/stu_ptn 上
普通字段：就是表数据文件
sql语句如下：
from stu_managed  
insert into stu_ptn partition(dept="IS") select id,name,sex,age where dept="IS" 
insert into stu_ptn partition(dept="MA") SELECT id,name,sex,age where dept="MA";

IT小鸟鸟

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive 的入门及基本操作

hive 的基本操作hive的ddl操作数据库的操作表的操作1）建表2）查看表列表3）查看表的详细信息4）修改表4.1 修改表的列信息4.1.1 修改表的列名、类型4.1.2 添加列4.1.3 替换列4.1.4 删除列4.2 修改表的分区信息hive的dml操作hive的ddl操作DDL：data define language 数据定义语言数据库的操作1）创建数据库create da...
复制链接

扫一扫