Hive基本操作-库、表

菜鸟进阶站

于 2023-03-30 16:42:48 发布

阅读量467

点赞数

文章标签： hive 数据库 java

本文链接：https://blog.csdn.net/Hao1999_/article/details/129860959

版权

第三章 Hive基本操作-库、表

3.1 规则语法

大小写规则:

1. hive的数据库名、表名都不区分大小写
2. 建议关键字大写

命名规则：

1. 名字不能使用数字开头
2. 不能使用关键字
3. 尽量不使用特殊符号

3.2 库操作语法

3.2.1 创建数据库

创建数据库的本质就是在hive的参数${hive.metastore.warehouse.dir}对应的目录下，创建一个新的目录，此目录的名称为：库名.db。

注意：在创建库或者表时除了创建目录外，还会在mysql中(元数据库)，添加元数据（描述信息）

hive> create database wxb;
hive> create database if not exists wxb;
hive> create database if not exists wxb comment 'this is a database of mashangxing';

进入MySQL mysql -r root -p

show databases;

use hive;

show tables;

select * from DBS -- 数据库

select * from TBLS -- 表名（元数据）

hive有一个默认的数据库default，如果不明确的说明要使用哪个库，则使用默认数据库。

3.2.2 查看所有数据库：实际就是从元数据库中获取对应的元数据

语法：show databases;

3.2.3 切换数据库

语法：use 数据库名;
use wxb;

3.2.4 查看数据库信息

语法1：desc database databaseName;
语法2：desc database extended databaseName;
语法3：describe database extended databaseName;

3.2.5 删除数据库

语法1：drop database databasename;         	  # 这个只能删除空库
语法2：drop database databasename cascade;    	# 如果不是空库，则可以加cascade强制删除

3.3 表操作语法

3.3.1 数据类型

Hive的数据类型分为基本数据类型和复杂数据类型,下面是基本数据类型(复杂类型到后期再讲)

其中加粗体是重点要掌握的类型

分类	类型	描述	字面量示例
基本类型	BOOLEAN	true/false	TRUE
	TINYINT 对应Java bate	1字节的有符号整数 -128~127	1Y
	SMALLINT 对应 Java short	2个字节的有符号整数，-32768~32767	1S
	INT 对应Java int	4个字节的带符号整数	1
	BIGINT 对应 Java long	8字节带符号整数	1L
	FLOAT	4字节单精度浮点数	1.0
	DOUBLE	8字节双精度浮点数	1.0
	DEICIMAL	任意精度的带符号小数	1.0
	STRING	字符串，可变长度	“a”,’b’
	VARCHAR	变长字符串,要设置长度	“a”,’b’
	CHAR	固定长度字符串	“a”,’b’
	BINARY	字节数组	无法表示
	TIMESTAMP	时间戳，纳秒精度	122327493795
	DATE	日期	‘2016-03-29’
复杂类型	ARRAY	有序的的同类型的集合	array(1,2)
	MAP	key-value,key必须为原始类型，value可以任意类型	map(‘a’,1,’b’,2)
	STRUCT	字段集合,类型可以不同	struct(‘1’,1,1.0)
	UNION	在有限取值范围内的一个值	create_union(1,’a’,63)

3.3.2 创建表

创建表的本质其实就是在对应的数据库目录下面创建一个子目录，目录名为表名。数据文件就存在这个目录下。

语法1: 
	create table t_user(id int,name string);  

语法2：使用库.表的形式
	create table mydb.t_user(id int,name string);

语法3：指定分隔规则形式
create table if not exists t1(
uname string comment 'this is name',
chinese int,
math int,
english int
)
comment 'this is my table'
row format delimited   
fields terminated by '\t'  # 字段与字段之间按照什么去分割
lines terminated by '\n'   # 行和行之间按照什么去分割
stored as textfile;   # 指我们这个文件在hdfs上是以什么格式进行存储的

create table if not exists emp(
eno int,
ename string,
job string, 
mgr int,
hiredate int,
salary int,
comm int,
deptno int
)
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as textfile;

3.3.3 查看当前表空间中的所有表名

语法：show tables; 
# 查看另外一个数据库中的表
show tables in zoo;

3.3.4 查看表结构

desc tableName
desc extended tableName;  # 查看表结构扩展信息
describe extended tableName;

3.3.5 修改表结构

- 修改表名
	alter table oldTableName rename to newTableName;
	alter table t1 rename to t1_1;
  
- 修改列名：change column	和修改字段类型是同一个语法
	alter table tableName change column oldName newName colType;
    alter table t1_1 change column math shuxue string;


- 修改列的位置:  注意，2.x版本后，必须是相同类型进行移动位置。
	alter table tableName change column colName colName colType after colName1;  
    alter table t1 change column english english int after chinese;
    
	alter table tableName change column colName colName colType first;
	# 将字段放在首位，前提是要移动的字段和第一个字段的数据类型要保持一致；
	alter table t2 change column math math int first;
	
	
# 注意：这个操作只能用于改变表结构，又不会影响我原本的数据的顺序
	例如 ：  姓名  语文  数学
			小明  98    89
	我在这里给数学放在语文前面，那么我的表结构就会变成如下：
			姓名  数学  语文  
			小明  98    89

- 增加字段：add columns
	alter table tableName add columns (sex int,...);
	alter table t1 add columns(sex string,id int);

- 删除字段：replace columns	#注意，2.x版本后，注意类型的问题，替换操作，其实涉及到位置的移动问题。
	alter table tableName replace columns(
    id int,
    name int,
    size int,
    pic string
    );
	注意：实际上是保留小括号内的字段。
	
	alter table t1 replace columns(
	sex string,
	id int
    );

3.3.6 删除表

drop table tableName;

3.4 数据导入

[root@hadoop02 ~]# mkdir /hivedata
[root@hadoop02 ~]# cd /hivedata
[root@hadoop02 hivedata]# vim user.txt
-- 加入下面的数据
1,廉德枫
2,刘浩
3,王鑫
4,司翔


create table t_user2(
id int,
name string
)
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as textfile;

加载数据到Hive，一般分为两种:

- 一种是从本地Linux上加载到Hive中
- 另外一种是从HDFS加载到Hive中

**方法1：**使用hdfs dfs -put将本地文件上传到表目录下

[root@hadoop02 hivedata]# hdfs dfs -put ./user.txt /user/hive/warehouse/t_user/

**方法2：**在hive中使用load 命令

load data [local] inpath '文件路径' [overwrite] into table 表名
# local：写上表示从本地加载到hdfs（copy的操作），不写表示从hdfs加载数据（移动的操作）
# overwrite：写上表示覆盖，不写表示追加
加载数据时：
1. 最好是写绝对路径，从根开始写。
2. 写相对路径也是可以的，但是一定要记住你登录hive时的位置，从当前位置写相对路径
3. ~在hive中，是相对路径的写法
4. 使用benline工具进行远程登录（客户端与服务端不在同一台机器）时，使用以下语句时：
	load data local inpath '文件路径' [overwrite] into table 表名
	会有一个大坑：local是指服务端的文件系统。

**方法3：**从另外一张表(也可称之为备份表)中动态加载数据

insert into table tableName2 select [.....] from tableName1;


#######   注意：如果出现以下异常：
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20221106234007_95faacb8-188f-4886-9b7b-3700535dd853
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there’s no reduce operator
解决方案：
在hive中执行
set hive.exec.mode.local.auto=true;(默认为false)
#######



扩展内容：向多张表中插入数据的语法
    from tableName1
    insert into tableName2 select * [where 条件]
    insert into tableName3 select * [where 条件]
    .....
    
    


create table u2(
id int,
name string
)
row format delimited
fields terminated by ',';

insert into table u2 select id,name from u1;


create table u3(
id int,
name string
)
row format delimited
fields terminated by ',';
create table u4(
id int,
name string
)
row format delimited
fields terminated by ',';


from u2
insert into u3 select *
insert into u4 select id,name;



create table u8(
id int,
name string
)
row format delimited
fields terminated by ',';

注意： tableName2表中的字段个数必须和tableName1表中查询出来的个数相同

**方法4：**克隆表数据

- create table if not exists tableName2 as select [....] from tableName1;
- create table if not exists tableName2 like tableName1 location 'tableName1的存储目录的路径'     # 新表不会产生自己的表目录，因为用的是别的表的路径

扩展内容：只复制表结构
create table if not exists tableName2 like tableName1;

加载数据的本质：

如果数据在本地，加载数据的本质就是将数据copy到hdfs上的表目录下。
如果数据在hdfs上，加载数据的本质是将数据移动到hdfs的表目录下。

注意:hive使用的是严格的读时模式：加载数据时不检查数据的完整性，读时发现数据不对则使用NULL来代替。
而mysql使用的是写时模式:在写入数据时就进行检查

3.5 案例演示

CREATE TABLE flow(
id             string COMMENT 'this is id column',
phonenumber     string,
mac             string,
ip               string,
url              string,
urltype          string,
uppacket		 int,
downpacket       int,
upflow            int,
downflow         int,
issuccess    int
)
COMMENT 'this is log table'
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
stored as textfile;

加载数据：
load data local inpath './data/HTTP_20130313143750.dat' into table flow;

1、统计每个电话号码的总流量(M)
select l.phonenumber,
round(sum(l.upflow + l.downflow) / 1024.0,2) as total
from flow l
group by l.phonenumber
;

2、第二个需求，求访问次数排名前3的url：
select l.url url,
count(l.url) as urlcount
from flow l
group by l.url
order by urlcount desc
limit 3
;

3.6 数据导出

3.6.1 hive数据导出分类

1. 从hive表中导出本地文件系统中(目录、文件)
2. 从hive表中导出hdfs文件系统中
3. hive表中导出到其它hive表中  # 就是上面的复制表

3.6.2 导出到目录下

--1. 导出数据到本地文件系统的目录下
insert overwrite local directory '/root/out/00'
select * from student; 


#######   注意：如果出现以下异常：
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20221106234007_95faacb8-188f-4886-9b7b-3700535dd853
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there’s no reduce operator
解决方案：
在hive中执行
set hive.exec.mode.local.auto=true;(默认为false)
#######


--2. 导出数据到hdfs的目录下
insert overwrite directory '/root/out/01'
select * from student;

-- 导出的文件中字段默认不分隔。

--3. 修改导出后的列与列之间的格式：
insert overwrite local directory '/root/out/01'
row format delimited fields terminated by ','
select * from student;

3.6.3 直接导入到本地文件系统的文件中：

[root@hadoop01 ~]# hive -e 'select * from exercise.student' >> /root/out/02;
-- 导出的文件中字段分隔符默认是\t

菜鸟进阶站

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
Hive基本操作-库、表

创建数据库的本质就是在hive的参数${hive.metastore.warehouse.dir}对应的目录下，创建一个新的目录，此目录的名称为：库名.db。注意:hive使用的是严格的读时模式：加载数据时不检查数据的完整性，读时发现数据不对则使用NULL来代替。数据文件就存在这个目录下。**方法1：**使用hdfs dfs -put将本地文件上传到表目录下。**方法3：**从另外一张表(也可称之为备份表)中动态加载数据。**方法2：**在hive中使用load 命令。**方法4：**克隆表数据。
复制链接

扫一扫