HiveQL数据定义

最新推荐文章于 2022-03-26 20:26:02 发布

legotime

最新推荐文章于 2022-03-26 20:26:02 发布

阅读量1.6k

点赞数

分类专栏： hadoop生态文章标签： hive

本文链接：https://blog.csdn.net/legotime/article/details/51263479

版权

hadoop生态专栏收录该内容

3 篇文章 0 订阅

订阅专栏

hive 是一个基于hadoop、处理结构化数据的数据仓库基础工具。它提供简单的sql查询功能
可以将sql语句转换为MapReduce任务进行运行

hadoop是一个开源的大型分布式处理框架，主要包含了两个模块，MapReduce和HDFS
-----MapReduce：它是一种并行编程模型在大型集群普通硬件可用于处理大型结构化，半结构化和非结构化数据。

-----HDFS：Hadoop分布式文件系统是Hadoop的框架的一部分，用于存储和处理数据集。它提供了一个容错文件系统

在普通硬件上运行。

Hive 不是
一个关系数据库
一个设计用于联机事务处理（OLTP）
实时查询和行级更新的语言

Hiver特点
它存储架构在一个数据库中并处理数据到HDFS。
它是专为联机分析处理(OLAP)设计。
它提供SQL类型语言查询叫HiveQL或HQL。
它是熟知，快速，可扩展和可扩展的。
Hive不支持
行级插入操作、更新操作和删除操作
事物

联机事务处理:OLTP(on-line transaction processing)、联机分析处理OLAP:On-Line Analytical Processing.

HiveQL：数据定义
hive中的数据库的概念本质上市表的一个目录或者命名空间

1、查看当前的数据库：show databases;

---------------------------------------------
hive> show databases;
OK
default
Time taken: 4.324 seconds, Fetched: 1 row(s)
---------------------------------------------

OK 和 Time taken: 4.324 seconds, Fetched: 1 row(s)是系统的回答的执行信息
default是我们默认的数据库

2、创建一个sample数据库

create database sample;

note：当数据库本身含有sample这个数据库的时候，在用sample这个名字，那么回提示已经存在：

create database sample;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Database sample already exists

当然也可以用 if not exist来判断并执行

hive> create database if not exist sample;
FAILED: ParseException line 1:23 missing KW_EXISTS at 'exist' near '<EOF>'
line 1:29 extraneous input 'sample' expecting EOF near '<EOF>'

3、用正则表达式匹配来赛选需要的数据库名

hive> show databases like 's.*' ;
OK
sample
Time taken: 0.101 seconds, Fetched: 1 row(s)

hive会为每一个数据库创建一个目录。数据库中的表会以这个数据库的字目录的形式存储。但是default例外
如，我们的hive所在目录为/user/hive/warehouse

<property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
        <description>location of default database for the warehouse</description>
</property>

这个/user/hive/warehouse就是我们在hdfs下的hdfs://Master:9000/user/hive/warehouse/

4、查看一个数据库放在哪里：describe database sample

hive> describe database sample;
OK
sample		hdfs://Master:9000/user/hive/warehouse/sample.db	root	USER	
Time taken: 0.696 seconds, Fetched: 1 row(s)

這里的URI格式是hdfs，如果是MapR，那么就是maprfs
如果是本地模式：那么本地路径类似为：file：///user/hive/warehouse/sample.db

5、切换到数据库里面： use sample

hive> use sample;
OK
Time taken: 0.15 seconds

那么你现在就可以sample这个数据库进行操作

6、删除一个数据库

 drop database sample;

当然还是可以用if exists
drop database if exists sample;
hive是不允许删除的数据库中还包含表，如果sample中含有表，那么会提示不可以删除

hive> drop database sample;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database sample is not empty. One or more tables exist.)

处理方法：要么把表删除干净，然后再删除这个数据库
要么在数据库最后面加上cascade

hive> drop database if exists sample cascade;
OK
Time taken: 4.089 seconds

当然还有一个 Restrict 关键字，restrict和默认一样，也必须删除表才可以删除数据库

7、创建表
hive中创建一个表和sql大体相同，但是hive有一些扩展功能。比如这个数据文件存储在什么位置，和用什么样的格式存储
如下：创建一个employee表

create table employee(
    name STRING comment 'employee name',
    salary FLOAT comment 'employee salary',
    subordinates ARRAY<STRING> comment 'name of subordinates', 
    deductions MAP<STRING,FLOAT>  comment 'key are deduction names,value are percentages',
    address STRUCT<Street:STRING,City:STRING,State:STRING,Zip:INT> comment 'home address')
	comment 'description of the table'
	tblproperties('creator'='me','created_at'='2016-01-26 10:00:00',...)
	location '/user/hive/warehouse/sample.db/employee';

分析：1、可以指定这个表放在什么位子location '/user/hive/warehouse/sample.db/employee';
2、用户可以在每个字段或者表后面增加一个注释： comment 'employee name',
3、tblproperties('creator'='me','created_at'='2016-01-26 10:00:00',...)是利用“键-值”对的格式为表增加的文档进

行说明,hive对自动增加两个关于表的属性
1：last_modified_by 保存最后修改这个表的用户
2：last_modified_time 保存最后一次修改的时间

8、查看一个表的属性

hive> show tblproperties employee;
OK
comment	description of table
transient_lastDdlTime	1461649362
Time taken: 0.278 seconds, Fetched: 2 row(s)

9、copy一个表

hive> create table sample.employee11
    > like sample.employees;
OK
Time taken: 0.486 seconds

10、查看数据库中的表
比如查看sample下的表
1、use sample；之后 show tables；
2、 show tables in sample
当我们有很多表的时候，可以用正则表达式来过滤所需要的表

use sample;
show tables ‘empl.*’;

note: in database_name 语句不可以和正则表达式一起用

hive> show tables 'empl.*' in sample;
FAILED: ParseException line 1:21 missing EOF at 'in' near ''empl.*''

11、查看一个表内部的信息

hive> describe sample.employees;
OK
name                	string              	employee name       
salary              	float               	employee salary     
subordinates        	array<string>       	name of subordinates
deductions          	map<string,float>   	key are deduction names,value are percentages
address             	struct<Street:string,City:string,State:string,Zip:int>	                    
Time taken: 0.279 seconds, Fetched: 5 row(s)

hive> describe extended sample.employees;
OK
name                	string              	employee name       
salary              	float               	employee salary     
subordinates        	array<string>       	name of subordinates
deductions          	map<string,float>   	key are deduction names,value are percentages
address             	struct<Street:string,City:string,State:string,Zip:int>	                    
	 	 
Detailed Table Information	Table(tableName:employees, dbName:sample, owner:root, createTime:1461649004, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:employee name), FieldSchema(name:salary, type:float, comment:employee salary), FieldSchema(name:subordinates, type:array<string>, comment:name of subordinates), FieldSchema(name:deductions, type:map<string,float>, comment:key are deduction names,value are percentages), FieldSchema(name:address, type:struct<Street:string,City:string,State:string,Zip:int>, comment:null)], location:hdfs://Master:9000/user/local/hive/warehouse/sample.db/employees, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1461649004}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)	
Time taken: 0.306 seconds, Fetched: 7 row(s)

hive> describe formatted sample.employees;
OK
# col_name            	data_type           	comment             
	 	 
name                	string              	employee name       
salary              	float               	employee salary     
subordinates        	array<string>       	name of subordinates
deductions          	map<string,float>   	key are deduction names,value are percentages
address             	struct<Street:string,City:string,State:string,Zip:int>	                    
	 	 
# Detailed Table Information	 	 
Database:           	sample              	 
Owner:              	root                	 
CreateTime:         	Mon Apr 25 22:36:44 PDT 2016	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://Master:9000/user/local/hive/warehouse/sample.db/employees	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	transient_lastDdlTime	1461649004          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	serialization.format	1                   
Time taken: 0.438 seconds, Fetched: 30 row(s)

可以看出中间多一个extended，formatted的信息会罗列更多的具体信息,可以看出formatted得到的信息最详细。
note：如果没有定义任何用户自定义表属性的话，那么last_modified_by和last——modified-time也不会出现在详细列

表中

12、表的管理
我们创建的表都是所谓的管理表，有时也被称为内部表。有时候我们希望管理一些外部的表（比如pig创建的表），但

是我们并没有给予hive,对数据的所有权，此时，我们可以创建一个外部表指向这份数据，這样不用对这个外部表具有

所有权限，就可以执行查询
创建一个外部表

hive> create external table stocks(
    > symbol string,
    > ymd string,
    > price_open float,
    > price_high float,
    > price_low float,
    > price_close float,
    > volume int,
    > price_adj_close float)
    > row format delimited fields terminated by ','
    > location '/data/stock';
OK
Time taken: 0.734 seconds

分析：一个外部表在table前面有external修饰，后面的location就是说明这个数据放在哪里
note：外部表hive并非完全拥有這份数据，
用describe extended stocks;来查看stock输出表示管理表还是外部表，查看有：。。。。。tableType:EXTERNAL_TABLE)
对于管理表，用户还可以对存在的表的结构进行复制（数据不会复制）

13、修改一个表的名字
把employee名字改为empl

hive> alter table employee rename to empl;
OK
Time taken: 5.14 seconds

14、增加分区

hive> alter table employees add if not exists
    > partition(year = 2011,month = 1,day = 1) location '/logs/2011/01/01';

15、删除某个分区

alter table employees drop if exists partition(year=2011,month=12,day=2);

16、增加一个列，如下列表，我们需要增加一个列

-----------------------------------------------------------------------------------------------
hive> describe formatted sample.employees;
OK
# col_name            	data_type           	comment             
	 	 
name                	string              	employee name       
salary              	float               	employee salary     
subordinates        	array<string>       	name of subordinates
deductions          	map<string,float>   	key are deduction names,value are percentages
address             	struct<Street:string,City:string,State:string,Zip:int>
-------------------------------------------------------------------------------------------------

执行：

alter table employees add columns (
school string comment 'school infomation'
)

增加的部分放在最后一行

--------------------------------------------------------------------------------------------------
hive> describe formatted sample.employees;
OK
# col_name            	data_type           	comment             
	 	 
name                	string              	employee name       
salary              	float               	employee salary     
subordinates        	array<string>       	name of subordinates
deductions          	map<string,float>   	key are deduction names,value are percentages
address             	struct<Street:string,City:string,State:string,Zip:int>	                    
school              	string              	school infomation  
--------------------------------------------------------------------------------------------------

17、删除或者替换列

alter table employee1 replace columns(
	school sting comment 'school infomation',
	name  string   comment '-------'.
	age  int   comment
);
------------------------------------------
hive> describe formatted employee1;
OK
# col_name            	data_type           	comment             
	 	 
school              	string              	school information  
name                	string              	-------------       
age                 	int                 	                    
-----------------------------------

18、修改表的属性

alter table employee1 set tblproperties(
	'note'='the process is add property!!!'
)

note:可以增加和修改表的属性，但是无法删除属性

legotime

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录